# <div align="center" style="color: brown"><strong>Linear Regression</strong></div>

## <div style="color: red"><strong>Part 1: Theoretical Background</strong></div>


### Overview
Linear Regression is a supervised machine learning algorithm used for predicting continuous values. It models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a linear equation to the observed data.

### Mathematical Formulation
For simple linear regression:
y = β₀ + β₁x + ε

For multiple linear regression:
y = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ + ε

Where:
- y is the dependent variable (target)
- x, x₁, x₂, ..., xₙ are independent variables (features)
- β₀ is the y-intercept (bias term)
- β₁, β₂, ..., βₙ are the coefficients (weights)
- ε is the error term (residual)

### Types of Linear Regression
1. **Simple Linear Regression**: Uses one independent variable to predict a dependent variable.
2. **Multiple Linear Regression**: Uses multiple independent variables to predict a dependent variable.
3. **Polynomial Regression**: A form of linear regression where the relationship between x and y is modeled as an nth degree polynomial.
4. **Ridge Regression**: Adds L2 regularization to prevent overfitting.
5. **Lasso Regression**: Adds L1 regularization which can lead to feature selection.
6. **Elastic Net**: Combines both L1 and L2 regularization.

### Key Assumptions
1. **Linearity**: The relationship between features and target is linear.
2. **Independence**: Observations are independent of each other.
3. **Homoscedasticity**: Constant variance in errors across all levels of independent variables.
4. **Normality**: Residuals follow a normal distribution.
5. **No or little multicollinearity**: Independent variables are not highly correlated.

### Estimation Method
The most common method to estimate parameters is Ordinary Least Squares (OLS), which minimizes the sum of squared differences between observed and predicted values:
min Σ(yᵢ - ŷᵢ)²

### Evaluation Metrics
1. **R-squared (Coefficient of Determination)**: Proportion of variance explained by the model (0-1).
2. **Adjusted R-squared**: R-squared adjusted for the number of predictors.
3. **Mean Squared Error (MSE)**: Average of squared differences between predicted and actual values.
4. **Root Mean Squared Error (RMSE)**: Square root of MSE.
5. **Mean Absolute Error (MAE)**: Average of absolute differences between predicted and actual values.

### Implementation Considerations
- Feature scaling/normalization is often beneficial but not strictly necessary
- Handle missing values and outliers appropriately
- Consider feature engineering to capture non-linear relationships
- Validate assumptions before interpreting the model

### Advantages
- Simple to understand and implement
- Computationally efficient
- Highly interpretable (coefficients directly represent feature importance)
- Works well with large datasets
- Provides confidence intervals and prediction intervals

### Limitations
- Assumes linear relationship between variables
- Sensitive to outliers
- May underperform with highly non-linear data
- Cannot handle irrelevant features well without regularization
- Assumes independence among features (multicollinearity issues)

### Applications
- Economics (demand prediction)
- Finance (asset pricing)
- Biology (growth rates)
- Real estate (house price prediction)
- Marketing (sales forecasting)

## <div style="color: red"><strong>Part 2: Implementation and Experiments</strong></div>

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generate some sample data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)  # Feature (independent variable)
y = 4 + 3 * X + np.random.randn(100, 1)  # Target (dependent variable) with some noise

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Linear Regression model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Print the model's coefficients
print(f"Intercept (theta_0): {model.intercept_[0]}")
print(f"Coefficient (theta_1): {model.coef_[0][0]}")

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Plot the results
plt.figure(figsize=(10, 6))
plt.scatter(X_test, y_test, color='blue', label='Actual data')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Linear regression line')
plt.xlabel("Feature (X)")
plt.ylabel("Target (y)")
plt.title("Linear Regression Example")
plt.legend()
plt.grid(True)
plt.show()