# <div align="center" style="color: brown"><strong>Regularization in Regression</strong></div>

## <div style="color: red"><strong>Part 1. Introduction to Regularization</strong></div>

Regularization is a technique used in regression models to prevent overfitting by adding a penalty term to the loss function. Overfitting occurs when a model learns the noise in the training data, resulting in poor generalization to new data. Regularization discourages complex models by penalizing large coefficients.

### Why Regularization?
- **Reduces overfitting**: Helps the model generalize better to unseen data.
- **Controls model complexity**: Penalizes large weights, leading to simpler models.
- **Improves stability**: Especially useful when features are highly correlated (multicollinearity).

### Types of Regularization
1. **Ridge Regression (L2 Regularization)**: Adds the sum of squared coefficients as a penalty.
2. **Lasso Regression (L1 Regularization)**: Adds the sum of absolute coefficients as a penalty.
3. **Elastic Net**: Combines both L1 and L2 penalties.

### Mathematical Formulation
For a linear regression model:
$$Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n + \epsilon$$

The regularized loss functions are:
- **Ridge (L2):**
  $$	ext{Loss} = \sum_{i=1}^m (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^n \beta_j^2$$
- **Lasso (L1):**
  $$	ext{Loss} = \sum_{i=1}^m (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^n |\beta_j|$$

Where $\lambda$ (alpha in sklearn) is the regularization strength. Higher $\lambda$ means more regularization.

## <div style="color: red"><strong>Part 2. Implementing Regularization in Python</strong></div>

Let's start by importing the necessary libraries:

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler

plt.style.use('seaborn-whitegrid')
sns.set_style("whitegrid")

### 2.1 Creating a Synthetic Dataset

We'll create a dataset with multicollinearity (highly correlated features) to demonstrate the effect of regularization.

In [ ]:
np.random.seed(42)
n_samples = 100
X1 = np.random.normal(0, 1, n_samples)
X2 = X1 + np.random.normal(0, 0.1, n_samples)  # Highly correlated with X1
X3 = np.random.normal(0, 1, n_samples)

# True coefficients
coefs = [5, 2, -3]
y = 3 + coefs[0]*X1 + coefs[1]*X2 + coefs[2]*X3 + np.random.normal(0, 2, n_samples)

data = pd.DataFrame({'X1': X1, 'X2': X2, 'X3': X3, 'y': y})
data.head()

### 2.2 Visualizing the Dataset

Let's check the correlation between features and visualize the data.

In [ ]:
plt.figure(figsize=(8, 6))
sns.heatmap(data.corr(), annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Matrix', fontsize=16)
plt.show()

### 2.3 Preparing the Data

We'll split the data and scale the features (important for regularization).

In [ ]:
X = data[['X1', 'X2', 'X3']]
y = data['y']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f'Training set size: {len(X_train)} samples')
print(f'Testing set size: {len(X_test)} samples')

## <div style="color: red"><strong>Part 3. Comparing Linear, Ridge, and Lasso Regression</strong></div>

Let's train and compare the three models.

In [ ]:
# Train models
linear = LinearRegression()
ridge = Ridge(alpha=1.0)
lasso = Lasso(alpha=0.5)

linear.fit(X_train_scaled, y_train)
ridge.fit(X_train_scaled, y_train)
lasso.fit(X_train_scaled, y_train)

# Predictions
y_pred_linear = linear.predict(X_test_scaled)
y_pred_ridge = ridge.predict(X_test_scaled)
y_pred_lasso = lasso.predict(X_test_scaled)

# Evaluation
results = pd.DataFrame({
    'Model': ['Linear', 'Ridge', 'Lasso'],
    'R2 Score': [r2_score(y_test, y_pred_linear), r2_score(y_test, y_pred_ridge), r2_score(y_test, y_pred_lasso)],
    'RMSE': [np.sqrt(mean_squared_error(y_test, y_pred_linear)),
             np.sqrt(mean_squared_error(y_test, y_pred_ridge)),
             np.sqrt(mean_squared_error(y_test, y_pred_lasso))]
})
results

### 3.1 Comparing Coefficients

Let's see how regularization affects the learned coefficients.

In [ ]:
coefs_df = pd.DataFrame({
    'Feature': X.columns,
    'Linear': linear.coef_,
    'Ridge': ridge.coef_,
    'Lasso': lasso.coef_
})
coefs_df

### 3.2 Visualizing Coefficient Shrinkage

Let's plot the coefficients for each model.

In [ ]:
coefs_df.set_index('Feature').plot(kind='bar', figsize=(10, 6))
plt.title('Comparison of Coefficients: Linear vs Ridge vs Lasso', fontsize=16)
plt.ylabel('Coefficient Value', fontsize=14)
plt.xlabel('Feature', fontsize=14)
plt.grid(True)
plt.show()

## <div style="color: red"><strong>Part 4. Summary</strong></div>

- **Ridge Regression** (L2) shrinks coefficients but does not set them to zero. Useful for multicollinearity.
- **Lasso Regression** (L1) can shrink some coefficients to exactly zero, performing feature selection.
- Regularization helps prevent overfitting and improves model generalization.

**Tip:** Tune the regularization strength ($\lambda$ or `alpha`) using cross-validation for best results.