Ridge regression, also known as Tikhonov regularization, is a type of linear regression that includes a regularization term. The key idea behind ridge regression is to find a new line that doesn't fit the training data as well as ordinary least squares regression, in order to achieve better generalization to new data. This is particularly useful when dealing with multicollinearity (independent variables are highly correlated) or when the number of predictors (features) exceeds the number of observations.

### Key Concept:
- **Regularization**: Ridge regression adds a penalty equal to the square of the magnitude of coefficients. This penalty term (squared L2 norm) shrinks the coefficients towards zero, but it doesn't make them exactly zero.

### Mathematical Representation:
The ridge regression modifies the least squares objective function by adding a penalty term:

$$ \text{Minimize } \sum_{i=1}^{n} (y_i - \sum_{j=1}^{p} x_{ij} \beta_j)^2 + \lambda \sum_{j=1}^{p} \beta_j^2 $$

where:
- $ y_i $ is the response value for the ith observation.
- $ x_{ij} $ is the value of the jth predictor for the ith observation.
- $ \beta_j $ is the regression coefficient for the jth predictor.
- $ \lambda $ is the tuning parameter that controls the strength of the penalty; $ \lambda \geq 0 $.


In this code, `alpha` is the regularization strength \( \lambda \). Adjusting `alpha` changes the strength of the regularization penalty. A larger `alpha` enforces stronger regularization (leading to smaller coefficients), and a smaller `alpha` tends towards a model similar to linear regression.

### Key Points:
- **Choosing Alpha**: Selecting the right value of `alpha` is crucial. It can be done using cross-validation techniques like `RidgeCV`.
- **Standardization**: It's often recommended to standardize the predictors before applying ridge regression.
- **Bias-Variance Tradeoff**: Ridge regression balances the bias-variance tradeoff in model training.

In [1]:
from sklearn.linear_model import LinearRegression, Ridge
import numpy as np

# Example data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# Target values
y = np.dot(X, np.array([1, 2])) + 3

# Linear regression
lr = LinearRegression()
lr.fit(X, y)

# Coefficients
print("Coefficients:", lr.coef_)
# Intercept
print("Intercept:", lr.intercept_)

Coefficients: [1. 2.]
Intercept: 3.0000000000000018


In [3]:
#ridge regression
ridge = Ridge(alpha=0.5)
ridge.fit(X, y)

print("Ridge Coefficients:", ridge.coef_)
print("Ridge Intercept:", ridge.intercept_)

Ridge Coefficients: [0.90909091 1.63636364]
Ridge Intercept: 3.8636363636363633


# Comparing Simple Linear Regression vs. Ridge Regression

In [18]:
import seaborn as sns
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error, mean_absolute_percentage_error
from sklearn.preprocessing import OneHotEncoder, StandardScaler, MinMaxScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import numpy as np

# Load the data set
df = sns.load_dataset('diamonds')

In [21]:
df['z'] = df['z'].fillna(df['z'].median())
df.isna().sum()

carat      0
cut        0
color      0
clarity    0
depth      0
table      0
price      0
x          0
y          0
z          0
dtype: int64

In [22]:
X = df.drop(columns=['price'], axis=1)
y = df['price']

# numeric features
numeric_features = ['carat', 'depth', 'table', 'x', 'y', 'z']
# categorical features
categorical_features = ['cut', 'color', 'clarity']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat',OneHotEncoder(),categorical_features)
    ])

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=42)

In [23]:
lr_pipeline = Pipeline(
    steps=[
        ('preprocessor',preprocessor),
        ('regressor',LinearRegression())
    ]
)

ridge_pipeline = Pipeline(
    steps=[
        ('preprocessor',preprocessor),
        ('regressor',Ridge(alpha=1.0))
    ]
)

In [24]:
# Train and evaluate Linear Regression
lr_pipeline.fit(X_train, y_train)
lr_pred = lr_pipeline.predict(X_test)
lr_mse = mean_squared_error(y_test, lr_pred)
lr_r2 = r2_score(y_test, lr_pred)
lr_mae = mean_absolute_error(y_test, lr_pred)
lr_mape = mean_absolute_percentage_error(y_test, lr_pred)
lr_rmse = np.sqrt(lr_mse)

# Train and evaluate Ridge Regression
ridge_pipeline.fit(X_train, y_train)
ridge_pred = ridge_pipeline.predict(X_test)
ridge_mse = mean_squared_error(y_test, ridge_pred)
ridhe_r2 = r2_score(y_test, ridge_pred)
ridge_mae = mean_absolute_error(y_test, ridge_pred)
ridge_mape = mean_absolute_percentage_error(y_test, ridge_pred)
ridge_rmse = np.sqrt(ridge_mse)

print("Linear Regression MSE:", lr_mse)
print("Ridge Regression MSE:", ridge_mse)
print(f"------------------------")

print("Linear Regression R2:", lr_r2)
print("Ridge Regression R2:", ridhe_r2)
print(f"------------------------")
print("Linear Regression MAE:", lr_mae)
print("Ridge Regression MAE:", ridge_mae)
print(f"------------------------")
print("Linear Regression MAPE:", lr_mape)
print("Ridge Regression MAPE:", ridge_mape)
print(f"------------------------")
print("Linear Regression RMSE:", lr_rmse)
print("Ridge Regression RMSE:", ridge_rmse)

Linear Regression MSE: 1641246.9824469788
Ridge Regression MSE: 1641198.7640876612
------------------------
Linear Regression R2: 0.9132764312325968
Ridge Regression R2: 0.913278979093001
------------------------
Linear Regression MAE: 882.4325038403573
Ridge Regression MAE: 882.4060964639548
------------------------
Linear Regression MAPE: 0.3913935361894944
Ridge Regression MAPE: 0.3911448808103667
------------------------
Linear Regression RMSE: 1281.111619823573
Ridge Regression RMSE: 1281.0928007321177
