# Ridge Regression

Ridge Regression is a regularized version of Linear Regression: a regularization term equal to is added to the cost function. This forces the learning algorithm to not only fit the data but also `keep the model weights as small as possible`. Note that the regularization term should only be added to the cost function during training. Once the model is trained, you want to evaluate the model's performance using the unregularized performance measure.

Regularization is a technique used in machine learning and statistics to prevent overfitting of models on training data. Overfitting occurs when a model learns the training data too well, including its noise and outliers, leading to poor generalization to new, unseen data. Regularization helps to solve this problem by adding a penalty to the model's complexity.

Ridge regression, also known as Tikhonov regularization, is a type of linear regression that includes a regularization term. The key idea behind ridge regression is to find a new line that doesn't fit the training data as well as ordinary least squares regression, in order to achieve better generalization to new data. This is particularly useful when dealing with multicollinearity (independent variables are highly correlated) or when the number of predictors (features) exceeds the number of observations.

Key Concept:
Regularization: Ridge regression adds a penalty equal to the square of the magnitude of coefficients. `This penalty term (squared L2 norm) shrinks the coefficients towards zero, but it doesn't make them exactly zero.`

Key Points:

1. Choosing Alpha: Selecting the right value of alpha is crucial. It can be done using cross-validation techniques like RidgeCV.
   
2. Standardization: It's often recommended to standardize the predictors before applying ridge regression.
   
3. Bias-Variance Tradeoff: Ridge regression balances the bias-variance tradeoff in model training.

In [18]:
# import necessary liberaries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, r2_score

In [19]:
# Load the California Housing dataset
housing = fetch_california_housing()

In [20]:
# Convert to a pandas DataFrame for convenience
data = pd.DataFrame(housing.data, columns=housing.feature_names)
data['target'] = housing.target

In [21]:
# Display the first few rows of the dataset
data.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,target
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [22]:
# Split the data into features and target
X = data.drop(columns='target')
y = data['target']

In [23]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [24]:
# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [25]:
# Train a Ridge Regression model with L2 regularization
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train_scaled, y_train)

In [26]:
# Make predictions on the test set
y_pred = ridge_model.predict(X_test_scaled)

In [27]:
# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Print the results
print(f"Mean Squared Error: {mse:.4f}")
print(f"R-squared: {r2:.4f}")

Mean Squared Error: 0.5559
R-squared: 0.5758


In [28]:
# Display the coefficients
print("Coefficients:", ridge_model.coef_)

Coefficients: [ 0.85432679  0.12262397 -0.29421036  0.33900794 -0.00228221 -0.04083302
 -0.89616759 -0.86907074]
