# **Ridge-Regression**

Ridge Regression is a type of linear regression that includes a regularization term to prevent overfitting. It is particularly useful when dealing with multicollinearity in the dataset.

It is based on L2 Regularization, which adds a penalty equal to the square of the magnitude of coefficients to the loss function. 
This helps to reduce model complexity and prevent overfitting, especially in cases where the number of predictors is large or when predictors are highly correlated.

It is implemented in Python using the `Ridge` class from the `sklearn.linear_model` module.

Rigde regression can be used to predict a continuous target variable based on one or more predictor variables.
It is commonly used in various fields such as finance, biology, and engineering.

In [40]:
import pandas as pd
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler

In [41]:
X, y = make_regression(n_samples=10000, n_features=15, noise=0.4, n_informative=6, random_state=42)

In [42]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [43]:
ridge = Ridge()
ridge.fit(X_train_scaled, y_train)

In [44]:
y_pred = ridge.predict(X_test_scaled)
display(display(pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})))

Unnamed: 0,Actual,Predicted
0,130.282046,130.708344
1,-107.678959,-107.471374
2,33.140268,33.470632
3,167.046980,166.795310
4,196.386097,195.896719
...,...,...
1995,106.830871,107.085882
1996,75.492470,75.318061
1997,56.905620,56.470984
1998,-16.628251,-16.365673


None

In [45]:
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error, root_mean_squared_error

print("Mean Squared Error 1:", mean_squared_error(y_test, y_pred))
print("R2 Score 1:", r2_score(y_test, y_pred))
print("Mean Absolute Error 1:", mean_absolute_error(y_test, y_pred))
print("Root Mean Squared Error :", root_mean_squared_error(y_test, y_pred))

Mean Squared Error 1: 0.1556115358601225
R2 Score 1: 0.9999902719840599
Mean Absolute Error 1: 0.31579353224769136
Root Mean Squared Error : 0.39447628047846234


In [46]:
param_grid = {
    'alpha': [0.01, 0.1, 1, 10, 100],
    'solver': ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg'],
    'max_iter': [1000, 1500, 2000]
    }

In [47]:
from sklearn.model_selection import GridSearchCV

grid_search = GridSearchCV(estimator=ridge, param_grid=param_grid, cv=10, n_jobs=2, verbose=2)
grid_search.fit(X_train_scaled, y_train)

Fitting 10 folds for each of 75 candidates, totalling 750 fits


In [48]:
y_pred_2 = grid_search.predict(X_test_scaled)
display(display(pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})))
print(grid_search.best_params_)

Unnamed: 0,Actual,Predicted
0,130.282046,130.708344
1,-107.678959,-107.471374
2,33.140268,33.470632
3,167.046980,166.795310
4,196.386097,195.896719
...,...,...
1995,106.830871,107.085882
1996,75.492470,75.318061
1997,56.905620,56.470984
1998,-16.628251,-16.365673


None

{'alpha': 0.01, 'max_iter': 1000, 'solver': 'auto'}


In [49]:
print("Mean Squared Error 2:", mean_squared_error(y_test, y_pred_2))
print("R2 Score 2:", r2_score(y_test, y_pred_2))
print("Mean Absolute Error 2:", mean_absolute_error(y_test, y_pred_2))
print("Root Mean Squared Error 2:", root_mean_squared_error(y_test, y_pred_2))

Mean Squared Error 2: 0.1553593869189563
R2 Score 2: 0.9999902877470874
Mean Absolute Error 2: 0.31542167395767
Root Mean Squared Error 2: 0.39415655128255356
