# Ridge Regression

A regularized version of linear regression with a regularization term added to the cost function.<br>

The regularization term is equal to : alpha * sum(i=1,n) * (theta)^2<br>

where,<br>
alpha = regualarization parameter<br>
i = index<br>
n = no of data points<br>
theta = weights<br>

Regularization adds a penalty equal to the square of the magnitude of coefficients.<br>
This penalty term (squared l2 norm) shrinks the coefficient towards zero but it doesn't make them exactly zero.

# Demonstration of Ridge regression

In [39]:
# import neccessary libraries 
from sklearn.linear_model import LinearRegression,Ridge 
import numpy as np
import pandas as pd 
import seaborn as sb 
from sklearn.metrics import mean_squared_error,r2_score,mean_absolute_error,mean_absolute_percentage_error
from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import OneHotEncoder,StandardScaler 
from sklearn.compose import ColumnTransformer 
from sklearn.pipeline import Pipeline 

In [40]:
# Creating sample data 
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]]) 
Y = np.dot(X,np.array([1,2])) + 3 
print("Input Features:\n", X)
print("Target Values:\n", y)

Input Features:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]
Target Values:
 [ 8 14 20 26]


In [41]:
# Using Ridge Regression 
ridge_reg = Ridge(alpha=1.0) 
ridge_reg.fit(X,Y)

In [42]:
# coefficients and intercept 
print('Coeffiecients: ',ridge_reg.coef_) 
print('Intercept: ',ridge_reg.intercept_)

Coeffiecients:  [1.46341463 1.46341463]
Intercept:  3.829268292682926


- Checking coefficent and intercept with liinear regression

In [43]:
lr = LinearRegression() 
lr.fit(X,Y) 
print('Linear Regression Coeffiecients: ',lr.coef_) 
print('Linear Regression Intercept: ',lr.intercept_)

Linear Regression Coeffiecients:  [1.5 1.5]
Linear Regression Intercept:  3.5000000000000036


<b>Interpretation:</b> Coefficents are slightly lessened in ridge regression than simple linear regression.

# Comparison between Simple linear regression and Ridge regression

In [44]:
# load the diamonds dataset 
df = sb.load_dataset('diamonds') 
df.head()

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75


In [45]:
# seperating features and target variable 
X = df.drop('price',axis=1) 
Y = df['price'] 
# seggregate numeric and categorical columns 
num_cols = list(X.select_dtypes(include=['int64','float64']).columns) 
cat_cols = list(X.select_dtypes(include=['object','category']).columns)
print("Numerical Columns: ",num_cols) 
print("Categorical Columns: ",cat_cols) 

Numerical Columns:  ['carat', 'depth', 'table', 'x', 'y', 'z']
Categorical Columns:  ['cut', 'color', 'clarity']


In [46]:
# preprocessing the data
preprocessor = ColumnTransformer(transformers=[('num',StandardScaler(),num_cols),
                                               ('cat',OneHotEncoder(),cat_cols)]) 
# train test split the data 
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2,random_state=42) 
# creating a pipeline with preprocessing and linear regression model 
lr_pipeline = Pipeline(steps=[('preprocessor',preprocessor),
                                 ('lr_reg',LinearRegression())]) 
ridge_pipeline = Pipeline(steps=[('preprocessor',preprocessor),
                                 ('ridge_reg',Ridge(alpha=1.0))])
# creating a pipeline with preprcessing and lin
# fitting the linear regression model
lr_pipeline.fit(X_train,Y_train)

In [47]:
# fitting the ridge regression model 
ridge_pipeline.fit(X_train,Y_train)

In [48]:
# Evaluating linear regression model 
Y_pred_lr = lr_pipeline.predict(X_test) 
print("Linear Regression Model Evaluation:") 
print("Mean Squared Error: ",mean_squared_error(Y_test,Y_pred_lr)) 
print("R2 Score: ",r2_score(Y_test,Y_pred_lr)) 
print("Mean Absolute Error: ",mean_absolute_error(Y_test,Y_pred_lr)) 
print("Mean Absolute Percentage Error: ",mean_absolute_percentage_error(Y_test,Y_pred_lr)) 
print("--------------------------------\n")
# Evaluating ridge regression model 
Y_pred_ridge = ridge_pipeline.predict(X_test)
print("Ridge Regression Model Evaluation:") 
print("Mean Squared Error: ",mean_squared_error(Y_test,Y_pred_ridge)) 
print("R2 Score: ",r2_score(Y_test,Y_pred_ridge)) 
print("Mean Absolute Error: ",mean_absolute_error(Y_test,Y_pred_ridge)) 
print("Mean Absolute Percentage Error: ",mean_absolute_percentage_error(Y_test,Y_pred_ridge)) 

Linear Regression Model Evaluation:
Mean Squared Error:  1288705.4778516763
R2 Score:  0.9189331350419386
Mean Absolute Error:  737.1513665933285
Mean Absolute Percentage Error:  0.3952933516494362
--------------------------------

Ridge Regression Model Evaluation:
Mean Squared Error:  1288677.6768713535
R2 Score:  0.9189348838808753
Mean Absolute Error:  737.1401555418629
Mean Absolute Percentage Error:  0.39520890011156
