<a href="https://colab.research.google.com/github/Metallicode/Math/blob/main/Regularization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Regularization

regression model:

regression model.svg

##RSS

[Residual sum of squares](https://en.wikipedia.org/wiki/Residual_sum_of_squares)

Sum((y - predictions)**2)

rss.svg




rss_expanded.svg

In [None]:
import numpy as np

y = np.array([1,3,5,6,7,8,9])
y_pred = np.array([1.3,2.1,4.5,6.23,8.8,8.9,10])

errors = y - y_pred
errors

array([-0.3 ,  0.9 ,  0.5 , -0.23, -1.8 , -0.9 , -1.  ])

In [None]:
rss = sum(errors**2)

In [None]:
rss

6.252900000000003

#Ridge Regression

[Ridge Regression](https://www.statology.org/ridge-regression/)

RSS + shrinkage penalty

RSS + λΣβj2

RSS + (λ * sum(model.coef_**2))

In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [46]:
df = pd.read_csv('Advertising.csv')

X = df.drop('sales', axis=1)
y = df["sales"]

In [47]:
df

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9
...,...,...,...,...
195,38.2,3.7,13.8,7.6
196,94.2,4.9,8.1,9.7
197,177.0,9.3,6.4,12.8
198,283.6,42.0,66.2,25.5


In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True)

model = LinearRegression()

model.fit(X_train, y_train)

In [10]:
model.coef_

array([0.04777359, 0.16556323, 0.00855185])

##Shrinkage Penalty

In [11]:
sum(model.coef_**2)

0.029766631687732473

##Ridge Regression

In [15]:
y_pred = model.predict(X_test)

In [20]:
lambda_value = 1.0 ##this is the hyper parameter we could change using cross validation
sum((np.array(y_test) - y_pred)**2) + lambda_value * sum(model.coef_**2)

216.29380100334723

##Comparing RSS to Shrinkage Penalty.

In [26]:
print("RSS\t\t\t", sum((np.array(y_test) - y_pred)**2))
print("Shrinkage Penalty\t", sum(model.coef_**2))

RSS			 216.2640343716595
Shrinkage Penalty	 0.029766631687732473


💡**We can see that Shrinkage Penalty is insignificant in comparison to RSS - This is why we must use Feature Scaling for Ridge Regression**

##Feature Scaling

In [37]:
from sklearn.preprocessing import StandardScaler

scalar = StandardScaler()

scalar.fit(X_train)

X_train = scalar.transform(X_train)
X_test = scalar.transform(X_test)

model = LinearRegression()

model.fit(X_train, y_train)

In [28]:
print("RSS\t\t\t", sum((np.array(y_test) - y_pred)**2))
print("Shrinkage Penalty\t", sum(model.coef_**2))

RSS			 216.2640343716595
Shrinkage Penalty	 22.118519404690296


📓
When you train a Ridge regression model (or other linear models) using libraries like scikit-learn, the .coef_ attribute of the trained model usually doesn't include the intercept. The intercept is instead stored in a separate attribute named .intercept_.


If you have 3 columns (features) in your dataset, and model.coef_ gives you 3 coefficients, then each coefficient corresponds directly to one of the features. There is no intercept included in model.coef_ in this scenario.

##sklearn Ridge Regression

In [30]:
from sklearn.linear_model import Ridge

In [31]:
ridge_model = Ridge(alpha=5)  ## notice β argument is called "alpha"

In [32]:
ridge_model.fit(X_train, y_train)

In [33]:
predictions = ridge_model.predict(X_test)

##Evaluate

In [39]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [43]:
mae = mean_absolute_error(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))

In [44]:
mae, rmse

(1.5186616279586536, 1.9086358711400049)

##Ridge Regression + Cross Validation

In [45]:
from sklearn.linear_model import RidgeCV

In [48]:
ridge_cv_model = RidgeCV(alphas=(0.1,1.0,10))

In [49]:
ridge_cv_model.fit(X_train, y_train)

In [50]:
ridge_cv_model.alpha_  ##the best alpha value

0.1