# Regularization of Linear Models with SKLearn
Linear models are usually a good starting point for training a model. However, a lot of datasets do not exhibit linear relationships between the independent and the dependent variables. As a result, it is frequently necessary to create a polynomial model. However, these models are usually prone to overfitting. One method of reducing overfitting in polynomial models is through the use of regularization.
Let’s import the necessary libraries and load up our training dataset.


In [2]:
#imports
import numpy as np
import pandas as pd
import math

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error

from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt
import seaborn as sns

sns.set()
%matplotlib inline


Let’s split our data into a training set and a validation set as you did before. You will hold out 30% of the data for validation. You will use a random state to make our experiment reproducible.


In [3]:
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
X = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
y = raw_df.values[1::2, 2]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.3)


Let’s establish a baseline by training a linear regression model.


In [4]:

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

print('Training score: {}'.format(lr_model.score(X_train, y_train)))
print('Test score: {}'.format(lr_model.score(X_test, y_test)))

y_pred_train = lr_model.predict(X_train)
mse_train= mean_squared_error(y_train, y_pred_train)
rmse_train = math.sqrt(mse_train)

print('RMSE_train: {}'.format(rmse_train))

y_pred_test = lr_model.predict(X_test)
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_test = math.sqrt(mse_test)

print('RMSE_test: {}'.format(rmse_test))


Training score: 0.7434997532004697
Test score: 0.7112260057484974
RMSE_train: 4.748208239685937
RMSE_test: 4.638689926172788


The model above should give us a training accuracy and a test accuracy of about 72%. We should also get an RMSE of about 4.5. The next models we train should outperform this model with higher accuracy scores and a lower RMSE.
We need to engineer new features. Specifically, we need to create polynomial features by taking our individual features and raising them to a chosen power. Thankfully, scikit-learn has an implementation for this and we don’t need to do it manually.
Something else we would like to do is standardize our data. This scales our data down to a range between 0 and 1. This serves the purpose of letting us work with reasonable numbers when we raise to a power.
Finally, because we need to carry out the same operations on our training, validation, and test sets, we will introduce a pipeline. This will let us pipe our process so the same steps get carried out repeatedly.
To summarize, we will scale our data, then create polynomial features, and then train a linear regression model.


In [5]:
steps = [
    ('scalar', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2)),
    ('model', LinearRegression())
]

pipeline = Pipeline(steps)

pipeline.fit(X_train, y_train)

y_pred_train = pipeline.predict(X_train)
mse_train= mean_squared_error(y_train, y_pred_train)
rmse_train = math.sqrt(mse_train)
print('RMSE_train: {}'.format(rmse_train))

y_pred_test = pipeline.predict(X_test)
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_test = math.sqrt(mse_test)
print('RMSE_test: {}\n'.format(rmse_test))

print('Training score: {}'.format(pipeline.score(X_train, y_train)))
print('Test score: {}'.format(pipeline.score(X_test, y_test)))


RMSE_train: 2.162970056950185
RMSE_test: 5.0811876783255245

Training score: 0.9467733311147442
Test score: 0.6535042863861226


After running our code, we will get a training accuracy of about 94.75%, and a test accuracy of 46.76%. This is a sign of overfitting. It’s normally not a desirable feature, but that is exactly what we were hoping for.
We will now apply regularization to our new data.
## l2 Regularization or Ridge Regression
To understand Ridge Regression, we need to remind ourselves of what happens during gradient descent, when our model coefficients are trained. During training, our initial weights are updated according to a gradient update rule using a learning rate and a gradient. Ridge regression adds a penalty to the update, and as a result shrinks the size of our weights. This is implemented in scikit-learn as a class called Ridge.
We will create a new pipeline, this time using Ridge. We will specify our regularization strength by passing in a parameter, alpha. This can be really small, like 0.001, or as large as you would want it to be. The larger the value of alpha, the less variance your model will exhibit.

${\begin{align*}\frac{1}{2} \sum_{n=1}^{N}\left\{y_{n}-\theta^{\top} \boldsymbol{\phi}\left(\mathbf{x}_{n}\right)\right\}^{2}+\frac{\lambda}{2}\|\theta\|_2^2
\end{align*}}$


In [6]:
steps = [
    ('scalar', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2)),
    ('model', Ridge(alpha=10, fit_intercept=True))
]

ridge_pipe = Pipeline(steps)
ridge_pipe.fit(X_train, y_train)

y_pred_train = ridge_pipe.predict(X_train)
mse_train= mean_squared_error(y_train, y_pred_train)
rmse_train = math.sqrt(mse_train)
print('RMSE_train: {}'.format(rmse_train))

y_pred_test = ridge_pipe.predict(X_test)
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_test = math.sqrt(mse_test)
print('RMSE_test: {}'.format(rmse_test))

print('Training Score: {}'.format(ridge_pipe.score(X_train, y_train)))
print('Test Score: {}'.format(ridge_pipe.score(X_test, y_test)))


RMSE_train: 2.441071076959751
RMSE_test: 3.823376123713985
Training Score: 0.9322063334864212
Test Score: 0.8038169683868278


By executing the code, we should have a training accuracy of about 91.8%, and a test accuracy of about 82.87%. That is an improvement on our baseline linear regression model.
Let’s try something else.
## l1 Regularization or Lasso Regression
By creating a polynomial model, we created additional features. The question we need to ask ourselves is which of our features are relevant to our model, and which are not.
l1 regularization tries to answer this question by driving the values of certain coefficients down to 0. This eliminates the least important features in our model. We will create a pipeline similar to the one above, but using Lasso. You can play around with the value of alpha.

${\frac{1}{2} \sum_{n=1}^{N}\left\{y_{n}-\theta^{\top} \boldsymbol{\phi}\left(\mathbf{x}_{n}\right)\right\}^{2}+\frac{\lambda}{2}\|\theta\|_1 }$


In [7]:
steps = [
    ('scalar', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2)),
    ('model', Lasso(alpha=0.3, fit_intercept=True))
]

lasso_pipe = Pipeline(steps)

lasso_pipe.fit(X_train, y_train)

y_pred_train = lasso_pipe.predict(X_train)
mse_train= mean_squared_error(y_train, y_pred_train)
rmse_train = math.sqrt(mse_train)
print('RMSE_train: {}'.format(rmse_train))

y_pred_test = lasso_pipe.predict(X_test)
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_test = math.sqrt(mse_test)
print('RMSE_test: {}'.format(rmse_test))

print('Training score: {}'.format(lasso_pipe.score(X_train, y_train)))
print('Test score: {}'.format(lasso_pipe.score(X_test, y_test)))


RMSE_train: 3.538738418298479
RMSE_test: 3.970165571442558
Training score: 0.8575294192309941
Test score: 0.7884638325042947


# task :
In Exercise 9 task 3, you have found the otipmal complexity for the polynomial model. In this task you will use that polynomial Regression and will apply Ridge and and Lasso Regression to it:
1. Use optimal degree you have found for the polynomial and calculate RMSE_train and RMSE_test for it.
2. Use the polynomial degree=10 and apply Ridge Regression  find the optimal lambda (around 0.001) and calculate RMSE_train and RMSE_test
3. Use the polynomial degree=10 and apply Lasso Regression and find the optimal lambda (around 0.001) and calculate RMSE_train and RMSE_test
4. In the results you can see RMSE_test using regularizaition is lower than polynomial with optimal complexity. How you justify these results?  


In [13]:
#use this code for your solutions 
from sklearn.model_selection import train_test_split

#cosin function
def true_fun(X):
    return np.cos(1.5 * np.pi * X)

np.random.seed(0)

n_samples = 30

X = np.sort(np.random.rand(n_samples))
y = true_fun(X) + np.random.randn(n_samples) * 0.1


X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.60, test_size=0.40, random_state=1)
print('Train/Test Size : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
degree=10
polynomial_features = PolynomialFeatures(degree=degree, include_bias=False)
linear_regression = LinearRegression()
pipeline = Pipeline(
    [
        ("polynomial_features", polynomial_features),
        ("linear_regression", linear_regression),
    ]
)
pipeline.fit(X_train[:, np.newaxis], Y_train)

y_pred_train = pipeline.predict(X_train[:, np.newaxis])
mse_train= mean_squared_error(y_train, y_pred_train)
rmse_train = math.sqrt(mse_train)
print('RMSE_train: {}'.format(rmse_train))

y_pred_test = pipeline.predict(X_test[:, np.newaxis])
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_test = math.sqrt(mse_test)
print('RMSE_test: {}\n'.format(rmse_test))


Train/Test Size :  (18,) (12,) (18,) (12,)
RMSE_train: 0.056532247455264445
RMSE_test: 0.30856057840599427



In [9]:
# Copy paste your  code here


Train/Test Size :  (18,) (12,) (18,) (12,)
RMSE_train: 0.1079839948207335
RMSE_test: 0.12863714132664852



In [155]:
# Copy paste your  code here


RMSE_train: 0.1248997128894005
RMSE_test: 0.12911924203886604



In [10]:
# Copy paste your code here


RMSE_train: 0.10041317401921453
RMSE_test: 0.11953621236626978

