# **What is Regularization?**


Overfitting is one of the most annoying things about a Machine Learning model. After all those time-consuming processes that took to gather the data, clean and preprocess it, the model is still incapable to give out an optimised result.  There can be lots of noises in data which may be the variance in the target variable for the same and exact predictors or irrelevant features or it can be corrupted data points. The ML model is unable to identify the noises and hence uses them as well to train the model. This can have a negative impact on the predictions of the model. This is called overfitting.

In simple words, overfitting is the result of an ML model trying to fit everything that it gets from the data including noises.

Please refer [this](https://analyticsindiamag.com/lasso-regression-in-python-with-machinehack-data-science-hackathon/) post to read about it more.

## **Ridge Regression**

It is also called an L2 regularization that is used to get rid of overfitting. The goal while building a machine learning model is to develop a model that can generalize patterns well in training as well as in testing.

Ridge Regression is done to improve the generalizability of the model. This is done by tweaking the slope of the best fit line. Maybe the model does not perform much well in the training because now the line does not pass exactly to the data points but it will give fairly good results in testing. The slope is changed or the line is titled a bit by making use of the penalty term called Alpha which is a hyperparameter. Linear regression aims to reduce the sum of squared errors whereas in ridge regression it also reduces the sum of squared error but adds this penalty term by multiplying it with slope square.

**Linear regression = min(Sum of squared errors)**

**Ridge regression = min(Sum of squared errors + alpha * slope)square)**

As the value of alpha increases, the lines gets horizontal and slope reduces as shown in the below graph.

## **Lasso Regression**

It is also called as l1 regularization. Similar to ridge regression, lasso regression also works in a similar fashion the only difference is of the penalty term. In ridge, we multiply it by slope and take the square whereas in lasso we just multiply the alpha with absolute of slope. 

**Lasso Regression = min(sum of squared error + alpha * | slope| )**

Similar to ridge regression as you increase the value of the penalty term the slope will get reduced and the line will become horizontal. As this term is increased it becomes less responsive to the independent variable. 

## **Implementation**

First, we will import all the required libraries and the data set. After importing we will explore a bit data like shape and about missing values present in the data set. Use the below code to do the same.

In [None]:
!python -m pip install pip --upgrade --user -q 
!python -m pip install numpy pandas seaborn matplotlib scipy statsmodels sklearn --user -q

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

In [1]:
import pandas as pd

from sklearn.linear_model import LinearRegression

from sklearn.linear_model import Ridge

from sklearn.linear_model import Lasso

from sklearn.datasets import load_boston
boston_data = load_boston()
df = pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
df['target'] = pd.Series(boston_data.target)
df.head()


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np


        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_h

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


In [2]:
print(df.shape)

print(df.isnull().sum())

(506, 14)
CRIM       0
ZN         0
INDUS      0
CHAS       0
NOX        0
RM         0
AGE        0
DIS        0
RAD        0
TAX        0
PTRATIO    0
B          0
LSTAT      0
target     0
dtype: int64


The data set contains 506 rows and 14 columns. There are no missing values that are found in the data. We will not divide the dependent variable and independent variable X and y respectively followed by scaling the data and then dividing it into training and testing sets. Use the below code to do so.

In [3]:
X = df.drop('target', axis=1)

y = df['target']

from sklearn import preprocessing

X = preprocessing.scale(X)

y = preprocessing.scale(y)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=1)

print(X_train.shape)

print(y_train.shape)

print(X_test.shape)

print(y_test.shape)

(354, 13)
(354,)
(152, 13)
(152,)


There are a total of 354 rows in the training data set and 152 are present in the testing data. We now build three models using simple linear regression, ridge regression and lasso regression and fit the data for training. After the model gets trained we will compute the scores for testing and training. Use the below code for the same.

In [4]:
regression_model = LinearRegression()

regression_model.fit(X_train, y_train)

ridge = Ridge(alpha=.3)

ridge.fit(X_train,y_train)

print ("Ridge model:", (ridge.coef_))

Ridge model: [-0.09167006  0.15298094  0.04282248  0.06757044 -0.26900759  0.21412311
  0.01052611 -0.34552532  0.2873329  -0.20302407 -0.23634259  0.06408346
 -0.44092697]


In [5]:
lasso = Lasso(alpha=0.1)

lasso.fit(X_train,y_train)

print ("Lasso model:", (lasso.coef_))

Lasso model: [-0.          0.         -0.          0.01568283 -0.          0.21429193
 -0.         -0.         -0.         -0.         -0.1518455   0.
 -0.43149417]


In [6]:
print("Linear Regression Model Training Score: ", regression_model.score(X_train, y_train))

print("Linear Regression Model Testing Score: ",regression_model.score(X_test, y_test))

print("Ridge Regression Model Training Score: ",ridge.score(X_train, y_train))

print("Ridge Regression Model Testing Score: ",ridge.score(X_test, y_test))

print("Lasso Regression Model Training Score: ",lasso.score(X_train, y_train))

print("Lasso Regression Model Testing Score: ",lasso.score(X_test, y_test))

Linear Regression Model Training Score:  0.7103879080674731
Linear Regression Model Testing Score:  0.7836295385076292
Ridge Regression Model Training Score:  0.7103849353060623
Ridge Regression Model Testing Score:  0.7837023354761999
Lasso Regression Model Training Score:  0.6370760173272938
Lasso Regression Model Testing Score:  0.6795163947872269


The results are almost identical but with less complexity of the models. We will now create a polynomial regression model by creating new features from the features followed by transforming the data and dividing it into training and testing. Use the below code to do so. 

In [7]:
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree = 2, interaction_only=True)

X_poly = poly.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.30, random_state=1)

regression_model.fit(X_train, y_train)

print(regression_model.coef_[0])

3.4026738293303087e-14


In [8]:
ridge = Ridge(alpha=.3)

ridge.fit(X_train,y_train)

print ("Ridge model:", (ridge.coef_))

Ridge model: [ 0.          0.09131088  0.04957092  0.1912305   0.19590777 -0.13001372
  0.37566068 -0.2643974  -0.38884179  0.06450491 -0.28779474  0.03471483
  0.18384016 -0.35661823  0.04865369  0.16616854  0.47613909 -0.16414125
  0.21836262 -0.16577694 -0.39233041 -0.42146194 -0.01398648  0.302188
 -0.03417649  0.2035278  -0.00994461 -0.04934762  0.03032621 -0.00150432
  0.04859549  0.04881267  0.08643587  0.10519436 -0.00282823 -0.05732559
 -0.17394488 -0.05029663  0.27058686  0.14884354  0.04492921  0.02211007
 -0.07684504  0.02653827 -0.01718006 -0.078254   -0.0570902  -0.16382703
 -0.11038648  0.00273401  0.02295145 -0.20551405  0.3096989  -0.07257826
  0.04122948 -0.06498732  0.1078521  -0.26229666  0.1227714  -0.06765637
 -0.0123171  -0.10933651  0.03419968  0.1371888  -0.18918276  0.03822037
 -0.07982726 -0.33045092 -0.06043317 -0.04501405 -0.17734675 -0.11703109
  0.3814108  -0.16430925 -0.05597577 -0.11419795 -0.11576035 -0.19264137
 -0.10178204 -0.00464449 -0.11287499  0.

In [9]:
lasso = Lasso(alpha=0.003)

lasso.fit(X_train,y_train)

print ("Lasso model:", (lasso.coef_))

Lasso model: [ 0.         -0.         -0.          0.          0.1218874  -0.11519651
  0.34377146 -0.16172908 -0.34259854  0.08440576 -0.19649545 -0.07612681
  0.08292073 -0.38011254  0.         -0.          0.30719265 -0.08967884
  0.13770663 -0.         -0.         -0.00486047 -0.          0.
 -0.0115419   0.10023233  0.         -0.          0.          0.00192164
  0.          0.0075936   0.03241507  0.03288134  0.03381794 -0.
 -0.07207862  0.00459289  0.05496256  0.08358469  0.00386614 -0.08631844
 -0.          0.00528134 -0.06079282  0.         -0.01974897 -0.1241212
 -0.09561138  0.0443368  -0.00247302 -0.          0.05167622 -0.02668962
  0.         -0.06763413  0.         -0.08784486  0.         -0.11162687
  0.         -0.00728116  0.01023866  0.         -0.13155027  0.
 -0.14744298 -0.13516906 -0.10220115 -0.04182454 -0.18337532 -0.00987345
  0.16864435  0.         -0.0345977  -0.         -0.08200021 -0.1406453
 -0.03301659  0.          0.          0.06870581  0.00887873  0.

We will now check the scores of the polynomial model and compute the training and testing scores. Use the below code to do so.

In [10]:
print("Linear Regression Model Training Score: ", regression_model.score(X_train, y_train))

print("Linear Regression Model Testing Score: ",regression_model.score(X_test, y_test))

print("Ridge Regression Model Training Score: ",ridge.score(X_train, y_train))

print("Ridge Regression Model Testing Score: ",ridge.score(X_test, y_test))

print("Lasso Regression Model Training Score: ",lasso.score(X_train, y_train))

print("Lasso Regression Model Testing Score: ",lasso.score(X_test, y_test))

Linear Regression Model Training Score:  0.9170667402821651
Linear Regression Model Testing Score:  0.8859749398610375
Ridge Regression Model Training Score:  0.9149881925509785
Ridge Regression Model Testing Score:  0.8856620334419097
Lasso Regression Model Training Score:  0.9023551833327291
Lasso Regression Model Testing Score:  0.901778455869959


Regularization is done to control the performance of the model and to avoid the model to get overfitted. In this article, we discussed the overfitting of the model and two well-known regularization techniques that are Lasso and Ridge Regression. Lasso regression transforms the coefficient values to 0 which means it can be used as a feature selection method and also dimensionality reduction technique. The feature whose coefficient becomes equal to 0 is less important in predicting the target variable and hence it can be dropped.  Ridge regression transforms the coefficient values to close to 0 and not completely equal to 0. 

Please refer [this](https://analyticsindiamag.com/hands-on-implementation-of-lasso-and-ridge-regression/) to get the complete overview of implementation and refer [this](https://analyticsindiamag.com/ridge-regression-vs-lasso-how-these-2-popular-ml-regularisation-techniques-work/) story to learn more about theoretical aspects of Ridge and Lasso Regularization.