# Regularization in Linear Regression

- We have multiple ways to apply regularization in regression: L1, L2, and Elastic Net
- Each machine learning algorithm has its own method of regularization. Examples:
    - Decision Tree: tree complexity and depth
    - Neural Networks: Dropout
    - Linear Regression: use Lasso or Ridge algos
    - Other: they have a hyperparameter called penalty example: `LogisticRegression()`
        None: no penalty is added;

        'l2': add a L2 penalty term and it is the default choice;

        'l1': add a L1 penalty term;

        'elasticnet': both L1 and L2 penalty terms are added.
- For Linear Regression, you need to evaluate the our between 3 models:
    1. No Regularization: `linearRegression()`
    2. L1 Regularization: `Lasso()` Regression - adds an absolute value to the coefficients 
    3. L2 Regularization: `Ridge()` Regression - adds the square of the coefficients magnitude



In [1]:
import pandas as pd
from sklearn.datasets import load_diabetes # this a regression dataset 
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.metrics import mean_squared_error, r2_score #2 metrics for model evaluation for regression problems


In [2]:
diab = load_diabetes()
print(diab)

{'data': array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
         0.01990749, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
        -0.06833155, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
         0.00286131, -0.02593034],
       ...,
       [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
        -0.04688253,  0.01549073],
       [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
         0.04452873, -0.02593034],
       [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
        -0.00422151,  0.00306441]]), 'target': array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
        69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
        68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
        87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
       259.,  53., 190., 142.,  75., 142., 155., 225.,  59., 104., 182.,
  

In [3]:
# col names
diab.feature_names

['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

In [5]:
#data description
print(diab.DESCR)

.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

  :Number of Instances: 442

  :Number of Attributes: First 10 columns are numeric predictive values

  :Target: Column 11 is a quantitative measure of disease progression one year after baseline

  :Attribute Information:
      - age     age in years
      - sex
      - bmi     body mass index
      - bp      average blood pressure
      - s1      tc, total serum cholesterol
      - s2      ldl, low-density lipoproteins
      - s3      hdl, high-density lipoproteins
      - s4      tch, total cholesterol / HDL
      - s5      ltg, possibly log of serum triglycerides level
      - s6      glu, blood sugar level

Note: Each of these 1

In [6]:
# we can start deploying the model directly as x=data, y=target
#but, to make it easy to read, let's convert to a dataframe

# split into 2 dataframes
X = pd.DataFrame(diab.data, columns=diab.feature_names)
X

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6
0,0.038076,0.050680,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204
2,0.085299,0.050680,0.044451,-0.005670,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.025930
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641
...,...,...,...,...,...,...,...,...,...,...
437,0.041708,0.050680,0.019662,0.059744,-0.005697,-0.002566,-0.028674,-0.002592,0.031193,0.007207
438,-0.005515,0.050680,-0.015906,-0.067642,0.049341,0.079165,-0.028674,0.034309,-0.018114,0.044485
439,0.041708,0.050680,-0.015906,0.017293,-0.037344,-0.013840,-0.024993,-0.011080,-0.046883,0.015491
440,-0.045472,-0.044642,0.039062,0.001215,0.016318,0.015283,-0.028674,0.026560,0.044529,-0.025930


In [7]:
y=diab.target
y

array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
        69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
        68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
        87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
       259.,  53., 190., 142.,  75., 142., 155., 225.,  59., 104., 182.,
       128.,  52.,  37., 170., 170.,  61., 144.,  52., 128.,  71., 163.,
       150.,  97., 160., 178.,  48., 270., 202., 111.,  85.,  42., 170.,
       200., 252., 113., 143.,  51.,  52., 210.,  65., 141.,  55., 134.,
        42., 111.,  98., 164.,  48.,  96.,  90., 162., 150., 279.,  92.,
        83., 128., 102., 302., 198.,  95.,  53., 134., 144., 232.,  81.,
       104.,  59., 246., 297., 258., 229., 275., 281., 179., 200., 200.,
       173., 180.,  84., 121., 161.,  99., 109., 115., 268., 274., 158.,
       107.,  83., 103., 272.,  85., 280., 336., 281., 118., 317., 235.,
        60., 174., 259., 178., 128.,  96., 126., 28

In [8]:
# splitting the data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [9]:
# define each model with its hyperparameters
models = [
    {'name': 'LinearRegression', 'model':LinearRegression()}, #using vanilla LR
    #using a wide range at first to figure out the best val
    {'name': 'RidgeRegression', 'model':Ridge(), 'params':{'alpha': [0.01, 0.1, 1, 10, 40]}}, # - alpha adjusts the intensity of the regularization
    {'name': 'LassoRegression', 'model':Lasso(), 'params':{'alpha': [0.01, 0.1, 1, 10]}}

]

In [10]:
for model_info in models:
    print(model_info)

{'name': 'LinearRegression', 'model': LinearRegression()}
{'name': 'RidgeRegression', 'model': Ridge(), 'params': {'alpha': [0.01, 0.1, 1, 10, 40]}}
{'name': 'LassoRegression', 'model': Lasso(), 'params': {'alpha': [0.01, 0.1, 1, 10]}}


In [14]:
# perform a GridSearchCV with a for loop to test all the models
for model_info in models:
    print(f"--\nTraining: {model_info['name']}:")
    GS = GridSearchCV(model_info['model'], model_info.get('params', {}), cv=5)
    GS.fit(X_train, y_train)

    #evaluate the best params
    best_model_par = GS.best_estimator_
    y_pred = best_model_par.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2_sc = r2_score(y_test, y_pred)

    print(f"Best parameters for model {model_info['name']}: {GS.best_params_}")
    print(f'Mean Squared Error: {mse:.4f}')
    print(f'R-Squared: {r2_sc:.4f}')

--
Training: LinearRegression:
Best parameters for model LinearRegression: {}
Mean Squared Error: 2900.1936
R-Squared: 0.4526
--
Training: RidgeRegression:
Best parameters for model RidgeRegression: {'alpha': 0.1}
Mean Squared Error: 2856.4869
R-Squared: 0.4609
--
Training: LassoRegression:
Best parameters for model LassoRegression: {'alpha': 0.1}
Mean Squared Error: 2798.1935
R-Squared: 0.4719


- MSE: Mean Squared Error is a measurement of the average squared differences between the actual and predicted target values
- R-Squared measures the proportion of the variance in the target variable (ranges from 0 to 1) the closer we are to 1 the better our model fits the data

Lasso is the best performing model because it has the lowest MSE and highest R-Squared

In [15]:
# define each model with its hyperparameters
models = [
    {'name': 'LinearRegression', 'model':LinearRegression()}, #using vanilla LR
    #using a wide range at first to figure out the best val
    {'name': 'RidgeRegression', 'model':Ridge(), 'params':{'alpha': [0.01, 0.1, 0.2, 0.3, 0.5]}}, # - alpha adjusts the intensity of the regularization
    {'name': 'LassoRegression', 'model':Lasso(), 'params':{'alpha': [0.01, 0.1, 0.2, 0.3, 0.5]}}

]

In [16]:
# perform a GridSearchCV with a for loop to test all the models
for model_info in models:
    print(f"--\nTraining: {model_info['name']}:")
    GS = GridSearchCV(model_info['model'], model_info.get('params', {}), cv=5)
    GS.fit(X_train, y_train)

    #evaluate the best params
    best_model_par = GS.best_estimator_
    y_pred = best_model_par.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2_sc = r2_score(y_test, y_pred)

    print(f"Best parameters for model {model_info['name']}: {GS.best_params_}")
    print(f'Mean Squared Error: {mse:.4f}')
    print(f'R-Squared: {r2_sc:.4f}')

--
Training: LinearRegression:
Best parameters for model LinearRegression: {}
Mean Squared Error: 2900.1936
R-Squared: 0.4526
--
Training: RidgeRegression:
Best parameters for model RidgeRegression: {'alpha': 0.1}
Mean Squared Error: 2856.4869
R-Squared: 0.4609
--
Training: LassoRegression:
Best parameters for model LassoRegression: {'alpha': 0.1}
Mean Squared Error: 2798.1935
R-Squared: 0.4719
