## Regularization with Regression

- It is a technique in ML to prevent or reduce Overfitting. as well as improve the generalization performance of the model
- It adds a Penalty Component to the model's formula: eg- Linear Regression $y=B_0 + B_1X_1 + \epsilon$ ($\epsilon$ here is a penalty level)
- There is no one good value for regularization that can be set for every model. Therefore, its part of hyperparameter tuning process.
- Types of Regularization:
    - L1 Regularization (Lasso):
        - Technique: it applies a penalty with absolute value of coefficients
        - It works like placing a straight line with sharp corners trying to fit the data. It can shrink some coefficients to exactly 0
        - Therefore, it may be useful for reducing impact of irrelevant data(noise)
    - L2 Regularization (Ridge):
        - Technique: it applies a penalty with squared value of coefficients
        - It prevents the coefficients from growing too large (reduces their impact)
        - It doesn't allow the coefficients to go to 0(unlike L1). It keeps them close to 0 if they need to be reduced 
    - ElasticNet:
        - Technique: using weighted average, it combines both L1 and L2 by adding both penalties 
        - Because we can control the level of both L1 and L2, its recommended when you have multiple highly correlated features

- Applying Regularization:
    - to apply for Linear Regression, you simply switch to a different name `Lasso()` or `Ridge()`
    - Other algorithms have regularization embedded in the hyperparameters 
        - For eg- `LogisticRegression(penalty={'l1', 'l2', 'elasticnet'})`

### Automating Multiple Regression Models with Hyperparameter Tuning (Apply Regularization)

In [14]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet

from sklearn.metrics import mean_squared_error, r2_score


In [15]:
df = pd.read_csv('/home/vinayakgaur07/Downloads/insurance.csv')
df.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,expenses
0,19,female,27.9,0,yes,southwest,16884.92
1,18,male,33.8,1,no,southeast,1725.55
2,28,male,33.0,3,no,southeast,4449.46
3,33,male,22.7,0,no,northwest,21984.47
4,32,male,28.9,0,no,northwest,3866.86


Expenses represents the target, which is the amount of medical expenses
- Objective : predict medical expenses 

General Checks of Data

In [16]:
df.isna().sum()

age         0
sex         0
bmi         0
children    0
smoker      0
region      0
expenses    0
dtype: int64

We need to convert Categorical columns to Numerical therefore, we are going to use Encoding
- sex and smoker --> binary i.e Label Encoder
- region --> one-hot encoding or get dummies

In [17]:
for col in ['sex', 'smoker', 'region']:
    print(col, ':', df[col].unique())

sex : ['female' 'male']
smoker : ['yes' 'no']
region : ['southwest' 'southeast' 'northwest' 'northeast']


In [18]:
df_org = df.copy()

In [19]:
df = pd.get_dummies(data=df, columns=['region'], dtype=int)
df.head()

Unnamed: 0,age,sex,bmi,children,smoker,expenses,region_northeast,region_northwest,region_southeast,region_southwest
0,19,female,27.9,0,yes,16884.92,0,0,0,1
1,18,male,33.8,1,no,1725.55,0,0,1,0
2,28,male,33.0,3,no,4449.46,0,0,1,0
3,33,male,22.7,0,no,21984.47,0,1,0,0
4,32,male,28.9,0,no,3866.86,0,1,0,0


now label encoding:

In [20]:
df['sex'] = pd.factorize(df['sex'])[0]
df['smoker'] = pd.factorize(df['smoker'])[0]
df.head()

Unnamed: 0,age,sex,bmi,children,smoker,expenses,region_northeast,region_northwest,region_southeast,region_southwest
0,19,0,27.9,0,0,16884.92,0,0,0,1
1,18,1,33.8,1,1,1725.55,0,0,1,0
2,28,1,33.0,3,1,4449.46,0,0,1,0
3,33,1,22.7,0,1,21984.47,0,1,0,0
4,32,1,28.9,0,1,3866.86,0,1,0,0


In [29]:
X = df.drop(columns='expenses', axis=1)
y = df['expenses']

#### Define the Models and their Hyperparameters

Structure: List of Dictionaries, each dictionary = {model name, model function, model parameters}

In [21]:
#This is a hyperparameter grid setup for trying multiple regression models and (for some of them) multiple hyperparameter combinations.
models = [
        {'name':'Linear Regression', 'model':LinearRegression()},
        {'name':'Ridge Regression', 'model':Ridge(), 'params':{'alpha':[0.01, 0.1, 1, 10]}},    
        {'name':'Lasso Regression ', 'model':Lasso(), 'params':{'alpha':[0.01, 0.1, 1, 10]}},
        {'name':'ElasticNet Regression', 'model':ElasticNet(), 'params':{'alpha':[0.01, 0.1, 1, 10], 'l1_ratio':[0.2, 0.3, 0.6]}}
]

In [22]:
for model_info in models:
        print(model_info)

{'name': 'Linear Regression', 'model': LinearRegression()}
{'name': 'Ridge Regression', 'model': Ridge(), 'params': {'alpha': [0.01, 0.1, 1, 10]}}
{'name': 'Lasso Regression ', 'model': Lasso(), 'params': {'alpha': [0.01, 0.1, 1, 10]}}
{'name': 'ElasticNet Regression', 'model': ElasticNet(), 'params': {'alpha': [0.01, 0.1, 1, 10], 'l1_ratio': [0.2, 0.3, 0.6]}}


We have Multiple methods to run this hyperparameter grid
- Method 1: Quick and Easy 
    - No pre-split req 
    - Choose a specific metric for evaluation 
    - let `GridSearchCV` run the simulation based on the metric on full data
    - compare all the `best_estimator_`

- Method 2: useful for double evaluation 
    - pre-split req 
    - Choose a specific metric for evaluation 
    - let `GridSearchCV` run the simulation based on the metric on train data
    - decide which model is the best based on 2 metrics with test data:
        - mse
        - r-squared

In [33]:
for model_info in models:
        print('Running GridSearchCV for:', model_info['name'])
        model_gridsearch= GridSearchCV(model_info['model'],
                                       model_info.get('params',{}),
                                       cv=5,
                                       scoring='neg_mean_squared_error')
        
        model_gridsearch.fit(X,y)

        print('Best Parameters:', model_gridsearch.best_params_)
        print('Best Score:', model_gridsearch.best_score_)

        print('--------')


Running GridSearchCV for: Linear Regression
Best Parameters: {}
Best Score: -36910755.016761966
--------
Running GridSearchCV for: Ridge Regression
Best Parameters: {'alpha': 0.1}
Best Score: -36910590.29392947
--------
Running GridSearchCV for: Lasso Regression 
Best Parameters: {'alpha': 10}
Best Score: -36901248.74691481
--------
Running GridSearchCV for: ElasticNet Regression


  model = cd_fast.enet_coordinate_descent(


Best Parameters: {'alpha': 0.01, 'l1_ratio': 0.6}
Best Score: -36957387.9893402
--------


In Negative Mean Squared Error, the closer the score is to 0, the better
- eg: -5 is better than -100 

> Therefore, **Lasso** is the Best Performer while it applies Regularization