# Objective

To review the idea of regularization as a method to avoid overfitting


## Preliminaries

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import f1_score

# Regularization

In regularization, we constrain the parameters of a complex model with an explicit view of minimizing the deviation between $E_{out}$ and $E_{\text{training}}$.

The central idea of regularization is to come up with algorithm-specific measures to constrain the parameters and add this to the training process.

## L2 regularization

In this form of regularization, we add a penalty on the square of the coefficients. So, in the case of regression, instead of minimizing:

$$(\sum_{i=0}^k w_ix_i - y)^2$$ 

we minimize:

$$(\sum_{i=0}^k w_ix_i - y)^2 \text{ subject to} \sum_{i=1}^kw_i^2 \leq C$$ 

So while the line is still being fit, it is not allowed to snugly fit the training data by imposing a budget $C$. It is also referred to as penalty.



## L1 regularization

In this form of regularization, we add a penalty on the absolute value of the coefficients. So, in the case of regression, instead of minimizing:

$$(\sum_{i=0}^k w_ix_i - y)^2$$ 

we minimize:

$$(\sum_{i=0}^k w_ix_i - y)^2 \text{ subject to } \sum_{i=1}^k|w_i|\leq C$$ 



# Regularization for logistic regression 

`Scikit-learn` imposes L2 regularization by [default](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)

## Data

In [None]:
fashion_mnist_train = pd.read_csv("/content/drive/MyDrive/AI-ML/supervised-learning-revision/Day1/data/fashion-mnist_train.csv")

In [None]:
fashion_X, fashion_y = (fashion_mnist_train.drop('label', axis=1), 
                        fashion_mnist_train.label)

In [None]:
fashion_Xtrain, fashion_Xvalid, fashion_ytrain, fashion_yvalid = train_test_split(fashion_X,
                                                                                  fashion_y,
                                                                                  test_size=0.2,
                                                                                  random_state=20130810)

In [None]:
sc = StandardScaler()

fashion_scaledXtrain = sc.fit_transform(fashion_Xtrain)
fashion_scaledXvalid = sc.transform(fashion_Xvalid)

In [None]:
fashion_ytrain_trouser = (fashion_ytrain == 1)
fashion_yvalid_trouser = (fashion_yvalid == 1)

## Model 1

The default implementation already applies regularization

In [None]:
learner_logistic = LogisticRegression(penalty='l2',
                                      C=1.0,
                                      solver='liblinear')

In [None]:
%%time
learner_logistic.fit(fashion_scaledXtrain, fashion_ytrain_trouser)

CPU times: user 40.3 s, sys: 1.08 s, total: 41.3 s
Wall time: 41.3 s


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='liblinear', tol=0.0001, verbose=0,
                   warm_start=False)

$E_{in}$

In [None]:
learner_logistic.score(fashion_scaledXtrain,
                       fashion_ytrain_trouser)

0.9966875

$E_{val}$

In [None]:
learner_logistic.score(fashion_scaledXvalid,
                       fashion_yvalid_trouser)

0.99275

In [None]:
f1_score()

## Model 2

Let us now disable the penalty

In [None]:
learner_logistic = LogisticRegression(penalty='none',
                                      solver='saga')

In [None]:
%%time

learner_logistic.fit(fashion_scaledXtrain,
                     fashion_ytrain_trouser)

CPU times: user 41.9 s, sys: 39.2 ms, total: 41.9 s
Wall time: 41.9 s




LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='none',
                   random_state=None, solver='saga', tol=0.0001, verbose=0,
                   warm_start=False)

In [None]:
learner_logistic = LogisticRegression(penalty='none',
                                      max_iter=1000)

In [None]:
%%time

learner_logistic.fit(fashion_scaledXtrain,
                     fashion_ytrain_trouser)

CPU times: user 1min 57s, sys: 7.6 s, total: 2min 4s
Wall time: 1min 4s


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=1000,
                   multi_class='auto', n_jobs=None, penalty='none',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

## Model 3

Let us now apply L1 regularization

In [None]:
learner_logistic = LogisticRegression(penalty='l1',
                                      C=1.0,
                                      solver='liblinear')

In [None]:
%%time
learner_logistic.fit(fashion_scaledXtrain, fashion_ytrain_trouser)

CPU times: user 22.4 s, sys: 406 ms, total: 22.8 s
Wall time: 22.8 s


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l1',
                   random_state=None, solver='liblinear', tol=0.0001, verbose=0,
                   warm_start=False)

$E_{in}$

In [None]:
learner_logistic.score(fashion_scaledXtrain,
                       fashion_ytrain_trouser)

0.9963333333333333

$E_{val}$

In [None]:
learner_logistic.score(fashion_scaledXvalid,
                       fashion_yvalid_trouser)

0.9938333333333333

## Model 4

A better way to handle tuning model hyperameters is to use [cross-validation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html)

In [None]:
parameters = {
    'C': [0.001, 0.1, 1, 10],
    'penalty': ['l1', 'l2']
}

In [None]:
learner_logistic = LogisticRegression(solver='liblinear')

In [None]:
learner_logistic_cv = RandomizedSearchCV(learner_logistic,
                                         parameters,
                                         n_iter=3)

In [None]:
%%time
learner_logistic_cv.fit(fashion_X, fashion_y)