<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Project 2
## Part 4: Model tuning


In [1]:
# imports
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, Lasso, LassoCV, RidgeCV,LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score,GridSearchCV
from sklearn.metrics import r2_score
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import SelectFromModel
from IPython.display import Image

### Hyperparameter tuning for Lasso Regression Model

In [2]:
%store -r X_scaled
%store -r y_train
%store -r X_test
%store -r y_test

#### Iteration 1

In [3]:
# Now we will experiment with range of alphas we input into the Lasso model. 
# The initial range is in logspace to ensure as large as possible range. 
# Later iterations will zoom in to linspace around the optimal value of alpha in previous iterations to obtain a closer value to optimal value of alpha.

lasso_params = {
    'alpha':list(np.logspace(0,10,100)),
    'max_iter':[50000]   
}

In [4]:
lasso_gridsearch = GridSearchCV(Lasso(), lasso_params,cv=3,verbose=1)

In [5]:
lasso_gridsearch.fit(X_scaled,y_train)

Fitting 3 folds for each of 100 candidates, totalling 300 fits


In [6]:
lasso_gridsearch.best_score_

0.8429377462835544

In [7]:
lasso_gridsearch.score(X_scaled,y_train)

0.906607998552871

In [8]:
lasso_gridsearch.score(X_test,y_test)

0.882160201476166

In [9]:
lasso_gridsearch.best_params_

{'alpha': 849.7534359086438, 'max_iter': 50000}

#### Iteration 2

In [10]:
lasso_params = {
    'alpha':list(np.logspace(2,3,100)),
    'max_iter':[50000]   
}

In [11]:
lasso_gridsearch = GridSearchCV(Lasso(), lasso_params,cv=3,verbose=1)

In [12]:
lasso_gridsearch.fit(X_scaled,y_train)

Fitting 3 folds for each of 100 candidates, totalling 300 fits


In [13]:
lasso_gridsearch.best_score_

0.8435531004073238

In [14]:
lasso_gridsearch.score(X_scaled,y_train)

0.904789890330967

In [15]:
lasso_gridsearch.score(X_test,y_test)

0.8833601728530867

In [16]:
lasso_gridsearch.best_params_

{'alpha': 911.1627561154896, 'max_iter': 50000}

#### Iteration 3

In [17]:
lasso_params = {
    'alpha':list(np.logspace(2.8,3,100)),
    'max_iter':[50000]   
}

In [18]:
lasso_gridsearch = GridSearchCV(Lasso(), lasso_params,cv=3,verbose=1)

In [19]:
lasso_gridsearch.fit(X_scaled,y_train)

Fitting 3 folds for each of 100 candidates, totalling 300 fits


In [20]:
lasso_gridsearch.best_score_

0.8435985795785924

In [21]:
lasso_gridsearch.score(X_scaled,y_train)

0.9045427580024477

In [22]:
lasso_gridsearch.score(X_test,y_test)

0.883490388476907

In [23]:
lasso_gridsearch.best_params_

{'alpha': 919.6791985117054, 'max_iter': 50000}

#### Iteration 4

In [24]:
lasso_params = {
    'alpha':list(np.linspace(915,925,100)),
    'max_iter':[50000]   
}

In [25]:
lasso_gridsearch = GridSearchCV(Lasso(), lasso_params,cv=3,verbose=1)

In [26]:
lasso_gridsearch.fit(X_scaled,y_train)

Fitting 3 folds for each of 100 candidates, totalling 300 fits


In [27]:
lasso_gridsearch.best_score_

0.8435987663261715

In [28]:
lasso_gridsearch.score(X_scaled,y_train)

0.9045201456415962

In [29]:
lasso_gridsearch.score(X_test,y_test)

0.8835017703623235

In [30]:
lasso_gridsearch.best_params_

{'alpha': 920.4545454545455, 'max_iter': 50000}

We have zoomed in to obtained a more exact approximate of the optimal value of alpha; and the closeness between the R^2 on training and test datasets also confirmed that the model performs well and generalizes well too. 

In [31]:
# The summary on results from gridsearch on alphas is in the table below.
Image(url="../pictures/Alpha tuning.PNG", width=460, height=460)