<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Grid Search Lab

---


Now we want to use grid search on the wine dataset to tune a linear regression model including regularisation parameters.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

plt.style.use('fivethirtyeight')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

from sklearn.model_selection import GridSearchCV
import patsy

###  Load the wine dataset

In [2]:
# Load dataset
df = pd.read_csv("winequality_merged.csv")
df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,red_wine
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,1
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,1
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,1
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,1
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,1


### Clean the column names by replacing spaces by underscore

In [6]:
ucols = []
for col in df.columns:
    print(col)
    ucols.append(col.replace(" ", "_"))
df.columns = ucols
df.columns

fixed_acidity
volatile_acidity
citric_acid
residual_sugar
chlorides
free_sulfur_dioxide
total_sulfur_dioxide
density
pH
sulphates
alcohol
quality
red_wine


Index(['fixed_acidity', 'volatile_acidity', 'citric_acid', 'residual_sugar',
       'chlorides', 'free_sulfur_dioxide', 'total_sulfur_dioxide', 'density',
       'pH', 'sulphates', 'alcohol', 'quality', 'red_wine'],
      dtype='object')

In [10]:
df['quality'].value_counts()

6    2836
5    2138
7    1079
4     216
8     193
3      30
9       5
Name: quality, dtype: int64

### Create Feature matrix and target (X and y)

In [12]:
df['quality']

0       5
1       5
2       5
3       6
4       5
5       5
6       5
7       7
8       7
9       5
10      5
11      5
12      5
13      5
14      5
15      5
16      7
17      5
18      4
19      6
20      6
21      5
22      5
23      5
24      6
25      5
26      5
27      5
28      5
29      6
       ..
6467    6
6468    6
6469    7
6470    6
6471    5
6472    6
6473    6
6474    6
6475    7
6476    5
6477    4
6478    6
6479    6
6480    6
6481    5
6482    6
6483    5
6484    6
6485    7
6486    7
6487    5
6488    6
6489    6
6490    6
6491    5
6492    6
6493    5
6494    6
6495    7
6496    6
Name: quality, Length: 6497, dtype: int64

In [19]:
features = [x for x in df if x != 'quality']
X = df[features]
y = df['quality']
print(X.shape)
print(y.shape)

(6497, 12)
(6497,)


### Use the standard scaler to rescale the feature matrix

In [20]:
from sklearn.preprocessing import StandardScaler

In [21]:
scaler = StandardScaler()

Xss = scaler.fit_transform(X)
Xss = pd.DataFrame(Xss, index=X.index, columns=X.columns)
Xss.head()

Unnamed: 0,fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,red_wine
0,0.142473,2.188833,-2.192833,-0.744778,0.569958,-1.10014,-1.446359,1.034993,1.81309,0.193097,-0.915464,1.75019
1,0.451036,3.282235,-2.192833,-0.59764,1.197975,-0.31132,-0.862469,0.701486,-0.115073,0.999579,-0.580068,1.75019
2,0.451036,2.5533,-1.917553,-0.660699,1.026697,-0.874763,-1.092486,0.768188,0.25812,0.797958,-0.580068,1.75019
3,3.073817,-0.362438,1.661085,-0.744778,0.541412,-0.762074,-0.986324,1.101694,-0.363868,0.32751,-0.580068,1.75019
4,0.142473,2.188833,-2.192833,-0.744778,0.569958,-1.10014,-1.446359,1.034993,1.81309,0.193097,-0.915464,1.75019


### Setup search parameters for grid search on the regularisation strength alpha

Hint: Look up `np.linspace` and `np.logspace` for efficient ways of defining search parameters for alpha.

### Perform grid search using Lasso regularisation 

In [None]:
from sklearn.linear_model import Lasso

### Obtain performance metrics and decide on an optimal parameter for alpha

### Plot the model scores obtained for the different alphas

### Fit a Lasso regression model on your features and target for all alpha values you used in your grid search and plot how the model coefficients change with alpha.

### Bonus: Ridge regression

Do the same using ridge regression adjusting the grid search parameters to the appropriate range. 

Docs: 

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html

http://scikit-learn.org/stable/modules/linear_model.html#ridge-regression

In [None]:
from sklearn.linear_model import Ridge

## Bonus: Elastic Net regression

Do the same using elasticnet adjusting the grid search parameters to the appropriate range. 

Docs: 

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html#sklearn.linear_model.ElasticNet

http://scikit-learn.org/stable/modules/linear_model.html#elastic-net

In [None]:
from sklearn.linear_model import ElasticNet