<h1 style="font-size:42px; text-align:center; margin-bottom:30px;"><span style="color:SteelBlue">Module 5:</span> Model Training</h1>
<hr>

At last, it's time to build our models! 

It might seem like it took us a while to get here, but professional data scientists actually spend the bulk of their time on the 3 steps leading up to this one: 
* Exploratory Analysis
* Data Cleaning
* Feature Engineering

That's because the biggest jumps in model performance are from **better data**, not from fancier algorithms.

This is lengthy and action-packed module, so buckle up and let's dive right in!

<br><hr id="toc">

### In this module...

First, we'll load our analytical base table from Module 3. 

Then, we'll go through the essential modeling steps:

1. [Split your dataset](#split)
2. [Build model pipelines](#pipelines)
3. [Declare hyperparameters to tune](#hyperparameters)
4. [Fit and tune models with cross-validation](#fit-tune)
5. [Evaluate metrics and select winner](#evaluate)

Finally, we'll save the best model as a project deliverable!

<br><hr>

### First, let's import libraries, recruit models, and load the analytical base table.

Let's import our libraries and load the dataset. It's good practice to keep all of your library imports at the top of your notebook or program.

Before anything else, let's import the <code style="color:steelblue">print()</code> function from the future for compatability with Python 3.

In [1]:
from __future__ import print_function
print('Print function ready to serve')

Print function ready to serve


Next, let's import the libraries we'll need for this module

In [7]:
#Numpy for numerical computing
import numpy as np
#Pandas fro dataframes
import pandas as pd
pd.set_option('display.max_columns', 100)
pd.set_option('display.float_format', lambda x: '%.3f' % x)
#matplotlib for visualization
from matplotlib import pyplot as plt
#dsiplay plots in the notebook
%matplotlib inline
#Seaborn for easier visualization
import seaborn as sns
#Scikit-learn for modelling
import sklearn

Next, let's import the 5 algorithms we introduced in module 4.

In [8]:
# Import Elastic Net, Ridge Regression, and Lasso Regression
from sklearn.linear_model import ElasticNet, Ridge, Lasso

#Import Random Forest and Gradient Boosted Trees
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

<strong>Quick note about this module.</strong><br> In this module, we'll be relying heavily on Scikit-Learn, which has many helpful functions we can take advantage of. However, we won't import everything right away. Instead, we'll be importing each function from Scikit-Learn as we need it. That way, we can point out where you can find each function.


Next, let's load the analytical base table from Module 3.

In [9]:
#Load cleaned dataset from module 3
df =pd.read_csv('cleaned_df.csv')
print(df.shape)

(1882, 26)


<br id="split">
# 1. Split your dataset

Let's start with a crucial but sometimes overlooked step: **Spending** your data.

<br>
First, let's import the <code style="color:steelblue">train_test_split()</code> function from Scikit-Learn.

In [10]:
# Function for splitting training and test set
from sklearn.model_selection import train_test_split

Next, separate your dataframe into separate objects for the target variable (<code style="color:steelblue">y</code>) and input features (<code style="color:steelblue">x</code>).

In [11]:
#Create separate object for target variable
y = df.tx_price
#Create separate object for input features
X=df.drop('tx_price',axis=1)

<br><hr style="border-color:royalblue;background-color:royalblue;height:1px;">
## <span style="color:RoyalBlue">Exercise 5.1</span>

**First, split <code style="color:steelblue">X</code> and <code style="color:steelblue">y</code> into training and test sets using the <code style="color:steelblue">train_test_split()</code> function.** 
* **Tip:** Its first two arguments should be X and y.
* **Pass in the argument <code style="color:steelblue">test_size=<span style="color:crimson">0.2</span></code> to set aside 20% of our observations for the test set.**
* **Pass in <code style="color:steelblue">random_state=<span style="color:crimson">1234</span></code> to set the random state for replicable results.**
* You can read more about this function in the <a href="http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html" target="_blank">documentation</a>.

The function returns a tuple with 4 elements: <code style="color:steelblue">(X_train, X_test, y_train, y_test)</code>. Remember, you can **unpack** it. We've given you a head-start below with the code to unpack the tuple:

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=1234)

Let's confirm we have the right number of observations in each subset.

<br>
**Next, run this code to confirm the size of each subset is correct.**

In [13]:
print(len(X_train),len(X_test),len(y_train),len(y_test))

1505 377 1505 377


Next, when we train our models, we can fit them on the <code style="color:steelblue">X_train</code> feature values and <code style="color:steelblue">y_train</code> target values.

Finally, when we're ready to evaluate our models on our test set, we would use the trained models to predict <code style="color:steelblue">X_test</code> and evaluate the predictions against <code style="color:steelblue">y_test</code>.

<hr style="border-color:royalblue;background-color:royalblue;height:1px;">
<div style="text-align:center; margin: 40px 0 40px 0;">
[**Back to Contents**](#toc)
</div>

In [14]:
#Summary statistics of X_train
X_train.describe()

Unnamed: 0,beds,baths,sqft,year_built,lot_size,basement,restaurants,groceries,nightlife,cafes,shopping,arts_entertainment,beauty_spas,active_life,median_age,married,college_grad,property_tax,insurance,median_school,num_schools,tx_year
count,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0
mean,3.423,2.582,2317.345,1982.666,12414.493,0.874,40.508,4.559,5.142,5.373,41.242,3.43,23.587,16.072,38.575,69.007,64.932,467.595,140.826,6.485,2.793,2007.104
std,1.064,0.93,1300.074,20.335,33937.256,0.332,47.005,4.527,8.534,7.516,53.662,4.672,25.894,17.759,6.52,19.578,17.146,231.362,72.957,1.998,0.507,5.198
min,1.0,1.0,500.0,1880.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,22.0,11.0,5.0,88.0,30.0,1.0,1.0,1993.0
25%,3.0,2.0,1352.0,1969.0,1575.0,1.0,7.0,1.0,0.0,0.0,7.0,0.0,4.0,5.0,33.0,59.0,53.0,323.0,96.0,5.0,3.0,2004.0
50%,3.0,3.0,1908.0,1986.0,6050.0,1.0,23.0,3.0,2.0,3.0,22.0,2.0,15.0,10.0,38.0,73.0,66.0,427.0,127.0,7.0,3.0,2007.0
75%,4.0,3.0,3000.0,2000.0,11761.0,1.0,58.0,7.0,6.0,7.0,51.0,5.0,35.0,21.0,43.0,84.0,78.0,573.0,171.0,8.0,3.0,2011.0
max,5.0,6.0,7594.0,2015.0,436471.0,1.0,266.0,24.0,53.0,47.0,340.0,35.0,177.0,94.0,69.0,100.0,100.0,4508.0,1374.0,10.0,4.0,2016.0


Next, standardize the training data manually, creating a new <code style="color:steelblue">X_train_new</code> object.

In [16]:
X_train=X_train.drop(['property_type','exterior_walls','roof'],axis=1)
X_train

Unnamed: 0,beds,baths,sqft,year_built,lot_size,basement,restaurants,groceries,nightlife,cafes,shopping,arts_entertainment,beauty_spas,active_life,median_age,married,college_grad,property_tax,insurance,median_school,num_schools,tx_year
415,3,2,1144,1973,1502,1.000,21,3,1,5,13,3,13,13,41.000,73.000,60.000,258.000,78.000,5.000,3.000,2004
646,3,2,2140,1990,7405,0.000,126,8,29,16,47,11,43,40,45.000,61.000,83.000,647.000,224.000,6.000,3.000,2008
1082,4,2,2298,2004,4007,1.000,26,3,2,4,15,0,9,12,38.000,83.000,48.000,306.000,87.000,7.000,3.000,2016
858,3,3,2118,2009,3484,0.000,31,1,6,0,13,1,8,9,40.000,80.000,66.000,431.000,111.000,7.000,3.000,2015
638,3,2,2100,1993,2178,1.000,29,3,2,4,41,1,18,10,35.000,50.000,66.000,300.000,77.000,7.000,3.000,2007
642,3,2,2112,2000,2953,1.000,39,9,2,6,28,4,21,20,49.000,48.000,52.000,537.000,163.000,6.000,3.000,2013
853,3,3,2086,1930,12632,1.000,127,11,26,17,174,10,71,47,48.000,54.000,89.000,648.000,224.000,6.000,3.000,2009
214,2,2,1080,1985,1951,1.000,67,6,3,8,31,3,35,18,31.000,74.000,59.000,241.000,73.000,8.000,3.000,2011
1303,4,3,2530,1962,11761,1.000,2,0,2,0,7,1,4,2,49.000,84.000,65.000,491.000,149.000,9.000,3.000,2008
1119,4,2,2887,1996,10763,1.000,42,6,5,4,21,1,26,11,30.000,91.000,45.000,479.000,146.000,5.000,3.000,2003


In [17]:
# Standardize X_train
X_train_new = (X_train - X_train.mean())/ X_train.std()

Let's look at the summary statistics for <code style="color:steelblue">X_train_new</code> to confirm standarization worked correctly.
* How can you tell?

In [18]:
# Summary statistics of X_train_new
X_train_new.describe()

Unnamed: 0,beds,baths,sqft,year_built,lot_size,basement,restaurants,groceries,nightlife,cafes,shopping,arts_entertainment,beauty_spas,active_life,median_age,married,college_grad,property_tax,insurance,median_school,num_schools,tx_year
count,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0,1505.0
mean,0.0,-0.0,-0.0,-0.0,-0.0,-0.0,0.0,-0.0,0.0,-0.0,-0.0,0.0,0.0,-0.0,0.0,0.0,-0.0,0.0,0.0,0.0,0.0,0.0
std,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
min,-2.278,-1.7,-1.398,-5.049,-0.366,-2.63,-0.862,-1.007,-0.603,-0.715,-0.769,-0.734,-0.911,-0.905,-2.542,-2.963,-3.495,-1.641,-1.519,-2.745,-3.533,-2.713
25%,-0.398,-0.626,-0.743,-0.672,-0.319,0.38,-0.713,-0.786,-0.603,-0.715,-0.638,-0.734,-0.756,-0.623,-0.855,-0.511,-0.696,-0.625,-0.614,-0.743,0.409,-0.597
50%,-0.398,0.449,-0.315,0.164,-0.188,0.38,-0.372,-0.344,-0.368,-0.316,-0.359,-0.306,-0.332,-0.342,-0.088,0.204,0.062,-0.175,-0.19,0.258,0.409,-0.02
75%,0.542,0.449,0.525,0.852,-0.019,0.38,0.372,0.539,0.101,0.217,0.182,0.336,0.441,0.277,0.679,0.766,0.762,0.456,0.414,0.758,0.409,0.749
max,1.482,3.673,4.059,1.59,12.495,0.38,4.797,4.294,5.608,5.538,5.567,6.757,5.925,4.388,4.667,1.583,2.045,17.464,16.903,1.759,2.379,1.711


For the most part, we'll almost never perform manual standardization because we'll include preprocessing steps in **model pipelines**.

<br>
So let's import the <code style="color:steelblue">make_pipeline()</code> function from Scikit-Learn.

In [19]:
# Function for creating model pipelines
from sklearn.pipeline import make_pipeline

Now let's import the <code style="color:steelblue">StandardScaler</code>, which is used for standardization.

In [20]:
# For standardization
from sklearn.preprocessing import StandardScaler

Next, create a <code style="color:steelblue">pipelines</code> dictionary.
* It should include 3 keys: <code style="color:crimson">'lasso'</code>, <code style="color:crimson">'ridge'</code>, and <code style="color:crimson">'enet'</code>
* The corresponding values should be pipelines that first standardize the data.
* For the algorithm in each pipeline, set <code style="color:steelblue">random_state=<span style="color:crimson">123</span></code> to ensure replicable results.

In [42]:
# Create pipelines dictionary
pipeline_dict = { 'lasso' : make_pipeline(StandardScaler(), Lasso(random_state=123)),
                 'ridge' : make_pipeline(StandardScaler(), Ridge(random_state=123)),
                 'enet' : make_pipeline(StandardScaler(), ElasticNet(random_state=123)) }

In the next exercise, you'll add pipelines for tree ensembles.

<hr style="border-color:royalblue;background-color:royalblue;height:1px;">
## <span style="color:RoyalBlue">Exercise 5.2</span>

**Add pipelines for <code style="color:SteelBlue">RandomForestRegressor</code> and <code style="color:SteelBlue">GradientBoostingRegressor</code> to your pipeline dictionary.**
* Name them <code style="color:crimson">'rf'</code> for random forest and <code style="color:crimson">'gb'</code> for gradient boosted tree.
* Both pipelines should standardize the data first.
* For both, set <code style="color:steelblue">random_state=<span style="color:crimson">123</span></code> to ensure replicable results.

In [46]:
# Add a pipeline for 'rf'
pipeline_dict['rf'] = make_pipeline(StandardScaler(), RandomForestRegressor(random_state=123))
# Add a pipeline for 'gb'
pipeline_dict['gb'] = make_pipeline(StandardScaler(), GradientBoostingRegressor(random_state=123))

Let's make sure our dictionary has pipelines for each of our algorithms.

<br>
**Run this code to confirm that you have all 5 algorithms, each part of a pipeline.**

In [47]:
# Check that we have all 5 algorithms, and that they are all pipelines
for key, value in pipeline_dict.items():
    print( key, type(value) )

lasso <class 'sklearn.pipeline.Pipeline'>
ridge <class 'sklearn.pipeline.Pipeline'>
enet <class 'sklearn.pipeline.Pipeline'>
rf <class 'sklearn.pipeline.Pipeline'>
gb <class 'sklearn.pipeline.Pipeline'>


Now that we have our pipelines, we're ready to move on to declaring hyperparameters to tune.

<hr style="border-color:royalblue;background-color:royalblue;height:1px;">

<div style="text-align:center; margin: 40px 0 40px 0;">
[**Back to Contents**](#toc)
</div>


<br id="hyperparameters">
# 3. Declare hyperparameters to tune

Up to now, we've been casually talking about "tuning" models, but now it's time to treat the topic more formally.

<br>
First, list all the tunable hyperparameters for your Lasso regression pipeline.

In [24]:
# List tuneable hyperparameters of our Lasso pipeline
pipeline_dict['lasso'].get_params()

{'memory': None,
 'steps': [('standardscaler',
   StandardScaler(copy=True, with_mean=True, with_std=True)),
  ('lasso', Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=123,
      selection='cyclic', tol=0.0001, warm_start=False))],
 'standardscaler': StandardScaler(copy=True, with_mean=True, with_std=True),
 'lasso': Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
    normalize=False, positive=False, precompute=False, random_state=123,
    selection='cyclic', tol=0.0001, warm_start=False),
 'standardscaler__copy': True,
 'standardscaler__with_mean': True,
 'standardscaler__with_std': True,
 'lasso__alpha': 1.0,
 'lasso__copy_X': True,
 'lasso__fit_intercept': True,
 'lasso__max_iter': 1000,
 'lasso__normalize': False,
 'lasso__positive': False,
 'lasso__precompute': False,
 'lasso__random_state': 123,
 'lasso__selection': 'cyclic',
 'lasso__tol': 0.0001,
 'lasso__warm_start': False}

Next, declare hyperparameters to tune for Lasso and Ridge regression.
* Try values between 0.001 and 10 for <code style="color:steelblue">alpha</code>.

In [25]:
# Lasso hyperparameters
lasso_hyperparameters = { 'lasso__alpha' : [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10] }

# Ridge hyperparameters 
ridge_hyperparameters = { 'ridge__alpha': [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10] }

Now declare a hyperparameter grid fo Elastic-Net.
* You should tune the <code style="color:steelblue">l1_ratio</code> in addition to <code style="color:steelblue">alpha</code>.

In [26]:
# Elastic Net hyperparameters
enet_hyperparameters = { 'elasticnet__alpha': [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10], 
                       'elasticnet__l1_ratio': [0.1, 0.3, 0.5, 0.7, 0.9]}

<br><hr style="border-color:royalblue;background-color:royalblue;height:1px;">
## <span style="color:RoyalBlue">Exercise 5.3</span>

Let's start by declaring the hyperparameter grid for our random forest.

<br>
**Declare a hyperparameter grid for <code style="color:SteelBlue">RandomForestRegressor</code>.**
* Name it <code style="color:steelblue">rf_hyperparameters</code>

* Set <code style="color:steelblue"><span style="color:crimson">'randomforestregressor\__n_estimators'</span>: [100, 200]</code>
* Set <code style="color:steelblue"><span style="color:crimson">'randomforestregressor\__max_features'</span>: ['auto', 'sqrt', 0.33]</code>

In [27]:
# Random forest hyperparameters
rf_hyperparameters = { 
    'randomforestregressor__n_estimators' : [100, 200],
    'randomforestregressor__max_features': ['auto', 'sqrt', 0.33],
}

Next, let's declare settings to try for our boosted tree.

<br>
**Declare a hyperparameter grid for <code style="color:SteelBlue">GradientBoostingRegressor</code>.**
* Name it <code style="color:steelblue">gb_hyperparameters</code>.
* Set <code style="color:steelblue"><span style="color:crimson">'gradientboostingregressor\__n_estimators'</span>: [100, 200]</code>
* Set <code style="color:steelblue"><span style="color:crimson">'gradientboostingregressor\__learning_rate'</span>: [0.05, 0.1, 0.2]</code>
* Set <code style="color:steelblue"><span style="color:crimson">'gradientboostingregressor\__max_depth'</span>: [1, 3, 5]</code>

In [28]:
# Boosted tree hyperparameters
gb_hyperparameters = { 'gradientboostingregressor__n_estimators': [100, 200],
                     'gradientboostingregressor__learning_rate': [0.05, 0.1, 0.2],
                     'gradientboostingregressor__max_depth': [1, 3, 5]}

Now that we have all of our hyperparameters declared, let's store them in a dictionary for ease of access.

<br>
**Create a <code style="color:steelblue">hyperparameters</code> dictionary**.
* Use the same keys as in the <code style="color:steelblue">pipelines</code> dictionary.
    * If you forgot what those keys were, you can insert a new code cell and call <code style="color:steelblue">pipelines.keys()</code> for a reminder.
* Set the values to the corresponding **hyperparameter grids** we've been declaring throughout this module.
    * e.g. <code style="color:steelblue"><span style="color:crimson">'rf'</span> : rf_hyperparameters</code>
    * e.g. <code style="color:steelblue"><span style="color:crimson">'lasso'</span> : lasso_hyperparameters</code>

In [29]:
# Create hyperparameters dictionary
hyperparameters = {
    'rf' : rf_hyperparameters,
    'gb' : gb_hyperparameters,
    'lasso' : lasso_hyperparameters,
    'ridge' : ridge_hyperparameters,
    'enet' : enet_hyperparameters
}

In [30]:
for key in ['enet', 'gb', 'ridge', 'rf', 'lasso']:
    if key in hyperparameters:
        if type(hyperparameters[key]) is dict:
            print( key, 'was found in hyperparameters, and it is a grid.' )
        else:
            print( key, 'was found in hyperparameters, but it is not a grid.' )
    else:
        print( key, 'was not found in hyperparameters')

enet was found in hyperparameters, and it is a grid.
gb was found in hyperparameters, and it is a grid.
ridge was found in hyperparameters, and it is a grid.
rf was found in hyperparameters, and it is a grid.
lasso was found in hyperparameters, and it is a grid.


<hr style="border-color:royalblue;background-color:royalblue;height:1px;">
<div style="text-align:center; margin: 40px 0 40px 0;">
[**Back to Contents**](#toc)
</div>

<br id="fit-tune">
# 4. Fit and tune models with cross-validation

Now that we have our <code style="color:steelblue">pipelines</code> and <code style="color:steelblue">hyperparameters</code> dictionaries declared, we're ready to tune our models with cross-validation.

<br>
First, let's to import a helper for cross-validation called <code style="color:steelblue">GridSearchCV</code>.

In [32]:
# Helper for cross-validation
from sklearn.model_selection import GridSearchCV

Next, to see an example, set up cross-validation for Lasso regression.

In [33]:
# Create cross-validation object from Lasso pipeline and Lasso hyperparameters
model = GridSearchCV(pipeline_dict['lasso'], hyperparameters['lasso'], cv=10, n_jobs=-1)

Pass <code style="color:steelblue">X_train</code> and <code style="color:steelblue">y_train</code> into the <code style="color:steelblue">.fit()</code> function to tune hyperparameters.

In [43]:
# Fit and tune model
model.fit(X_train, y_train)

  return self.partial_fit(X, y)
  return self.fit(X, y, **fit_params).transform(X)


GridSearchCV(cv=10, error_score='raise-deprecating',
       estimator=Pipeline(memory=None,
     steps=[('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('gradientboostingregressor', GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
             max_leaf_nodes=None, mi...123, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False))]),
       fit_params=None, iid='warn', n_jobs=-1,
       param_grid={'gradientboostingregressor__n_estimators': [100, 200], 'gradientboostingregressor__learning_rate': [0.05, 0.1, 0.2], 'gradientboostingregressor__max_depth': [1, 3, 5]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

By the way, don't worry if you get the message:

<pre style="color:crimson">ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations</pre>

We'll dive into some of the under-the-hood nuances later.

<br>
In the next exercise, we'll write a loop that tunes all of our models.

<br><hr style="border-color:royalblue;background-color:royalblue;height:1px;">
## <span style="color:RoyalBlue">Exercise 5.4</span>

**Create a dictionary of models named <code style="color:SteelBlue">fitted_models</code> that have been tuned using cross-validation.**
* The keys should be the same as those in the <code style="color:SteelBlue">pipelines</code> and <code style="color:SteelBlue">hyperparameters</code> dictionaries. 
* The values should be <code style="color:steelblue">GridSearchCV</code> objects that have been fitted to <code style="color:steelblue">X_train</code> and <code style="color:steelblue">y_train</code>.
* After fitting each model, print <code style="color:crimson">'{name} has been fitted.'</code> just to track the progress.
* **Tip:** We've started you off with some code.

This step can take a few minutes, so please be patient.

In [48]:
# Create empty dictionary called fitted_models
fitted_models = {}

# Loop through model pipelines, tuning each one and saving it to fitted_models
for name, pipeline in pipeline_dict.items():
    # Create cross-validation object from pipeline and hyperparameters
    model = GridSearchCV(pipeline, hyperparameters[name], cv=10, n_jobs=-1)
    
    # Fit model on X_train, y_train
    model.fit(X_train, y_train)
    
    # Store model in fitted_models[name] 
    fitted_models[name] = model
    
    # Print '{name} has been fitted'
    print(name, 'has been fitted.')

  return self.partial_fit(X, y)
  return self.fit(X, y, **fit_params).transform(X)


lasso has been fitted.


  return self.partial_fit(X, y)
  return self.fit(X, y, **fit_params).transform(X)


ridge has been fitted.


  return self.partial_fit(X, y)
  return self.fit(X, y, **fit_params).transform(X)


enet has been fitted.


  return self.partial_fit(X, y)
  return self.fit(X, y, **fit_params).transform(X)


rf has been fitted.


  return self.partial_fit(X, y)
  return self.fit(X, y, **fit_params).transform(X)


gb has been fitted.


<br>
**Run this code to check that the models are of the correct type.**

In [50]:
# Check that we have 5 cross-validation objects
for key, value in fitted_models.items():
    print( key, type(value) )

lasso <class 'sklearn.model_selection._search.GridSearchCV'>
ridge <class 'sklearn.model_selection._search.GridSearchCV'>
enet <class 'sklearn.model_selection._search.GridSearchCV'>
rf <class 'sklearn.model_selection._search.GridSearchCV'>
gb <class 'sklearn.model_selection._search.GridSearchCV'>


<br>
**Finally, run this code to check that the models have been fitted correctly.**

In [51]:
X_test=X_test.drop(['property_type','exterior_walls','roof'],axis=1)
X_test

KeyError: "['property_type' 'exterior_walls' 'roof'] not found in axis"

In [52]:
from sklearn.exceptions import NotFittedError

for name, model in fitted_models.items():
    try:
        pred = model.predict(X_test)
        print(name, 'has been fitted.')
    except NotFittedError as e:
        print(repr(e))

  Xt = transform.transform(Xt)
  Xt = transform.transform(Xt)
  Xt = transform.transform(Xt)
  Xt = transform.transform(Xt)


lasso has been fitted.
ridge has been fitted.
enet has been fitted.
rf has been fitted.
gb has been fitted.


  Xt = transform.transform(Xt)


Nice. Now we're ready to evaluate how our models performed!

<hr style="border-color:royalblue;background-color:royalblue;height:1px;">

<div style="text-align:center; margin: 40px 0 40px 0;">
[**Back to Contents**](#toc)
</div>

<br id="evaluate">
# 5. Evaluate models and select winner

Finally, it's time to evaluate our models and pick the best one.

<br>
Let's display the holdout $R^2$ score for each fitted model.

In [53]:
# Display best_score_ for each fitted model
for name, model in fitted_models.items():
    print(name, model.best_score_)

lasso 0.40065490963009015
ridge 0.4014165555748239
enet 0.40971035156107144
rf 0.7943960866513141
gb 0.8079318583159825
