# Exercise: Using a trained model on a new dataset

Previously, we created a basic model that let us find the relationship between a person's shoe length and their height. We showed how this model could then be used to make a prediction about a new, previously unseen person.

It's common to build, train, then use a model while we are just experimenting, but in the real world, we might want to train our model in Seattle in order to use three months later in New York on completely new data. How can we do this?

Here we will:

1. Create a basic model
2. Save it to disk
3. Load it from disk
4. Use it to make predictions about a new dataset.

## Load the first dataset

Let's begin by opening the dataset from file

In [1]:
import pandas

# Load a file containing people's shoe sizes
# and height, both in cm
data = pandas.read_csv('Data/shoe-size-height.csv')

# Print the first few rows
data.head()


Unnamed: 0,shoe_length,height
0,35,170.0
1,32,195.0
2,15,168.0
3,32,190.9
4,24,185.0


## Create and train a model

As we have done before, we will create a simple Linear Regression model and train it on our dataset.

In [2]:
import statsmodels.formula.api as smf
from scipy import stats

# Fit a simple model that finds a linear relationship
# between shoe length and height, which we can use later
# to predict someone's height based on their shoe length
model = smf.ols(formula = "height ~ shoe_length", data = data).fit()

print("Model trained!")

  import pandas.util.testing as tm
Model trained!


## Saving and loading a model

Our model is ready to use, but we don't need it yet. Let's save it to disk.

In [3]:
import joblib
model_filename = './height_shoes_model.pkl'
joblib.dump(model, model_filename)

print("Model saved!")

Model saved!


Loading our model is just as easy:

In [4]:
model_loaded = joblib.load(model_filename)


print("We have loaded a model with the following parameters:")
print(model_loaded.params)


We have loaded a model with the following parameters:
Intercept      158.307560
shoe_length      1.095146
dtype: float64


## Putting it together

The normal use-case for saved models is that we we receive new data, and we want to run predictions on it without re-training the model.

Let's put everything here together to make a function that loads a model from disk and uses it to make predictions on new data.

In [5]:
def load_model_and_predict(path_to_model, path_to_data):
    '''
    This function loads a pretrained model and a dataset of 
    shoe sizes from disk. It uses the model to predict how 
    tall the people are, based on their shoe size.
    '''

    # Load the model and print basic information about it
    loaded_model = joblib.load(path_to_model)

    print("We have loaded a model with the following parameters:")
    print(loaded_model.params)

    # Load the dataset
    data = pandas.read_csv(path_to_data)
    print("\nTop rows of our input data:")
    print(data.head())

    # Use the model to make a prediction
    predicted_heights = loaded_model.predict(data.shoe_length)

    # Print out a table of the shoe sizes and predicted heights
    dataframe = pandas.DataFrame({"shoe_length (cm)":data.shoe_length, "Predicted height (cm)":predicted_heights})
    print("\nPredictions:")
    print(dataframe)

load_model_and_predict(model_filename, 'Data\shoe-sizes.csv')

We have loaded a model with the following parameters:
Intercept      158.307560
shoe_length      1.095146
dtype: float64

Top rows of our input data:
   shoe_length
0         31.0
1         19.0
2         34.0
3         15.0
4         31.0

Predictions:
    shoe_length (cm)  Predicted height (cm)
0               31.0             192.257091
1               19.0             179.115337
2               34.0             195.542530
3               15.0             174.734752
4               31.0             192.257091
5               24.0             184.591068
6               33.0             194.447383
7               24.0             184.591068
8               16.0             175.829898
9               19.0             179.115337
10              15.0             174.734752
11              28.0             188.971653
12              35.0             196.637676
13              28.0             188.971653
14              25.0             185.686214
15              20.0             180.21048


## Summary

Well done!
In this exercise, we practiced:

1. Creating basic models
2. Training, then saving them to disk
3. Loading them from disk
4. Making predictions with them using new data sets