# Stacking Keras Models

Involves taking the predictions of one model and using it as the input of another.

Note: The models must be trained on different datasets.

In [1]:
import pandas as pd
import numpy as np
from numpy import unique
import matplotlib.pyplot as plt

In [2]:
from keras.layers import Input, Dense, Embedding, Flatten, Subtract, Add, Concatenate
from keras.models import Model
from keras.utils import plot_model

Using TensorFlow backend.


In [3]:
# load the data
full = pd.read_csv('./data/basket-ball/games_season.csv')
tournament = pd.read_csv('./data/basket-ball/games_tourney.csv')
full.shape, tournament.shape

((312178, 8), (4234, 9))

We used the regular season model to make predictions on the tournament dataset, and got pretty good results! Try to improve your predictions for the tournament by modeling it specifically.

You'll use the prediction from the regular season model as an input to the tournament model. This is a form of "model stacking."

To start, take the regular season model from the previous lesson, and predict on the tournament data. Add this prediction to the tournament data as a new column.

In [4]:
#### MODEL from Notebook 5 ######
# Build the team strength Layer
n_teams = unique(full['team_1']).shape[0]

# Create an embedding layer
team_lookup = Embedding(
    input_dim=n_teams,
    output_dim=1,
    input_length=1,
    name='Team-Strength'
)

teamid_in = Input(shape=(1,)) # input tensor
strength_lookup = team_lookup(teamid_in)
strength_lookup_flat = Flatten()(strength_lookup) # flattened output tensor

# Combine the operations into a single, re-usable model
team_strength_model = Model(teamid_in, strength_lookup_flat, name='Team-Strength-Model')

## Create three input layers
team_in_1 = Input(shape=(1,), name='Team-1-In')
team_in_2 = Input(shape=(1,), name='Team-2-In')
home_in = Input(shape=(1,), name='Home-In')

# Lookup the team inputs in the team strength model
team_1_strength = team_strength_model(team_in_1)
team_2_strength = team_strength_model(team_in_2)

# Combine the team strengths with the home input using a Concatenate layer, then add a Dense layer
out = Concatenate()([team_1_strength, team_2_strength, home_in])
out = Dense(1)(out)

# Instantiate and compile the model
model = Model([team_in_1, team_in_2, home_in], out)
model.compile(optimizer='adam', loss='mean_absolute_error')

# train the model
model.fit(
    [full['team_1'], full['team_2'], full['home']],
    full['score_diff'],
    epochs=1,
    verbose=True,
    validation_split=0.1,
    batch_size=2048
)

Train on 280960 samples, validate on 31218 samples
Epoch 1/1


<keras.callbacks.History at 0x7f5a01e5f5c0>

In [5]:
# Use the model to predict on the games_tourney dataset. 
# The model has three inputs: 'team_1', 'team_2', and 'home' columns. 
# Assign the predictions to a new column, 'pred'.
tournament['pred'] = model.predict([
    tournament['team_1'], 
    tournament['team_2'], 
    tournament['home']
])
print(tournament.shape)
tournament.head()

(4234, 10)


Unnamed: 0,season,team_1,team_2,home,seed_diff,score_diff,score_1,score_2,won,pred
0,1985,288,73,0,-3,-9,41,50,0,0.065246
1,1985,5929,73,0,4,6,61,55,1,0.120679
2,1985,9884,73,0,5,-4,59,63,0,0.105372
3,1985,73,288,0,3,9,50,41,1,0.062881
4,1985,3920,410,0,1,-9,54,63,0,0.185282


### Save tournament set with predictions

We'll save the `tournament` dataset with the `pred` column for later (notebook 7)

In [7]:
import os

os.makedirs('./tmp', exist_ok=True)
tournament.to_feather('./tmp/tournament')

Now we can try building a model for the tournament data based on our regular season predictions.

We'll look at a different way to create models with multiple inputs. This method only works for purely numeric data, but its a much simpler approach to making multi-variate neural networks.

Now we have three numeric columns in the tournament dataset: `'seed_diff'`, `'home'`, and `'pred'`. We'll create a neural network that uses a single input layer to process all three of these numeric inputs.

This model should have a single output to predict the tournament game score difference.

In [8]:
# Create an input layer with 3 columns
input_tensor = Input((3,))

# Connect this input to a Dense layer with 1 unit.
output_tensor = Dense(1)(input_tensor)

# Create a model with input_tensor as the input and output_tensor as the output.
model = Model(input_tensor, output_tensor)

# Compile the model
model.compile(optimizer='adam', loss='mean_absolute_error')

Now that we've enriched the tournament dataset and built a model to make use of the new data, fit that model to the tournament data.

Note that this model has only one input layer that is capable of handling all 3 inputs, so it's inputs and outputs do not need to be a list.

Tournament games are split into a training set and a test set. The tournament games before 2010 are in the training set, and the ones after 2010 are in the test set.

In [10]:
# split the tournament dataset into train and test
train, test = tournament[:3168], tournament[3168:]
train.shape, test.shape

((3168, 10), (1066, 10))

In [13]:
train.tail()

Unnamed: 0,season,team_1,team_2,home,seed_diff,score_diff,score_1,score_2,won,pred
3163,2009,2902,10688,0,7,-2,59,61,0,0.106128
3164,2009,10810,10688,0,8,11,60,49,1,0.113019
3165,2009,7024,10810,0,3,5,60,55,1,0.069208
3166,2009,7078,10810,0,-9,-18,59,77,0,0.075088
3167,2009,10688,10810,0,-8,-11,49,60,0,0.06819


In [11]:
test.head()

Unnamed: 0,season,team_1,team_2,home,seed_diff,score_diff,score_1,score_2,won,pred
3168,2010,2365,401,0,15,29,73,44,1,0.169092
3169,2010,10655,401,0,0,-17,44,61,0,0.161586
3170,2010,2365,647,0,2,7,78,71,1,0.081605
3171,2010,6757,647,0,-8,-8,68,76,0,0.083099
3172,2010,7616,647,0,-11,-9,59,68,0,0.08294


In [14]:
# Fit the model to the games_tourney_train dataset using 1 epoch.
# The input columns are 'home', 'seed_diff', and 'pred'
# The target column is 'score_diff'.
model.fit(
    train[['home', 'seed_diff', 'pred']], # predictors
    train['score_diff'], # target
    epochs=1,
    verbose=True
)

Epoch 1/1


<keras.callbacks.History at 0x7fed587edb38>

Now that we've fit our model to the tournament training data, evaluate it on the tournament test data. Recall that the tournament test data contains games from after 2010.

In [15]:
# Evaluate the model on the games_tourney_test dataset
# Recall that the model's inputs are 'home', 'seed_diff', and 'pred'
# columns and the target column is 'score_diff'
model.evaluate(
    test[['home', 'seed_diff', 'pred']], 
    test['score_diff']
)



9.249671923510354