# Final model performance

In this notebook we run our best-performing model on the test set to get a final estimate of its performance.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

## Define the model

In `3_ModelSelection.ipynb`, we used cross-validation to select ...

In [None]:
# define/instantiate model from 3_ModelSelection.ipynb

## Load the data

In [None]:
df_train = pd.read_csv('./train_data.csv')
# df_test = pd.read_csv('./test_data.csv')

We will now perform some imputation and scaling. It is important to note that all of the transformations we will perform here will be performed "within season," meaning that the data for a given season is transformed using information from *only* that season and no other.

In particular, since at the time of prediction we will have access to all of the data for that particular season, there is ***no data leakage*** occurring here.

In [None]:
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler

for df in [df_train, df_test]:
    # fill all null stats with 0 (except for SALARY)
    null_cols = df_train.count()[df_train.count() < len(df_train)].index.drop('SALARY')
    df_train[null_cols] = df_train[null_cols].fillna(0)

    # use mean imputer for SALARY within each season
    df_train.loc[df_train['SALARY']==0, 'SALARY'] = None
    mean_imputer = SimpleImputer(strategy='mean')
    df_train['SALARY'] = (
        df_train
        .groupby('SEASON_START')['SALARY']
        .transform(lambda x: mean_imputer.fit_transform(x.values.reshape(-1,1)).ravel())
    )

    # use standard scaler within each season
    cols_to_rescale = df_train.select_dtypes(include=['float']).columns
    scaler = StandardScaler()
    df_train[cols_to_rescale] = (
        df_train
        .groupby('SEASON_START')[cols_to_rescale]
        .transform(lambda x: scaler.fit_transform(x.values.reshape(-1,1)).ravel())
    )

## Testing model performance

In order to get the most realistic estimate of our model performance, we will perform walk-forward testing. The idea is as follows:

0. **Initialize datasets**: Begin with training set `df_train` (seasons 1990-91 through 2016-17) and test set `df_test[df_test['SEASON_START']==2017]` (season 2017-18).

1. **Train the model**: Fit the model using the current training set.

2. **Evaluate performance**: Use the model to predict outcomes for the current test set and compute performance metrics.

3. **Itereate**: Expand the training set to include the current test set, replace the test set with data from the next season, and repeat from Step 1 until no future data is available.

See the following table for explicit details.

| Iteration | Training set start seasons | Test set start season |
| ---       | ---                        | ---                   |
| 1         | 1990 - 2016                | 2017                  |
| 2         | 1990 - 2017                | 2018                  |
| 3         | 1990 - 2018                | 2019                  |
| 4         | 1990 - 2019                | 2020                  |
| 5         | 1990 - 2020                | 2021                  |
| 6         | 1990 - 2021                | 2022                  |

In [None]:
# perform walk-forward testing, compute average performace metrics across each iteration

## Conclusions

**TODO** write up conclusions, make nice figures, etc.