## Train Final Model

In this notebook we train the final model which performed best given
the chosen metric (recall at k) and save it to disk.

### Import necessary tools

In [6]:
import joblib
import lightfm
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import clean
import process
import eval

OPTIMAL_LIGHTFM = {'no_components': 30, 'loss': 'warp-kos', 'k': 2}
OPTIMAL_MATRIX = {'method': 'zero_one', 'threshold': 2.5}

In [4]:
raw_df = pd.read_parquet("raw-data.pq")

In [5]:
cleaned_df = clean.merge_similar_name_breweries(raw_df)
cleaned_df = clean.merge_brewery_ids(cleaned_df)
cleaned_df = clean.remove_dup_beer_rows(cleaned_df)
cleaned_df = clean.remove_null_rows(cleaned_df)
cleaned_df = clean.remove_duplicate_reviews(cleaned_df)

### Processing

We now get our training and testing split as well as
define functions to help us to optimize over hyperparameters.

In [7]:
int_matrix_trans = process.InteractionMatrixTransformer(cleaned_df)
matrix = int_matrix_trans.fit(**OPTIMAL_MATRIX)

## Train model

In [8]:
estimator = lightfm.LightFM(**OPTIMAL_LIGHTFM)

In [9]:
estimator.fit(matrix)

<lightfm.lightfm.LightFM at 0x7fe264acd390>

## Evaluate and Save Model

In [10]:
k_recall = eval.recall_at_k(estimator, matrix)
print(f"Final Recall at k (k=10): {k_recall:5.3f}")

Final Recall at k (k=10): 0.626


In [11]:
joblib.dump(estimator, "final-model.joblib")

['final-model.joblib']