# Learning Model (SVD)
Now that we've implemented naive user-based CF, we'll implement a more advanced model - the SVD model.

INSERT DESCRIPTION HERE

One disadvantage of the SVD model is that it can't generalize to unseen items - predictions rely on item and user factors which are learned during model training. When training our model, 

In [1]:
%%capture
import sys
import os

# Add project root to Python path
project_root = os.path.abspath("..")
if project_root not in sys.path:
    sys.path.append(project_root)
# import packages
from utils.imports import *
# import user-defined funcs and classes
from utils.helpers import plot_heatmap
from utils.helpers import safe_len
from models.ManualSVD import ManualSVD

In [2]:
# import pandas dataframes
with open("../data/dataframes.pkl", "rb") as f:
    data = pickle.load(f)

train = data["train"]
validation = data["validation"]
baseline = data["baseline"]

# load sparse matrix
ui_csr = load_npz("../data/ui_csr.npz")

# load encodings
with open("../artifacts/user_encoder.pkl", "rb") as f:
    user_encoder = pickle.load(f)
with open("../artifacts/item_encoder.pkl", "rb") as f:
    item_encoder = pickle.load(f)
with open("../artifacts/user_map.pkl", "rb") as f:
    user_map = pickle.load(f)
with open("../artifacts/item_map.pkl", "rb") as f:
    item_map = pickle.load(f)

In [3]:
# this grid search takes a long time to run so it's quoted out

k_grid = np.array([25,50,75, 100]) # set k grid
reg_grid = np.array([0.001, 0.02, 0.1]) # set reg grid
grid = itertools.product(k_grid, reg_grid)
models = []
# dataframe for grid search
grid_search = pd.DataFrame(columns=('reg', 'k', 'RMSE', 'MAE', 'coverage'))
for k, reg in grid:
    # fit model with grid params
    model = ManualSVD(k=k, reg=reg)
    model.fit(ui_csr, validation, verbose=False)
    #evaluate
    N = 10
    coverage = model.top_N_coverage(N) # get training set item catalog coverage @ top 10 beers
    # add results to dataframe
    grid_search.loc[len(grid_search)]= (reg, k, model.val_RMSE_clipped, model.val_MAE, coverage)
    # save model
    models.append(model)
    print('\n')

Final validation RMSE is: 0.695346195478145
Params: 25 latent factors, 0.005 learning rate, 0.001 reg. parameter
Stopped after 14 iterations


Final validation RMSE is: 0.694153797655426
Params: 25 latent factors, 0.005 learning rate, 0.02 reg. parameter
Stopped after 15 iterations


Final validation RMSE is: 0.6942797319143535
Params: 25 latent factors, 0.005 learning rate, 0.1 reg. parameter
Stopped after 14 iterations


Final validation RMSE is: 0.7000727484219289
Params: 50 latent factors, 0.005 learning rate, 0.001 reg. parameter
Stopped after 12 iterations


Final validation RMSE is: 0.697586883050648
Params: 50 latent factors, 0.005 learning rate, 0.02 reg. parameter
Stopped after 13 iterations


Final validation RMSE is: 0.6960137657139708
Params: 50 latent factors, 0.005 learning rate, 0.1 reg. parameter
Stopped after 13 iterations


Final validation RMSE is: 0.7035930348028471
Params: 75 latent factors, 0.005 learning rate, 0.001 reg. parameter
Stopped after 11 iterations


F

In [4]:
# save results of grid search
with open("../artifacts/models.pkl", "wb") as f:
   pickle.dump(models, f)
with open("../data/grid_search.pkl", "wb") as f:
   pickle.dump(grid_search, f)