Precondition: run Task2_clean.ipynb before running this code.
This creates the input file of the Customer-Product utility matrix

Model based recommendation system.
We found for the memory based item recommendation system, the results were poor.
This is because the Customer-Product matrix is large and sparse, making it unlikely there
are going to be good correlations. (See Task2_collab_item.ipnyb)

To install the surprise library, go to your anaconda prompt and type: 
>conda install -c conda-forge scikit-surprise

In [37]:
import pandas as pd
import numpy as np
from product_data import ProductData
from surprise import Reader, Dataset, SVD, accuracy
from surprise.model_selection import cross_validate, train_test_split

In [38]:
file_name = "recom_pivot.csv"
prod = ProductData(file_name)
prod.set_orig_dataframe_data_types()
prod.df_orig_recommender.head()

Unnamed: 0,Customer_ID,Code_Product,Order_Amount
0,90fada91,5002.0,1
1,9006f9ac,35012.0,1
2,32270891,5005.0,1
3,97e03e47,35078.5,1
4,41949228,49291.5,5


In [39]:
reader = Reader()
data = Dataset.load_from_df(prod.df_orig_recommender, reader)
svd = SVD()
cross_validate(svd, data, measures=["RMSE"], cv = 3)

{'test_rmse': array([0.71280381, 0.77660473, 0.72865585]),
 'fit_time': (0.4248635768890381, 0.3991405963897705, 0.4308507442474365),
 'test_time': (0.08679533004760742, 0.08078384399414062, 0.16452836990356445)}

The best RMSE value is 0.87, which is very impressive

Use the full dataset for training

In [40]:
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1eb86e3b280>

Choose a customer for testing

In [41]:
customer = "00e7053f"
prod.df_orig_recommender[prod.df_orig_recommender["Customer_ID"] == customer]

Unnamed: 0,Customer_ID,Code_Product,Order_Amount
9397,00e7053f,5001.0,1
12882,00e7053f,49292.0,4
17920,00e7053f,5000.5,3
38532,00e7053f,49291.5,1
38766,00e7053f,5027.0,1
49027,00e7053f,5000.5,2


Try some values to test the user

In [42]:
Code_Products = [5001.0, 49292.0, 5000.5, 49291.5, 5027.0, 5000.5, 35087.0, 40017.5, 10001.0]
Customer_Products_Bought = [5001.0, 49292.0, 5000.5, 49291.5, 5027.0, 5000.5]

print(f"For Customer = {customer}.")
for code in Code_Products:
    if code in Customer_Products_Bought:
        print(f"\tProduct = {code} bought")
    else:
        print(f"\tProduct = {code} NOT bought")
    p = svd.predict(customer, code)
    print(f"\t{p}\n\t----------------------------")

For Customer = 00e7053f.
	Product = 5001.0 bought
	user: 00e7053f   item: 5001.0     r_ui = None   est = 1.52   {'was_impossible': False}
	----------------------------
	Product = 49292.0 bought
	user: 00e7053f   item: 49292.0    r_ui = None   est = 1.65   {'was_impossible': False}
	----------------------------
	Product = 5000.5 bought
	user: 00e7053f   item: 5000.5     r_ui = None   est = 1.74   {'was_impossible': False}
	----------------------------
	Product = 49291.5 bought
	user: 00e7053f   item: 49291.5    r_ui = None   est = 1.38   {'was_impossible': False}
	----------------------------
	Product = 5027.0 bought
	user: 00e7053f   item: 5027.0     r_ui = None   est = 1.28   {'was_impossible': False}
	----------------------------
	Product = 5000.5 bought
	user: 00e7053f   item: 5000.5     r_ui = None   est = 1.74   {'was_impossible': False}
	----------------------------
	Product = 35087.0 NOT bought
	user: 00e7053f   item: 35087.0    r_ui = None   est = 1.50   {'was_impossible': Fals

It doesn't seem as though the reommender can predict zero sales for a customer, even though that would be correct for all the customers most of the time.
Every time the code is run there are variations in the estimates

In [43]:
train_data, test_data = train_test_split(data, test_size=0.2)
svd.fit(train_data)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1eb86e3b280>

In [44]:
predictions = svd.test(test_data)
accuracy.rmse(predictions)

RMSE: 0.7886


0.7885525714890977