Now we attempt to build our baseline.

There are a lot of different built-in packages that can help us build a recommender system. For our baseline construction, we are using the `Surprise` package, which has a few different algorithms built in.

We need to load in a custom dataset for `Surprise`. Based on the documentation, we just need to make sure our data frame has three columns: the user id, the item id, and the rating. Additionally, we'll need to specify the rating scale. In our case, users can rate a product discretely from 1 to 5.

The `Surprise` package also provides methods for the training-testing data splitting, which could be of our use.

In [4]:
import pandas as pd
import numpy as np

from surprise import accuracy
from surprise import Dataset
from surprise import Reader
from surprise.model_selection import train_test_split

reviews = pd.read_csv('preprocessed.csv')

reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(reviews[['customer_id', 'product_id', 'star_rating']], reader)
train_set, test_set = train_test_split(data, test_size=0.2)

Briefly, baseline estimates consider the average rating an item receives across the entire dataset, as well as the average rating given by a particular user. Some items may receive much higher ratings than others, and some users may give more critical ratings. These variations are used in our baseline estimates to predict a score.

To compute baseline estimates, two popular algorithms are Stochastic Gradient Descent (SGD) and Alternating Least Squares (ALS). In this instance, ALS would be adopted, which is a commonly used algorithm for collaborative filtering.

GridSearchCV would be employed to identify optimal parameters, but more computational resources or additional time could improve its performance.

In [13]:
from surprise.model_selection import GridSearchCV
from surprise.prediction_algorithms.baseline_only import BaselineOnly
import random

param_grid = {'bsl_options': {'method': ['als'],
                             'n_epochs': random.sample(range(10, 20), 5),
                             'reg_u': random.sample(range(10, 30), 5),
                             'reg_i': random.sample(range(10, 30), 5)}}

gs = GridSearchCV(BaselineOnly, param_grid, measures=['RMSE', 'MAE'], cv=5, n_jobs = -1)
gs.fit(data)
print(gs.best_score['rmse'], gs.best_params['rmse'])

1.2418786306124092 {'bsl_options': {'method': 'als', 'n_epochs': 14, 'reg_u': 16, 'reg_i': 15}}


In [14]:
baseline = BaselineOnly(bsl_options={'method': 'als', 'n_epochs': 14, 'reg_u': 16, 'reg_i': 15})
fit = baseline.fit(train_set)
predictions = fit.test(test_set)
accuracy.rmse(predictions, verbose=False)

Estimating biases using als...


1.2409841103484036

Then we could roughly build a preliminary "Recommendation" System.
There is a written function we can use among the `Surprise` documentations to get the top-N recommendations for each user (https://surprise.readthedocs.io/en/stable/FAQ.html#how-to-get-the-top-n-recommendations-for-each-user).


In [15]:
from collections import defaultdict

def get_top_n(predictions, n=10):

    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n


top_n = get_top_n(predictions, n=10)

In [25]:
top_n[reviews.customer_id[1]]

[('B009DRP9RU', 4.03243074351026),
 ('B0043ZWQWI', 4.018181078562333),
 ('B004JKBEUM', 4.013568309771048),
 ('B0043ZYMP2', 4.009889495014582),
 ('B000FV8PTM', 3.9994831839353524),
 ('B003P1QDL6', 3.969674531673862),
 ('B002WJIQA8', 3.9067737247645327),
 ('B0044XDZII', 3.9045231337715585),
 ('B005MKGOOY', 3.88496863830129),
 ('B001QFYKMW', 3.8682906142917846)]

Of course, as our feature space is quite sparse, the baseline system cannot always submit the "top-10" recommendations, as there is a large number of customers who have only rated one product, as well as many products that have only one rating. This issue would be examined in our future work later in the Final.