In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

import pandas as pd
import numpy as np

The project will focus on the top-k recommendation problem, that is the evaluation will only apply to the first k items that our recommender systems suggest.

It's useful to create some synthetic dataset to test our evaluator. For simplicity, only one user is sufficient.

We only need a synthetic test dataset since our dummy recommender systems won't have any actual implementation. It's worth noting that the test_ratings will only contain 5-rating, since that's what we will actually do when we split the actual dataset into training set and testing set (get only 5-rating into our testing set). This methodology has been used in most papers. Although later on I'll check back on other methodology as well.

In [2]:
test_ratings = pd.DataFrame({
    'user_id': [1, 1, 1, 1],
    'book_id': [1, 2, 3, 4],
    'rating': [5, 5, 5, 5]
})
test_ratings

Unnamed: 0,user_id,book_id,rating
0,1,1,5
1,1,2,5
2,1,3,5
3,1,4,5


And then our dummy recommender

The expected output of our recommender system should be a dataframe linking our user_id to the top-k items that it suggests. For simplicity, k=2 is assumed here, although k=10 will be our main focus.

In [16]:
class DummyRecommender():
    def fit(self, training_set):
        pass
    
    def predict(self):
        return pd.DataFrame({
            1: [1, 5]
        })

As we can see, the dummy recommender suggests items 1 and 5

In [19]:
model = DummyRecommender()
model.fit(_)
model.predict()

Unnamed: 0,1
0,1
1,5


Now to our evaluator

In [67]:
class Evaluator():
    def __init__(self, k=10, training_set=None, testing_set=None):
        self.k = k
        self.training_set = training_set
        self.testing_set = testing_set
        self.result = {}
    
    def _precision(self):
        precisions = np.array([])
        for user_id in self.preds.columns:
            pred = self.preds[user_id]
            truth = self.testing_set[self.testing_set.user_id==user_id].book_id
            precisions = np.append(precisions, np.in1d(pred, truth).sum() / self.k)
        return precisions.mean()
    
    def _recall(self):
        recalls = np.array([])
        for user_id in self.preds.columns:
            pred = self.preds[user_id]
            truth = self.testing_set[self.testing_set.user_id==user_id].book_id
            recalls = np.append(recalls, np.in1d(pred, truth).sum() / truth.count())
        return recalls.mean()
    
    def evaluate(self, model):
        model.fit(self.training_set)
        self.preds = model.predict()
        self.result['precision'] = self._precision()
        self.result['recall'] = self._recall()
        
    def print_result(self):
        print(self.result)

We'll be using precision and recall for now as starters. This project will focus on other metrics which will show more about other qualities instead of simply effectiveness.

Let's test it on our synthetic testing set

In [70]:
evl = Evaluator(k=2, testing_set=test_ratings)
evl.evaluate(model)
evl.print_result()

{'precision': 0.5, 'recall': 0.25}


The dummy prediction is [1, 5]

The truth ratings is [1, 2, 3, 4]

In the truth array, only item 1 is retrieved by our dummy model => precision = 1 / k = 1 / 2 = 0.5

recall = 1 / number of items that the user like = 1 / 4 = 0.25

To conclude, our evaluator is working as intended as of now.

For the next 2-3 notebooks this evaluator will be copied over to test it (before making it into its own file)