# Title of the notebook

Here we explain what we are trying to achieve in this notebook. We add a bit of text in the necessary bits to tell a story and document the process correctly.

The filename should also have a meaningful name, relfecting the purpose of the file. Update the `readme.md` with a good description of the notebook contents.


In [1]:
import os,sys,inspect
currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(currentdir)
# Add parent dir to path, so that python finds the lenskit package
sys.path.insert(0,parentdir) 

In [2]:
from lenskit import batch, topn, util
from lenskit import crossfold as xf
from lenskit.algorithms import Recommender, als, item_knn as knn, basic
from lenskit import topn

In [3]:
import numpy as np
import pandas as pd
%matplotlib inline

We load the dataset. It consists of ... blb bla

In [4]:
dataset = os.path.join(parentdir,'ml-latest-small/ratings.1.csv')    
ratings = pd.read_csv(dataset, sep=',',
                      names=['user', 'item', 'rating', 'timestamp'], header=0)

In [5]:
ratings.head()

Unnamed: 0,user,item,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


We define the algorithms that we are going to test.

In [6]:
algo_ii = knn.ItemItem(20)
algo_als = als.BiasedMF(50)

In [7]:
def eval(aname, algo, train, test):
    """Here is the explanation of what the function does.
    
    Look at https://github.com/google/styleguide/blob/gh-pages/pyguide.md#38-comments-and-docstrings for good examples
    """
    fittable = util.clone(algo)
    fittable = Recommender.adapt(fittable)
    fittable.fit(train)
    users = test.user.unique()
    # now we run the recommender
    recs = batch.recommend(fittable, users, 100)
    # add the algorithm name for analyzability
    recs['Algorithm'] = aname
    return recs

In [8]:
all_recs = []
test_data = []
for train, test in xf.partition_users(ratings[['user', 'item', 'rating']], 2, xf.SampleFrac(0.2)):
    test_data.append(test)
    all_recs.append(eval('ItemItem', algo_ii, train, test))
    all_recs.append(eval('ALS', algo_als, train, test))

In [9]:
all_recs = pd.concat(all_recs, ignore_index=True)
all_recs.head()

Unnamed: 0,item,score,user,rank,Algorithm
0,141,3.5625,1,1,ItemItem
1,296,3.5625,1,2,ItemItem
2,349,3.5625,1,3,ItemItem
3,480,3.5625,1,4,ItemItem
4,1961,3.5625,1,5,ItemItem


In [10]:
test_data = pd.concat(test_data, ignore_index=True)

In [11]:
rla = topn.RecListAnalysis()
rla.add_metric(topn.ndcg)
results = rla.compute(all_recs, test_data)
results.head()

  return self._getitem_tuple(key)


Unnamed: 0_level_0,Unnamed: 1_level_0,ndcg
user,Algorithm,Unnamed: 2_level_1
1,ALS,0.0
1,ItemItem,0.0
2,ALS,0.062138
2,ItemItem,0.029803
3,ALS,0.039328


In [12]:
results.groupby('Algorithm').ndcg.mean()

Algorithm
ALS         0.029696
ItemItem    0.017775
Name: ndcg, dtype: float64

# Conclusions and Remarks

So, as you see from the output, we can conclude that bla bla bla.

# Reproducibility

This section documents the libraries used in this notebook. 

In [13]:
# Load the watermark magic extension: https://github.com/rasbt/watermark
%load_ext watermark

ModuleNotFoundError: No module named 'watermark'

In [None]:
%watermark -i -p watermark,scipy,numba,cffi,pandas,numpy,matplotlib -m -g