# Title of the notebook

Here we explain what we are trying to achieve in this notebook. We add a bit of text in the necessary bits to tell a story and document the process correctly.

The filename should also have a meaningful name, relfecting the purpose of the file. Update the `readme.md` with a good description of the notebook contents.


In [1]:
import os,sys,inspect
currentdir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
parentdir = os.path.dirname(currentdir)
# Add parent dir to path, so that python finds the lenskit package
sys.path.insert(0,parentdir) 

In [2]:
from lenskit import batch, topn, util
from lenskit import crossfold as xf
from lenskit.algorithms import Recommender, als, item_knn as knn, basic
from lenskit import topn

In [3]:
import numpy as np
import pandas as pd
%matplotlib inline

We load the dataset. It consists of ...

In [6]:
dataset = os.path.join(parentdir,'ml-latest-small/ratings.1.csv')    
ratings = pd.read_csv(dataset, sep=',',
                      names=['user', 'item', 'rating', 'timestamp'], header=0)

In [7]:
ratings.head()

Unnamed: 0,user,item,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


We define the algorithms that we are going to test.

In [8]:
algo_ii = knn.ItemItem(20)
algo_als = als.BiasedMF(50)

In [10]:
def eval(aname, algo, train, test):
    """Here is the explanation of what the function does.
    
    Look at https://github.com/google/styleguide/blob/gh-pages/pyguide.md#38-comments-and-docstrings for good examples
    """
    fittable = util.clone(algo)
    fittable = Recommender.adapt(fittable)
    fittable.fit(train)
    users = test.user.unique()
    # now we run the recommender
    recs = batch.recommend(fittable, users, 100)
    # add the algorithm name for analyzability
    recs['Algorithm'] = aname
    return recs

In [11]:
all_recs = []
test_data = []
for train, test in xf.partition_users(ratings[['user', 'item', 'rating']], 2, xf.SampleFrac(0.2)):
    test_data.append(test)
    all_recs.append(eval('ItemItem', algo_ii, train, test))
    all_recs.append(eval('ALS', algo_als, train, test))

test _reccomend
test _reccomend
test _reccomend
test _reccomend


In [12]:
all_recs = pd.concat(all_recs, ignore_index=True)
all_recs.head()

Unnamed: 0,item,score,user,rank,Algorithm
0,141,3.875,1,1,ItemItem
1,1961,3.875,1,2,ItemItem
2,296,3.875,1,3,ItemItem
3,349,3.875,1,4,ItemItem
4,919,3.875,1,5,ItemItem


In [13]:
test_data = pd.concat(test_data, ignore_index=True)

In [14]:
rla = topn.RecListAnalysis()
rla.add_metric(topn.ndcg)
results = rla.compute(all_recs, test_data)
results.head()

gcols
['user', 'Algorithm']
ti_cols
['user', 'item']
                ndcg
user Algorithm      
1    ALS         NaN
     ItemItem    NaN
2    ALS         NaN
     ItemItem    NaN
3    ALS         NaN
res.index:  MultiIndex(levels=[[1, 2, 3, 4, 5], ['ALS', 'ItemItem']],
           codes=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
           names=['user', 'Algorithm'])
true_key:  (1,)
Recs:
      item     score  user  rank Algorithm
58    2023  3.956323     1     1       ALS
59    1197  3.807115     1     2       ALS
60    2959  3.770803     1     3       ALS
61     318  3.770803     1     4       ALS
62    3949  3.770803     1     5       ALS
63    6711  3.609119     1     6       ALS
64    1221  3.609119     1     7       ALS
65     858  3.609119     1     8       ALS
66     356  3.591872     1     9       ALS
67   50068  3.519655     1    10       ALS
68   48783  3.519655     1    11       ALS
69     296  3.519089     1    12       ALS
70     903  3.472161     1

  return self._getitem_tuple(key)


Unnamed: 0_level_0,Unnamed: 1_level_0,ndcg
user,Algorithm,Unnamed: 2_level_1
1,ALS,0.029296
1,ItemItem,0.0
2,ALS,0.0
2,ItemItem,0.0
3,ALS,0.0


# Conclusions and Remarks

So, as you see from the output, we can conclude that bla bla bla.

# Reproducibility

This section documents the libraries used in this notebook. 

In [15]:
# Load the watermark magic extension: https://github.com/rasbt/watermark
%load_ext watermark

In [16]:
%watermark -i -p watermark,scipy,numba,cffi,pandas,numpy,matplotlib -m -g

2019-03-19T15:35:08+01:00

watermark 1.8.1
scipy 1.2.1
numba 0.43.0
cffi 1.12.2
pandas 0.24.2
numpy 1.16.2
matplotlib 3.0.3

compiler   : GCC 7.2.0
system     : Linux
release    : 4.15.0-46-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 4
interpreter: 64bit
Git hash   : a549e7c063238437ea49e6c208f750b4179afcc4
