# Table of contents

1. [Load the dataset](#load_the_dataset)
2. [Split the dataset](#split_the_dataset)
3. [Fitting the recommender](#fitting)
4. [Sequential evaluation](#seq_evaluation)  
    4.1 [Evaluation with sequentially revaeled user profiles](#eval_seq_rev)  
    4.2 [Evaluation with "static" user profiles](#eval_static)  
5. [Analysis of next-item recommendation](#next-item)  
    5.1 [Evaluation with different recommendation list lengths](#next-item_list_length)  
    5.2 [Evaluation with different user profile lengths](#next-item_profile_length)
    
---------------------
## **sraps**: 
    
    FPMC无法支持冷启动。如果新来了一个用户，我们没有他的历史数据（听歌记录、购买记录等）
    那么没法用FPMC对他的下一首歌或是下一项购买行为进行预测(推荐）
    因为FPMC是基于个性化的推荐，它需要训练一个用户U与物品I之间的关系矩阵，
    新用户在训练好的模型里是没有这个矩阵的。
    
    而新用户是可以在矩阵分解模型（非个性化）下给出预测的。
    
    我们可以考虑每个人（物）都具有共性和特性，在冷启动状态下，只能依照其共性进行推测，
    其个性可以在产生历史记录后进行学习（各种模型）。
    当然，也可以引入知识图谱，在冷启动状态，极大化其特性的猜测，根据用户提供信息的丰富度，
    进行初始化特性与共性的比例调节。
    
    这篇文档实现的是seq到下一个item或下个seq的预测
    可以考虑set到set的实现

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
from util.data_utils import create_seq_db_filter_top_k, sequences_to_spfm_format
from util.split import last_session_out_split
from util.metrics import precision, recall, mrr
from util import evaluation
from recommenders.FPMCRecommender import FPMCRecommender

In [4]:
import datetime

In [5]:
def get_test_sequences_and_users(test_data, given_k, train_users):
    # we can run evaluation only over sequences longer than abs(LAST_K)
    mask = test_data['sequence'].map(len) > abs(given_k)
    mask &= test_data['user_id'].isin(train_users)
    test_sequences = test_data.loc[mask, 'sequence'].values
    test_users = test_data.loc[mask, 'user_id'].values
    return test_sequences, test_users

<a id='load_the_dataset'></a>

# 1. Load the dataset

For this hands-on session we will use a dataset of user-listening sessions crawled from [last.fm](https://www.last.fm/). In detail, we will use a subset of the following dataset:

* 30Music listening and playlists dataset, Turrin et al., ACM RecSys 2015 ([paper](https://home.deib.polimi.it/pagano/portfolio/papers/30Musiclisteningandplaylistsdataset.pdf))

In [None]:
# unzip the dataset, if you haven't already done it
# ! unzip datasets/sessions.zip -d datasets

In [8]:
dataset_path = 'datasets\\sessions\\sessions.csv'
# load this sample if you experience a severe slowdown with the previous dataset
#dataset_path = 'datasets/sessions_sample_10.csv'

# for the sake of speed, let's keep only the top-1k most popular items in the last month
dataset = create_seq_db_filter_top_k(path=dataset_path, topk=1000, last_months=1) 

is deprecated and will be removed in a future version
  aggregated = groups['item_id'].agg({'sequence': lambda x: list(map(str, x))})


Let's see at how the dataset looks like

In [9]:
dataset.head()

Unnamed: 0,session_id,sequence,ts,user_id
0,122,"[1762, 3700, 638]",1420059172,2432
1,223,"[3772, 3953]",1419418147,15861
2,226,"[245, 1271, 379]",1419433841,15861
3,243,"[245, 1197, 4307, 3868]",1421674741,15861
4,245,"[409, 234, 2334, 2431, 231, 4738, 219, 2403]",1421679507,15861


Let's show some statistics about the dataset

In [10]:
from collections import Counter
cnt = Counter()
dataset.sequence.map(cnt.update);

In [11]:
sequence_length = dataset.sequence.map(len).values
n_sessions_per_user = dataset.groupby('user_id').size()

print('Number of items: {}'.format(len(cnt)))
print('Number of users: {}'.format(dataset.user_id.nunique()))
print('Number of sessions: {}'.format(len(dataset)) )

print('\nSession length:\n\tAverage: {:.2f}\n\tMedian: {}\n\tMin: {}\n\tMax: {}'.format(
    sequence_length.mean(), 
    np.quantile(sequence_length, 0.5), 
    sequence_length.min(), 
    sequence_length.max()))

print('Sessions per user:\n\tAverage: {:.2f}\n\tMedian: {}\n\tMin: {}\n\tMax: {}'.format(
    n_sessions_per_user.mean(), 
    np.quantile(n_sessions_per_user, 0.5), 
    n_sessions_per_user.min(), 
    n_sessions_per_user.max()))

Number of items: 1000
Number of users: 17850
Number of sessions: 66249

Session length:
	Average: 4.19
	Median: 3.0
	Min: 1
	Max: 198
Sessions per user:
	Average: 3.71
	Median: 3.0
	Min: 1
	Max: 38


In [22]:
a = dataset.groupby('user_id')

In [24]:
print(a)

<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000001EE8B66FE88>


In [12]:
print('Most popular items: {}'.format(cnt.most_common(5)))

Most popular items: [('443', 1983), ('1065', 1533), ('67', 1471), ('1622', 1220), ('2308', 1215)]


<a id='split_the_dataset'></a>

# 2. Split the dataset

For simplicity, let's split the dataset by assigning the **last session** of every user to the **test set**, and **all the previous** ones to the **training set**.

In [13]:
train_data, test_data = last_session_out_split(dataset)
print("Train sessions: {} - Test sessions: {}".format(len(train_data), len(test_data)))

Train sessions: 48399 - Test sessions: 17850


In [27]:
train_data.head(10)

Unnamed: 0,session_id,sequence,ts,user_id
1,223,"[3772, 3953]",1419418147,15861
2,226,"[245, 1271, 379]",1419433841,15861
3,243,"[245, 1197, 4307, 3868]",1421674741,15861
4,245,"[409, 234, 2334, 2431, 231, 4738, 219, 2403]",1421679507,15861
6,353,"[4255, 652, 4256, 4257, 4256, 2247]",1420927951,4296
7,354,"[652, 4257, 4255, 4256, 2247, 652, 4256, 2247,...",1420935007,4296
8,355,"[793, 3489, 3090, 1762]",1420979477,4296
9,357,"[793, 3489]",1421003874,4296
11,386,"[1251, 1255, 1252, 3178, 1256, 1253, 443, 1254...",1420522353,30980
12,388,"[1097, 276, 7182, 1254, 1864, 7431, 1255, 274]",1420646888,30980


In [30]:
test_data.head(10)

Unnamed: 0,session_id,sequence,ts,user_id
0,122,"[1762, 3700, 638]",1420059172,2432
5,246,"[245, 1197, 4307, 3868]",1421682592,15861
10,359,[1762],1421018535,4296
20,404,"[67, 471, 3551, 2466, 1475, 2267]",1421697722,30980
22,657,"[665, 2204, 2205, 2206, 2207]",1421713004,5953
25,689,"[3208, 531, 3081, 2285]",1421576868,8677
28,1614,"[3492, 2418, 2420, 1209, 3933, 7083, 3491, 247...",1421049831,12789
29,1861,"[6315, 3391]",1421422293,17917
30,2042,"[2211, 2208, 2205]",1421066642,8266
37,2218,"[788, 2353]",1421672287,17265


<a id='fitting'></a>

# 3. Fitting the recommender

Here we fit the recommedation algorithm over the sessions in the training set.  
This recommender is based on the following paper:

_Rendle, S., Freudenthaler, C., & Schmidt-Thieme, L. (2010). Factorizing personalized Markov chains for next-basket recommendation. Proceedings of the 19th International Conference on World Wide Web - WWW ’10, 811_

In short, FPMC factorizes a personalized order-1 transition tensor using Tensor Factorization with pairwise loss function akin to BPR (Bayesian Pairwise Ranking).

<img src="images/fpmc.png" width="200px" />

TF allows to impute values for the missing transitions between items for each user. For this reason, FPMC can be used for generating _personalized_ recommendations in session-aware recommenders as well.

In this notebook, you will be able to change the number of latent factors and a few other learning hyper-parameters and see the impact on the recommendation quality.

The class `FPMCRecommender` has the following initialization hyper-parameters:
* `n_factor`: (optional) the number of latent factors
* `learn_rate`: (optional) the learning rate
* `regular`: (optional) the L2 regularization coefficient
* `n_epoch`: (optional) the number of training epochs
* `n_neg`: (optional) the number of negative samples used in BPR learning


In [31]:
recommender = FPMCRecommender(n_factor=16,
                              n_epoch=5)
recommender.fit(train_data)

2019-11-07 10:58:31,906 - INFO - epoch 0 done
2019-11-07 10:58:33,044 - INFO - epoch 1 done
2019-11-07 10:58:34,194 - INFO - epoch 2 done
2019-11-07 10:58:35,334 - INFO - epoch 3 done
2019-11-07 10:58:36,526 - INFO - epoch 4 done


<a id='seq_evaluation'></a>


# 4. Sequential evaluation

In the evaluation of sequence-aware recommenders, each sequence in the test set is split into:
- the _user profile_, used to compute recommendations, is composed by the first *k* events in the sequence;
- the _ground truth_, used for performance evaluation, is composed by the remainder of the sequence.

In the cells below, you can control the dimension of the _user profile_ by assigning a **positive** value to `GIVEN_K`, which correspond to the number of events from the beginning of the sequence that will be assigned to the initial user profile. This ensures that each user profile in the test set will have exactly the same initial size, but the size of the ground truth will change for every sequence.

Alternatively, by assigning a **negative** value to `GIVEN_K`, you will set the initial size of the _ground truth_. In this way the _ground truth_ will have the same size for all sequences, but the dimension of the user profile will differ.

In [32]:
METRICS = {'precision':precision, 
           'recall':recall,
           'mrr': mrr}
TOPN = 10 # length of the recommendation list

<a id='eval_seq_rev'></a>

## 4.1 Evaluation with sequentially revealed user-profiles

Here we evaluate the quality of the recommendations in a setting in which user profiles are revealed _sequentially_.

The _user profile_ starts from the first `GIVEN_K` events (or, alternatively, from the last `-GIVEN_K` events if `GIVEN_K<0`).  
The recommendations are evaluated against the next `LOOK_AHEAD` events (the _ground truth_).  
The _user profile_ is next expanded to the next `STEP` events, the ground truth is scrolled forward accordingly, and the evaluation continues until the sequence ends.

In typical **next-item recommendation**, we start with `GIVEN_K=1`, generate a set of **alternatives** that will evaluated against the next event in the sequence (`LOOK_AHEAD=1`), move forward of one step (`STEP=1`) and repeat until the sequence ends.

You can set the `LOOK_AHEAD='all'` to see what happens if you had to recommend a **whole sequence** instead of a set of a set of alternatives to a user.

NOTE: Metrics are averaged over each sequence first, then averaged over all test sequences.

** (TODO) Try out with different evaluation settings to see how the recommandation quality changes. **


![](gifs/sequential_eval.gif)

In [33]:
# GIVEN_K=1, LOOK_AHEAD=1, STEP=1 corresponds to the classical next-item evaluation
GIVEN_K = 1
LOOK_AHEAD = 1
STEP=1

In [34]:
test_sequences, test_users = get_test_sequences_and_users(test_data, GIVEN_K, train_data['user_id'].values) # we need user ids now!
print('{} sequences available for evaluation ({} users)'.format(len(test_sequences), len(np.unique(test_users))))

results = evaluation.sequential_evaluation(recommender,
                                           test_sequences=test_sequences,
                                           users=test_users,
                                           given_k=GIVEN_K,
                                           look_ahead=LOOK_AHEAD,
                                           evaluation_functions=METRICS.values(),
                                           top_n=TOPN,
                                           scroll=True,  # scrolling averages metrics over all profile lengths
                                           step=STEP)

9200 sequences available for evaluation (9200 users)


100%|██████████████████████████████████████████████████████████████████████████████| 9200/9200 [13:40<00:00, 11.21it/s]


In [41]:
train_data.head()

Unnamed: 0,session_id,sequence,ts,user_id
1,223,"[3772, 3953]",1419418147,15861
2,226,"[245, 1271, 379]",1419433841,15861
3,243,"[245, 1197, 4307, 3868]",1421674741,15861
4,245,"[409, 234, 2334, 2431, 231, 4738, 219, 2403]",1421679507,15861
6,353,"[4255, 652, 4256, 4257, 4256, 2247]",1420927951,4296


In [36]:
test_sequences

array([list(['245', '1197', '4307', '3868']),
       list(['67', '471', '3551', '2466', '1475', '2267']),
       list(['665', '2204', '2205', '2206', '2207']), ...,
       list(['431', '2445']),
       list(['419', '930', '419', '908', '3493', '5294', '5297', '5299', '5296']),
       list(['40', '3319', '1546', '1484', '1417', '155', '3495', '3188'])],
      dtype=object)

In [37]:
test_users

array([15861, 30980,  5953, ..., 26213,  4503, 12934], dtype=int64)

In [35]:
print('Sequential evaluation (GIVEN_K={}, LOOK_AHEAD={}, STEP={})'.format(GIVEN_K, LOOK_AHEAD, STEP))
for mname, mvalue in zip(METRICS.keys(), results):
    print('\t{}@{}: {:.4f}'.format(mname, TOPN, mvalue))

Sequential evaluation (GIVEN_K=1, LOOK_AHEAD=1, STEP=1)
	precision@10: 0.0232
	recall@10: 0.2319
	mrr@10: 0.1077


<a id='eval_static'></a>

## 4.2 Evaluation with "static" user-profiles

Here we evaluate the quality of the recommendations in a setting in which user profiles are instead _static_.

The _user profile_ starts from the first `GIVEN_K` events (or, alternatively, from the last `-GIVEN_K` events if `GIVEN_K<0`).  
The recommendations are evaluated against the next `LOOK_AHEAD` events (the _ground truth_).  

The user profile is *not extended* and the ground truth *doesn't move forward*.
This allows to obtain "snapshots" of the recommendation performance for different user profile and ground truth lenghts.

Also here you can set the `LOOK_AHEAD='all'` to see what happens if you had to recommend a **whole sequence** instead of a set of a set of alternatives to a user.

**(TODO) Try out with different evaluation settings to see how the recommandation quality changes.**

In [None]:
GIVEN_K = 1
LOOK_AHEAD = 'all'
STEP=1

In [None]:
test_sequences, test_users = get_test_sequences_and_users(test_data, GIVEN_K, train_data['user_id'].values) # we need user ids now!
print('{} sequences available for evaluation ({} users)'.format(len(test_sequences), len(np.unique(test_users))))

results = evaluation.sequential_evaluation(recommender,
                                           test_sequences=test_sequences,
                                           users=test_users,
                                           given_k=GIVEN_K,
                                           look_ahead=LOOK_AHEAD,
                                           evaluation_functions=METRICS.values(),
                                           top_n=TOPN,
                                           scroll=False  # notice that scrolling is disabled!
                                          )  

In [None]:
print('Sequential evaluation (GIVEN_K={}, LOOK_AHEAD={}, STEP={})'.format(GIVEN_K, LOOK_AHEAD, STEP))
for mname, mvalue in zip(METRICS.keys(), results):
    print('\t{}@{}: {:.4f}'.format(mname, TOPN, mvalue))

<a id='next-item'></a>

## 5. Analysis of next-item recommendation

Here we propose to analyse the performance of the recommender system in the scenario of *next-item recommendation* over the following dimensions:

* the *length* of the **recommendation list**, and
* the *length* of the **user profile**.

NOTE: This evaluation is by no means exhaustive, as different the hyper-parameters of the recommendation algorithm should be *carefully tuned* before drawing any conclusions. Unfortunately, given the time constraints for this tutorial, we had to leave hyper-parameter tuning out. A very useful reference about careful evaluation of (session-based) recommenders can be found at:

*  Evaluation of Session-based Recommendation Algorithms, Ludewig and Jannach, 2018 ([paper](https://arxiv.org/abs/1803.09587))

<a id='next-item_list_length'></a>

### 5.1 Evaluation for different recommendation list lengths

In [38]:
GIVEN_K = 1
LOOK_AHEAD = 1
STEP = 1
topn_list = [1, 5, 10, 20, 50, 100]

In [39]:
# ensure that all sequences have the same minimum length 
test_sequences, test_users = get_test_sequences_and_users(test_data, GIVEN_K, train_data['user_id'].values) # we need user ids now!
print('{} sequences available for evaluation ({} users)'.format(len(test_sequences), len(np.unique(test_users))))

9200 sequences available for evaluation (9200 users)


In [None]:
res_list = []

for topn in topn_list:
    print('Evaluating recommendation lists with length: {}'.format(topn))
    res_tmp = evaluation.sequential_evaluation(recommender,
                                               test_sequences=test_sequences,
                                               users=test_users,
                                               given_k=GIVEN_K,
                                               look_ahead=LOOK_AHEAD,
                                               evaluation_functions=METRICS.values(),
                                               top_n=topn,
                                               scroll=True,  # here we average over all profile lengths
                                               step=STEP)
    mvalues = list(zip(METRICS.keys(), res_tmp))
    res_list.append((topn, mvalues))

In [None]:
# show separate plots per metric
fig, axes = plt.subplots(nrows=1, ncols=len(METRICS), figsize=(15,5))
res_list_t = list(zip(*res_list))
for midx, metric in enumerate(METRICS):
    mvalues = [res_list_t[1][j][midx][1] for j in range(len(res_list_t[1]))]
    ax = axes[midx]
    ax.plot(topn_list, mvalues)
    ax.set_title(metric)
    ax.set_xticks(topn_list)
    ax.set_xlabel('List length')

<a id='next-item_profile_length'></a>

### 5.2 Evaluation for different user profile lengths

In [None]:
given_k_list = [1, 2, 3, 4]
LOOK_AHEAD = 1
STEP = 1
TOPN = 20

In [None]:
res_list = []

for gk in given_k_list:
    print('Evaluating profiles having length: {}'.format(gk))
    res_tmp = evaluation.sequential_evaluation(recommender,
                                               test_sequences=test_sequences,
                                               users=test_users,
                                               given_k=gk,
                                               look_ahead=LOOK_AHEAD,
                                               evaluation_functions=METRICS.values(),
                                               top_n=TOPN,
                                               scroll=False,  # here we stop at each profile length
                                               step=STEP)
    mvalues = list(zip(METRICS.keys(), res_tmp))
    res_list.append((gk, mvalues))

In [None]:
# show separate plots per metric
fig, axes = plt.subplots(nrows=1, ncols=len(METRICS), figsize=(15,5))
res_list_t = list(zip(*res_list))
for midx, metric in enumerate(METRICS):
    mvalues = [res_list_t[1][j][midx][1] for j in range(len(res_list_t[1]))]
    ax = axes[midx]
    ax.plot(given_k_list, mvalues)
    ax.set_title(metric)
    ax.set_xticks(given_k_list)
    ax.set_xlabel('Profile length')