<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Sequential Recommender Quick Start

### Example: SLi_Rec : Adaptive User Modeling with Long and Short-Term Preferences for Personailzed Recommendation
Unlike a general recommender such as Matrix Factorization or xDeepFM (in the repo) which doesn't consider the order of the user's activities, sequential recommender systems take the sequence of the user behaviors as context and the goal is to predict the items that the user will interact in a short time (in an extreme case, the item that the user will interact next).

This notebook aims to give you a quick example of how to train a sequential model based on a public Amazon dataset. Currently, we can support NextItNet \[4\], GRU4Rec \[2\], Caser \[3\], A2SVD \[1\] and SLi_Rec \[1\]. Without loss of generality, this notebook takes [SLi_Rec model](https://www.microsoft.com/en-us/research/uploads/prod/2019/07/IJCAI19-ready_v1.pdf) for example.
SLi_Rec \[1\] is a deep learning-based model aims at capturing both long and short-term user preferences for precise recommender systems. To summarize, SLi_Rec has the following key properties:

* It adopts the attentive "Asymmetric-SVD" paradigm for long-term modeling;
* It takes both time irregularity and semantic irregularity into consideration by modifying the gating logic in LSTM.
* It uses an attention mechanism to dynamic fuse the long-term component and short-term component.

In this notebook, we test SLi_Rec on a subset of the public dataset: [Amazon_reviews](http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Movies_and_TV_5.json.gz) and [Amazon_metadata](http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/meta_Movies_and_TV.json.gz)

This notebook is well tested under TF 1.15.0. 

## 0. Global Settings and Imports

In [1]:
import sys
sys.path.append("../../")
import os
import logging
import papermill as pm
import scrapbook as sb
from tempfile import TemporaryDirectory

import tensorflow as tf
import time

import numpy as np

from reco_utils.common.constants import SEED
from reco_utils.recommender.deeprec.deeprec_utils import (
    prepare_hparams
)
from reco_utils.dataset.amazon_reviews import download_and_extract, data_preprocessing
from reco_utils.dataset.download_utils import maybe_download


from reco_utils.recommender.deeprec.models.sequential.sli_rec import SLI_RECModel as SeqModel
####  to use the other model, use one of the following lines:
# from reco_utils.recommender.deeprec.models.sequential.asvd import A2SVDModel as SeqModel
# from reco_utils.recommender.deeprec.models.sequential.caser import CaserModel as SeqModel
# from reco_utils.recommender.deeprec.models.sequential.gru4rec import GRU4RecModel as SeqModel

#from reco_utils.recommender.deeprec.models.sequential.nextitnet import NextItNetModel

from reco_utils.recommender.deeprec.io.sequential_iterator import SequentialIterator
#from reco_utils.recommender.deeprec.io.nextitnet_iterator import NextItNetIterator



##  ATTENTION: change to the corresponding config file, e.g., caser.yaml for CaserModel 
yaml_file = '../../reco_utils/recommender/deeprec/config/sli_rec.yaml'  

In [2]:
print("System version: {}".format(sys.version))
print("Tensorflow version: {}".format(tf.__version__))


System version: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0]
Tensorflow version: 1.15.0


#### Parameters

In [3]:
EPOCHS = 10
BATCH_SIZE = 400
RANDOM_SEED = SEED  # Set None for non-deterministic result

data_path = os.path.join("..", "..", "tests", "resources", "deeprec", "slirec")

##  1. Input data format
The input data contains 8 columns, i.e.,   `<label> <user_id> <item_id> <category_id> <timestamp> <history_item_ids> <history_cateory_ids> <hitory_timestamp>`  columns are seperated by `"\t"`.  item_id and category_id denote the target item and category, which means that for this instance, we want to guess whether user user_id will interact with item_id at timestamp. `<history_*>` columns record the user behavior list up to `<timestamp>`, elements are separated by commas.  `<label>` is a binary value with 1 for positive instances and 0 for negative instances.  One example for an instance is: 

`1       A1QQ86H5M2LVW2  B0059XTU1S      Movies  1377561600      B002ZG97WE,B004IK30PA,B000BNX3AU,B0017ANB08,B005LAIHW2  Movies,Movies,Movies,Movies,Movies   1304294400,1304812800,1315785600,1316304000,1356998400` 

In data preprocessing stage, we have a script to generate some ID mapping dictionaries, so user_id, item_id and category_id will be mapped into interager index starting from 1. And you need to tell the input iterator where is the ID mapping files are. (For example, in the next section, we have some mapping files like user_vocab, item_vocab, and cate_vocab).  The data preprocessing script is at https://github.com/microsoft/recommenders/blob/master/reco_utils/dataset/amazon_reviews.py, you need to call the `_create_vocab(train_file, user_vocab, item_vocab, cate_vocab)` function. Note that ID vocabulary only creates from the train_file, so the new IDs in valid_file or test_file will be regarded as unknown IDs and assigned with a defualt 0 index.

Only the SLi_Rec model is time-aware. For the other models, you can just pad some meaningless timestamp in the data files to fill up the format, the models will ignore these columns.

We use Softmax to the loss function. In training and evalution stage, we group 1 positive instance with num_ngs negative instances. Pair-wise ranking can be regarded as a special case of Softmax ranking, where num_ngs is set to 1. 

More specifically,  for training and evalation, you need to organize the data file such that each one positive instance is followd by num_ngs negative instances. Our program will take 1+num_ngs lines as a unit for Softmax calculation. num_ngs is a parameter you need to pass to the `prepare_hparams`, `fit` and `run_eval` function. `train_num_ngs` in `prepare_hparams` denotes the number of negative instances for training, where a recommended number is 4. `valid_num_ngs` and `num_ngs` in `fit` and `run_eval` denote the number in evalution. In evaluation, the model calculates metrics among the 1+num_ngs instances. For the `predict` function, since we only need to calcuate a socre for each individual instance, there is no need for num_ngs setting.  More details and examples will be provided in the following sections.

For training stage, if you don't want to prepare negative instances, you can just provide positive instances and set the parameter `need_sample=True, train_num_ngs=train_num_ngs` for function `prepare_hparams`, our model will dynamicly sample `train_num_ngs` instances as negative samples in each mini batch.

###  Amazon dataset
Now let's start with a public dataset containing product reviews and metadata from Amazon, which is widely used as a benchmark dataset in recommemdation systems field.

In [4]:

# for test
train_file = os.path.join(data_path, r'train_data')
valid_file = os.path.join(data_path, r'valid_data')
test_file = os.path.join(data_path, r'test_data')
user_vocab = os.path.join(data_path, r'user_vocab.pkl')
item_vocab = os.path.join(data_path, r'item_vocab.pkl')
cate_vocab = os.path.join(data_path, r'category_vocab.pkl')
output_file = os.path.join(data_path, r'output.txt')

reviews_name = 'reviews_Movies_and_TV_5.json'
meta_name = 'meta_Movies_and_TV.json'
reviews_file = os.path.join(data_path, reviews_name)
meta_file = os.path.join(data_path, meta_name)
train_num_ngs = 4 # number of negative instances with a positive instance for training
valid_num_ngs = 4 # number of negative instances with a positive instance for validation
test_num_ngs = 9 # number of negative instances with a positive instance for testing
sample_rate = 0.01 # sample a small item set for training and testing here for fast example

input_files = [reviews_file, meta_file, train_file, valid_file, test_file, user_vocab, item_vocab, cate_vocab]

if not os.path.exists(train_file):
    download_and_extract(reviews_name, reviews_file)
    download_and_extract(meta_name, meta_file)
    data_preprocessing(*input_files, sample_rate=sample_rate, valid_num_ngs=valid_num_ngs, test_num_ngs=test_num_ngs)
    #### uncomment this for the NextItNet model, because it does not need to unfold the user history
    # data_preprocessing(*input_files, sample_rate=sample_rate, valid_num_ngs=valid_num_ngs, test_num_ngs=test_num_ngs, is_history_expanding=False)


#### 1.1 Prepare hyper-parameters
prepare_hparams() will create a full set of hyper-parameters for model training, such as learning rate, feature number, and dropout ratio. We can put those parameters in a yaml file (a complete list of parameters can be found under our config folder) , or pass parameters as the function's parameters (which will overwrite yaml settings).

Parameters hints: <br>
`need_sample` controls whether to perform dynamic negative sampling in mini-batch. 
`train_num_ngs` indicates how many negative instances followed by one positive instances.  <br>
Examples: <br>
(1) `need_sample=True and train_num_ngs=4`:  There are only positive instances in your training file. Our model will dynamically sample 4 negative instances for each positive instances in mini-batch. Note that if need_sample is set to True, train_num_ngs should be greater than zero. <br>
(2) `need_sample=False and train_num_ngs=4`: In your training file, each one positive line is followed by 4 negative lines. Note that if need_sample is set to False, you must provide a traiing file with negative instances, and train_num_ngs should match the number of negative number in your training file.

In [5]:
### NOTE:  
### remember to use `_create_vocab(train_file, user_vocab, item_vocab, cate_vocab)` to generate the user_vocab, item_vocab and cate_vocab files, if you are using your own dataset rather than using our demo Amazon dataset.
hparams = prepare_hparams(yaml_file, 
                          embed_l2=0., 
                          layer_l2=0., 
                          learning_rate=0.001,  # set to 0.01 if batch normalization is disable
                          epochs=EPOCHS,
                          batch_size=BATCH_SIZE,
                          show_step=20,
                          MODEL_DIR=os.path.join(data_path, "model/"),
                          SUMMARIES_DIR=os.path.join(data_path, "summary/"),
                          user_vocab=user_vocab,
                          item_vocab=item_vocab,
                          cate_vocab=cate_vocab,
                          need_sample=True,
                          train_num_ngs=train_num_ngs, # provides the number of negative instances for each positive instance for loss computation.
            )

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



#### 1.2 Create data loader
Designate a data iterator for the model. All our sequential models use SequentialIterator. 
data format is introduced aboved. 

<br>Validation and testing data are files after negative sampling offline with the number of `<num_ngs>` and `<test_num_ngs>`.

In [6]:
input_creator = SequentialIterator
#### uncomment this for the NextItNet model, because it needs a special data iterator for training
#input_creator = NextItNetIterator

## 2. Create model
When both hyper-parameters and data iterator are ready, we can create a model:

In [7]:
model = SeqModel(hparams, input_creator, seed=RANDOM_SEED)

## sometimes we don't want to train a model from scratch
## then we can load a pre-trained model like this: 
#model.load_model(r'your_model_path')





Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Use keras.layers.BatchNormalization instead.  In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.batch_normalization` documentation).
Instructions for updating:
Please use `layer.__call__` method instead.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.



Now let's see what is the model's performance at this point (without starting training):

In [8]:
print(model.run_eval(test_file, num_ngs=test_num_ngs)) # test_num_ngs is the number of negative lines after each positive line in your test_file

{'auc': 0.5131, 'logloss': 0.6931, 'mean_mrr': 0.289, 'ndcg@2': 0.1609, 'ndcg@4': 0.2475, 'ndcg@6': 0.3219, 'group_auc': 0.5134}


AUC=0.5 is a state of random guess. We can see that before training, the model behaves like random guessing.

#### 2.1 Train model
Next we want to train the model on a training set, and check the performance on a validation dataset. Training the model is as simple as a function call:

In [9]:
start_time = time.time()
model = model.fit(train_file, valid_file, valid_num_ngs=valid_num_ngs) 
# valid_num_ngs is the number of negative lines after each positive line in your valid_file 
# we will evaluate the performance of model on valid_file every epoch
end_time = time.time()
print('Time cost for training is {0:.2f} mins'.format((end_time-start_time)/60.0))


step 20 , total_loss: 1.6105, data_loss: 1.6105
eval valid at epoch 1: auc:0.4977,logloss:0.6933,mean_mrr:0.4526,ndcg@2:0.3198,ndcg@4:0.51,ndcg@6:0.5866,group_auc:0.4972
step 20 , total_loss: 1.5950, data_loss: 1.5950
eval valid at epoch 2: auc:0.5648,logloss:0.7007,mean_mrr:0.4957,ndcg@2:0.3825,ndcg@4:0.553,ndcg@6:0.6197,group_auc:0.5484
step 20 , total_loss: 1.4578, data_loss: 1.4578
eval valid at epoch 3: auc:0.6493,logloss:0.816,mean_mrr:0.5831,ndcg@2:0.507,ndcg@4:0.6476,ndcg@6:0.6866,group_auc:0.6532
step 20 , total_loss: 1.2790, data_loss: 1.2790
eval valid at epoch 4: auc:0.7018,logloss:0.7818,mean_mrr:0.6176,ndcg@2:0.5572,ndcg@4:0.6838,ndcg@6:0.7131,group_auc:0.6969
step 20 , total_loss: 1.3249, data_loss: 1.3249
eval valid at epoch 5: auc:0.7208,logloss:0.6877,mean_mrr:0.6466,ndcg@2:0.5921,ndcg@4:0.7101,ndcg@6:0.7349,group_auc:0.722
step 20 , total_loss: 1.2396, data_loss: 1.2396
eval valid at epoch 6: auc:0.7336,logloss:0.6063,mean_mrr:0.6554,ndcg@2:0.6022,ndcg@4:0.7173,ndcg

#### 2.2  Evaluate model

Again, let's see what is the model's performance now (after training):

In [10]:
res_syn = model.run_eval(test_file, num_ngs=test_num_ngs)
print(res_syn)
sb.glue("res_syn", res_syn)

{'auc': 0.7249, 'logloss': 0.5924, 'mean_mrr': 0.4946, 'ndcg@2': 0.4075, 'ndcg@4': 0.5107, 'ndcg@6': 0.5607, 'group_auc': 0.7133}


  This is separate from the ipykernel package so we can avoid doing imports until


If we want to get the full prediction scores rather than evaluation metrics, we can do this:

In [11]:
model = model.predict(test_file, output_file)




In [12]:
# The data was downloaded in tmpdir folder. You can delete them manually if you do not need them any more.

#### 2.3  Running models with large dataset
Here are performances using the whole amazon dataset among popular sequential models with 1,697,533 positive instances.
<br>Settings for reproducing the results:
<br>`learning_rate=0.001, dropout=0.3, item_embedding_dim=32, cate_embedding_dim=8, l2_norm=0, batch_size=400, 
train_num_ngs=4, valid_num_ngs=4, test_num_ngs=49`


We compare the running time with CPU only and with GPU on the larger dataset. It appears that GPU can significantly accelerate the training. Hardware specification for running the large dataset: 
<br>GPU: Tesla P100-PCIE-16GB
<br>CPU: 6 cores Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
 
| Models | AUC | g-AUC | NDCG@2 | NDCG@10 | seconds per epoch on GPU | seconds per epoch on CPU| config |
| :------| :------: | :------: | :------: | :------: | :------: | :------: | :------ |
| A2SVD | 0.8251 | 0.8178 | 0.2922 | 0.4264 | 249.5 | 440.0 | N/A |
| GRU4Rec | 0.8411 | 0.8332 | 0.3213 | 0.4547 | 439.0 | 4285.0 | max_seq_length=50, hidden_size=40|
| Caser | 0.8244 | 0.8171 | 0.283 | 0.4194 | 314.3 | 5369.9 | T=1, n_v=128, n_h=128, L=3, min_seq_length=5|
| SLi_Rec | 0.8631 | 0.8519 | 0.3491 | 0.4842 | 549.6 | 5014.0 | attention_size=40, max_seq_length=50, hidden_size=40|
| NextItNet* | 0.6793 | 0.6769 | 0.0602 | 0.1733 | 112.0 | 214.5 | min_seq_length=3, dilations=\[1,2,4,1,2,4\], kernel_size=3 |

 Note 1: The five models are grid searched with a coarse granularity and the results are for reference only.
 <br>Note 2: NextItNet model requires a dataset with strong sequence property, but the Amazon dataset used in this notebook does not meet that requirement, so NextItNet Model may not performance good. If you wish to use other datasets with strong sequence property, NextItNet is recommended.
 <br>Note 3: Time cost of NextItNet Model is significantly shorter than other models because it doesn't need a history expanding of training data.

## 3. Online serving
In this section, we provide a simple example to illustrate how we can use the trained model to serve for production demand.

Suppose we are in a new session. First let's load a previous trained model:

In [13]:
model_best_trained = SeqModel(hparams, input_creator, seed=RANDOM_SEED)
path_best_trained = os.path.join(hparams.MODEL_DIR, "best_model")
print('loading saved model in {0}'.format(path_best_trained))
model_best_trained.load_model(path_best_trained)


loading saved model in ../../tests/resources/deeprec/slirec/model/best_model
INFO:tensorflow:Restoring parameters from ../../tests/resources/deeprec/slirec/model/best_model


Let's see if we load the model correctly. The testing metrics should be close to the numbers we have in the training stage.

In [14]:
model_best_trained.run_eval(test_file, num_ngs=test_num_ngs)

{'auc': 0.7249,
 'logloss': 0.5924,
 'mean_mrr': 0.4946,
 'ndcg@2': 0.4075,
 'ndcg@4': 0.5107,
 'ndcg@6': 0.5607,
 'group_auc': 0.7133}

And we make predictions using this model. In the next step, we will make predictions using a serving model. Then we can check if the two result files are consistent.

In [15]:
model_best_trained.predict(test_file, output_file)

<reco_utils.recommender.deeprec.models.sequential.sli_rec.SLI_RECModel at 0x7f2da0326e80>

Exciting. Now let's start our quick journey of online serving. 

For efficient and flexible serving, usually we only keep the necessary computation nodes and froze the TF model to a single pb file, so that we can easily compute scores with this unified pb file in both Python or Java:

In [16]:

with model_best_trained.sess as sess:
    graph_def = model_best_trained.graph.as_graph_def()
    output_graph_def = tf.graph_util.convert_variables_to_constants(
        sess,
        graph_def,
        ["pred"]
    )

    outfilepath = os.path.join(hparams.MODEL_DIR, "serving_model.pb")
    with tf.gfile.GFile(outfilepath, 'wb') as f:
        f.write(output_graph_def.SerializeToString())



Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
INFO:tensorflow:Froze 61 variables.
INFO:tensorflow:Converted 61 variables to const ops.


The serving logic is as simple as feeding the feature values to the corresponding input nodes, and fetch the score from the output node. 

In our model, input nodes are some placeholders and control variables (such as is_training, layer_keeps). We can get the nodes by their name:

In [17]:
class LoadFrozedPredModel:
    def __init__(self, graph):
        self.pred = graph.get_tensor_by_name('import/pred:0') 
        self.items = graph.get_tensor_by_name('import/items:0') 
        self.cates = graph.get_tensor_by_name('import/cates:0') 
        self.item_history = graph.get_tensor_by_name('import/item_history:0') 
        self.item_cate_history = graph.get_tensor_by_name('import/item_cate_history:0') 
        self.mask = graph.get_tensor_by_name('import/mask:0')  
        self.time_from_first_action = graph.get_tensor_by_name('import/time_from_first_action:0') 
        self.time_to_now = graph.get_tensor_by_name('import/time_to_now:0') 
        self.layer_keeps = graph.get_tensor_by_name('import/layer_keeps:0') 
        self.is_training = graph.get_tensor_by_name('import/is_training:0') 


In [18]:
def infer_as_serving(model, infile, outfile, hparams, iterator, sess):
    preds = []
    
    for batch_data_input in iterator.load_data_from_file(infile, batch_num_ngs=0):
        if batch_data_input:
            feed_dict = {
                model.layer_keeps:np.ones(3, dtype=np.float32),
                model.is_training:False,
                model.items: batch_data_input[iterator.items],
                model.cates: batch_data_input[iterator.cates],
                model.item_history: batch_data_input[iterator.item_history],
                model.item_cate_history: batch_data_input[iterator.item_cate_history],
                model.mask: batch_data_input[iterator.mask],
                model.time_from_first_action: batch_data_input[iterator.time_from_first_action],
                model.time_to_now: batch_data_input[iterator.time_to_now]
            }
            step_pred = sess.run(model.pred, feed_dict=feed_dict)
            preds.extend(np.reshape(step_pred, -1))
                
    with open(outfile, "w") as wt:
        for line in preds:
            wt.write('{0}\n'.format(line))
            

Here is the main pipeline for inferring in an online serving manner. You can compare the 'output_serving.txt' with 'output.txt' to see if the results are consistent.

The input file format is the same as introduced in Section 1 'Input data format'. In serving stage, since we do not need a groundtrue lable, so for the label column, you can simply place any number like a zero. The iterator will parse the input file and convert into the required format for model's feed_dictionary. 

In [19]:
G = tf.Graph()
with tf.gfile.GFile(
        os.path.join(hparams.MODEL_DIR, "serving_model.pb"),
        'rb'
) as f, G.as_default():
    graph_def_optimized = tf.GraphDef()
    graph_def_optimized.ParseFromString(f.read())
    
    ####  uncomment this line if you want to check what conent is included in the graph
    #print('graph_def_optimized = ' + str(graph_def_optimized))


with tf.Session(graph=G) as sess:
    tf.import_graph_def(graph_def_optimized)

    model = LoadFrozedPredModel(sess.graph)
    
    serving_output_file = os.path.join(data_path, r'output_serving.txt')  
    iterator = input_creator(hparams, tf.Graph())
    infer_as_serving(model, test_file, serving_output_file, hparams, iterator, sess)
    

## Reference
\[1\] Zeping Yu, Jianxun Lian, Ahmad Mahmoody, Gongshen Liu, Xing Xie. Adaptive User Modeling with Long and Short-Term Preferences for Personailzed Recommendation. In Proceedings of the 28th International Joint Conferences on Artificial Intelligence, IJCAI’19, Pages 4213-4219. AAAI Press, 2019.

\[2\] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, Domonkos Tikk. Session-based Recommendations with Recurrent Neural Networks. ICLR (Poster) 2016

\[3\] Tang, Jiaxi, and Ke Wang. Personalized top-n sequential recommendation via convolutional sequence embedding. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 2018.

\[4\] Yuan, F., Karatzoglou, A., Arapakis, I., Jose, J. M., & He, X. A Simple Convolutional Generative Network for Next Item Recommendation. WSDM, 2019