[View in Colaboratory](https://colab.research.google.com/github/whongyi/openrec/blob/master/tutorials/Vanilla_Youtube_Recommender_example.ipynb)

<p align="center">
  <img src ="https://recsys.acm.org/wp-content/uploads/2017/07/recsys-18-small.png" height="40" /> <font size="4">Recsys 2018 Tutorial</font>
</p>
<p align="center">
  <font size="4"><b>Modularizing Deep Neural Network-Inspired Recommendation Algorithms</b></font>
</p>
<p align="center">
  <font size="4">Hands on: Customizing Deep YouTube Video Recommendation. Vanilla YouTube Recommender</font>
</p>

# Install OpenRec and download dataset

In [0]:
!pip install openrec

import urllib.request

dataset_prefix = 'http://s3.amazonaws.com/cornell-tech-sdl-openrec'
urllib.request.urlretrieve('%s/lastfm/lastfm_test.npy' % dataset_prefix, 
                   'lastfm_test.npy')
urllib.request.urlretrieve('%s/lastfm/lastfm_train.npy' % dataset_prefix, 
                   'lastfm_train.npy')

## load lastfm dataset

In [0]:
import numpy as np

total_users = 992   
total_items = 14598
train_data = np.load('lastfm_train.npy')
test_data = np.load('lastfm_test.npy')

In [0]:
train_data[:10], test_data[:10]

# Task

In this task, we will user listening historys to preict next listens. You will need to implement a vanilla verision of the YouTube video recommender using openrec.tf1.

# Implement the Vanilla Youtube Recommender

To implement  a model using OpenRec, you will need to first decide how this recommender should be decomposed into subgraphs, i.e., inputgraph, usergraph, itemgraph, interactiongraph and optimizergraph. For example, the training graph of `VanillaYouTubeRec` can be decomposed as follows.

<p align="center">
  <img src ="https://s3.amazonaws.com/cornell-tech-sdl-openrec/tutorials/vanilla_youtube_module.png" height="600" />
</p>

* **inputgraph**: item consumption history and the groundtruth label.
* **usergraph**: left as empty as no user-specific latent factor is needed.
* **itemgraph**: extract latent factors for items.
* **interactiongraph**: uses MLP and softmax to model user-item interactions.

A sample specification of Vanilla Youtube Recommender can be as follows.

<p align="center">
  <img src ="https://s3.amazonaws.com/cornell-tech-sdl-openrec/tutorials/vanilla_youtube.png" height="300" />
</p>

**Your task **
-  fill in the placeholders in the implementation of the `VanillaYouTubeRec` function 
-  successfully run the experimental code with the recommender you just built. 

In [0]:
from openrec.tf1.recommenders import Recommender
from openrec.tf1.modules.extractions import LatentFactor
from openrec.tf1.modules.interactions import MLPSoftmax
import tensorflow as tf


def Tutorial_VanillaYouTubeRec(batch_size, dim_item_embed, max_seq_len, total_items,
        l2_reg_embed=None, l2_reg_mlp=None, dropout=None, init_model_dir=None,
        save_model_dir='Vanilla_YouTube/', train=True, serve=False):
    
    rec = Recommender(init_model_dir=init_model_dir,
                      save_model_dir=save_model_dir, train=train, serve=serve)

    
    @rec.traingraph.inputgraph(outs=['seq_item_id', 'seq_len', 'label'])
    def train_input_graph(subgraph):
      
        subgraph['seq_item_id'] = tf.placeholder(tf.int32, 
                                      shape=[batch_size, max_seq_len],
                                      name='seq_item_id')
        subgraph['seq_len'] = tf.placeholder(tf.int32, 
                                      shape=[batch_size], 
                                      name='seq_len')
        subgraph['label'] = tf.placeholder(tf.int32, 
                                      shape=[batch_size], 
                                      name='label')
        
        subgraph.register_global_input_mapping({'seq_item_id': subgraph['seq_item_id'],
                                                'seq_len': subgraph['seq_len'],
                                                'label': subgraph['label']})
        
        
    @rec.servegraph.inputgraph(outs=['seq_item_id', 'seq_len'])
    def serve_input_graph(subgraph):
        subgraph['seq_item_id'] = tf.placeholder(tf.int32, 
                                      shape=[None, max_seq_len],
                                      name='seq_item_id')
        subgraph['seq_len'] = tf.placeholder(tf.int32, 
                                      shape=[None],
                                      name='seq_len')
        subgraph.register_global_input_mapping({'seq_item_id': subgraph['seq_item_id'],
                                                'seq_len': subgraph['seq_len']})

    
    @rec.traingraph.itemgraph(ins=['seq_item_id'], outs=['seq_vec'])
    @rec.servegraph.itemgraph(ins=['seq_item_id'], outs=['seq_vec'])
    def item_graph(subgraph):
        _, subgraph['seq_vec']= LatentFactor(l2_reg=l2_reg_embed,
                                      init='normal',
                                      id_=subgraph['seq_item_id'],
                                      shape=[total_items,dim_item_embed],
                                      subgraph=subgraph,
                                      scope='item')
        
    
    @rec.traingraph.interactiongraph(ins=['seq_vec', 'seq_len', 'label'])
    def train_interaction_graph(subgraph):
        MLPSoftmax(user=None,
                   item=subgraph['seq_vec'],
                   seq_len=subgraph['seq_len'],
                   max_seq_len=max_seq_len,
                   dims=[dim_item_embed, total_items],
                   l2_reg=l2_reg_mlp,
                   labels=subgraph['label'],
                   dropout=dropout,
                   train=True,
                   subgraph=subgraph,
                   scope='MLPSoftmax'
                  )

        
    @rec.servegraph.interactiongraph(ins=['seq_vec', 'seq_len'])
    def serve_interaction_graph(subgraph):
        MLPSoftmax(user=None,
                   item=subgraph['seq_vec'],
                   seq_len=subgraph['seq_len'],
                   max_seq_len=max_seq_len,
                   dims=[dim_item_embed, total_items],
                   l2_reg=l2_reg_mlp,
                   train=False,
                   subgraph=subgraph,
                   scope='MLPSoftmax'
                   )

        
    @rec.traingraph.optimizergraph
    def optimizer_graph(subgraph):
        losses = tf.add_n(subgraph.get_global_losses())
        optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
        subgraph.register_global_operation(optimizer.minimize(losses))
    
    
    @rec.traingraph.connector
    @rec.servegraph.connector
    def connect(graph):
        graph.itemgraph['seq_item_id'] = graph.inputgraph['seq_item_id']
        graph.interactiongraph['seq_len'] = graph.inputgraph['seq_len']
        graph.interactiongraph['seq_vec'] = graph.itemgraph['seq_vec']

        
    @rec.traingraph.connector.extend
    def train_connect(graph):
        graph.interactiongraph['label'] = graph.inputgraph['label']


    return rec

# Experiement
We will use the recommender you implemented to run a toy experiement on the LastFM dataset. 

## preprocessing dataset

In [0]:
from openrec.tf1.utils import Dataset

train_dataset = Dataset(train_data, total_users, total_items, 
                        sortby='ts', name='Train')
test_dataset = Dataset(test_data, total_users, total_items, 
                       sortby='ts', name='Test')

## hyperparameters and training parameters

In [0]:
dim_item_embed = 50     # dimension of item embedding
max_seq_len = 100       # the maxium length of user's listen history
total_iter = int(1e3)   # iterations for training 
batch_size = 100        # training batch size
eval_iter = 100         # iteration of evaluation
save_iter = eval_iter   # iteration of saving model   

## define sampler
We use `TemporalSampler`  and `TemporalEvaluationSampler` to sample sequences of training and testing samples. 

In [0]:
from openrec.tf1.utils.samplers import TemporalEvaluationSampler,TemporalSampler

train_sampler = TemporalSampler(batch_size=batch_size, max_seq_len=max_seq_len, 
                                dataset=train_dataset, num_process=1)
test_sampler = TemporalEvaluationSampler(dataset=test_dataset, 
                                         max_seq_len=max_seq_len)

## define evaluator

In [0]:
from openrec.tf1.utils.evaluators import AUC, Recall

auc_evaluator = AUC()
recall_evaluator = Recall(recall_at=[100, 200, 300, 400, 500])

## define model trainer

we used the Vanilla version of the Youtube recommender to train our model.

In [0]:
from openrec import ModelTrainer

model = Tutorial_VanillaYouTubeRec(batch_size=batch_size,
                   total_items=train_dataset.total_items(),
                   max_seq_len=max_seq_len,
                   dim_item_embed=dim_item_embed,
                   save_model_dir='vanilla_youtube_recommender/',
                   train=True, 
                   serve=True)

model_trainer = ModelTrainer(model=model)

## training and testing

In [0]:
model_trainer.train(total_iter=total_iter, 
                    eval_iter=eval_iter,
                    save_iter=save_iter,
                    train_sampler=train_sampler,
                    eval_samplers=[test_sampler], 
                    evaluators=[auc_evaluator, recall_evaluator])