In [1]:
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# Building Youtube-DNN Retrieval Model using Merlin Models

## Overview

[Merlin Models](https://github.com/NVIDIA-Merlin/models/) provided the necessary blocks to support a two-stage pipeline that connects an item retrieval model (extracts a subset of relevant items) to a ranking model (identifies the top-k items that are to be displayed to the user). For more information about the two-stage pipeline, you can check the example notebook [Retrieval models (ALI-CCP)- Two-Tower model example]**(add link when it is merged)**. 

[Youtube-DNN](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf) is a sophisticated two-stage recommender model that proposes novel architectures of the retrieval and the ranking models and trains them with custom training tasks. In this notebook, we are going to build, train and evaluate the retrieval achitecture. 

### Learning objectives

- Training and Evaluating [Google's Youtube-DNN retrieval model](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf) with only 3 commands.
- Building a retrieval index from the trained model. 


## Downloading and preparing the dataset

In [2]:
from movielens import get_session_movielens
MAX_LENGTH = 30
train, valid = get_movielens(pvariant="ml-1m", user_sessions=True, max_length=MAX_LENGTH)

2022-03-22 22:40:12.496523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24570 MB memory:  -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:65:00.0, compute capability: 8.6
downloading ml-1m.zip: 5.93MB [00:01, 4.95MB/s]                                                                                                                                                  
unzipping files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 39.67files/s]
  users = pd.read_csv(
  movies = pd.read_csv(
  ratings = pd.read_csv(
INFO:movielens:starting ETL..
INFO:movielens:saving the workflow..


In [3]:
train.compute().head()

Unnamed: 0,userId,day,movieId-list,timestamp-list,genres-list,gender-first,age-first,occupation-first,zipcode-first,movieId-count,movieId-list_truncated
0,1,100,"[2249, 2317, 1102, 1767, 395, 1230, 963, 1270,...","[3865, 3865, 3865, 3865, 3865, 3865, 3865, 386...","[16, 2, 1, 1, 3, 8, 4, 3, 8, 4, 7, 8, 5, 4, 8,...",1,5,2,11,118,"[3008, 1987, 639, 1955, 50, 2514, 128, 1038, 7..."
1,1,104,"[672, 558, 599, 1020, 661, 1620, 1583, 2374, 1...","[7742, 7742, 7742, 7742, 7742, 7742, 7742, 943...","[17, 8, 17, 8, 17, 4, 8, 17, 4, 2, 2, 14, 2, 2...",1,5,2,11,10,"[672, 558, 599, 1020, 661, 1620, 1583, 2374, 1..."
2,1,105,"[1459, 815, 518, 656, 512, 2709, 187, 106, 710...","[2076, 2076, 2076, 2076, 2076, 2076, 4011, 401...","[9, 4, 1, 2, 9, 4, 3, 8, 1, 13, 14, 4, 2, 4, 2...",1,5,2,11,147,"[121, 1267, 73, 231, 2494, 341, 972, 1808, 254..."
3,1,112,"[2701, 2794, 3005, 3318, 2950, 2364, 786, 871,...","[15406, 15406, 15406, 15406, 15406, 15406, 532...","[2, 13, 4, 2, 1, 3, 9, 5, 4, 1, 2, 2, 3, 5, 1,...",1,5,2,11,15,"[2701, 2794, 3005, 3318, 2950, 2364, 786, 871,..."
4,1,116,"[856, 45, 143, 1212, 1482, 1031, 1153, 2148, 1...","[1050, 1050, 1050, 1050, 1050, 1050, 1050, 105...","[1, 7, 10, 2, 13, 2, 6, 11, 1, 13, 6, 1, 1, 11...",1,5,2,11,51,"[2248, 3097, 2845, 3246, 3054, 3195, 3247, 162..."


## Train Youtube-DNN retrieval model

The candidate generation network proposed by [Youtube-DNN](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf) leverages the sequence of YouTube browsing history as well as other context features about the user. 


The model is inspired by the architecture of bag of word representation in NLP domain. In a nutshell, the sequence of `N-1` past events are averaged to create the user's interactions embeddings based on the event's type (`watch_vector`, `search_vector`), and then the model is trained to predict the next  video to be interacted with. Lastly, [sampled-softmax](http://arxiv.org/abs/1412.2007) loss is used to efficiently train over a large catalog of items

<img src="images/YoutubeDNN.png"  width="30%">

<a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf">Image Source: Youtube-DNN paper</a>

We select a subset of features to use for training the retrieval model 

In [4]:
schema = train.schema.select_by_name(['userId', 'movieId-list_truncated', 'gender-first', 'age-first','occupation-first', 'zipcode-first'])

We initalize the YoutubeDNNRetrieval model.

In [5]:
import merlin.models.tf as mm
from tensorflow.keras import regularizers
model = mm.YoutubeDNNRetrievalModel(
    schema=schema, 
    num_sampled=1000, 
    top_block=mm.MLPBlock([128, 64]),
    max_seq_length=MAX_LENGTH
)
model.compile(optimizer="adam", run_eagerly=False)

Next, we train the model.

In [6]:
losses = model.fit(train,validation_data=valid, batch_size=256, epochs=5)

2022-03-22 22:40:24.999533: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2022-03-22 22:40:26.610685: I tensorflow/stream_executor/cuda/cuda_blas.cc:1792] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


Epoch 1/5
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method

2022-03-22 22:40:36.004959: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: cond/then/_0/cond/cond/branch_executed/_154


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


We evaluate the model

In [7]:
history = model.evaluate(valid, batch_size=64, return_dict=True)



## From model to top-k recommendation

After training, the neural network is learned high dimensional embeddings about each movie which are organized in a fixed vocabulary. The user representation vector `u` the pre-computed embeddings of videos `V` are then fed to the Neirest Neighbor Index to retrieve the top-N videos to feed to the ranking stage for building the final list of recommendation.  

**IN progress** 

Need to implement the option of exporting item/user towers from YoutubeDNN retrieval model. 

In [8]:
#TODO
# pre_embbedings = model.first['sparse'][0][0]['categorical'].embedding_tables['movieId-list_seq']