In [None]:
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

## Training a DLRM model with TensorFlow

In the previous notebooks, we have downloaded the movielens data, converted it to parquet files and then used NVTabular library to process the data, join data frames, and create input features. In this notebook we will use NVIDIA Merlin Models library to build and train a Deep Learning Recommendation Model [(DLRM)](https://arxiv.org/abs/1906.00091) architecture originally proposed by Facebook in 2019.

Figure 1 illustrates DLRM architecture. The model was introduced as a personalization deep learning model that uses embeddings to process sparse features that represent categorical data and a multilayer perceptron (MLP) to process dense features, then interacts these features explicitly using the statistical techniques proposed in [here](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5694074).

![DLRM](../images/DLRM.png)

<p>Figure 1. DLRM architecture. Image source: <a href="https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Recommendation/DLRM">Nvidia DL Examples</a></p>

DLRM accepts two types of features: categorical and numerical. For details of the DLRM architecture and how to build it using Merlin Models low-level API please visit `Binary_classificaion_DLRM` notebook.

### Import Libraries

In [1]:
import sys
sys.path.append("/workspace/merlin_models/")
sys.path.append("/nvtabular/")

In [2]:
import os
import glob
import numpy as np
import pandas as pd
import nvtabular as nvt

import merlin_models.tf as ml
from merlin_standard_lib import Schema, Tag

2022-02-01 01:22:52.960361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 16254 MB memory:  -> device: 0, name: Quadro GV100, pci bus id: 0000:15:00.0, compute capability: 7.0


In [3]:
import logging
# disable INFO and DEBUG logging everywhere
logging.disable(logging.WARNING) 

### Data Download and Preprocess

In [4]:
INPUT_DATA_DIR = os.environ.get(
    "INPUT_DATA_DIR", os.path.expanduser("/workspace/data/movielens/")
)

With help of a utility function first we download and unzip the data. Second, we convert data via basic preprocessing, and split data into train and validation files and save them as parquet files. Afterwards, we preprocess the train and validation parquet files and generate features for model training using NVTabular.

Let's download Movielens 25M dataset and then process it, and save files to disk in parquet format.

In [5]:
from merlin_standard_lib.utils.data_etl_utils import movielens_download_etl
movielens_download_etl(INPUT_DATA_DIR, 'ml-25m')

Merlin Models library relies on a `schema` object to automatically build all necessary layers to represent, normalize and aggregate input features. As you can see below, schema.pb is a protobuf file that contains metadata including statistics about features such as cardinality, min and max values and also tags features based on their characteristics and dtypes (e.g., categorical, continuous, list, integer).

We also generated our `schema.pbtxt` file in using NVTabular. Now we read this schema file to create a `schema` object.

In [6]:
from merlin_standard_lib import Schema
SCHEMA_PATH = os.path.join(INPUT_DATA_DIR, 'ml-25m' "/train/schema.pbtxt")
schema = Schema().from_proto_text(SCHEMA_PATH)

We can print out the feature names including the binary target column, `rating_binary`, in the schema easily.

In [7]:
schema = schema.remove_by_name(['rating', 'title'])

In [8]:
schema.column_names

['movieId',
 'userId',
 'genres',
 'TE_movieId_rating',
 'userId_count',
 'rating_binary']

Select continuous and categorical columns from schema using feature tags.

In [9]:
con_schema = schema.select_by_tag(Tag.CONTINUOUS)
cat_schema = schema.select_by_tag(Tag.CATEGORICAL)

In [10]:
con_schema.column_names, cat_schema.column_names

(['TE_movieId_rating', 'userId_count'], ['movieId', 'userId', 'genres'])

### Define Data Loader

Below we define our input block using the `ml.ContinuousEmbedding` function. The from_schema() method processes the schema and creates the necessary layers to represent features and aggregate them.

In [11]:
import merlin_models.tf.dataset as tf_dataloader

# Define categorical and continuous columns
x_cat_names, x_cont_names = cat_schema.column_names, con_schema.column_names

# dictionary representing max sequence length for each column
sparse_features_max = {'genres': 10}

def get_dataloader(paths_or_dataset, batch_size=4096, shuffle=True):
    dataloader = tf_dataloader.Dataset(
        paths_or_dataset,
        batch_size=batch_size,
        label_names=['rating_binary'],
        cat_names=x_cat_names,
        cont_names=x_cont_names,
        sparse_names=list(sparse_features_max.keys()),
        sparse_max=sparse_features_max,
        sparse_as_dense=True,
        shuffle=shuffle,
    )
    return dataloader.map(lambda X, y: (X, tf.reshape(y, (-1,))))

In [12]:
OUTPUT_DIR = os.environ.get("OUTPUT_DIR", "/workspace/data/movielens/ml-25m/")
train_paths = glob.glob(os.path.join(OUTPUT_DIR, "train/*.parquet"))
eval_paths = glob.glob(os.path.join(OUTPUT_DIR, "valid/*.parquet"))

In the DLRM architecture, categorical features are processed using embeddings. Below, for each categorical feature, we create an embedding table used to provide dense representation to each unique value of this feature. The dense vector values in the embedding tables are learned during model training.

### Building a DLRM model with Merlin Models

In [14]:
dlrm_body = ml.DLRMBlock(schema,
        embedding_dim=16,
        bottom_block=ml.MLPBlock([64, 16]),
        top_block=ml.MLPBlock([64, 32]),
    )
model = dlrm_body.connect(ml.BinaryClassificationTask("rating_binary"))

In [17]:
import tensorflow as tf
optimizer = tf.keras.optimizers.Adam(0.005)
model.compile(optimizer=optimizer, run_eagerly=False)

In [18]:
train_loader = get_dataloader(nvt.Dataset(train_paths), shuffle=True) 
losses = model.fit(train_loader, epochs=3)
model.reset_metrics()

print('*'*20)
print("Start evaluation")
eval_loader = get_dataloader(nvt.Dataset(eval_paths), shuffle=False) 
eval_metrics = model.evaluate(eval_loader, return_dict=True)

2022-02-01 01:10:27.512305: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2022-02-01 01:10:28.350752: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/3
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: annotated name 'output' can't be nonlocal (tmpygc__wyq.py, line 36)
Epoch 2/3
Epoch 3/3
********************
Start evaluation


## Training a Deep & Cross Network (DCN)-V2 model with TensorFlow

[Deep & Cross Network (DCN)-V2](https://arxiv.org/pdf/2008.13535.pdf) architecture was proposed by Google in 2020 as an improve upon the original [DCN model](https://arxiv.org/pdf/1708.05123.pdf). The overall model architecture is depicted in Figure 2, with two ways to combine the cross network with the deep network: (1) stacked and (2) parallel.

![DCN](../images/DCN.png)

<p>Figure 2. DCN-v2 architecture. Image source: <a href="https://arxiv.org/pdf/2008.13535.pdf">DCN V2</a></p>

The output of the embbedding layer is the concatenation of all the embedded vectors and the normalized dense features: x<sub>0</sub> = [x<sub>embed,1</sub>; . . . ; x<sub>embed,𝑛</sub>; x<sub>dense</sub>]. Below, we build a stacked structure shown in Figure 2(a). Basically, it starts with an input layer (typically an embedding layer), and then the input x<sub>0</sub> is fed to the cross network, containing multiple cross layers that models explicit feature interactions, and then followed by the deep network. At the last step, we connect the final layer to the `BinaryClassificationTask` head for doing binary classification.

In [13]:
dcn_body = (
    ml.InputBlock(schema,
        embedding_options=ml.EmbeddingOptions(embedding_dim_default=16),
        aggregation="concat",
    )
    .connect(ml.CrossBlock(3))
    .connect(ml.MLPBlock([512, 256]))
)
model = dcn_body.connect(ml.BinaryClassificationTask("rating_binary"))

In [15]:
model.compile(optimizer="adam", run_eagerly=False)
train_loader = get_dataloader(nvt.Dataset(train_paths), shuffle=True) 
losses = losses = model.fit(train_loader, epochs=3)

print('*'*20)
print("Start evaluation")
eval_loader = get_dataloader(nvt.Dataset(eval_paths), shuffle=False) 
eval_metrics = model.evaluate(eval_loader, return_dict=True)

2022-02-01 01:24:11.423250: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2022-02-01 01:24:12.341618: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/3
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: annotated name 'output' can't be nonlocal (tmpskqv39bh.py, line 36)
Epoch 2/3
Epoch 3/3
********************
Start evaluation


Just like that, with couple lines of codes we are able to build state-of-the-art Deep Learning-based Recommender Systems models.