In [1]:
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ================================

## Advanced: Define your own architecture

### Overview: 
In [explore-different-models](https://github.com/NVIDIA-Merlin/models/blob/main/examples/Exploring-different-models.ipynb) , we conduct 
a benchmark of various ranking models provided by the high-level Merlin Models API. The library also includes the standard components of deep learning that will let recsys practioners and researchers to define custom models, train and export them for inference.  


In this example, we will combine pre-existing blocks and demonstrate how to create the [DLRM](https://arxiv.org/abs/1906.00091) architecture.


### Learning objectives
- Understand the building blocks of Merlin Models
- Define a model architecture from scratch

### Introduction to Merlin-models core building blocks

The [Block](https://nvidia-merlin.github.io/models/review/pr-294/generated/merlin.models.tf.Block.html#merlin.models.tf.Block) is the core abstraction in Merlin Models and is the class from which all blocks inherit.
The class extends the [tf.keras.layers.Layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer) base class and implements a number of properties that simplify the creation of custom blocks/models. These properties include the `Schema` object for determining the embedding dimensions, input shapes, and output shapes. Additionally, `Block` has a `BlockContext` to store/retrieve public variables and share them with other blocks in the same model as additional meta-data. 

Before deep-diving into the definition of the DLRM architecture, let's start by listing the core components you need to know to define a model from scratch:

#### Features Blocks

They include input blocks to process various inputs based on their types and shapes. We support three main Blocks: 
- `EmbeddingFeatures`: Input block for embedding-lookups for categorical features.
- `SequenceEmbeddingFeatures`: Input block for embedding-lookups for sequential categorical features (3D tensors).
- `ContinuousFeatures`: Input block for continuous features.

#### Transformations Blocks

They include various operators commonly used to transform tensors in various parts of the model, such as: 

- `AsDenseFeatures`: It takes a dictionary of raw input tensors and transforms the sparse ones into dense tensors.
- `L2Norm`: It takes a single or a dictionary of hidden tensors and applies an L2-normalization along a given axis. 
- `LogitsTemperatureScaler`: It scales the output tensor of predicted logits to lower the model's confidence. 

#### Aggregations Blocks

They inlude common aggregation ops to combine multiple tenors, such as: 
- `ConcatFeatures`: Concatenate dictionary of tensors along a given dimension.
- `StackFeatures`: Stack dictionary of tensors along a given dimension.
- `CosineSimilarity`: Calculate the cosine similarity between two tensors. 


#### Connects Methods

The base class `Block` implements different connects methods that control how to link a given block to other blocks: 

- `connect`: Connect the block to other blocks sequentially. The output is a tensor returned by the last block. 
- `connect_branch`: Link the block to other blocks in parallel. The output is a dictionary containing the output tensor of each block.
- `connect_with_shortcut`: Connect the block to other blocks sequentially and apply a skip connection with the block's output. 
- `connect_with_residual`: Connect the block to other blocks sequentially and apply a residual sum with the block's output.

#### Prediction Tasks

Merlin Models introduces the `PredictionTask` layer that defines the necessary blocks and transformation ops to compute the final prediction scores. It also provides the default loss and metrics related to the given prediction task.\
We support the core tasks:  `BinaryClassificationTask`, `MultiClassClassificationTask`, and`RegressionTask`. As well as RecSys specific tasks: `NextItemPredictionTask`, and `ItemRetrievalTask`




### Implement the DLRM model with Movielens-1M data

Now that we have introduced the core blocks of Merlin Models, let's take a look at how we can combine them to define the DLRM architecture:

In [2]:
import tensorflow as tf
import merlin.models.tf as mm

from merlin.models.data.movielens import get_movielens
from merlin.schema.tags import Tags

2022-03-28 20:25:16.928534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24570 MB memory:  -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:65:00.0, compute capability: 8.6


We will use the utils function to download, extract and preprocess the MovieLens 1M  dataset.

In [3]:
train, valid = get_movielens(variant="ml-1m")

downloading ml-1m.zip: 5.93MB [00:02, 2.68MB/s]                                                                                                                                                                                                                                                                                                                                                                                                      
unzipping files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 46.96files/s]
  users = pd.read_csv(
  ratings = pd.read_csv(
  movies = pd.read_csv(
INFO:merlin.models.data.movielens:starting ETL..
INF

In [4]:
valid.head()

Unnamed: 0,userId,movieId,title,genres,gender,age,occupation,zipcode,TE_age_rating,TE_gender_rating,TE_occupation_rating,TE_zipcode_rating,TE_movieId_rating,TE_userId_rating,rating_binary,rating
0,109,78,78,"[3, 2]",1,2,10,277,0.418121,0.013356,0.497073,0.252915,0.689269,0.346225,0,3.0
1,3511,350,350,"[1, 5]",1,2,12,2406,0.420087,0.035684,0.655097,0.625098,0.699561,0.671923,1,4.0
2,257,900,897,"[10, 1]",1,3,16,148,0.005225,0.0,0.56061,0.555358,0.699954,0.601075,0,3.0
3,1325,43,43,"[3, 7, 5, 4]",1,1,8,1289,0.149098,0.034863,0.426944,0.626964,0.693121,0.673556,1,4.0
4,337,566,566,[1],1,1,16,521,0.153974,0.035684,0.558937,0.645513,0.72816,0.6868,1,4.0


We take the first batch of input tensors and use it to check the outputs of each building block

In [5]:
from merlin.models.tf.dataset import BatchedDataset
batch = next(iter(BatchedDataset(valid, batch_size=4, shuffle=False)))[0]
batch.keys()

dict_keys(['genres', 'userId', 'movieId', 'title', 'gender', 'age', 'occupation', 'zipcode', 'TE_age_rating', 'TE_gender_rating', 'TE_occupation_rating', 'TE_zipcode_rating', 'TE_movieId_rating', 'TE_userId_rating'])

#### Define the inputs block

For the sake of simplicity, let's create a schema with a subset of the following continuous and categorical features: 

In [6]:
sub_schema = train.schema.select_by_name(['userId', 'movieId', 'title', 'gender', 'TE_zipcode_rating', 'TE_movieId_rating', 'rating_binary'])

We define the continuous layer based on the schema

In [7]:
continuous_block = mm.ContinuousFeatures.from_schema(sub_schema, tags=Tags.CONTINUOUS)

We visualize the output tensor of the continuous block using data of the first `batch`: it returns the raw tensors of continuous features 

In [8]:
continuous_block(batch)

{'TE_zipcode_rating': <tf.Tensor: shape=(4, 1), dtype=float32, numpy=
 array([[0.25291514],
        [0.62509835],
        [0.5553576 ],
        [0.6269641 ]], dtype=float32)>,
 'TE_movieId_rating': <tf.Tensor: shape=(4, 1), dtype=float32, numpy=
 array([[0.68926865],
        [0.69956094],
        [0.6999536 ],
        [0.69312125]], dtype=float32)>}

We connect the continuous block to an `MLPBlock` so as to project them in a higher dimensional space.

In [9]:
deep_continuous_block = continuous_block.connect(mm.MLPBlock([64]))
deep_continuous_block(batch).shape

2022-03-28 20:25:58.058064: I tensorflow/stream_executor/cuda/cuda_blas.cc:1792] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


TensorShape([4, 64])

We define the categorical embedding block based on the schema

In [10]:
embedding_block = mm.EmbeddingFeatures.from_schema(sub_schema)

We visualize the output tensor of the categorical block using data of the first `batch`: it returns the embeddings tensors of categorical features with a default dimension of 64

In [11]:
embeddings = embedding_block(batch)
embeddings.keys(), embeddings['userId'].shape

(dict_keys(['userId', 'movieId', 'title', 'gender']), TensorShape([4, 64]))

Let's store the continuous and categorical representations in a single dictionary using `ParallelBlock`

In [12]:
dlrm_input_block = mm.ParallelBlock({"embeddings": embedding_block, "deep_continuous": deep_continuous_block})
print("Output shapes of DLRM input block:")
for key, val in dlrm_input_block(batch).items(): 
    print("\t%s : %s" %(key, val.shape))

Output shapes of DLRM input block:
	userId : (4, 64)
	movieId : (4, 64)
	title : (4, 64)
	gender : (4, 64)
	deep_continuous : (4, 64)


#### Define the interaction block

Now that we have a vector representation of each input feature, we will create the DLRM interaction block. It consists of three operations: 
- Apply a dot product between all continuous and categorical features to learn pairwise interactions. 
- Concat the resulting pairwise interaction with the deep representation of conitnuous features (skip-connection). 
- Apply an `MLPBlock` with a series of dense layers to the concatenated tensor. 

First, we will use `connect_with_shortcut` to create the two first operations of DLRM interaction block.

In [13]:
from merlin.models.tf.blocks.dlrm import DotProductInteractionBlock
dlrm_interaction = dlrm_input_block.connect_with_shortcut(
    DotProductInteractionBlock(), 
    shortcut_filter=mm.Filter("deep_continuous"), 
    aggregation="concat"
)

The following diagram visualize the ops of `dlrm_interaction`

<img src="./images/residual_interaction.png"  width="30%">


In [14]:
dlrm_interaction(batch)

<tf.Tensor: shape=(4, 2080), dtype=float32, numpy=
array([[ 0.        ,  0.        ,  0.        , ...,  0.02601941,
         0.02672471, -0.00849337],
       [ 0.06243493,  0.        ,  0.        , ...,  0.00388642,
        -0.00386185, -0.00540488],
       [ 0.04459976,  0.        ,  0.        , ...,  0.02311607,
         0.01283619, -0.00895122],
       [ 0.06380296,  0.        ,  0.        , ...,  0.00019991,
         0.02553606, -0.03204093]], dtype=float32)>

Then, we project the learned interaction using a series of dense layers

In [15]:
deep_dlrm_interaction = dlrm_interaction.connect(mm.MLPBlock([64, 128, 512]))
deep_dlrm_interaction(batch)

<tf.Tensor: shape=(4, 512), dtype=float32, numpy=
array([[0.        , 0.00548285, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.00389321, 0.        , ..., 0.00283362, 0.        ,
        0.        ],
       [0.00106108, 0.00902012, 0.00035212, ..., 0.00025212, 0.00450933,
        0.        ],
       [0.        , 0.        , 0.00242545, ..., 0.        , 0.        ,
        0.00243152]], dtype=float32)>

#### Define the Prediction block

At this stage, we have created the DLRM block that takes as input a dictionary of categorical and continuous tensors and returns the interaction representation vector of shape `512`. The next step is to use this hidden representation to conduct a given prediction task. In our case, we will use the label `rating_binary` and the objective is: to predict if a user `A` will give a high rating to a movie `B` or not. 

We will use the `BinaryClassificationTask` and evaluate the performances using `auc` metric. We will also use `LogitsTemperatureScaler` as a pre-transformation Op that scales the logits returned by the task before computing the loss and metrics. 

In [16]:
from merlin.models.tf.blocks.core.transformations import LogitsTemperatureScaler
binary_task = mm.BinaryClassificationTask(
    target_name=sub_schema.select_by_tag(Tags.TARGET).column_names[0],
    metrics=[tf.keras.metrics.AUC], 
    pre=LogitsTemperatureScaler(temperature=2)
)

#### Define, train and evaluate the final DLRM Model

We connect the `deep_dlrm_interaction` to the `binary_task` and the method will automatically  generate the `Model` class for us.
We note that the `Model` inherits from [tf.keras.Model](https://keras.io/api/models/model/) class. 

In [17]:
model = deep_dlrm_interaction.connect(binary_task)
type(model)

merlin.models.tf.models.base.Model

We train the model using built-in Keras `fit` method: 

In [18]:
model.compile(optimizer="adam")
model.fit(train, batch_size=1024, epochs=5)

2022-03-28 20:25:59.381431: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


Epoch 1/5








Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fade8600ca0>

- We get the evaluation scores 

In [19]:
model.evaluate(valid, batch_size=1024, return_dict=True)

2022-03-28 20:26:53.129759: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: cond/then/_0/cond/cond/branch_executed/_128




{'rating_binary/binary_classification_task/auc': 0.7453046441078186,
 'loss': 2.6448612213134766,
 'regularization_loss': 0.0,
 'total_loss': 2.6448612213134766}

- We save the trained model

In [20]:
model.save("custom_dlrm")



INFO:tensorflow:Assets written to: custom_dlrm/assets


INFO:tensorflow:Assets written to: custom_dlrm/assets


## Conclusion 

Merlin Models provides common and state-of-the-art RecSys architectures in a high-level API as well as all the required low-level building blocks for you to create your own architecture (input blocks, MLP layers, prediction tasks, loss functions, etc.). In this example, we explored a subset of these pre-existing blocks to create the DLRM model, but you can view our [documentation](https://nvidia-merlin.github.io/models/main/) to discover more. You can also [contribute](https://github.com/NVIDIA-Merlin/models/blob/main/CONTRIBUTING.md) to the library by submitting new RecSys architectures and custom building Blocks.  



## Next steps
To learn more about how to deploy the trained DLRM model, please see the [Get Started with Merlin Systems] (TODO: Include a link once it is merged) example that deploys a [NVTabular](https://github.com/NVIDIA-Merlin/NVTabular) Workflow and a trained model from Merlin Models to [Triton Inference Server](https://github.com/triton-inference-server/server). 

