In [1]:
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ================================

## Build your own DLRM model 

#TODO:  Add general introduction + Add references to DLRM paper and merlin example notebook 

### Learning objectives
- Getting familiarized with Merlin models building blocks
- Building a model from scratch

### Introduction to Merlin-models core building blocks

The [Block](https://nvidia-merlin.github.io/models/review/pr-294/generated/merlin.models.tf.Block.html#merlin.models.tf.Block) is the core abstraction in Merlin models and is the class from which all blocks inherit.
The class extends the [tf.keras.layers.Layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Layer) base class and implements a number of properties that simplify the creation of custom blocks/models. These properties include the `Schema` object for determining the embedding dimensions, input shapes, and output shapes. Additionally, `Block` has a `BlockContext` to store/retrieve public variables and share them with other blocks in the same model as additional meta-data. 



For this example, we will combine pre-existing blocks and demonstrate how to create the DLRM architecture. So let's start by listing the core Blocks you need to know to build a model from scratch: 

#### Features Blocks

They include input blocks to process various inputs based on their types and shapes. We support three main Blocks: 
- `EmbeddingFeatures`: Input block for embedding-lookups for categorical features.
- `SequenceEmbeddingFeatures`: Input block for embedding-lookups for sequential categorical features (3D tensors).
- `ContinuousFeatures`: Input block for continuous features.

#### Transformations Blocks

They include various operators commonly used to transform tensors in various parts of the model, such as: 

- `AsDenseFeatures`: It takes a dictionary of raw input tensors and transforms the sparse ones into dense tensors.
- `L2Norm`: It takes a single or a dictionary of hidden tensors and applies an L2-normalization along a given axis. 
- `LogitsTemperatureScaler`: It scales the output tensor of predicted logits to lower the model's confidence. 

#### Aggregations Blocks

They inlude common aggregation ops to combine multiple tenors, such as: 
- `ConcatFeatures`:
- `StackFeatures`:
- `CosineSimilarity`: 

#TODO add description of each block

#### Combinators Blocks

They include the base ops to combine different blocks together: 
- `SequentialBlock`
- `ParallelBlock`
- `ResidualBlock`

#TODO add description of each block

### Prediction Blocks

They include the common prediction tasks and are responsible of building the final prediction scores, computing the loss and evaluation metrics: 

- `BinaryClassificationTask`
- `MultiClassClassificationTask`
- `RegressionTask`

#TODO add description of each block

### Build the DLRM model with Movielens-1M data

Now that we have iterated introduced the core blocks of Merlin models, let's take a look at how we can combine them to build the DLRM architecture:

In [2]:
import tensorflow as tf
import merlin.models.tf as mm

from merlin.models.data.movielens import get_movielens
from merlin.schema.tags import Tags

2022-03-24 15:35:07.136315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24570 MB memory:  -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:65:00.0, compute capability: 8.6


We will use the utils function to download, extract and preprocess the MovieLens 1M  dataset.

In [3]:
train, valid = get_movielens(variant="ml-1m")



In [4]:
valid.head()

Unnamed: 0,userId,movieId,title,genres,gender,age,occupation,zipcode,TE_age_rating,TE_gender_rating,TE_occupation_rating,TE_zipcode_rating,TE_movieId_rating,TE_userId_rating,rating_binary,rating
0,2181,935,932,"[3, 5]",2,6,11,1770,0.982804,0.989014,0.640637,0.469049,0.440049,0.537162,0,3.0
1,147,4,4,"[3, 7, 6, 5, 11]",1,4,4,282,0.497828,0.0,0.425216,0.775113,0.811054,0.809315,0,3.0
2,2719,40,40,"[8, 4]",2,4,4,784,0.484754,0.943272,0.436012,0.671168,0.986661,0.781915,1,5.0
3,956,1270,1269,[18],1,1,2,352,0.140412,0.014632,0.326065,0.668503,0.850387,0.665899,0,3.0
4,314,746,746,[9],1,4,15,523,0.497828,0.0,0.330956,0.519841,0.503531,0.581438,0,1.0


We take the first batch of input tensors to check the outputs of each building block

In [5]:
from merlin.models.tf.dataset import BatchedDataset
batch = next(iter(BatchedDataset(valid, batch_size=4, shuffle=False)))[0]
batch.keys()

dict_keys(['genres', 'userId', 'movieId', 'title', 'gender', 'age', 'occupation', 'zipcode', 'TE_age_rating', 'TE_gender_rating', 'TE_occupation_rating', 'TE_zipcode_rating', 'TE_movieId_rating', 'TE_userId_rating'])

### Build the inputs block

For the sake of simplicity, let's create a schema with a subset of the following continuous and categorical features: 

In [6]:
sub_schema = train.schema.select_by_name(['userId', 'movieId', 'title', 'gender', 'TE_zipcode_rating', 'TE_movieId_rating', 'rating_binary'])

We define the continous layer based on the schema

In [7]:
continous_block = mm.ContinuousFeatures.from_schema(sub_schema, tags=Tags.CONTINUOUS)

We visualize the output tensor of the continuous block using data of the first `batch`: it returns the raw tensors of continuous features 

In [8]:
continous_block(batch)

{'TE_zipcode_rating': <tf.Tensor: shape=(4, 1), dtype=float32, numpy=
 array([[0.46904874],
        [0.7751127 ],
        [0.67116815],
        [0.6685025 ]], dtype=float32)>,
 'TE_movieId_rating': <tf.Tensor: shape=(4, 1), dtype=float32, numpy=
 array([[0.44004914],
        [0.8110535 ],
        [0.9866613 ],
        [0.8503867 ]], dtype=float32)>}

We connect the continuous block to an `MLPBlock` so as to project them in a higher dimensional space 

In [9]:
deep_continous_block = continous_block.connect(mm.MLPBlock([64]))
deep_continous_block(batch).shape

2022-03-24 15:35:34.172673: I tensorflow/stream_executor/cuda/cuda_blas.cc:1792] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


TensorShape([4, 64])

We define the categorical embedding block based on the schema

In [10]:
embedding_block = mm.EmbeddingFeatures.from_schema(sub_schema)

We visualize the output tensor of the categorical block using data of the first `batch`: it returns the embeddings tensors of categorical features with a default dimension of 64

In [11]:
embeddings = embedding_block(batch)
embeddings.keys(), embeddings['userId'].shape

(dict_keys(['userId', 'movieId', 'title', 'gender']), TensorShape([4, 64]))

Let's store the continuous and categorical representations in a single dictionary using `ParallelBlock`

In [12]:
dlrm_input_block = mm.ParallelBlock({"embeddings": embedding_block, "continuous": deep_continous_block})
print("Output shapes of DLRM input block:")
for key, val in dlrm_input_block(batch).items(): 
    print("\t%s : %s" %(key, val.shape))

Output shapes of DLRM input block:
	userId : (4, 64)
	movieId : (4, 64)
	title : (4, 64)
	gender : (4, 64)
	continuous : (4, 64)


### Build the interaction block

Now that we have a vector representation of each input feature, we will create the DLRM interaction block. It consists of three operations: 
- Apply a dot product between all continous and categorical features to learn pairwise interactions. 
- Concat the resulting pairwise interaction with the deep representation of conitnuous features. 
- Apply an `MLPBlock` with a series of layers on the concatenated tensor. 

The `Block` implements a method `connect_with_shortcut` that connects the input block to other blocks sequentially with a residual connection.

#TODO: Add a simple diagram to visualize the residual connection

First, we will use `connect_with_shortcut` to build the two first operations of DLRM interaction block

In [13]:
from merlin.models.tf.blocks.dlrm import DotProductInteractionBlock
dlrm_interaction = dlrm_input_block.connect_with_shortcut(
    DotProductInteractionBlock(), 
    shortcut_filter=mm.Filter("bottom_block"), 
    aggregation="concat"
)

In [14]:
dlrm_interaction(batch)

<tf.Tensor: shape=(4, 2016), dtype=float32, numpy=
array([[ 0.02296983, -0.01505249, -0.02917415, ...,  0.01208645,
         0.02926546,  0.04685134],
       [ 0.00456675,  0.01728372, -0.00656997, ...,  0.05093538,
         0.04981958,  0.05963509],
       [ 0.01007329,  0.02456067, -0.02941924, ...,  0.01750099,
         0.04780834,  0.00441987],
       [-0.00547133, -0.02438645,  0.03615618, ...,  0.08643801,
         0.03780769,  0.01758762]], dtype=float32)>

Then, we project the learned interaction using a series of dense layers

In [15]:
deep_dlrm_interaction = dlrm_interaction.connect(mm.MLPBlock([64, 128, 512]))
deep_dlrm_interaction(batch)

<tf.Tensor: shape=(4, 512), dtype=float32, numpy=
array([[0.00150441, 0.        , 0.        , ..., 0.        , 0.00156314,
        0.00148631],
       [0.        , 0.00512344, 0.00612389, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.00159896, ..., 0.        , 0.        ,
        0.00342017],
       [0.00124279, 0.00247318, 0.00704529, ..., 0.        , 0.000864  ,
        0.0008056 ]], dtype=float32)>

### Build the Prediction block

At this stage, we have built the DLRM block that takes as input a dictionary of categorical and continuous tensors and returns the interaction representation vector of shape `512`. The next step is to use this hidden representation to conduct a given prediction task. In our case, we will use the label `rating_binary` and the objective is to: Predict if a user `A` will give a high rating to a movie `B` or not. 

We will use the `BinaryClassificationTask` and evaluate the performances using `auc` metric. We will also use `LogitsTemperatureScaler` as a pre-transformation Op that scales the logits returned by the task before computing the loss and metrics. 

In [16]:
from merlin.models.tf.blocks.core.transformations import LogitsTemperatureScaler
binary_task = mm.BinaryClassificationTask(
    target_name=sub_schema.select_by_tag(Tags.TARGET).column_names[0],
    metrics=[tf.keras.metrics.AUC], 
    pre=LogitsTemperatureScaler(temperature=2)
)

### Build, train and evaluate the final DLRM Model

We connect the `deep_dlrm_interaction` to the `binary_task` and the method will automatically  generate the `Model` class for us.
We note that the `Model` inherits from [tf.keras.Model](https://keras.io/api/models/model/) class. 

In [17]:
model = deep_dlrm_interaction.connect(binary_task)
type(model)

merlin.models.tf.models.base.Model

We train the model using Built-in Keras `fit` method: 

In [18]:
model.compile(optimizer="adam")
model.fit(train, batch_size=1024, epochs=5)

2022-03-24 15:35:34.946572: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


Epoch 1/5








Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f97b530ecd0>

- We get the evaluation scores 

In [19]:
model.evaluate(valid, batch_size=1024, return_dict=True)

2022-03-24 15:36:27.139613: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: cond/then/_0/cond/cond/branch_executed/_128




{'rating_binary/binary_classification_task/auc': 0.7477688789367676,
 'loss': 2.4032702445983887,
 'regularization_loss': 0.0,
 'total_loss': 2.4032702445983887}

## Conclusion 

#TODO

## Next Steps

#TODO