In [1]:
# Copyright 2022 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =====

## 3. Customize and Extend Merlin Models

Merlin Models provides common and state-of-the-art RecSys architectures in a high-level API as well as all the required low-level building blocks (e.g., input blocks, MLP layers, prediction tasks, loss functions, etc.) for you to create your own architecture. 

In this lab, we define DLRM model architecture from scratch and customize it with Merlin Models.

**Learning Objectives**

- Understand the building blocks of Merlin Models
- Define DLRM model architecture with low-level api
- Customize DLRM model with Merlin Models: 
    - Add cross-product transformation (see [Wide & Deep](https://arxiv.org/abs/1606.07792) paper) to the DLRM model.
    - Replace the pairwise interaction layer of DLRM by a cross network (see [DCN-v2](https://arxiv.org/abs/2008.13535) paper).   

**Import Required Libraries**

In [2]:
import os

import glob
import cudf 
import pandas as pd
import numpy as np
import nvtabular as nvt
from nvtabular.ops import *
import gc

from merlin.schema.tags import Tags
import merlin.models.tf as mm
from merlin.io.dataset import Dataset

import tensorflow as tf

2022-09-08 20:12:38.804219: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-08 20:12:41.068057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 16255 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB-LS, pci bus id: 0000:8a:00.0, compute capability: 7.0


In [3]:
seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)

Define data paths.

In [4]:
data_path = '/workspace/data/ecom/'
output_path = os.path.join(data_path,'processed_nvt')

Read processed parquet files as Dataset objects.

In [5]:
train = Dataset(os.path.join(output_path, "train", "*.parquet"), part_size="500MB")
valid = Dataset(os.path.join(output_path, "valid", "*.parquet"), part_size="500MB")

# define schema object
schema = train.schema.without(['event_time_ts', 'user_id_raw', 'product_id_raw'])



In [6]:
target_column = schema.select_by_tag(Tags.TARGET).column_names[0]
target_column

'target'

### 3.1. Introduction to Merlin Models' core building blocks

Let's explain the functions and blocks that we use to build our DLRM model from scratch. The `Block` is the core abstraction in Merlin Models and is the class from which all blocks inherit. The class extends the `tf.keras.layers.Layer` base class and implements a number of properties that simplify the creation of custom blocks and models. These properties include the Schema object for determining the embedding dimensions, input shapes, and output shapes.

**Features Blocks** <br>

`Embeddings:` Creates a ParallelBlock with an EmbeddingTable for each categorical feature in the schema. <br>
`ContinuousFeatures:` Input block for continuous features.

**Connects Methods** <br>
The base class `Block` implements different connects methods that control how to link a given block to other blocks:

- `connect:` Connect the block to other blocks sequentially. The output is a tensor returned by the last block.
- `connect_branch:` Link the block to other blocks in parallel. The output is a dictionary containing the output tensor of each block.
- `connect_with_shortcut:` Connect the block to other blocks sequentially and apply a skip connection with the block's output.
- `connect_with_residual:` Connect the block to other blocks sequentially and apply a residual sum with the block's output.

### 3.2. Build a DLRM model using Merlin Models low-level API

Let's  convert the first five rows of the valid dataset to a batch of input tensors, so that we can check out the outputs from each block below.

In [7]:
batch = mm.sample_batch(valid, batch_size=5, shuffle=False, include_targets=False)

We define the continuous layer based on the schema.

In [8]:
continuous_block = mm.ContinuousFeatures.from_schema(schema, tags=Tags.CONTINUOUS)

We connect the continuous block to an `MLPBlock` to project them into the same dimensionality as the embedding width of categorical features.

In [9]:
bottom_block = continuous_block.connect(mm.MLPBlock([64]))
bottom_block(batch).shape

TensorShape([5, 64])

We define the categorical embedding block based on the schema.

In [10]:
from merlin.models.utils.schema_utils import infer_embedding_dim

embeddings_block = mm.Embeddings(
    schema.select_by_tag(Tags.CATEGORICAL),
    dim = 64
)

We display the output tensor of the categorical embedding block using the data from the first batch. We can see the embeddings tensors of categorical features with a default dimension of 64.

In [11]:
embeddings = embeddings_block(batch)
embeddings.keys(), embeddings["user_id"].shape

(dict_keys(['user_id', 'ts_weekday', 'ts_hour', 'product_id', 'cat_0', 'cat_1', 'cat_2', 'brand']),
 TensorShape([5, 64]))

Let's store the continuous and categorical representations in a single dictionary using a `ParallelBlock` instance.

In [12]:
dlrm_input_block = mm.ParallelBlock(
    {"embeddings": embeddings_block, "bottom_block": bottom_block}
)

By looking at the output, we can see that the ParallelBlock class applies embedding and continuous blocks, in parallel, to the same input batch. Additionally, it merges the resulting tensors into one dictionary.

In [13]:
print("Output shapes of DLRM input block:")
for key, val in dlrm_input_block(batch).items():
    print("\t%s : %s" % (key, val.shape))

Output shapes of DLRM input block:
	user_id : (5, 64)
	ts_weekday : (5, 64)
	ts_hour : (5, 64)
	product_id : (5, 64)
	cat_0 : (5, 64)
	cat_1 : (5, 64)
	cat_2 : (5, 64)
	brand : (5, 64)
	bottom_block : (5, 64)


**Define the interaction block**

Now that we have a vector representation of each input feature, we will create the DLRM interaction block. It consists of three operations:

- Apply a dot product between all continuous and categorical features to learn pairwise interactions.
- Concat the resulting pairwise interaction with the deep representation of conitnuous features (skip-connection).
- Apply an `MLPBlock` with a series of dense layers to the concatenated tensor.

First, we use the `connect_with_shortcut` method to create first two operations of the DLRM interaction block:

In [14]:
from merlin.models.tf.blocks.dlrm import DotProductInteractionBlock

dlrm_interaction = dlrm_input_block.connect_with_shortcut(
    DotProductInteractionBlock(), shortcut_filter=mm.Filter("bottom_block"), aggregation="concat"
)

The `Filter` operation allows us to select the deep_continuous tensor from the dlrm_input_block outputs.

The following diagram provides a visualization of the operations that we constructed in the dlrm_interaction object.

<img src="./images/residual_interaction.png" width="300" height="200">

Uncomment the line below if you want to see the tensor outputs from `dlrm_interaction` block.

In [15]:
#dlrm_interaction(batch)

In [16]:
top_mlp = mm.MLPBlock([128, 64, 32])
dlrm_body = dlrm_interaction.connect(top_mlp)

**Define the Prediction block**

At this stage, we have created the DLRM block that accepts a dictionary of categorical and continuous tensors as input. The output of this block is the interaction representation vector of shape 32. The next step is to use this hidden representation to conduct a given prediction task.

We use the BinaryClassificationTask class and evaluate the performances using the AUC metric.

In [18]:
binary_task = mm.BinaryClassificationTask(schema)

We connect the deep DLRM interaction layer to the binary task head and the method automatically generates the Model class for us. We note that the Model class inherits from `tf.keras.Model` class.

In [19]:
model = mm.Model(dlrm_body, binary_task)

In [20]:
%%time 
model.compile(optimizer='adam', run_eagerly=False, metrics=[tf.keras.metrics.AUC()])
model.fit(train, validation_data=valid, batch_size=4096, epochs=2)

Epoch 1/2
Epoch 2/2
CPU times: user 49.6 s, sys: 7.72 s, total: 57.3 s
Wall time: 34.3 s


<keras.callbacks.History at 0x7f5ad7fe1ee0>

### 3.3. Customize DLRM architecture: Add Cross-Product features to DLRM Model

We can synthetically form new features by multiplying (crossing) two or more sparse features. Crossing combinations of features can provide predictive abilities beyond what those features provide individually (see ref [website](https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture#:~:text=A%20feature%20cross%20is%20a,an%20understanding%20of%20feature%20crosses)). In particular, cross-product feature transformations help the model memorize the niche/rare interactions seen in the user’s past history. Feature crossing has been used as an efficient technique in well-known DL architectures such as [Wide & Deep](https://arxiv.org/abs/1606.07792) in the Wide part. The `HashedCross` class in Merlin Models allows us to perform cross-product transformations between two or multiple sparse categorical features. By using the <i>hashing trick</i>, we can control the dimension of the resulting crossed features. Conceptually, the transformation can be thought of as: `hash(concatenation of features) % num_bins`. 

The purpose of this use case is to illustrate how two categorical variables can be combined to generate one cross-product feature. In more general situations, where you want to create multiple crossed features from a list of categorical variables, you can use the `HashCrossAll` class.

Instead of re-building main blocks from scratch, at this step, we can use `DLRMBlock` instead to define the same DLRM architecture defined above.

In [21]:
dlrm_body = mm.DLRMBlock(schema, embedding_dim=64, bottom_block=mm.MLPBlock([128,64]), top_block= mm.MLPBlock([128, 64, 32]))

Define `cross_features` transformation block.

In [None]:
cross_schema = schema.select_by_name(names=["cat_0", "cat_1"])
cross_features = mm.HashedCross(cross_schema, num_bins=100, output_mode="one_hot")

Uncomment the line below if you want to see the tensor outputs from `cross_features` block.

In [None]:
#cross_features(batch)

To learn more about the `HashedCross` class arguments (e.g., num_bins, output_mode, etc) you can visit [here](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/core/transformations.py#L825-L846).

For large cardinality, it is important to set `sparse=True` for `HashedCross`, `HashedCrossAll` and `CategoryEncoding`, as it makes output sparse (otherwise this might cause OOM). But for this example is fine to keep it as the default `sparse=False`.

We add another feature interaction representation based on the weighted sum of feature crosses, thefore, we connect the `HashedCross` transformation block with a simple single-neuron linear MLP architecture.

In [25]:
# wide part: 

wide_body = cross_features.connect(
    mm.MLPBlock([1], no_activation_last_layer=True), block_name='cross_model'
)

In [26]:
wide_body(batch)

<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[0.1873807 ],
       [0.17904067],
       [0.1873807 ],
       [0.1873807 ],
       [0.1873807 ]], dtype=float32)>

Concat `wide_body` layer to `dlrm_body` layer using `ParallelBlock`.

In [28]:
# wide-and-dlrm
wide_and_dlrm = mm.ParallelBlock({'wide':wide_body, "dlrm": dlrm_body}, aggregation="concat")
binary_task = mm.BinaryClassificationTask(schema)
model = mm.Model(wide_and_dlrm, binary_task)

In [29]:
%%time 
model.compile(optimizer='adam', run_eagerly=False, metrics=[tf.keras.metrics.AUC()])
model.fit(train, validation_data=valid, batch_size=4096, epochs=2)

Epoch 1/2
Epoch 2/2
CPU times: user 53.9 s, sys: 8.04 s, total: 1min 1s
Wall time: 35.1 s


<keras.callbacks.History at 0x7f5ae43c2880>

### 3.4. Replace DotProductInteractionBlock with CrossBlock

In this section, we will replace DotProductInteractionBlock layer with `CrossBlock` that is used in [DCN-v2](https://arxiv.org/pdf/2008.13535.pdf) architecture. CrossBlock uses `Cross` layer which creates interactions of all input features. When used inside `CrossBlock`, stacked `Cross` layers can be used to do high-order features interaction.

We will keep bottom layer same as we did above.

In [30]:
continuous_block = mm.ContinuousFeatures.from_schema(schema, tags=Tags.CONTINUOUS)
bottom_block = continuous_block.connect(mm.MLPBlock([128,64]))

In [31]:
embeddings_block = mm.Embeddings(
    schema.select_by_tag(Tags.CATEGORICAL),
    dim = 64
)
embeddings = embeddings_block(batch)
embeddings.keys(), embeddings["user_id"].shape

(dict_keys(['user_id', 'ts_weekday', 'ts_hour', 'product_id', 'cat_0', 'cat_1', 'cat_2', 'brand']),
 TensorShape([5, 64]))

In [32]:
dlrm_input_block = mm.ParallelBlock(
    {"embeddings": embeddings_block, "bottom_block": bottom_block},
    aggregation="concat"
)

`CrossBlock` block provides a way to create high-order feature interactions by a number of stacked Cross Layers. We set the depth, which is the number of cross-layers to be stacked, to 2.

In [34]:
#stacked
cross_inter_body = dlrm_input_block.connect(mm.CrossBlock(2))
cross_inter_body(batch)

<tf.Tensor: shape=(5, 576), dtype=float32, numpy=
array([[ 0.01346518,  0.01996166,  0.10235769, ..., -0.01322006,
        -0.00873787, -0.03701539],
       [ 0.13175648,  0.03475271,  0.        , ..., -0.01474485,
        -0.00794951, -0.03636587],
       [ 0.        ,  0.05581063,  0.10123038, ..., -0.0129318 ,
        -0.00880974, -0.03923511],
       [ 0.        ,  0.04674374,  0.12530847, ..., -0.0126127 ,
        -0.00837193, -0.04131868],
       [ 0.01239133,  0.02843624,  0.11145297, ..., -0.01299973,
        -0.00858843, -0.03969663]], dtype=float32)>

Concat `cross_inter_body` layer to bottom block using ParallelBlock.

In [35]:
dlrm_interaction = mm.ParallelBlock(
    {"cross_inter_body": cross_inter_body, "bottom_block": bottom_block},
    aggregation="concat"
)                                                

Then, we project the learned interaction using a series of dense layers, this defines the top block.

In [38]:
deep_dlrm_interaction = dlrm_interaction.connect(mm.MLPBlock([128, 64, 32]))

In [39]:
binary_task = mm.BinaryClassificationTask(schema)

We connect the deep DLRM interaction layer to the binary task head, and automatically generate the Model class.

In [40]:
model = mm.Model(deep_dlrm_interaction, binary_task)

In [41]:
%%time 
model.compile(optimizer='adam', run_eagerly=False, metrics=[tf.keras.metrics.AUC()])
model.fit(train, validation_data=valid, batch_size=4096, epochs=2)

Epoch 1/2
Epoch 2/2
CPU times: user 48.6 s, sys: 8.26 s, total: 56.8 s
Wall time: 32.9 s


<keras.callbacks.History at 0x7f5ad719e8e0>

### Summary 

In this hands-on lab we learned how to

- use a subset of pre-existing blocks to create a DLRM model
- add cross-product transformation block to the DLRM Model
- replace DotProductInteractionBlock with CrossBlock

Please execute the cell below to shut down the kernel before moving on to the next notebook, `04-Building-multi-stage-RecSys-with-Merlin-Systems`.

In [38]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}