In [1]:
# Copyright 2022 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =====

<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# Training and Deploying Multi-Stage Recommender Systems

Industrial recommender systems are made up of complex pipelines requiring multiple steps including feature engineering and
preprocessing, a retrieval model for candidate generation, filtering, a feature store query, a ranking model for scoring, and an ordering
stage. These pipelines need to be carefully deployed as a set, requiring coordination during their development and deployment. Data
scientists, ML engineers, and researchers might focus on different stages of recommender systems, however they share a common
desire to reduce the time and effort searching for and combining boilerplate code coming from different sources or writing custom
code from scratch to create their own RecSys pipelines.

This tutorial introduces the Merlin framework which aims to make the development and deployment of recommender systems
easier, providing methods for evaluating existing approaches, developing new ideas and deploying them to production. There are
many techniques, such as different model architectures (e.g. MF, DLRM, DCN, etc), negative sampling strategies, loss functions or
prediction tasks (binary, multi-class, multi-task) that are commonly used in these pipelines. Merlin provides building blocks that allow
RecSys practitioners to focus on the “what” question in designing their model pipeline instead of “how”. Supporting research into new
ideas within the RecSys spaces is equally important and Merlin supports the addition of custom components and the extension of
existing ones to address gaps.

In this tutorial, participants will learn: 
   - how to easily implement common recommender system techniques for comparison, 
   - how to modify components to evaluate new ideas,
   - deploying recommender systems, bringing new ideas to production- using an open source framework Merlin and its libraries.

## 2. Implementing popular RecSys architectures and algorithms with Merlin Models

**Learning Objectives**

- Introduction to the open source framework Merlin and its libraries- NVTabular and Merlin Models
- Pre-processing and feature engineering with NVTabular
- Build and train common recommender models with Merlin Models

### NVIDIA Merlin

Merlin is an open-source framework for building large-scale (deep learning) recommender systems. It is designed to support recommender systems end-to-end from ETL to training to deployment on CPU or GPU. Common deep learning frameworks are integrated such as TensorFlow or PyTorch. Its key benefits are the easy-to-use APIs, accelerations with GPU and scaling to multi-GPU or multi-node systems.

![Merlin](./images/Merlin.png)

### Merlin Models

[Merlin Models](https://github.com/NVIDIA-Merlin/models) is a library to make it easy for users in industry or academia to train and deploy recommender models with best practices baked into the library. This will let users in industry easily train standard models against their own dataset, getting high performance GPU accelerated models into production. This will also let researchers to build custom models by incorporating standard components of deep learning recommender models, and then benchmark their new models on example offline datasets. Core features are:
- Unified API enables users to create models in TensorFlow or PyTorch
- Deep integration with NVTabular for ETL and model serving
- Flexible APIs targeted to both production and research
- Many different recommender system architectures (tabular, two-tower, sequential) or tasks (binary, multi-class classification, multi-task)

### NVTabular 

[NVTabular](https://github.com/NVIDIA-Merlin/NVTabular) is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library. NVTabular helps data scientists and ML engineers to:
- process datasets that exceed GPU and CPU memory without having to worry about scale
- focus on what to do with the data and not how to do it by using abstraction at the operation level
- prepare datasets quickly and easily for experimentation so that more models can be trained.

![Merlin](./images/schema.png)

That's a short introduction into Merlin, NVTabular and Merlin Models. If you are interested to learn more, we provide many examples in our GitHub repositories. 

Let's get started!

### 2.1. Feature Engineering on GPU with NVTabular

In this hands-on tutorial, we use a publicly available [eCommerce behavior dataset](https://www.kaggle.com/mkechinov/ecommerce-behavior-data-from-multi-category-store). The full dataset contains 7 months data (from October 2019 to April 2020) from a large multi-category online store. Each row in the file represents an event. All events are related to products and users. Each event is like many-to-many relation between products and users. Data collected by Open CDP project and the source of the dataset is REES46 Marketing Platform.

We use csv files from 2019-Oct to 2020-April for training and validating our models, so you can visit this site and download the csv files from [Kaggle](https://www.kaggle.com/mkechinov/ecommerce-behavior-data-from-multi-category-store). 

We already performed certain preprocessing steps on the csv files of the raw dataset. You can visit this notebook `01-Data-preparation.ipynb` to go through the code to create the `train` and `valid` parquet files that we are using in this notebook and in the following ones.

Below, we do following data operations with NVTabular:

- Categorify Categories
- Create temporal features
- Apply a user defined function with LambdaOp and transform Continuous features
- Target Encoding
- Tagging input feature and target columns

In this lab we show how to use NVTabular operations for prepocessing and feature engineering, but we are not going into details of NVTabular. To learn more about NVTabular operators please visit the [documentation](https://nvidia-merlin.github.io/NVTabular/main/Introduction.html) page.

**Import Required Libraries**

In [2]:
import os

import glob
import cudf 
import pandas as pd
import numpy as np
import nvtabular as nvt
from nvtabular.ops import *
import gc


from merlin.schema.tags import Tags
import merlin.models.tf as mm
from merlin.io.dataset import Dataset

import tensorflow as tf

2022-09-09 15:42:28.712492: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-09 15:42:31.792996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 16255 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB-LS, pci bus id: 0000:8a:00.0, compute capability: 7.0


In [3]:
seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)

In [4]:
data_path = '/workspace/data/ecom/'
output_path = os.path.join(data_path,'processed_nvt')

Read raw parquet files

In [5]:
train_dataset = nvt.Dataset(os.path.join(data_path, 'train.parquet'))
valid_dataset = nvt.Dataset(os.path.join(data_path, 'valid.parquet'))



Let's print out a few rows from our training dataset.

In [6]:
train_dataset.to_ddf().head()

Unnamed: 0,user_id,product_id,event_time,event_type,brand,price,user_session,target,cat_0,cat_1,cat_2,cat_3,timestamp,ts_month,event_time_ts
0,635096898,26205398,2020-03-31 20:00:17 UTC,purchase,,178.380005,27282c23-cf25-436f-87f9-b1fefa8ecee3,1,construction,components,faucet,,2020-03-31 20:00:17,3,1585684817000000000
1,635096898,26205378,2020-03-31 19:58:21 UTC,purchase,,263.070007,27282c23-cf25-436f-87f9-b1fefa8ecee3,1,construction,components,faucet,,2020-03-31 19:58:21,3,1585684701000000000
2,635096898,26500131,2020-03-31 19:55:25 UTC,purchase,lucente,194.820007,27282c23-cf25-436f-87f9-b1fefa8ecee3,1,kids,toys,,,2020-03-31 19:55:25,3,1585684525000000000
3,635096898,26500571,2020-03-31 19:47:51 UTC,cart,lucente,205.580002,27282c23-cf25-436f-87f9-b1fefa8ecee3,0,kids,toys,,,2020-03-31 19:47:51,3,1585684071000000000
4,635096898,26500149,2020-03-31 19:46:45 UTC,purchase,lucente,309.809998,27282c23-cf25-436f-87f9-b1fefa8ecee3,1,kids,toys,,,2020-03-31 19:46:45,3,1585684005000000000


Categorify categorical columns.

In [7]:
user_id = ["user_id"] >> Categorify(dtype='int32') >> TagAsUserID()

item_id = ["product_id"] >> Categorify(dtype='int32') >> TagAsItemID()

item_features = ["cat_0", "cat_1", "cat_2", "brand"] >> Categorify(dtype='int32') >> TagAsItemFeatures()

Create temporal features and categorify them.

In [8]:
weekday = (
    ["timestamp"] >> 
    LambdaOp(lambda col: col.dt.weekday) >> 
    Rename(name ='ts_weekday')
)

hour = (
    ["timestamp"] >> 
    LambdaOp(lambda col: col.dt.hour) >> 
    Rename(name ='ts_hour')
)

timestamp = ['event_time_ts'] >> nvt.ops.AddMetadata(tags=[Tags.TIME]) 

context_features = (
    (weekday + hour)  
    >> Categorify(dtype='int32') >> TagAsUserFeatures()
)

Apply a user defined function to calculate relative price to the average price for the product_id using `LambdaOp` and transform continuous features.

In [9]:
# Relative price to the average price for the product_id
def relative_price_to_avg_pr(col, gdf):
    epsilon = 1e-5
    col = ((gdf['price'] - col) / (col + epsilon)) * (col > 0).astype(int)
    return col


price = (
    ['price']
    >> FillMissing(0)
    >> LogOp()
    >> Normalize()
    >> LambdaOp(lambda col: col.astype("float32"))
    >> TagAsItemFeatures()
)   

avg_price_product = ['product_id'] >> JoinGroupby(cont_cols =['price'], stats=["mean"])

relative_price_to_avg = (
    avg_price_product 
    >> LambdaOp(relative_price_to_avg_pr, dependency=['price']) 
    >> LambdaOp(lambda col: col.astype("float32"))
    >> Rename(name='relative_price')
    >> AddMetadata(tags=["item", Tags.CONTINUOUS])
)

Below, we apply target encoding (TE) to categorical columns. `TE` is a popular feature engineering technique for tabular data. `TE` calculates the statistics from a target variable grouped by the unique values of one or more categorical features. It was used by many top solutions in the [RecSys2020](https://blog.twitter.com/engineering/en_us/topics/insights/2020/what_twitter_learned_from_recsys2020) competition. In previous years, TE was used, as well. You can read and lear more about target encoding in this [blog post](https://medium.com/rapids-ai/target-encoding-with-rapids-cuml-do-more-with-your-categorical-data-8c762c79e784).

In [10]:
cat_groups =  nvt.ColumnSelector(['user_id', 'brand', 'cat_1', 'cat_2'])
label = nvt.ColumnSelector(["target"])
te_features = cat_groups >> TargetEncoding(label)
te_features_norm = te_features >> Normalize() >> LambdaOp(lambda col: col.astype("float32")) >> TagAsItemFeatures()

Tag target column and original raw user_id and item_id columns. We keep raw user_id and item_id because in the next notebook we will register them to feature store.

In [11]:
user_id_raw = ["user_id"] >> Rename(postfix='_raw') >> TagAsUserFeatures()
item_id_raw = ["product_id"] >> Rename(postfix='_raw') >> TagAsItemFeatures()

target = (
    ["target"] 
    >> nvt.ops.AddMetadata(tags=[Tags.BINARY_CLASSIFICATION, "target"]) 
)

Create workflow output node.

In [12]:
outputs = (user_id  + 
           context_features + 
           item_id + 
           item_features + 
           price + 
           relative_price_to_avg + 
           te_features_norm + 
           timestamp +
           user_id_raw +
           item_id_raw +
           target
          )
workflow = nvt.Workflow(outputs)

In [13]:
%%time
workflow.fit(train_dataset)

workflow.transform(train_dataset).to_parquet(
    output_path=os.path.join(output_path, "train")
)

workflow.transform(valid_dataset).to_parquet(
    output_path=os.path.join(output_path, "valid")
)



CPU times: user 4.72 s, sys: 2.54 s, total: 7.26 s
Wall time: 18.1 s


We can save the workflow.

In [14]:
workflow.save(os.path.join(output_path, "workflow"))

NVTabular exported the schema file of our processed dataset. The schema.pbtxt is a protobuf text file contains features metadata, including statistics about features such as cardinality, min and max values and also tags based on their characteristics and dtypes (e.g., categorical, continuous, list, item_id). The metadata information is loaded from schema and their tags are used to automatically set the parameters of Merlin Models. In other words, Merlin Models relies on the schema object to automatically build all necessary input and output layers.

To learn more about NVTabular and schema object you can visit the example notebooks in the NVTabular [repo](https://github.com/NVIDIA-Merlin/NVTabular/tree/main/examples) and Merlin Models [repo](https://github.com/NVIDIA-Merlin/models/blob/main/examples/02-Merlin-Models-and-NVTabular-integration.ipynb).

### 2.2. Model Building and Training on GPU with Merlin Models

**GOAL:** In this lab, we build ranking models for a binary classification task, which aims to predict the likelihood (a relevance score) of a product to be purchased by a given user.

**Read processed parquet files as Dataset objects**

In [15]:
train = Dataset(os.path.join(output_path, "train", "*.parquet"), part_size="500MB")
valid = Dataset(os.path.join(output_path, "valid", "*.parquet"), part_size="500MB")

# define schema object
schema = train.schema.without(['event_time_ts', 'user_id_raw', 'product_id_raw'])

In [16]:
schema

Unnamed: 0,name,tags,dtype,is_list,is_ragged,properties.num_buckets,properties.freq_threshold,properties.max_size,properties.start_index,properties.cat_path,properties.embedding_sizes.cardinality,properties.embedding_sizes.dimension,properties.domain.min,properties.domain.max
0,user_id,"(Tags.CATEGORICAL, Tags.USER, Tags.ID, Tags.US...",int32,False,False,,0.0,0.0,0.0,.//categories/unique.user_id.parquet,350630.0,512.0,0.0,350629.0
1,ts_weekday,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.ts_weekday.parquet,8.0,16.0,0.0,7.0
2,ts_hour,"(Tags.CATEGORICAL, Tags.USER)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.ts_hour.parquet,25.0,16.0,0.0,24.0
3,product_id,"(Tags.CATEGORICAL, Tags.ITEM_ID, Tags.ID, Tags...",int32,False,False,,0.0,0.0,0.0,.//categories/unique.product_id.parquet,51376.0,512.0,0.0,51375.0
4,cat_0,"(Tags.CATEGORICAL, Tags.ITEM)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.cat_0.parquet,14.0,16.0,0.0,13.0
5,cat_1,"(Tags.CATEGORICAL, Tags.ITEM)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.cat_1.parquet,61.0,16.0,0.0,60.0
6,cat_2,"(Tags.CATEGORICAL, Tags.ITEM)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.cat_2.parquet,90.0,20.0,0.0,89.0
7,brand,"(Tags.CATEGORICAL, Tags.ITEM)",int32,False,False,,0.0,0.0,0.0,.//categories/unique.brand.parquet,2653.0,132.0,0.0,2652.0
8,price,"(Tags.CONTINUOUS, Tags.ITEM)",float32,False,False,,,,,,,,,
9,relative_price,"(Tags.CONTINUOUS, Tags.ITEM)",float32,False,False,,,,,,,,,


In [17]:
target_column = schema.select_by_tag(Tags.TARGET).column_names[0]
target_column

'target'

#### 2.2.1. DLRM Model

Deep Learning Recommendation Model (DLRM) architecture is a popular neural network model originally proposed by Facebook in 2019 as a personalization deep learning model.

<img src="./images/DLRM.png" width="600" height="400">

DLRM accepts two types of features: categorical and numerical.

- For each categorical feature, an embedding table is used to provide dense representation to each unique value.
- For numerical features, they are fed to model as dense features, and then transformed by a simple neural network referred to as "bottom MLP". This part of the network consists of a series of linear layers with ReLU activations.
- The output of the bottom MLP and the embedding vectors are then fed into the dot product interaction operation (see Pairwise interaction step). The output of "dot interaction" is then concatenated with the features resulting from the bottom MLP (we apply a skip-connection there) and fed into the "top MLP" which is also a series of dense layers with activations ((a fully connected NN).
- The model outputs a single number (here we use sigmoid function to generate probabilities) which can be interpreted as a likelihood of a certain user clicking on an ad, watching a movie, or viewing a news page.

In [18]:
model = mm.DLRMModel(
    schema,
    embedding_dim=64,
    bottom_block=mm.MLPBlock([128, 64]),
    top_block=mm.MLPBlock([128, 64, 32]),
    prediction_tasks=mm.BinaryClassificationTask(target_column),
)

In [19]:
%%time 
model.compile(optimizer='adam', run_eagerly=False, metrics=[tf.keras.metrics.AUC()])
model.fit(train, validation_data=valid, batch_size=4096, epochs=2)

Epoch 1/2
Epoch 2/2
CPU times: user 46.1 s, sys: 7.63 s, total: 53.7 s
Wall time: 34 s


<keras.callbacks.History at 0x7f4016079280>

#### 2.2.2. DCN Model

DCN-V2 is an architecture proposed as an improvement upon the original [DCN model](https://arxiv.org/pdf/1708.05123.pdf). The explicit feature interactions of the inputs are learned through cross layers, and then combined with a deep network to learn complementary implicit interactions. The overall model architecture is depicted in Figure below, with two ways to combine the cross network with the deep network: (1) stacked and (2) parallel. The output of the embbedding layer is the concatenation of all the embedded vectors and the normalized dense features: x<sub>0</sub> = [x<sub>embed,1</sub>; . . . ; x<sub>embed,𝑛</sub>; 𝑥<sub>dense</sub>].

![DCN](./images/DCN.png)

<a href="https://arxiv.org/abs/2008.13535">Image Source: DCN V2 paper</a>

In this example, we build `DCN-v2 stacked` architecture. 

In [20]:
model = mm.DCNModel(
    schema,
    depth=2,
    deep_block=mm.MLPBlock([64, 32]),
    prediction_tasks=mm.BinaryClassificationTask(target_column),
)

In [21]:
model.compile('adam', run_eagerly=False, metrics=[tf.keras.metrics.AUC()])
model.fit(train, validation_data=valid, batch_size=4096, epochs=2)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f4005e80790>

**Injecting additional Hyper-parameters to DCN Model**

Hyperparameter tuning (optimization) is an important phenomenon to find the possible best sets of hyperparameters to build and train the model from a given dataset. Hyperparameter tuning can be done manually or be managed by an algorithm like grid search and bayesian optimization. The latter optimizes the search for the best hyperparameters guided by a metric that needs to be maximized or minimized.

Below, we showcase how we can inject certain hyperparameters to our DCN model and set their values. Let's first give some explanation about these hyperparameters:

In Merlin Models, we introduce a data class called, `EmdeddingOptions`, that  contains different parameters to configure the embedding tables of categorical variables. Among them are the embedding dimension, a boolean flag to set the optimal dimension inferred from the variable cardinality, and the embeddings' l2 regularization. `embedding_dims` controls the dimensionality of embeddings that will be created for categorical columns. Below, we use this argument to set embedding dimensions for `item_ids` and `user_ids` columns. The higher the value, the greater capacity our model will have. But greater capacity beyond a certain point might result in overfitting. Picking a good embedding size can go a long way and is best arrived at by experimentation. A related parameter here is `embeddings_l2_reg`. The higher the value, the greater the constraint put on the variability of values in our embeddings. This is another parameter that we can use to control the capacity of our model.


In the [DCNModel](https://github.com/NVIDIA-Merlin/models/blob/main/merlin/models/tf/models/ranking.py#L87) constructor, the `depth` parameter specifies the number of cross-layers to be stacked. The default value is 1, and going above again increases the capacity of the model and makes the mappings that it can learn more expressive.

The `deep_block` parameter is a Multilayer Perceptron block consisting of a stack of linear layers. Here the parameters we can alter are the number and dimensionality of layers (`[64, 32]` that is passed indicates 64 nodes in the first layer and 32 in the second), the activation of our layers controlled by the `activation` parameter.

We again have two parameters for specifying the level of regularization:`kernel_regularizer` and `bias_regularizer` that allow us to apply penalties on layer parameters or layer activity during optimization.`kernel_regularizer` is used to apply a penalty on the layer's weights, whereas `bias_regularizer` is used to apply a penalty on the layer's bias. You can learn more about tf.keras.regularizers [here](https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/Regularizer).

The `dropout` parameter controls the amount of dropout that will be applied. With a value greater than 0, during training, some nodes (with probability equal to the `dropout` parameter value) will be excluded from the calculation. This is helpful as it can prevent overspecialization -- no node can be completely sure it will receive input from any other node in the forward pass. This parameter can be thought of as another way to combat overfitting.

Last but not least, we can control the `epochs` parameter. Depending on other values that we specify, training for this or another number of epochs can lead to better results. We can use hyperparameter optimization to discover that value!

In [22]:
from merlin.models.utils import schema_utils
from tensorflow.keras import regularizers
embedding_dims = {}

item_id_feature_name = schema.select_by_tag(Tags.ITEM_ID).column_names[0]
item_id_domain = schema_utils.categorical_domains(schema)[
    item_id_feature_name
]
embedding_dims[item_id_domain] = 128

user_id_feature_name = schema.select_by_tag(Tags.USER_ID).column_names[0]
user_id_domain = schema_utils.categorical_domains(schema)[
    user_id_feature_name
]
embedding_dims[user_id_domain] = 128

embedding_options = mm.EmbeddingOptions(
    embedding_dims=embedding_dims,
    infer_embedding_sizes=True,
    embeddings_l2_reg=0.00
    )

In [23]:
model = mm.DCNModel(
    schema,
    depth=2,
    deep_block=mm.MLPBlock(
        [64, 32],
        activation='selu',
        no_activation_last_layer=False,
        dropout=0.015,
        kernel_regularizer=regularizers.l2(1e-5),
        bias_regularizer=regularizers.l2(1e-5),
    ),
    stacked=True,
    embedding_options=embedding_options,
    prediction_tasks=mm.BinaryClassificationTask('target'),
)

In [24]:
model.compile('adam', run_eagerly=False, metrics=[tf.keras.metrics.AUC()])
model.fit(train, validation_data=valid, batch_size=4096, epochs=2)

Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f40054ef2b0>

#### 2.2.3. XGBoost

[XGBoost](https://xgboost.ai/), which stands for Extreme Gradient Boosting, is a scalable, distributed gradient-boosted decision tree (GBDT) machine learning library. It provides parallel tree boosting and is the leading machine learning library for regression, classification, and ranking problems.

A Gradient Boosting Decision Trees (GBDT) is a decision tree ensemble learning algorithm similar to random forest, for classification and regression. Ensemble learning algorithms combine multiple machine learning algorithms to obtain a better model. Both random forest and GBDT build a model consisting of multiple decision trees. The difference is in how the trees are built and combined.

The term “gradient boosting” comes from the idea of “boosting” or improving a single weak model by combining it with a number of other weak models in order to generate a collectively strong model. Gradient boosting is an extension of boosting where the process of additively generating weak models is formalized as a gradient descent algorithm over an objective function. `XGBoost` is a scalable and highly accurate implementation of gradient boosting that pushes the limits of computing power for boosted tree algorithms, being built largely for energizing machine learning model performance and computational speed. With XGBoost, trees are built in parallel, instead of sequentially like GBDT. You can read more about XGBoost [here](https://www.nvidia.com/en-us/glossary/data-science/xgboost/).

In order to facilitate training on data larger than the available GPU memory, the training will leverage Dask. All the complexity of starting a local dask cluster is hidden in the Distributed context manager.

Without further ado, let's build and train our XGB model.

In [25]:
from merlin.core.utils import Distributed
from merlin.models.xgb import XGBoost
xgb_booster_params = {
    'objective':'binary:logistic',
    'tree_method':'gpu_hist',
    'eval_metric': "auc"
}

xgb_train_params = {
    'num_boost_round': 100,
    'verbose_eval': 10,
}


with Distributed():
    model = XGBoost(schema=schema, **xgb_booster_params)
    model.fit(
        train,
        evals=[(valid, 'validation_set'),],
        use_quantile = False,
        **xgb_train_params
    )
    metrics = model.evaluate(valid)

2022-09-09 15:44:34,315 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
Failed to bind address 'None', trying to use '127.0.0.1' instead.
[15:44:39] task [xgboost.dask]:tcp://127.0.0.1:36865 got new rank 0


[0]	validation_set-auc:0.63497
[10]	validation_set-auc:0.63740
[20]	validation_set-auc:0.63688
[30]	validation_set-auc:0.63625
[40]	validation_set-auc:0.63632
[50]	validation_set-auc:0.63618
[60]	validation_set-auc:0.63623
[70]	validation_set-auc:0.63621
[80]	validation_set-auc:0.63642
[90]	validation_set-auc:0.63650
[99]	validation_set-auc:0.63650


Print eval metrics.

In [26]:
metrics

{'auc': 0.6364974402143888}

### Summary 

In this hands-on lab we learned how
- to do feature preprocessing and generation on GPU using NVTabular library
- NVTabular and Merlin Models are seamlessly integrated via schema object
- build and train popular Recommender model architectures easily with Merlin Models library

Please execute the cell below to shut down the kernel before moving on to the next notebook `03-Customize-Merlin-Models`.

In [27]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}