# End-to-end Recommender System with NVIDIA Merlin and Vertex AI.

This notebook shows how to deploy and execute an end-to-end recommender system on Vertex Pipelines using NVIDIA Merlin.
The notebook covers the following:

1. Training pipeline overview.
2. Set pipeline configurations.
3. Build pipeline container images.
4. Configure pipeline parameters.
5. Compile KFP pipeline.
6. Submit pipeline to Vertex AI.


## 1. Training Pipeline Overview

The following diagram shows the end-to-end pipeline for preprocessing, training, and serving `NVIDIA Merlin` Recommender System using `Vertex AI`.
The pipeline is defined in [src/training_pipelines.py](src/training_pipelines.py) module. 

The `training_bq` pipeline function reads the criteo data from `Cloud Storage` and perform the following steps:

1. Preprocess the data using `NVTabular`, as described in the [01-dataset-preprocessing.ipynb](01-dataset-preprocessing.ipynb) notebook:
    1. Convert CSV data to Parquet and write to `Cloud Storage`.
    2. Transform the data using an `NVTabular` workflow.
    3. Write the transformed data as parquet files and the workflow object to `Cloud Storage`.
2. Train a DeepFM model using `HugeCTR`. This step is submits a [Custom Training Job](https://cloud.google.com/vertex-ai/docs/training/create-custom-job) to `Vertex AI` training, as described in [02-model-training-hugectr.ipynb](02-model-training-hugectr.ipynb).
3. Export the model as a `Triton` Ensemble to be served using `Triton` server. The ensemble consists of of the `NVTabular` preprocessing workflow and a `HugeCTR` model. 
4. The exported `Triton` ensemble model is uploaded to `Vertex AI` model resources.

Once the model is uploaded to `Vertex AI`, a long with a reference to its serving `Triton` container, it can be deployed to `Vertex AI` Prediction, as described in [03-model-inference-hugectr.ipynb](03-model-inference-hugectr.ipynb). 

All the components of the pipelines are defined in the [src/pipelines/components.py](src/pipelines/components.py) module.

<img src="images/merlin-vertex-e2e.png" alt="Pipeline" style="width:50%;"/>

## Setup

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import os
import json
from datetime import datetime
from google.cloud import aiplatform as vertex_ai
from kfp.v2 import compiler

In [5]:
# Project definitions
PROJECT_ID = 'jk-mlops-dev' # Change to your project Id.
REGION = 'us-central1' # Change to your region.

# Bucket definitions
BUCKET = 'jk-staging-us-central1' # Change to your bucket. All the files will be stored here.
MODEL_NAME = 'deepfm'
MODEL_VERSION = 'v-01'
MODEL_DISPLAY_NAME = f'criteo-merlin-recommender-{MODEL_VERSION}'
WORKSPACE = f'gs://{BUCKET}/{MODEL_DISPLAY_NAME}'
TRAINING_PIPELINE_NAME = f'merlin-training-pipeline'

# Docker definitions for data preprocessing
NVT_IMAGE_NAME = 'nvt_preprocessing'
NVT_IMAGE_URI = f'gcr.io/{PROJECT_ID}/{NVT_IMAGE_NAME}'
NVT_DOCKERNAME = 'nvtabular'

# Docker definitions for model training
HUGECTR_IMAGE_NAME = 'hugectr-training'
HUGECTR_IMAGE_URI = f'gcr.io/{PROJECT_ID}/{HUGECTR_IMAGE_NAME}'
HUGECTR_DOCKERNAME = 'hugectr'

# Docker definitions for model serving
TRITON_IMAGE_NAME = f'triton-serving'
TRITON_IMAGE_URI = f'gcr.io/{PROJECT_ID}/{HUGECTR_IMAGE_NAME}'
TRITON_DOCKERNAME = 'triton'

## 2. Set Pipeline Configurations

In [6]:
os.environ['PROJECT_ID'] = PROJECT_ID
os.environ['REGION'] = REGION
os.environ['BUCKET'] = BUCKET
os.environ['WORKSPACE'] = WORKSPACE

os.environ['TRAINING_PIPELINE_NAME'] = TRAINING_PIPELINE_NAME
os.environ['MODEL_NAME'] = MODEL_NAME
os.environ['MODEL_VERSION'] = MODEL_VERSION
os.environ['MODEL_DISPLAY_NAME'] = MODEL_DISPLAY_NAME

os.environ['MEMORY_LIMIT'] = '170G'
os.environ['CPU_LIMIT'] = '24'
os.environ['GPU_LIMIT'] = '2'
os.environ['GPU_TYPE'] = 'NVIDIA_TESLA_A100'

os.environ['MACHINE_TYPE'] = 'a2-highgpu-4g'
os.environ['ACCELERATOR_TYPE'] = 'NVIDIA_TESLA_A100'
os.environ['ACCELERATOR_NUM'] = '4'
os.environ['NUM_WORKERS'] = '4'

os.environ['NUM_SLOTS'] = '26'
os.environ['MAX_NNZ'] = '2'
os.environ['EMBEDDING_VECTOR_SIZE'] = '11'
os.environ['MAX_BATCH_SIZE'] = '64'
os.environ['MODEL_REPOSITORY_PATH'] = '/models'

os.environ['NVT_IMAGE_URI'] = NVT_IMAGE_URI
os.environ['HUGECTR_IMAGE_URI'] = HUGECTR_IMAGE_URI
# os.environ['TRITON_IMAGE_URI'] = TRITON_IMAGE_URI

os.environ['TRITON_IMAGE_URI'] = 'gcr.io/merlin-on-gcp/merlin-inference:22.02'

The following cell lists the configuration values in `config.py`

In [7]:
from src.pipelines import config
import importlib
importlib.reload(config)

for key, value in config.__dict__.items():
    if key.isupper(): print(f'{key}: {value}')

PROJECT_ID: jk-mlops-dev
REGION: us-central1
BUCKET: jk-staging-us-central1
VERTEX_SA: vertex-sa@jk-mlops-dev.iam.gserviceaccount.com
VERSION: v01
MODEL_NAME: deepfm
MODEL_DISPLAY_NAME: criteo-merlin-recommender-v-01
WORKSPACE: gs://jk-staging-us-central1/criteo-merlin-recommender-v-01
STAGING_LOCATION: gs://jk-staging-us-central1/criteo-merlin-recommender-v-01/staging
PREPROCESS_CSV_PIPELINE_NAME: nvt-csv-pipeline
PREPROCESS_CSV_PIPELINE_ROOT: gs://jk-staging-us-central1/criteo-merlin-recommender-v-01/nvt-csv-pipeline
TRAINING_PIPELINE_NAME: merlin-training-pipeline
TRAINING_PIPELINE_ROOT: gs://jk-staging-us-central1/criteo-merlin-recommender-v-01/merlin-training-pipeline
NVT_IMAGE_URI: gcr.io/jk-mlops-dev/nvt_preprocessing
HUGECTR_IMAGE_URI: gcr.io/jk-mlops-dev/hugectr-training
TRITON_IMAGE_URI: gcr.io/merlin-on-gcp/merlin-inference:22.02
INSTANCE_TYPE: n1-highmem-64
MACHINE_TYPE: a2-highgpu-4g
REPLICA_COUNT: 1
ACCELERATOR_TYPE: NVIDIA_TESLA_A100
ACCELERATOR_NUM: 4
NUM_WORKERS: 4
MEM

## 3. Build Pipeline Container Images

The following three commands build the NVTabular preprocessing, HugeCTR training, and Triton serving container images using Cloud Build, and store the container images in Container Registry.

### Build NVTabular preprocessing container image

In [8]:
FILE_LOCATION = './src'
! gcloud builds submit --config src/cloudbuild.yaml --substitutions _DOCKERNAME=$NVT_DOCKERNAME,_IMAGE_URI=$NVT_IMAGE_URI,_FILE_LOCATION=$FILE_LOCATION --timeout=2h --machine-type=e2-highcpu-8

Creating temporary tarball archive of 51 file(s) totalling 5.1 MiB before compression.
Some files were not included in the source upload.

Check the gcloud log [/home/jupyter/.config/gcloud/logs/2022.02.28/17.04.15.882309.log] to see which files and the contents of the
default gcloudignore file used (see `$ gcloud topic gcloudignore` to learn
more).

Uploading tarball of [.] to [gs://jk-mlops-dev_cloudbuild/source/1646067855.967403-1c6f0a8b742a41a7bb25a26f46f7ea72.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/jk-mlops-dev/locations/global/builds/3096255d-4ff6-45c4-9824-6e42efc5cb2a].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/3096255d-4ff6-45c4-9824-6e42efc5cb2a?project=895222332033].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "3096255d-4ff6-45c4-9824-6e42efc5cb2a"

FETCHSOURCE
Fetching storage object: gs://jk-mlops-dev_cloudbuild/source/1646067855.967403-1c6f0a8b742a41a7bb25a26f46f7ea72.

### Build HugeCTR training container image

In [9]:
FILE_LOCATION = './src'
! gcloud builds submit --config src/cloudbuild.yaml --substitutions _DOCKERNAME=$HUGECTR_DOCKERNAME,_IMAGE_URI=$HUGECTR_IMAGE_URI,_FILE_LOCATION=$FILE_LOCATION --timeout=2h --machine-type=e2-highcpu-8

Creating temporary tarball archive of 51 file(s) totalling 5.1 MiB before compression.
Some files were not included in the source upload.

Check the gcloud log [/home/jupyter/.config/gcloud/logs/2022.02.28/17.09.23.277693.log] to see which files and the contents of the
default gcloudignore file used (see `$ gcloud topic gcloudignore` to learn
more).

Uploading tarball of [.] to [gs://jk-mlops-dev_cloudbuild/source/1646068163.359981-490ae70726b14e0cb0b5b50b5b55c9c5.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/jk-mlops-dev/locations/global/builds/817c6844-d86f-4700-a514-eedad4592417].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/817c6844-d86f-4700-a514-eedad4592417?project=895222332033].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "817c6844-d86f-4700-a514-eedad4592417"

FETCHSOURCE
Fetching storage object: gs://jk-mlops-dev_cloudbuild/source/1646068163.359981-490ae70726b14e0cb0b5b50b5b55c9c5.

### Build Triton serving container image

In [None]:
FILE_LOCATION = './src'
! gcloud builds submit --config src/cloudbuild.yaml --substitutions _DOCKERNAME=$TRITON_DOCKERNAME,_IMAGE_URI=$TRITON_IMAGE_URI,_FILE_LOCATION=$FILE_LOCATION --timeout=24h --machine-type=e2-highcpu-8

Creating temporary tarball archive of 51 file(s) totalling 5.2 MiB before compression.
Some files were not included in the source upload.

Check the gcloud log [/home/jupyter/.config/gcloud/logs/2022.02.28/17.14.19.406730.log] to see which files and the contents of the
default gcloudignore file used (see `$ gcloud topic gcloudignore` to learn
more).

Uploading tarball of [.] to [gs://jk-mlops-dev_cloudbuild/source/1646068459.48997-c2e3e9b621df4b4b8fc72ee5f4e2a057.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/jk-mlops-dev/locations/global/builds/3ff947b7-1b52-4124-a4ea-78420a33a1a8].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/3ff947b7-1b52-4124-a4ea-78420a33a1a8?project=895222332033].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "3ff947b7-1b52-4124-a4ea-78420a33a1a8"

FETCHSOURCE
Fetching storage object: gs://jk-mlops-dev_cloudbuild/source/1646068459.48997-c2e3e9b621df4b4b8fc72ee5f4e2a057.tg

## 4. Configure pipeline parameters

In [None]:
# List of path(s) to criteo file(s) or folder(s) in GCS.
# Training files
TRAIN_PATHS = ['gs://renatoleite-criteo-full'] # Training CSV file to be preprocessed.
# Validation files
VALID_PATHS = ['gs://renatoleite-criteo-full/day_0'] # Validation CSV file to be preprocessed.

sep = '\t' # Separator for the CSV file.
num_output_files_train = 24 # Number of output files after converting CSV to Parquet
num_output_files_valid = 1 # Number of output files after converting CSV to Parquet

In [None]:
# Training parameters
NUM_EPOCHS = 0
MAX_ITERATIONS = 50000
EVAL_INTERVAL = 1000
EVAL_BATCHES = 500
EVAL_BATCHES_FINAL = 2500
DISPLAY_INTERVAL = 200
SNAPSHOT_INTERVAL = 0
PER_GPU_BATCHSIZE = 2048
LR = 0.001
DROPOUT_RATE = 0.5

In [None]:
parameter_values = {
    'train_paths': TRAIN_PATHS,
    'valid_paths': VALID_PATHS,
    'shuffle': json.dumps(None), # select PER_PARTITION, PER_WORKER, FULL, or None.
    'sep': sep,
    'num_output_files_train': num_output_files_train,
    'num_output_files_valid': num_output_files_valid,
    'per_gpu_batch_size': PER_GPU_BATCHSIZE,
    'max_iter': MAX_ITERATIONS,
    'max_eval_batches': EVAL_BATCHES ,
    'eval_batches': EVAL_BATCHES_FINAL ,
    'dropout_rate': DROPOUT_RATE,
    'lr': LR ,
    'num_epochs': NUM_EPOCHS,
    'eval_interval': EVAL_INTERVAL,
    'snapshot': SNAPSHOT_INTERVAL,
    'display_interval': DISPLAY_INTERVAL
}

## 5. Compile KFP pipeline

In [None]:
from src.pipelines import training_pipelines

compiled_pipeline_path = 'merlin_training_pipeline.json'
compiler.Compiler().compile(
       pipeline_func=training_pipelines.training_pipeline,
       package_path=compiled_pipeline_path
)

## 6. Submit pipeline to Vertex AI

In [None]:
job_name = f'merlin_training_{datetime.now().strftime("%Y%m%d%H%M%S")}'

pipeline_job = vertex_ai.PipelineJob(
    display_name=job_name,
    template_path=compiled_pipeline_path,
    enable_caching=False,
    parameter_values=parameter_values,
)

pipeline_job.submit()