# End-to-end Recommender System with NVIDIA Merlin and Vertex AI.

This notebook shows how to deploy and execute an end-to-end recommender system on Vertex Pipelines using NVIDIA Merlin.
The notebook covers the following:

1. Training pipeline overview.
2. Set pipeline configurations.
3. Build pipeline container images.
4. Configure pipeline parameters.
5. Compile KFP pipeline.
6. Submit pipeline to Vertex AI.


## 1. Training Pipeline Overview

The following diagram shows the end-to-end pipeline for preprocessing, training, and serving `NVIDIA Merlin` Recommender System using `Vertex AI`.
The pipeline is defined in [src/training_pipelines.py](src/training_pipelines.py) module. 

The `training_bq` pipeline function reads the criteo data from `BigQuery` and perform the following steps:

1. Preprocess the data using `NVTabular`, as described in the [01-dataset-preprocessing.ipynb](01-dataset-preprocessing.ipynb) notebook:
    1. Export the `BigQuery` data to `Cloud Storage` as Parquet files.
    2. Transform the data using an `NVTabular` workflow.
    3. Write the transformed data as parquet files and the workflow object to `Cloud Storage`.
2. Train a DeepFM model using `HugeCTR`. This step is submits a [Custom Training Job](https://cloud.google.com/vertex-ai/docs/training/create-custom-job) to `Vertex AI` training, as described in [02-model-training-hugectr.ipynb](02-model-training-hugectr.ipynb).
3. Export the model as a `Triton` Ensemble to be served using `Triton` server. The ensemble consists of of the `NVTabular` preprocessing workflow and a `HugeCTR` model. 
4. The exported `Triton` ensemble model is uploaded to `Vertex AI` model resources.

Once the model is uploaded to `Vertex AI`, a long with a reference to its serving `Triton` container, it can be deployed to `Vertex AI` Prediction, as described in [03-model-inference-hugectr.ipynb](03-model-inference-hugectr.ipynb). 

All the components of the pipelines are defined in the [src/pipelines/components.py](src/pipelines/components.py) module.

<img src="images/merlin-vertex-e2e.png" alt="Pipeline" style="height: 50%; width:50%;"/>

## Setup

In [None]:
import os
import json
from datetime import datetime
from google.cloud import aiplatform as vertex_ai
from kfp.v2 import compiler

In [None]:
PROJECT_ID = 'merlin-on-gcp' # Change to your project Id.
REGION = 'us-central1' # Change to your region.
BUCKET =  'merlin-on-gcp' # Change to your bucket.

MODEL_NAME = 'deepfm'
MODEL_VERSION = 'v01'
MODEL_DISPLAY_NAME = f'criteo-hugectr-{MODEL_NAME}-{MODEL_VERSION}'
WORKSPACE = f'gs://{BUCKET}/{MODEL_DISPLAY_NAME}'
TRAINING_PIPELINE_NAME = f'merlin-training-pipeline'

BQ_DATASET_NAME = 'criteo_pipeline' # Set to your BigQuery dataset including the Criteo dataset.
BQ_LOCATION = 'us' # Set to your BigQuery dataset location.
BQ_TRAIN_TABLE_NAME = 'train'
BQ_VALID_TABLE_NAME = 'valid'

NVT_IMAGE_NAME = 'nvt_preprocessing'
NVT_IMAGE_URI = f'gcr.io/{PROJECT_ID}/{NVT_IMAGE_NAME}'
NVT_DOCKERNAME = 'nvtabular'

HUGECTR_IMAGE_NAME = 'hugectr_training'
HUGECTR_IMAGE_URI = f'gcr.io/{PROJECT_ID}/{HUGECTR_IMAGE_NAME}'
HUGECTR_DOCKERNAME = 'hugectr'

## 2. Set Pipeline Configurations

In [None]:
os.environ['PROJECT_ID'] = PROJECT_ID
os.environ['REGION'] = REGION
os.environ['BUCKET'] = BUCKET
os.environ['WORKSPACE'] = WORKSPACE

os.environ['BQ_DATASET_NAME'] = BQ_DATASET_NAME
os.environ['BQ_LOCATION'] = BQ_LOCATION
os.environ['BQ_TRAIN_TABLE_NAME'] = BQ_TRAIN_TABLE_NAME
os.environ['BQ_VALID_TABLE_NAME'] = BQ_VALID_TABLE_NAME

os.environ['TRAINING_PIPELINE_NAME'] = TRAINING_PIPELINE_NAME
os.environ['MODEL_NAME'] = MODEL_NAME
os.environ['MODEL_VERSION'] = MODEL_VERSION
os.environ['MODEL_DISPLAY_NAME'] = MODEL_DISPLAY_NAME

os.environ['NVT_IMAGE_URI'] = NVT_IMAGE_URI
os.environ['HUGECTR_ITMAGE_URI'] = HUGECTR_IMAGE_URI

os.environ['MEMORY_LIMIT'] = '120G'
os.environ['CPU_LIMIT'] = '32'
os.environ['GPU_LIMIT'] = '4'
os.environ['GPU_TYPE'] = 'nvidia-tesla-t4'

os.environ['MACHINE_TYPE'] = 'a2-highgpu-4g'
os.environ['ACCELERATOR_TYPE'] = 'NVIDIA_TESLA_A100'
os.environ['ACCELERATOR_NUM'] = '4'
os.environ['NUM_WORKERS'] = '12'

os.environ['NUM_SLOTS'] = '26'
os.environ['MAX_NNZ'] = '2'
os.environ['EMBEDDING_VECTOR_SIZE'] = '11'
os.environ['MAX_BATCH_SIZE'] = '64'
os.environ['MODEL_REPOSITORY_PATH'] = '/models'

## 3. Build Pipeline Container Images

In [None]:
FILE_LOCATION = './src'
! gcloud builds submit --config src/cloudbuild.yaml --substitutions _DOCKERNAME=$NVT_DOCKERNAME,_IMAGE_URI=$NVT_IMAGE_URI,_FILE_LOCATION=$FILE_LOCATION

In [None]:
FILE_LOCATION = './src'
! gcloud builds submit --config src/cloudbuild.yaml --substitutions _DOCKERNAME=$HUGECTR_DOCKERNAME,_IMAGE_URI=$HUGECTR_IMAGE_URI,_FILE_LOCATION=$FILE_LOCATION

## 4. Configure pipeline parameters

In [None]:
NUM_EPOCHS = 0
MAX_ITERATIONS = 50000
EVAL_INTERVAL = 1000
EVAL_BATCHES = 500
EVAL_BATCHES_FINAL = 2500
DISPLAY_INTERVAL = 200
SNAPSHOT_INTERVAL = 0
PER_GPU_BATCHSIZE = 2048
LR = 0.001
DROPOUT_RATE = 0.5

In [None]:
parameter_values = {
    'shuffle': json.dumps(None), # select PER_PARTITION, PER_WORKER, FULL, or None.
    'per_gpu_batch_size': PER_GPU_BATCHSIZE,
    'max_iter': MAX_ITERATIONS,
    'max_eval_batches': EVAL_BATCHES ,
    'eval_batches': EVAL_BATCHES_FINAL ,
    'dropout_rate': DROPOUT_RATE,
    'lr': LR ,
    'num_epochs': NUM_EPOCHS,
    'eval_interval': EVAL_INTERVAL,
    'snapshot': SNAPSHOT_INTERVAL,
    'display_interval': DISPLAY_INTERVAL
}

## 5. Compile KFP pipeline

In [None]:
from src.pipelines.training_pipelines import training_bq

compiled_pipeline_path = 'merlin_training_bq.json'
compiler.Compiler().compile(
       pipeline_func=training_bq,
       package_path=compiled_pipeline_path
)

## 6. Submit pipeline to Vertex AI

In [None]:
job_name = f'merlin_training_bq_{datetime.now().strftime("%Y%m%d%H%M%S")}'

pipeline_job = vertex_ai.PipelineJob(
    display_name=job_name,
    template_path=compiled_pipeline_path,
    enable_caching=False,
    parameter_values=parameter_values,
)

pipeline_job.submit()