<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# HugeCTR Inference

## Overview

In HugeCTR version 3.0, we have supported inference and added it to the Python interface. This notebook explains how to make inference with trained HugeCTR models in Python. For more details of the usage of Python API, please refer to [HugeCTR Python Interface](../docs/python_interface.md).

## Table of Contents
1. [Build the HugeCTR Python Interface](#1)
1. [DCN Inference Demo](#2)
1. [API Signatures](#3)

<a id="1"></a>
## 1. Access the HugeCTR Python Interface

1. Please make sure that you have started the notebook inside the running NGC docker container: `nvcr.io/nvidia/merlin/merlin-training:0.5`.

   A dynamic link to the `hugectr.so` library is installed to the system path `/usr/local/hugectr/lib/`. Besides, this system path is added to the environment variable `PYTHONPATH`, which means that you can use the Python interface within the docker container environment. Check the dynamic link with the following command:

In [1]:
!ls /usr/local/hugectr/lib

hugectr.so  libgmock_main.a  libgtest_main.a	    libhuge_ctr_static.a
libgmock.a  libgtest.a	     libhuge_ctr_shared.so


2. Import HugeCTR, in order to train your model and to make inference with Python as shown here:

In [2]:
import hugectr

<a id="2"></a>
## 2. DCN Inference Demo

### 2.1 Download and Preprocess Data
1. Download the Kaggle Criteo dataset using the following command:
   ```shell
   $ cd ${project_root}/tools
   $ wget http://azuremlsampleexperiments.blob.core.windows.net/criteo/day_1.gz
   ```

   In preprocessing, we will further reduce the amounts of data to speedup the preprocessing, fill missing values, remove the feature values whose occurrences are very rare, etc. Here we choose pandas preprocessing method to make the dataset ready for HugeCTR training.

2. Preprocessing by Pandas using the following command:
   ```shell
   $ bash preprocess.sh 1 dcn_data pandas 1 0
   ```
   
   The first argument represents the dataset postfix. It is 1 here since day_1 is used. The second argument dcn_data is where the preprocessed data is stored. The fourth arguement (one after pandas) 1 embodies that the normalization is applied to dense features. The last argument 0 means that the feature crossing is not applied.

### 2.2 DCN Model Training

We can train fom scratch and store the trained dense model and embedding table in model files by doing the following: 

1. Create a JSON file for the DCN model. 
   **NOTE**: Please note that the solver clause no longer needs to be added to the JSON file when using the Python interface. Instead, you can configure the parameters using `hugectr.solver_parser_helper()` directly in the Python interface.

In [3]:
%%writefile dcn_train.json
{
  "optimizer": {
    "type": "Adam",
    "update_type": "Global",
    "adam_hparam": {
      "learning_rate": 0.001,
      "beta1": 0.9,
      "beta2": 0.999,
      "epsilon": 0.0000001
    }
  },
  "layers": [
    {
      "name": "data",
      "type": "Data",
      "source": "./dcn_data/file_list.txt",
      "eval_source": "./dcn_data/file_list_test.txt",
      "check": "Sum",
      "label": {
        "top": "label",
        "label_dim": 1
      },
      "dense": {
        "top": "dense",
        "dense_dim": 13
      },
      "sparse": [
        {
          "top": "data1",
          "type": "DistributedSlot",
          "max_feature_num_per_sample": 30,
          "slot_num": 26
        }
      ]
    },
    {
      "name": "sparse_embedding1",
      "type": "DistributedSlotSparseEmbeddingHash",
      "bottom": "data1",
      "top": "sparse_embedding1",
      "sparse_embedding_hparam": {
        "max_vocabulary_size_per_gpu": 1447751,
        "embedding_vec_size": 16,
        "combiner": 0
      }
    },
    {
      "name": "reshape1",
      "type": "Reshape",
      "bottom": "sparse_embedding1",
      "top": "reshape1",
      "leading_dim": 416
    },
    {
      "name": "concat1",
      "type": "Concat",
      "bottom": [
        "reshape1",
        "dense"
      ],
      "top": "concat1"
    },
    {
      "name": "slice1",
      "type": "Slice",
      "bottom": "concat1",
      "ranges": [
        [
          0,
          429
        ],
        [
          0,
          429
        ]
      ],
      "top": [
        "slice11",
        "slice12"
      ]
    },
    {
      "name": "multicross1",
      "type": "MultiCross",
      "bottom": "slice11",
      "top": "multicross1",
      "mc_param": {
        "num_layers": 6
      }
    },
    {
      "name": "fc1",
      "type": "InnerProduct",
      "bottom": "slice12",
      "top": "fc1",
      "fc_param": {
        "num_output": 1024
      }
    },
    {
      "name": "relu1",
      "type": "ReLU",
      "bottom": "fc1",
      "top": "relu1"
    },
    {
      "name": "dropout1",
      "type": "Dropout",
      "rate": 0.5,
      "bottom": "relu1",
      "top": "dropout1"
    },
    {
      "name": "fc2",
      "type": "InnerProduct",
      "bottom": "dropout1",
      "top": "fc2",
      "fc_param": {
        "num_output": 1024
      }
    },
    {
      "name": "relu2",
      "type": "ReLU",
      "bottom": "fc2",
      "top": "relu2"
    },
    {
      "name": "dropout2",
      "type": "Dropout",
      "rate": 0.5,
      "bottom": "relu2",
      "top": "dropout2"
    },
    {
      "name": "concat2",
      "type": "Concat",
      "bottom": [
        "dropout2",
        "multicross1"
      ],
      "top": "concat2"
    },
    {
      "name": "fc4",
      "type": "InnerProduct",
      "bottom": "concat2",
      "top": "fc4",
      "fc_param": {
        "num_output": 1
      }
    },
    {
      "name": "loss",
      "type": "BinaryCrossEntropyLoss",
      "bottom": [
        "fc4",
        "label"
      ],
      "top": "loss"
    }
  ]
}


Writing dcn_train.json


2. Write the Python script for training.

In [4]:
%%writefile dcn_train.py
from hugectr import Session, solver_parser_helper
import sys
from mpi4py import MPI

def dcn_train(json_file):
  solver_config = solver_parser_helper(seed = 0,
                                     batchsize = 2048,
                                     batchsize_eval =2048,
                                     model_file = "",
                                     embedding_files = [],
                                     vvgpu = [[0]],
                                     use_mixed_precision = False,
                                     scaler = 1.0,
                                     i64_input_key = False,
                                     use_algorithm_search = True,
                                     use_cuda_graph = True,
                                     repeat_dataset = True
                                    )
  sess = Session(solver_config, json_file)
  sess.start_data_reading()
  for i in range(10000):
    sess.train()
    if (i%200 == 0):
      loss = sess.get_current_loss()
      print("[HUGECTR][INFO] iter: {}; loss: {}".format(i, loss))
    if (i%1000 == 0 and i != 0):
      sess.check_overflow()
      sess.copy_weights_for_evaluation()
      data_reader_eval = sess.get_data_reader_eval()
      for _ in range(solver_config.max_eval_batches):
        sess.eval()
      metrics = sess.get_eval_metrics()
      print("[HUGECTR][INFO] iter: {}, {}".format(i, metrics))
  sess.download_params_to_files("./", i+1)
  return

if __name__ == "__main__":
  json_file = sys.argv[1]
  dcn_train(json_file)

Writing dcn_train.py


In [5]:
%%writefile dcn_train.sh
cd ../tools && \
python3 ../notebooks/dcn_train.py ../notebooks/dcn_train.json

Writing dcn_train.sh


In [6]:
!bash dcn_train.sh

[29d12h39m32s][HUGECTR][INFO]: Global seed is 772789275
[29d12h39m33s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
Device 0: GeForce RTX 2080 Ti
[29d12h39m33s][HUGECTR][INFO]: cache_eval_data is not specified using default: 0
[29d12h39m33s][HUGECTR][INFO]: num_workers is not specified using default: 12
[29d12h39m33s][HUGECTR][INFO]: num of DataReader workers: 12
[29d12h39m33s][HUGECTR][INFO]: max_nnz is not specified using default: 30
[29d12h39m33s][HUGECTR][INFO]: num_internal_buffers 1
[29d12h39m33s][HUGECTR][INFO]: num_internal_buffers 1
[29d12h39m33s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=1447751
[29d12h39m40s][HUGECTR][INFO]: gpu0 start to init embedding
[29d12h39m40s][HUGECTR][INFO]: gpu0 init embedding done
[HUGECTR][INFO] iter: 0; loss: 0.8440738916397095
[HUGECTR][INFO] iter: 200; loss: 0.14668972790241241
[HUGECTR][INFO] iter: 400; loss: 0.14474165439605713
[HUGECTR][INFO] iter: 600; loss: 0.1524054855108261
[HUGECTR][INFO] iter: 800; loss: 0.119324915

### 2.3 DCN Model Inference

1. Create a JSON file for DCN model inference

Check the stored model files that will be used in the inference, and create the JSON file for inference. We should remove the solver and optimizer clauses and add the inference clause in the JSON file. The paths of the stored dense model and sparse model(s) should be specified at "dense_model_file" and "sparse_model_file" within the inference clause. We need to make some modifications to "data" in the layers clause. Besides, we need to change the last layer from BinaryCrossEntropyLoss to Sigmoid. The rest of "layers" should be exactly the same as that in the training JSON file.

In [7]:
!ls ../tools/*.model

../tools/0_sparse_10000.model  ../tools/_dense_10000.model


In [8]:
%%writefile dcn_inference.json
{
  "inference": {
    "max_batchsize": 4096,
    "dense_model_file": "../tools/_dense_10000.model",
    "sparse_model_file": "../tools/0_sparse_10000.model"
  },
  "layers": [
    {
      "name": "data",
      "type": "Data",
      "check": "Sum",
      "label": {
        "label_dim": 1
      },
      "dense": {
        "top": "dense",
        "dense_dim": 13
      },
      "sparse": [
        {
          "top": "data1",
          "type": "DistributedSlot",
          "max_feature_num_per_sample": 30,
          "slot_num": 26
        }
      ]
    },
    {
      "name": "sparse_embedding1",
      "type": "DistributedSlotSparseEmbeddingHash",
      "bottom": "data1",
      "top": "sparse_embedding1",
      "sparse_embedding_hparam": {
        "max_vocabulary_size_per_gpu": 1447751,
        "embedding_vec_size": 16,
        "combiner": 0
      }
    },
    {
      "name": "reshape1",
      "type": "Reshape",
      "bottom": "sparse_embedding1",
      "top": "reshape1",
      "leading_dim": 416
    },
    {
      "name": "concat1",
      "type": "Concat",
      "bottom": [
        "reshape1",
        "dense"
      ],
      "top": "concat1"
    },
    {
      "name": "slice1",
      "type": "Slice",
      "bottom": "concat1",
      "ranges": [
        [
          0,
          429
        ],
        [
          0,
          429
        ]
      ],
      "top": [
        "slice11",
        "slice12"
      ]
    },
    {
      "name": "multicross1",
      "type": "MultiCross",
      "bottom": "slice11",
      "top": "multicross1",
      "mc_param": {
        "num_layers": 6
      }
    },
    {
      "name": "fc1",
      "type": "InnerProduct",
      "bottom": "slice12",
      "top": "fc1",
      "fc_param": {
        "num_output": 1024
      }
    },
    {
      "name": "relu1",
      "type": "ReLU",
      "bottom": "fc1",
      "top": "relu1"
    },
    {
      "name": "dropout1",
      "type": "Dropout",
      "rate": 0.5,
      "bottom": "relu1",
      "top": "dropout1"
    },
    {
      "name": "fc2",
      "type": "InnerProduct",
      "bottom": "dropout1",
      "top": "fc2",
      "fc_param": {
        "num_output": 1024
      }
    },
    {
      "name": "relu2",
      "type": "ReLU",
      "bottom": "fc2",
      "top": "relu2"
    },
    {
      "name": "dropout2",
      "type": "Dropout",
      "rate": 0.5,
      "bottom": "relu2",
      "top": "dropout2"
    },
    {
      "name": "concat2",
      "type": "Concat",
      "bottom": [
        "dropout2",
        "multicross1"
      ],
      "top": "concat2"
    },
    {
      "name": "fc4",
      "type": "InnerProduct",
      "bottom": "concat2",
      "top": "fc4",
      "fc_param": {
        "num_output": 1
      }
    },
    {
      "name": "sigmoid",
      "type": "Sigmoid",
      "bottom": "fc4",
      "top": "sigmoid"
    }
  ]
}


Writing dcn_inference.json


2. Convert the criteo data to inference format

The HugeCTR inference is enabled by predict() method of InferenceSession, which requires dense features, embedding columns and row pointers of slots as the input and gives the prediction result as the output. We need to convert the criteo data to inference format first. Please find the details of input format [here](https://github.com/triton-inference-server/hugectr_backend/blob/main/docs/user_guide.md#variant-compressed-sparse-row-input)

In [9]:
!python3 ../tools/criteo_predict/criteo2predict.py --src_csv_path=../tools/dcn_data/val/test.txt --src_config=../tools/criteo_predict/dcn_data.json --dst_path=./dcn_csr.txt --batch_size=4096

 3. Write the Python script for inference.

In [10]:
%%writefile dcn_inference.py
import sys
from mpi4py import MPI
from hugectr.inference import CreateParameterServer, CreateEmbeddingCache, InferenceSession

def dcn_inference(config_file, model_name, data_path):
  # read data from file
  data_file = open(data_path)
  labels = [int(item) for item in data_file.readline().split(' ')]
  dense_features = [float(item) for item in data_file.readline().split(' ')]
  embedding_columns = [int(item) for item in data_file.readline().split(' ')]
  row_ptrs = [int(item) for item in data_file.readline().split(' ')]
  # create parameter server, embedding cache and inference session
  parameter_server = CreateParameterServer([config_file], [model_name], False)
  embedding_cache = CreateEmbeddingCache(parameter_server, 0, True, 0.2, config_file, model_name, False)
  inference_session = InferenceSession(config_file, 0, embedding_cache)
  # make prediction and calculate accuracy
  output = inference_session.predict(dense_features, embedding_columns, row_ptrs)
  accuracy = calculate_accuracy(labels, output)
  print("[HUGECTR][INFO] prediction number samples: {}, accuracy: {}".format(len(labels), accuracy))

def calculate_accuracy(labels, output):
  num_samples = len(labels)
  flags = [1 if ((labels[i] == 0 and output[i] <= 0.5) or (labels[i] == 1 and output[i] > 0.5)) else 0 for i in range(num_samples)]
  correct_samples = sum(flags)
  return float(correct_samples)/float(num_samples)
    
if __name__ == "__main__":
  config_file = sys.argv[1]
  model_name = sys.argv[2]
  data_path = sys.argv[3]
  dcn_inference(config_file, model_name, data_path)

Writing dcn_inference.py


In [11]:
!python3 dcn_inference.py dcn_inference.json DCN dcn_csr.txt

[29d12h41m36s][HUGECTR][INFO]: default_emb_vec_value is not specified using default: 0.000000
[29d12h41m37s][HUGECTR][INFO]: Global seed is 1326185260
[29d12h41m38s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
[29d12h41m38s][HUGECTR][INFO]: algorithm_search is not specified using default: 1
[29d12h41m38s][HUGECTR][INFO]: cuda_graph is not specified using default: 1
[29d12h41m38s][HUGECTR][INFO]: start create embedding for inference
[29d12h41m38s][HUGECTR][INFO]: sparse_input name data1
[29d12h41m38s][HUGECTR][INFO]: create embedding for inference success
[HUGECTR][INFO] prediction number samples: 4096, accuracy: 0.96435546875


<a id="3"></a>
## 3. API Signatures

Here is the list of all the API signatures within the HugeCTR Python interface related to the inference feature. As you can see from the above example, we have included `CreateParameterServer`, `CreateEmbeddingCache`, and `InferenceSession`.

**CreateParameterServer**
```bash
CreateParameterServer(...) method of builtins.PyCapsule instance
    CreateParameterServer(model_config_path: List[str], model_name: List[str], i64_input_key: bool) -> hugectr.inference.ParameterServerBase
```

**CreateEmbeddingCache**
```bash
CreateEmbeddingCache(...) method of builtins.PyCapsule instance
    CreateEmbeddingCache(parameter_server: hugectr.inference.ParameterServerBase, cuda_dev_id: int, use_gpu_embedding_cache: bool, cache_size_percentage: float, model_config_path: str, model_name: str, i64_input_key: bool) -> hugectr.inference.EmbeddingCacheInterface
```

**InferenceSession**
```bash
class InferenceSession(pybind11_builtins.pybind11_object)
 |  Method resolution order:
 |      InferenceSession
 |      pybind11_builtins.pybind11_object
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  __init__(...)
 |      __init__(self: hugectr.inference.InferenceSession, config_file: str, device_id: int, embedding_cache: hugectr.inference.EmbeddingCacheInterface) -> None
 |
 |  predict(...)
 |      predict(*args, **kwargs)
 |      Overloaded function.
 |
 |      1. predict(self: hugectr.inference.InferenceSession, dense_feature: List[float], embeddingcolumns: List[int], row_ptrs: List[int]) -> List[float]
 |
 |      2. predict(self: hugectr.inference.InferenceSession, dense_feature: List[float], embeddingcolumns: List[int], row_ptrs: List[int], i64_input_key: bool) -> List[float]

```