<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# HugeCTR Python Interface

## Overview

HugeCTR is a recommender specific framework which is capable of distributed training across multiple GPUs and nodes for Click-Through-Rate (CTR) estimation. It is a component of NVIDIA [Merlin](https://developer.nvidia.com/nvidia-merlin#getstarted), which is a framework accelerating the entire pipeline from data ingestion and training to deploying GPU-accelerated recommender systems.

The Python interface is incorporated into HugeCTR in version 2.3, which supports setting data source and model oversubscribing during training. This notebook will introduce how to access HugeCTR Python interface and demontrate how to use it.The API signatures of Python interface will also be displayed.

## Content
1. [Build HugeCTR Python Interface](#1)
1. [Wide&Deep Demo](#2)
1. [API Signatures](#3)

<a id="1"></a>
## 1. Build HugeCTR Python Interface

HugeCTR Python Interface takes form of the dynamic library. In order to use Python interface, you should enter the HugeCTR docker container and build HugeCTR using the following commands.
```shell
$ cd hugectr
$ mkdir -p build && cd build
$ cmake -DCMAKE_BUILD_TYPE=Release -DSM=70 .. # Target is NVIDIA V100
$ make -j
```
After building, there will be a dynamic link library `hugectr.so` in the `hugectr/build/lib/` folder 

In [None]:
!ls /hugectr/build/lib

You can copy `hugectr.so` to the folder where you want to use Python interface. You can also install it to `/usr/local/hugectr/lib` and set the environment variable `export PYTHONPATH=/usr/local/hugectr/lib:$PYTHONPATH` if you want to use Python interface in the whole docker container environment. Then you can import hugectr and train your model simply with Python.

In [None]:
!cp /hugectr/build/lib/hugectr.so /hugectr/notebooks/

In [None]:
import hugectr

<a id="2"></a>
## 2. Wide&Deep Demo

### 2.1 Data Download and Preprocess
Download dataset from [Kaggle Criteo datasets](http://labs.criteo.com/2014/02/kaggle-display-advertising-challenge-dataset/)
```shell
$ wget https://s3-eu-west-1.amazonaws.com/kaggle-display-advertising-challenge-dataset/dac.tar.gz
```

Extract the dataset
```shell
$ tar zxvf dac.tar.gz
```

Preprocess the data
```shell
$mkdir wdl_data
$shuf train.txt > train.shuf.txt
$python3 /hugectr/tools/criteo_script/preprocess.py --src_csv_path=train.shuf.txt --dst_csv_path=wdl_data/train.out.txt --normalize_dense=1 --feature_cross=1
```

Split the dataset
```shell
head -n 36672493 wdl_data/train.out.txt > wdl_data/train && \
tail -n 9168124 wdl_data/train.out.txt > wdl_data/valtest && \
head -n 4584062 wdl_data/valtest > wdl_data/val && \
tail -n 4584062 wdl_data/valtest > wdl_data/test
```

Then we can convert the dataset to HugeCTR Norm format. Specifically, we will generate `file_list.*.txt` and `file_list.*.keyset`, together with all training data `*.data` so that we can employ the features of `set source` and `model prefetch` during training.

In [None]:
%%writefile criteo2hugectr.sh
mkdir -p wdl_data_hugectr/wdl_data_bin && \
cd wdl_data_hugectr && \
cp /hugectr/build/bin/criteo2hugectr ./ &&
./criteo2hugectr /wdl_data/train wdl_data_bin/ file_list.txt 2 100

In [None]:
!bash criteo2hugectr.sh

### 2.2 Train from scratch

We can train scratch and store the trained dense model and embedding tables to files. A json file for Wide&Deep model should be created first. Please note that the `solver` clause is no longer needed in json file when using Python interface. Instead, we can configure the parameters using `hugectr.solver_parser_helper()` directly in Python interface.

In [None]:
%%writefile wdl_1gpu.json
{
  "optimizer": {
    "type": "Adam",
    "update_type": "Global",
    "adam_hparam": {
      "learning_rate": 0.001,
      "beta1": 0.9,
      "beta2": 0.999,
      "epsilon": 0.0000001
    }
  },
  "layers": [
    {
      "name": "data",
      "type": "Data",
      "source": "./file_list.0.txt",
      "eval_source": "./file_list.5.txt",
      "check": "Sum",
      "label": {
        "top": "label",
        "label_dim": 1
      },
      "dense": {
        "top": "dense",
        "dense_dim": 13
      },
      "sparse": [
        {
          "top": "wide_data",
          "type": "DistributedSlot",
          "max_feature_num_per_sample": 30,
          "slot_num": 1
        },
        {
          "top": "deep_data",
          "type": "DistributedSlot",
          "max_feature_num_per_sample": 30,
          "slot_num": 26
        }
      ]
    },
    {
      "name": "sparse_embedding2",
      "type": "DistributedSlotSparseEmbeddingHash",
      "bottom": "wide_data",
      "top": "sparse_embedding2",
      "sparse_embedding_hparam": {
        "max_vocabulary_size_per_gpu": 2322444,
        "embedding_vec_size": 1,
        "combiner": 0
      }
    },
    {
      "name": "sparse_embedding1",
      "type": "DistributedSlotSparseEmbeddingHash",
      "bottom": "deep_data",
      "top": "sparse_embedding1",
      "sparse_embedding_hparam": {
        "max_vocabulary_size_per_gpu": 2322444,
        "embedding_vec_size": 16,
        "combiner": 0
      }
    },
    {
      "name": "reshape1",
      "type": "Reshape",
      "bottom": "sparse_embedding1",
      "top": "reshape1",
      "leading_dim": 416
    },
    {
      "name": "reshape2",
      "type": "Reshape",
      "bottom": "sparse_embedding2",
      "top": "reshape2",
      "leading_dim": 1
    },
    {
      "name": "concat1",
      "type": "Concat",
      "bottom": [
        "reshape1",
        "dense"
      ],
      "top": "concat1"
    },
    {
      "name": "fc1",
      "type": "InnerProduct",
      "bottom": "concat1",
      "top": "fc1",
      "fc_param": {
        "num_output": 1024
      }
    },
    {
      "name": "relu1",
      "type": "ReLU",
      "bottom": "fc1",
      "top": "relu1"
    },
    {
      "name": "dropout1",
      "type": "Dropout",
      "rate": 0.5,
      "bottom": "relu1",
      "top": "dropout1"
    },
    {
      "name": "fc2",
      "type": "InnerProduct",
      "bottom": "dropout1",
      "top": "fc2",
      "fc_param": {
        "num_output": 1024
      }
    },
    {
      "name": "relu2",
      "type": "ReLU",
      "bottom": "fc2",
      "top": "relu2"
    },
    {
      "name": "dropout2",
      "type": "Dropout",
      "rate": 0.5,
      "bottom": "relu2",
      "top": "dropout2"
    },
    {
      "name": "fc4",
      "type": "InnerProduct",
      "bottom": "dropout2",
      "top": "fc4",
      "fc_param": {
        "num_output": 1
      }
    },
    {
      "name": "add1",
      "type": "Add",
      "bottom": [
        "fc4",
        "reshape2"
      ],
      "top": "add1"
    },
    {
      "name": "loss",
      "type": "BinaryCrossEntropyLoss",
      "bottom": [
        "add1",
        "label"
      ],
      "top": "loss"
    }
  ]
}

Then we can write the Python script to train a Wide&Deep model from scratch and store the trained model to files. Here we configure `repeat_dataset` to be `False`, which means we have to specify the file list before the first call to `sess.train()` or `sess.evaluation()`. Besides, we have to create a write-enabled directory for storing the temporary files of model oversubscribing.

In [None]:
%%writefile wdl_from_scratch.py
from hugectr import Session, solver_parser_helper
import sys

def train_from_scratch(json_file):
  dataset = [("./file_list."+str(i)+".txt", "./file_list."+str(i)+".keyset") for i in range(5)]
  solver_config = solver_parser_helper(seed = 0,
                                     batchsize = 16384,
                                     batchsize_eval =16384,
                                     model_file = "",
                                     embedding_files = [],
                                     vvgpu = [[0]],
                                     use_mixed_precision = False,
                                     scaler = 1.0,
                                     i64_input_key = False,
                                     use_algorithm_search = True,
                                     use_cuda_graph = True,
                                     repeat_dataset = False
                                    )
  sess = Session(solver_config, json_file, True, "./temp_embedding")
  data_reader_train = sess.get_data_reader_train()
  data_reader_eval = sess.get_data_reader_eval()
  data_reader_eval.set_file_list_source("./file_list.5.txt")
  model_oversubscriber = sess.get_model_oversubscriber()
  iteration = 0
  for file_list, keyset_file in dataset:
    data_reader_train.set_file_list_source(file_list)
    model_oversubscriber.update(keyset_file)
    while True:
      good = sess.train()
      if good == False:
        break
      if iteration % 100 == 0:
        metrics = sess.evaluation()
        print("[HUGECTR][INFO] iter: {}, metrics: {}".format(iteration, metrics))
      iteration += 1
    print("[HUGECTR][INFO] trained with data in {}".format(file_list))
  sess.download_params_to_files("./", iteration)

if __name__ == "__main__":
  json_file = sys.argv[1]
  train_from_scratch(json_file)

In [None]:
%%writefile wdl_from_scratch.sh
cd wdl_data_hugectr && \
mkdir -p temp_embedding && \
python3 ../wdl_from_scratch.py ../wdl_1gpu.json

In [1]:
!bash wdl_from_scratch.sh

[10d11h09m06s][HUGECTR][INFO]: Initial seed is 2834829260
[10d11h09m07s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
Device 0: GeForce RTX 2080 Ti
[10d11h09m07s][HUGECTR][INFO]: cache_eval_data is not specified using default: 0
[10d11h09m07s][HUGECTR][INFO]: max_nnz is not specified using default: 30
[10d11h09m07s][HUGECTR][INFO]: max_nnz is not specified using default: 30
[10d11h09m07s][HUGECTR][INFO]: num_internal_buffers 1
[10d11h09m07s][HUGECTR][INFO]: num_internal_buffers 1
[10d11h09m07s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=2322444
[10d11h09m07s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=2322444
[10d11h09m08s][HUGECTR][INFO]: Traning from scratch, no snapshot file specified
[10d11h09m08s][HUGECTR][INFO]: Write hash table <key,value> pairs to file
[10d11h09m08s][HUGECTR][INFO]: Write hash table <key,value> pairs to file
[10d11h09m08s][HUGECTR][INFO]: Start to upload embedding table file to GPUs, total loop_num: 0
[10d11h09m08s][HUGECTR][INFO]: Start to 

### 2.3 Train from stored model

We should check the stored model files, which will be used in the training. Dense model file embedding model files should be passed to `model_file` and `embedding_files` respectively when calling `sess.solver_parser_helper()`. We will use the same json file and training data as the previous section. Besides, all the other configurations for `solver_parser_helper` will also be the same.

In [2]:
!ls wdl_data_hugectr/*.model

wdl_data_hugectr/0_sparse_1260.model  wdl_data_hugectr/_dense_1260.model
wdl_data_hugectr/1_sparse_1260.model


In [None]:
%%writefile wdl_from_stored.py
from hugectr import Session, solver_parser_helper
import sys

def train_from_stored(json_file):
  dataset = [("./file_list."+str(i)+".txt", "./file_list."+str(i)+".keyset") for i in range(5)]
  solver_config = solver_parser_helper(seed = 0,
                                     batchsize = 16384,
                                     batchsize_eval =16384,
                                     model_file = "_dense_1260.model",
                                     embedding_files = ["0_sparse_1260.model", "1_sparse_1260.model"],
                                     vvgpu = [[0]],
                                     use_mixed_precision = False,
                                     scaler = 1.0,
                                     i64_input_key = False,
                                     use_algorithm_search = True,
                                     use_cuda_graph = True,
                                     repeat_dataset = False
                                    )
  sess = Session(solver_config, json_file, True, "./temp_embedding")
  data_reader_train = sess.get_data_reader_train()
  data_reader_eval = sess.get_data_reader_eval()
  data_reader_eval.set_file_list_source("./file_list.5.txt")
  model_oversubscriber = sess.get_model_oversubscriber()
  iteration = 1260
  for file_list, keyset_file in dataset:
    data_reader_train.set_file_list_source(file_list)
    model_oversubscriber.update(keyset_file)
    while True:
      good = sess.train()
      if good == False:
        break
      if iteration % 100 == 0:
        metrics = sess.evaluation()
        print("[HUGECTR][INFO] iter: {}, metrics: {}".format(iteration, metrics))
      iteration += 1
    print("[HUGECTR][INFO] trained with data in {}".format(file_list))
  sess.download_params_to_files("./", iteration)

if __name__ == "__main__":
  json_file = sys.argv[1]
  train_from_stored(json_file)

In [None]:
%%writefile wdl_from_stored.sh
cd wdl_data_hugectr && \
mkdir -p temp_embedding && \
python3 ../wdl_from_stored.py ../wdl_1gpu.json

In [3]:
!bash wdl_from_stored.sh

[10d11h09m56s][HUGECTR][INFO]: Initial seed is 1957937766
[10d11h09m57s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
Device 0: GeForce RTX 2080 Ti
[10d11h09m57s][HUGECTR][INFO]: cache_eval_data is not specified using default: 0
[10d11h09m57s][HUGECTR][INFO]: max_nnz is not specified using default: 30
[10d11h09m57s][HUGECTR][INFO]: max_nnz is not specified using default: 30
[10d11h09m57s][HUGECTR][INFO]: num_internal_buffers 1
[10d11h09m57s][HUGECTR][INFO]: num_internal_buffers 1
[10d11h09m57s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=2322444
[10d11h09m57s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=2322444
Loading dense model: _dense_1260.model
[10d11h09m58s][HUGECTR][INFO]: Write hash table <key,value> pairs to file
[10d11h09m59s][HUGECTR][INFO]: Write hash table <key,value> pairs to file
[10d11h09m59s][HUGECTR][INFO]: Start to upload embedding table file to GPUs, total loop_num: 250
[10d11h10m00s][HUGECTR][INFO]: Start to upload embedding table file to GPUs, to

<a id="3"></a>
## 3. API Signatures

We will display all the API signatures of HugeCTR Python Interface that you need to get familiar with to train your own model. As you can see from the above example, we will cover `Session`, `DataReader`, `ModelPrefetcher` and `solver_parser_helper`.

**Session**
```bash
class Session(pybind11_builtins.pybind11_object)
 |  Method resolution order:
 |      Session
 |      pybind11_builtins.pybind11_object
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(...)
 |      __init__(self: hugectr.Session, solver_config: hugectr.SolverParser, config_file: str, use_model_oversubscriber: bool=False, temp_embedding_dir: str='') -> None
 |  
 |  check_overflow(...)
 |      check_overflow(self: hugectr.Session) -> None
 |  
 |  download_params_to_files(...)
 |      download_params_to_files(self: hugectr.Session, prefix: str, iter: int) -> hugectr.Error_t
 |  
 |  evaluation(...)
 |      evaluation(self: hugectr.Session) -> List[Tuple[str, float]]
 |  
 |  get_current_loss(...)
 |      get_current_loss(self: hugectr.Session) -> float
 |  
 |  get_data_reader_eval(...)
 |      get_data_reader_eval(self: hugectr.Session) -> hugectr.IDataReader
 |  
 |  get_data_reader_train(...)
 |      get_data_reader_train(self: hugectr.Session) -> hugectr.IDataReader
 |  
 |  get_model_oversubscriber(...)
 |      get_model_oversubscriber(self: hugectr.Session) -> hugectr.ModelPrefetcher
 |  
 |  set_learning_rate(...)
 |      set_learning_rate(self: hugectr.Session, lr: float) -> hugectr.Error_t
 |  
 |  start_data_reading(...)
 |      start_data_reading(self: hugectr.Session) -> None
 |  
 |  train(...)
 |      train(self: hugectr.Session) -> bool
 |  
```

**DataReader**
```bash
class DataReader32(IDataReader)
 |  Method resolution order:
 |      DataReader32
 |      IDataReader
 |      pybind11_builtins.pybind11_object
 |      builtins.object
 |  
 |  Methods defined here:
 |   
 |  set_file_list_source(...)
 |      set_file_list_source(self: hugectr.DataReader32, file_list: str='') -> None
 
class DataReader64(IDataReader)
 |  Method resolution order:
 |      DataReader64
 |      IDataReader
 |      pybind11_builtins.pybind11_object
 |      builtins.object
 |  
 |  Methods defined here:
 |   
 |  set_file_list_source(...)
 |      set_file_list_source(self: hugectr.DataReader64, file_list: str='') -> None
```

**ModelPrefetcher**
```bash
class ModelPrefetcher(pybind11_builtins.pybind11_object)
 |  Method resolution order:
 |      ModelPrefetcher
 |      pybind11_builtins.pybind11_object
 |      builtins.object
 |  
 |  Methods defined here:
 |   
 |  update(...)
 |      update(*args, **kwargs)
 |      Overloaded function.
 |      
 |      1. update(self: hugectr.ModelPrefetcher, keyset_file: str) -> None
 |      
 |      2. update(self: hugectr.ModelPrefetcher, keyset_file_list: List[str]) -> None
 ```
 
 **solver_parser_helper**
 ```bash
 solver_parser_helper(...) method of builtins.PyCapsule instance
    solver_parser_helper(seed: int=0, batchsize_eval: int=16384, batchsize: int=16384, model_file: str='', embedding_files: List[str]=[], vvgpu: List[List[int]]=[[0]], use_mixed_precision: bool=False, scaler: float=1.0, i64_input_key: bool=False, use_algorithm_search: bool=True, use_cuda_graph: bool=True, repeat_dataset: bool=False) -> hugectr.SolverParser
 ```