<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# HugeCTR Python Interface

## Overview

In HugeCTR version 2.3, we've integrated the Python interface, which supports setting data source and model oversubscribing during training. This notebook explains how to access and use the HugeCTR Python interface. For more details of the usage of Python API, please refer to [HugeCTR Python Interface](../docs/python_interface.md).

## Table of Contents
1. [Build the HugeCTR Python Interface](#1)
1. [Wide & Deep Demo](#2)
1. [API Signatures](#3)

<a id="1"></a>
## 1. Access the HugeCTR Python Interface

1. Please make sure that you have started the notebook inside the running NGC docker container: `nvcr.io/nvidia/merlin/merlin-training:0.5`.

   A dynamic link to the `hugectr.so` library is installed to the system path `/usr/local/hugectr/lib/`. Besides, this system path is added to the environment variable `PYTHONPATH`, which means that you can use the Python interface within the docker container environment. Check the dynamic link with the following command:

In [None]:
!ls /usr/local/hugectr/lib

2. Import HugeCTR, in order to train your model with Python as shown here:

In [None]:
import hugectr

<a id="2"></a>
## 2. Wide & Deep Demo

### 2.1 Download and Preprocess Data
1. Go to [this link](https://ailab.criteo.com/download-criteo-1tb-click-logs-dataset/), and download one of 24 files into the directory `${project_root}/tools`, or execute the following command:
   ```
   $ cd ${project_root}/tools
   $ wget http://azuremlsampleexperiments.blob.core.windows.net/criteo/day_1.gz
   ```
2. Extract the dataset using the following command:
   ```shell
   $ tar zxvf day_1.gz
   ```

3. Preprocess the data using `tools/preprocess.sh`:
   ```shell
   $ bash tools/preprocess.sh 1 wdl_data pandas 1 1 100 # using pandas to preprocess the data
   ```
   The command will run about half an hour, finally generate a folder named wdl_data containing preprocessed data. For more detailed usage of `tools/preprocess.sh`, please refer the #Preprocessing by Pandas# in [samples/wdl](../samples/wdl/README.md)

### 2.2 Train from scratch

We can train fom scratch and store the trained dense model and embedding tables in model files by doing the following: 

1. Create a JSON file for the W&D model. 
   **NOTE**: Please note that the solver clause no longer needs to be added to the JSON file when using the Python interface. Instead, you can configure the parameters using `hugectr.solver_parser_helper()` directly in the Python interface.

In [1]:
%%writefile wdl_1gpu.json
{
  "optimizer": {
    "type": "Adam",
    "update_type": "Global",
    "adam_hparam": {
      "learning_rate": 0.001,
      "beta1": 0.9,
      "beta2": 0.999,
      "epsilon": 0.0000001
    }
  },
  "layers": [
    {
      "name": "data",
      "type": "Data",
      "source": "wdl_data/file_list.0.txt",
      "eval_source": "wdl_data/file_list.8.txt",
      "check": "Sum",
      "label": {
        "top": "label",
        "label_dim": 1
      },
      "dense": {
        "top": "dense",
        "dense_dim": 13
      },
      "sparse": [
        {
          "top": "wide_data",
          "type": "DistributedSlot",
          "max_feature_num_per_sample": 30,
          "slot_num": 1
        },
        {
          "top": "deep_data",
          "type": "DistributedSlot",
          "max_feature_num_per_sample": 30,
          "slot_num": 26
        }
      ]
    },
    {
      "name": "sparse_embedding2",
      "type": "DistributedSlotSparseEmbeddingHash",
      "bottom": "wide_data",
      "top": "sparse_embedding2",
      "sparse_embedding_hparam": {
        "max_vocabulary_size_per_gpu": 5863985,
        "embedding_vec_size": 1,
        "combiner": 0
      }
    },
    {
      "name": "sparse_embedding1",
      "type": "DistributedSlotSparseEmbeddingHash",
      "bottom": "deep_data",
      "top": "sparse_embedding1",
      "sparse_embedding_hparam": {
        "max_vocabulary_size_per_gpu": 5863985,
        "embedding_vec_size": 16,
        "combiner": 0
      }
    },
    {
      "name": "reshape1",
      "type": "Reshape",
      "bottom": "sparse_embedding1",
      "top": "reshape1",
      "leading_dim": 416
    },
    {
      "name": "reshape2",
      "type": "Reshape",
      "bottom": "sparse_embedding2",
      "top": "reshape2",
      "leading_dim": 1
    },
    {
      "name": "concat1",
      "type": "Concat",
      "bottom": [
        "reshape1",
        "dense"
      ],
      "top": "concat1"
    },
    {
      "name": "fc1",
      "type": "InnerProduct",
      "bottom": "concat1",
      "top": "fc1",
      "fc_param": {
        "num_output": 1024
      }
    },
    {
      "name": "relu1",
      "type": "ReLU",
      "bottom": "fc1",
      "top": "relu1"
    },
    {
      "name": "dropout1",
      "type": "Dropout",
      "rate": 0.5,
      "bottom": "relu1",
      "top": "dropout1"
    },
    {
      "name": "fc2",
      "type": "InnerProduct",
      "bottom": "dropout1",
      "top": "fc2",
      "fc_param": {
        "num_output": 1024
      }
    },
    {
      "name": "relu2",
      "type": "ReLU",
      "bottom": "fc2",
      "top": "relu2"
    },
    {
      "name": "dropout2",
      "type": "Dropout",
      "rate": 0.5,
      "bottom": "relu2",
      "top": "dropout2"
    },
    {
      "name": "fc4",
      "type": "InnerProduct",
      "bottom": "dropout2",
      "top": "fc4",
      "fc_param": {
        "num_output": 1
      }
    },
    {
      "name": "add1",
      "type": "Add",
      "bottom": [
        "fc4",
        "reshape2"
      ],
      "top": "add1"
    },
    {
      "name": "loss",
      "type": "BinaryCrossEntropyLoss",
      "bottom": [
        "add1",
        "label"
      ],
      "top": "loss"
    }
  ]
}

Writing wdl_1gpu.json


2. Write the Python script. 
   Ensure that the `repeat_dataset` parameter is set to `False` within the script, which indicates that the file list needs to be specified before submitting the sess.train() or sess.evaluation() calls. Additionally, be sure to create a write-enabled directory for storing the temporary files for model oversubscribing.

In [2]:
%%writefile wdl_from_scratch.py
from hugectr import Session, solver_parser_helper
import sys
from mpi4py import MPI

def train_from_scratch(json_file):
  dataset = [("./wdl_data/file_list."+str(i)+".txt", "./wdl_data/file_list."+str(i)+".keyset") for i in range(8)]
  solver_config = solver_parser_helper(seed = 0,
                                     batchsize = 16384,
                                     batchsize_eval =16384,
                                     model_file = "",
                                     embedding_files = [],
                                     vvgpu = [[0]],
                                     use_mixed_precision = False,
                                     scaler = 1.0,
                                     i64_input_key = False,
                                     use_algorithm_search = True,
                                     use_cuda_graph = True,
                                     repeat_dataset = False
                                    )
  sess = Session(solver_config, json_file, True, "./temp_embedding")
  data_reader_train = sess.get_data_reader_train()
  data_reader_eval = sess.get_data_reader_eval()
  data_reader_eval.set_source("./wdl_data/file_list.8.txt")
  model_oversubscriber = sess.get_model_oversubscriber()
  iteration = 0
  for file_list, keyset_file in dataset:
    data_reader_train.set_source(file_list)
    model_oversubscriber.update(keyset_file)
    while True:
      good = sess.train()
      if good == False:
        break
      if iteration % 100 == 0:
        sess.check_overflow()
        sess.copy_weights_for_evaluation()
        data_reader_eval = sess.get_data_reader_eval()
        good_eval = True
        j = 0
        while good_eval:
          if j >= solver_config.max_eval_batches:
            break
          good_eval = sess.eval()
          j += 1
        if good_eval == False:
          data_reader_eval.set_source()
        metrics = sess.get_eval_metrics()
        print("[HUGECTR][INFO] iter: {}, metrics: {}".format(iteration, metrics))
      iteration += 1
    print("[HUGECTR][INFO] trained with data in {}".format(file_list))
  sess.download_params_to_files("./", iteration)

if __name__ == "__main__":
  json_file = sys.argv[1]
  train_from_scratch(json_file)

Writing wdl_from_scratch.py


In [3]:
%%writefile wdl_from_scratch.sh
mkdir -p temp_embedding && \
python3 wdl_from_scratch.py wdl_1gpu.json

Writing wdl_from_scratch.sh


In [1]:
!bash wdl_from_scratch.sh

[02d01h44m32s][HUGECTR][INFO]: Global seed is 3078712038
[02d01h44m35s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
Device 0: GeForce RTX 2080 Ti
[02d01h44m35s][HUGECTR][INFO]: cache_eval_data is not specified using default: 0
[02d01h44m35s][HUGECTR][INFO]: num_workers is not specified using default: 12
[02d01h44m35s][HUGECTR][INFO]: num of DataReader workers: 12
[02d01h44m35s][HUGECTR][INFO]: max_nnz is not specified using default: 30
[02d01h44m35s][HUGECTR][INFO]: max_nnz is not specified using default: 30
[02d01h44m35s][HUGECTR][INFO]: num_internal_buffers 1
[02d01h44m35s][HUGECTR][INFO]: num_internal_buffers 1
[02d01h44m35s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=5863985
[02d01h44m35s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=5863985
[02d01h45m16s][HUGECTR][INFO]: Traning from scratch, no snapshot file specified
[02d01h45m16s][HUGECTR][INFO]: Write hash table <key,value> pairs to file
[02d01h45m16s][HUGECTR][INFO]: Write hash table <key,value> pairs to fi

### 2.3 Train from stored model

Check the stored model files that will be used in the training. Dense model file embeddings should be passed to the respective model_file and embedding_files when calling `sess.solver_parser_helper()`. We will use the same JSON file and training data as the previous section. Also, all the other configurations for `solver_parser_helper` will also be the same.

In [1]:
!ls *.model

0_sparse_2016.model  1_sparse_2016.model  _dense_2016.model


In [5]:
%%writefile wdl_from_stored.py
from hugectr import Session, solver_parser_helper
import sys
from mpi4py import MPI

def train_from_stored(json_file):
  dataset = [("./wdl_data/file_list."+str(i)+".txt", "./wdl_data/file_list."+str(i)+".keyset") for i in range(8)]
  solver_config = solver_parser_helper(seed = 0,
                                     batchsize = 16384,
                                     batchsize_eval =16384,
                                     model_file = "_dense_2016.model",
                                     embedding_files = ["0_sparse_2016.model", "1_sparse_2016.model"],
                                     vvgpu = [[0]],
                                     use_mixed_precision = False,
                                     scaler = 1.0,
                                     i64_input_key = False,
                                     use_algorithm_search = True,
                                     use_cuda_graph = True,
                                     repeat_dataset = False
                                    )
  sess = Session(solver_config, json_file, True, "./temp_embedding")
  data_reader_train = sess.get_data_reader_train()
  data_reader_eval = sess.get_data_reader_eval()
  data_reader_eval.set_source("./wdl_data/file_list.8.txt")
  model_oversubscriber = sess.get_model_oversubscriber()
  iteration = 1260
  for file_list, keyset_file in dataset:
    data_reader_train.set_source(file_list)
    model_oversubscriber.update(keyset_file)
    while True:
      good = sess.train()
      if good == False:
        break
      if iteration % 100 == 0:
        sess.check_overflow()
        sess.copy_weights_for_evaluation()
        data_reader_eval = sess.get_data_reader_eval()
        good_eval = True
        j = 0
        while good_eval:
          if j >= solver_config.max_eval_batches:
            break
          good_eval = sess.eval()
          j += 1
        if good_eval == False:
          data_reader_eval.set_source()
        metrics = sess.get_eval_metrics()
        print("[HUGECTR][INFO] iter: {}, metrics: {}".format(iteration, metrics))
      iteration += 1
    print("[HUGECTR][INFO] trained with data in {}".format(file_list))
  sess.download_params_to_files("./", iteration)

if __name__ == "__main__":
  json_file = sys.argv[1]
  train_from_stored(json_file)

Overwriting wdl_from_stored.py


In [3]:
%%writefile wdl_from_stored.sh
mkdir -p temp_embedding && \
python3 wdl_from_stored.py wdl_1gpu.json

Writing wdl_from_stored.sh


In [6]:
!bash wdl_from_stored.sh

[01d13h17m31s][HUGECTR][INFO]: Global seed is 431843434
[01d13h17m32s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
Device 0: GeForce RTX 2080 Ti
[01d13h17m32s][HUGECTR][INFO]: cache_eval_data is not specified using default: 0
[01d13h17m32s][HUGECTR][INFO]: num_workers is not specified using default: 12
[01d13h17m32s][HUGECTR][INFO]: num of DataReader workers: 12
[01d13h17m32s][HUGECTR][INFO]: max_nnz is not specified using default: 30
[01d13h17m32s][HUGECTR][INFO]: max_nnz is not specified using default: 30
[01d13h17m32s][HUGECTR][INFO]: num_internal_buffers 1
[01d13h17m32s][HUGECTR][INFO]: num_internal_buffers 1
[01d13h17m37s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=5863985
[01d13h17m37s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=5863985
Loading dense model: _dense_2016.model
[01d13h18m24s][HUGECTR][INFO]: Write hash table <key,value> pairs to file
[01d13h18m24s][HUGECTR][INFO]: Write hash table <key,value> pairs to file
[01d13h18m25s][HUGECTR][INFO]: Start to

<a id="3"></a>
## 3. API Signatures

Here's list of all the API signatures within the HugeCTR Python interface that you need to get familiar with to successfully train your own model. As you can see from the above example, we've included `Session`, `DataReader`, `ModelPrefetcher` and `solver_parser_helper`.

**Session**
```bash
class Session(pybind11_builtins.pybind11_object)
 |  Method resolution order:
 |      Session
 |      pybind11_builtins.pybind11_object
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  check_overflow(...)
 |      check_overflow(self: hugectr.Session) -> None
 |
 |  copy_weights_for_evaluation(...)
 |      copy_weights_for_evaluation(self: hugectr.Session) -> None
 |
 |  download_params_to_files(...)
 |      download_params_to_files(self: hugectr.Session, prefix: str, iter: int) -> hugectr
.Error_t
 |
 |  eval(...)
 |      eval(self: hugectr.Session) -> bool
 |
 |  get_current_loss(...)
 |      get_current_loss(self: hugectr.Session) -> float
 |
 |  get_model_oversubscriber(...)
 |      get_model_oversubscriber(self: hugectr.Session) -> hugectr.ModelOversubscriber
 |
 |  get_params_num(...)
 |      get_params_num(self: hugectr.Session) -> int
 |
 |  init_params(...)
 |      init_params(self: hugectr.Session, model_file: str) -> hugectr.Error_t
 |
 |  set_learning_rate(...)
 |      set_learning_rate(self: hugectr.Session, lr: float) -> hugectr.Error_t
 |
 |  start_data_reading(...)
 |      start_data_reading(self: hugectr.Session) -> None
 |
 |  train(...)
 |      train(self: hugectr.Session) -> bool
```

**DataReader**
```bash
class DataReader32(IDataReader)
 |  Method resolution order:
 |      DataReader32
 |      IDataReader
 |      pybind11_builtins.pybind11_object
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  __init__(...)
 |      __init__(self: hugectr.DataReader32, batchsize: int, label_dim: int, dense_dim: in
t, params: List[hugectr.DataReaderSparseParam], resource_manager: hugectr.ResourceManager,
 repeat: bool, num_chunk_threads: int, use_mixed_precision: bool, cache_num_iters: int) ->
 None
 |
 |  create_drwg_norm(...)
 |      create_drwg_norm(self: hugectr.DataReader32, file_list: str, Check_t: hugectr.Chec
k_t, start_reading_from_beginning: bool=True) -> None
 |
 |  create_drwg_parquet(...)
 |      create_drwg_parquet(self: hugectr.DataReader32, file_list: str, slot_offset: List[
int], start_reading_from_beginning: bool=True) -> None
 |
 |  create_drwg_raw(...)
 |      create_drwg_raw(self: hugectr.DataReader32, file_name: str, num_samples: int, slot
_offset: List[int], float_label_dense: bool, data_shuffle: bool=False, start_reading_from_
beginning: bool=True) -> None
 |
 |  get_dense_tensors(...)
 |      get_dense_tensors(self: hugectr.DataReader32) -> List[HugeCTR::TensorBag2]
 |
 |  get_label_tensors(...)
 |      get_label_tensors(self: hugectr.DataReader32) -> List[HugeCTR::Tensor2<float>]
 |
 |  get_row_offsets_tensors(...)
 |      get_row_offsets_tensors(*args, **kwargs)
 |      Overloaded function.
 |
 |      1. get_row_offsets_tensors(self: hugectr.DataReader32) -> List[HugeCTR::Tensor2<un
signed int>]
 |
 |      2. get_value_tensors(self: hugectr.DataReader32, param_id: int) -> List[HugeCTR::T
ensor2<unsigned int>]
 |
 |  read_a_batch_to_device(...)
 |      read_a_batch_to_device(self: hugectr.DataReader32) -> int
 |
 |  read_a_batch_to_device_delay_release(...)
 |      read_a_batch_to_device_delay_release(self: hugectr.DataReader32) -> int
 |
 |  ready_to_collect(...)
 |      ready_to_collect(self: hugectr.DataReader32) -> None
 |
 |  set_source(...)
 |      set_source(self: hugectr.DataReader32, file_name: str='') -> None
 |
 |  start(...)
 |      start(self: hugectr.DataReader32) -> None

class DataReader64(IDataReader)
 |  Method resolution order:
 |      DataReader64
 |      IDataReader
 |      pybind11_builtins.pybind11_object
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  create_drwg_norm(...)
 |      create_drwg_norm(self: hugectr.DataReader64, file_list: str, Check_t: hugectr.Chec
k_t, start_reading_from_beginning: bool=True) -> None
 |
 |  create_drwg_parquet(...)
 |      create_drwg_parquet(self: hugectr.DataReader64, file_list: str, slot_offset: List[
int], start_reading_from_beginning: bool=True) -> None
 |
 |  create_drwg_raw(...)
 |      create_drwg_raw(self: hugectr.DataReader64, file_name: str, num_samples: int, slot
_offset: List[int], float_label_dense: bool, data_shuffle: bool=False, start_reading_from_
beginning: bool=True) -> None
 |
 |  get_dense_tensors(...)
 |      get_dense_tensors(self: hugectr.DataReader64) -> List[HugeCTR::TensorBag2]
 |
 |  get_label_tensors(...)
 |      get_label_tensors(self: hugectr.DataReader64) -> List[HugeCTR::Tensor2<float>]
 |
 |  get_row_offsets_tensors(...)
 |      get_row_offsets_tensors(*args, **kwargs)
 |      Overloaded function.
 |
 |      1. get_row_offsets_tensors(self: hugectr.DataReader64) -> List[HugeCTR::Tensor2<lo
ng long>]
 |
 |      2. get_row_offsets_tensors(self: hugectr.DataReader64, param_id: int) -> List[Huge
CTR::Tensor2<long long>]
 |
 |  get_value_tensors(...)
 |      get_value_tensors(*args, **kwargs)
 |      Overloaded function.
 |
 |      1. get_value_tensors(self: hugectr.DataReader64) -> List[HugeCTR::Tensor2<long lon
g>]
 |
 |      2. get_value_tensors(self: hugectr.DataReader64, param_id: int) -> List[HugeCTR::T
ensor2<long long>]
 |
 |  read_a_batch_to_device(...)
 |      read_a_batch_to_device(self: hugectr.DataReader64) -> int
 |
 |  read_a_batch_to_device_delay_release(...)
 |      read_a_batch_to_device_delay_release(self: hugectr.DataReader64) -> int
 |
 |  ready_to_collect(...)
 |      ready_to_collect(self: hugectr.DataReader64) -> None
 |
 |  set_source(...)
 |      set_source(self: hugectr.DataReader64, file_name: str='') -> None
 |
 |  start(...)
 |      start(self: hugectr.DataReader64) -> None
```

**ModelPrefetcher**
```bash
class ModelOversubscriber(pybind11_builtins.pybind11_object)
 |  Method resolution order:
 |      ModelOversubscriber
 |      pybind11_builtins.pybind11_object
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  store(...)
 |      store(self: hugectr.ModelOversubscriber, snapshot_file_list: List[str]) -> None
 |
 |  update(...)
 |      update(*args, **kwargs)
 |      Overloaded function.
 |
 |      1. update(self: hugectr.ModelOversubscriber, keyset_file: str) -> None
 |
 |      2. update(self: hugectr.ModelOversubscriber, keyset_file_list: List[str]) -> None
 ```
 
 **solver_parser_helper**
 ```bash
 solver_parser_helper(...) method of builtins.PyCapsule instance
    solver_parser_helper(seed: int=0, max_eval_batches: int=100, batchsize_eval: int=2048,
 batchsize: int=2048, model_file: str='', embedding_files: List[str]=[], vvgpu: List[List[
int]]=[[0]], use_mixed_precision: bool=False, enable_tf32_compute: bool=False, scaler: flo
at=1.0, i64_input_key: bool=False, use_algorithm_search: bool=True, use_cuda_graph: bool=T
rue, repeat_dataset: bool=True, max_iter: int=0, num_epochs: int=0, display: int=200, snap
shot: int=10000, eval_interval: int=1000, use_model_oversubscriber: bool=False, temp_embed
ding_dir: str='./') -> hugectr.SolverParser
 ```