HugeCTR is a recommender specific framework which is capable of distributed training across multiple GPUs and nodes for Click-Through-Rate (CTR) estimation. It is a component of NVIDIA Merlin, which is a framework accelerating the entire pipeline from data ingestion and training to deploying GPU-accelerated recommender systems.
Design Goals:
- Fast: it's a speed-of-light CTR training framework;
- Dedicated: we consider everything you need in CTR training;
- Easy: you can start your work now, no matter if you are a data scientist, a learner, or a developer.
HugeCTR version 2.2.1 is a minor update to v2.2, which includes the Parquet data support, the sample preprocessing script rewritten in nvTabular, etc. Find the full list of changes here.
In HugeCTR version 2.2, we add the new features like full fp16 pipeline, algorithm search, AUC calculation, etc,
whilst enabling the support of the world's most advanced accelerator, NVIDIA A100 Tensor Core GPU and the modern models such as Wide and Deep, Deep Cross Network, DeepFM, Deep Learning Recommendation Model (DLRM).
This document describes how to set up the environment and run HugeCTR.
For more details such as HugeCTR architecture and supported features, please refer to HugeCTR User Guide and Questions and Answers in directory docs/
You can download the HugeCTR repository and the third party modules which it relies upon:
git clone https://github.com/NVIDIA/HugeCTR.git
cd HugeCTR
git submodule update --init --recursive
Inside the HugeCTR directory, build a docker image and run a container with the image:
docker build -t hugectr:devel -f ./tools/dockerfiles/dev.a100.Dockerfile . #docker support A100. To use tf please use dev.tf.Dockerfile instead.
docker run --runtime=nvidia --rm -it -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr hugectr:devel bash
Then build HugeCTR with the build type and compute capability specified. Please see Build Options for more details.
cd /hugectr
mkdir -p build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DSM=70 .. # Target is NVIDIA V100
make -j
Let’s download the Kaggle Display Advertising Challenge Dataset to $HugeCTR/tools/criteo_script/
and preprocess it to train the Deep & Cross Network (DCN) example:
cd ../../tools/criteo_script/ # assume that the downloaded dataset is here
bash preprocess.sh dcn 1 0
Alternatively you can generate a synthetic dataset:
cd /hugectr/build
mkdir dataset_dir
bin/data_generator ../samples/dcn/dcn.json ./dataset_dir 434428 1
cd /hugectr/build
bin/huge_ctr --train ../samples/dcn/dcn.json
The other sample models and their end-to-end instructions are available here.
- Requirements
- Supported Compute Capabilities
- Build Options
- Synthetic Data Generation and Benchmark
- File Format
- Document Generation
- Coding Style and Refactor
- cuBLAS >= 10.1
- Compute Capability >= 70 (V100)
- CMake >= 3.8
- cuDNN >= 7.5
- NCCL >= 2.0
- Clang-Format 3.8
- GCC >= 7.4.0
- ortools
- OpenMPI >= 4.0
- UCX library >= 1.6
- HWLOC library >= 2.1.0
- mpi4py
Compute Compatibility | GPU |
---|---|
60 | NVIDIA P100 (Pascal) |
70 | NVIDIA V100 (Volta) |
75 | NVIDIA T4 (Turing) |
80 | NVIDIA A100 (Ampere) |
You can choose to use Docker to simplify the environment setting up. Mare sure that you have installed Nvidia Docker .
To build a docker image of development environment from the corresponding Dockerfile, run the command below. It will install the libraries and tools required to use HugeCTR. HugeCTR build itself must be done by yourself.
$ docker build -t hugectr:devel -f ./tools/dockerfiles/dev.a100.Dockerfile .
Run with interaction mode (mount the home directory of repo into container for easy development and try):
$ docker run --runtime=nvidia --rm -it -u $(id -u):$(id -g) -v $(pwd):/hugectr -w /hugectr hugectr:devel bash
To build a docker image of production environment, run the command below.
In addition to resolving dependencies, it will build and install HugeCTR to /usr/local/hugectr
.
Note that SM
(the target GPU architecture list) and NCCL_A2A
(use NCCL ALL-to-ALL or not) are also specified.
You can change them according to your environment.
$ docker build --build-arg SM="70;75;80" \
--build-arg NCCL_A2A=on \
-t hugectr:build \
-f ./tools/dockerfiles/build.Dockerfile .
Compute Capability can be specified by -DSM=[Compute Compatibilities]
, which is SM70 by default (NVIDIA V100). It is also possible to set multiple Compute Capabilities, e.g., -DSM=70
for NVIDIA V100 and -DSM="70;75"
for both NVIDIA V100 and NVIDIA T4.
$ mkdir -p build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release -DSM=70 .. # Target is NVIDIA V100
$ make -j
If the build type is Debug
, HugeCTR will print more verbose logs and execute GPU tasks in a synchronous manner.
The other options remain the same as Release
build.
$ mkdir -p build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Debug -DSM=70 .. # Target is NVIDIA V100
$ make -j
This mode is designed for framework validation. In this mode loss of training will be shown as the average of eval_batches
results. Only one thread and chunk will be used in DataReader. Performance will be lower than turning off.
$ mkdir -p build
$ cd build
$ cmake -DVAL_MODE=ON ..
$ make -j
To run with multi-nodes please build in this way and run HugeCTR with mpirun
. For more details please refer to samples/dcn2nodes
$ mkdir -p build
$ cd build
$ cmake -DENABLE_MULTINODES=ON ..
$ make -j
The default collection communication library used in LocalizedSlotSparseEmbedding is Gossip. NCCL all2all is also supported in HugeCTR. If you want to run with NCCL all2all, please turn on the NCCL_A2A switch in cmake.
$ mkdir -p build
$ cd build
$ cmake -DNCCL_A2A=ON ..
$ make -j
For quick benchmarking and research use, you can generate a synthetic dataset like below. Without any additional modification to JSON file. Both Norm format (with Header) and Raw format (without Header) dataset can be generated with data_generator
.
- For
Norm
format:
$ ./data_generator your_config.json data_folder vocabulary_size max_nnz (num_files) (num_samples_per_file)
$ ./huge_ctr --train your_config.json
- For
Raw
format:
$ ./data_generator your_config.json
$ ./huge_ctr --train your_config.json
Parameters:
data_folder
: Directory where the generated dataset is stored.vocabulary_size
: Total vocabulary size of your target dataset, which cannot be exceedmax_vocabulary_size_per_gpu
x the number of active GPUs.max_nnz
: You can use this parameter to simulate one-/multi-hot encodings. If you just want to use the one-hot encoding, set this parameter to 1. Otherwise, [1, max_nnz] values will be generated for each slot. Note thatmax_nnz * slot_num
must be less thanmax_feature_num_per_sample
in the data layer of the used JSON config file.num_files
: Number of data file will be generated (optional)num_samples_per_file
: Number of samples per file (optional)
In total, there are three types of files used in HugeCTR training: a configuration file, model file and dataset.
Configuration file must be in a json format, e.g., simple_sparse_embedding_fp32.json
There are three main JSON objects in a configuration file: "solver", "optimizer", and "layers". They can be specified in any order.
- solver: the active GPU list, batchsize, model_file, etc are specified.
- optimizer: The type of optimizer and its hyperparameters are specified.
- layers: training/evaluation data (and their paths), embeddings and dense layers are specified. Note that embeddings must precede the dense layers.
Model file is a binary file that will be loaded to initialize weights. In that file, the weights are stored in the same order with the layers in the configuration file.
We provide a tutorial on how to dump a model to TensorFlow. You can find more details on Model file and it format
there.
Two format of data set are supported in HugeCTR:
Norm format consists of a collection of data files and a file list.
The first line of a file list is the number of data files in the dataset. It is followed by the paths to those files.
$ cat simple_sparse_embedding_file_list.txt
10
./simple_sparse_embedding/simple_sparse_embedding0.data
./simple_sparse_embedding/simple_sparse_embedding1.data
./simple_sparse_embedding/simple_sparse_embedding2.data
./simple_sparse_embedding/simple_sparse_embedding3.data
./simple_sparse_embedding/simple_sparse_embedding4.data
./simple_sparse_embedding/simple_sparse_embedding5.data
./simple_sparse_embedding/simple_sparse_embedding6.data
./simple_sparse_embedding/simple_sparse_embedding7.data
./simple_sparse_embedding/simple_sparse_embedding8.data
./simple_sparse_embedding/simple_sparse_embedding9.data
A data file (binary) consists of a header and actual tabular data.
Header Definition:
typedef struct DataSetHeader_ {
long long error_check; // 0: no error check; 1: check_num
long long number_of_records; // the number of samples in this data file
long long label_dim; // dimension of label
long long dense_dim; // dimension of dense feature
long long slot_num; // slot_num for each embedding
long long reserved[3]; // reserved for future use
} DataSetHeader;
Data Definition (each sample):
typedef struct Data_{
int length; // bytes in this sample (optional: only in check_sum mode )
float label[label_dim];
float dense[dense_dim];
Slot slots[slot_num];
char checkbits; // checkbit for this sample (optional: only in checksum mode)
} Data;
typedef struct Slot_{
int nnz;
unsigned int* keys; // changeable to `long long` with `"input_key_type"` in `solver` object of JSON config file.
} Slot;
RAW format is introduced in HugeCTR v2.2
Different to Norm
format, The training Data in RAW
format is all in one binary file and in int32, no matter Label / Dense Feature / Category Features.
The number of Samples / Dense feature / Category feature / label dimension are all declared in the configure json file.
Note that only one-hot data is accepted with this format.
Data Definition (each sample):
typedef struct Data_{
int label[label_dim];
int dense[dense_dim];
int category[sparse_dim];
} Data;
Default coding style follows Google C++ coding style (link).
This project also uses Clang-Format
(link) to help developers to fix style issue, such as indent, number of spaces to tab, etc.
The Clang-Format is a tool that can auto-refactor source code.
Use following instructions to install and enable Clang-Format:
$ sudo apt-get install clang-format
# First, configure Cmake as usual
$ mkdir -p build
$ cd build
$ cmake -DCLANGFORMAT=ON ..
# Second, run Clang-Format
$ cmake --build . --target clangformat
# Third, check what Clang-Format did modify
$ git status
# or
$ git diff
Doxygen is supported in HugeCTR and by default an on-line documentation browser (in HTML) and an off-line reference manual (in LaTeX) can be generated within docs/
.
Within project home
directory
$ doxygen
Merlin HugeCTR is an industry oriented framework and we are keen on making sure that it satisfies your needs and that it provides an overall pleasant experience. If you face any problem or have any question, please file an issue here so that we can discuss it together. We would also be grateful if you have any suggestions or feature requests to enrich HugeCTR. To further advance the Merlin/HugeCTR Roadmap, we would like to invite you to share the gist of your recommender system pipeline via this survey