In [1]:
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

# Scaling Criteo: Training with HugeCTR
## Overview

HugeCTR is an open-source framework to accelerate the training of CTR estimation models on NVIDIA GPUs. It is written in CUDA C++ and highly exploits GPU-accelerated libraries such as cuBLAS, cuDNN, and NCCL.

HugeCTR offers multiple advantages to train deep learning recommender systems:

Speed: HugeCTR is a highly efficient framework written C++. We experienced upto 10x speed up. HugeCTR on a NVIDIA DGX A100 system proved to be the fastest commercially available solution for training the architecture Deep Learning Recommender Model (DLRM) developed by Facebook.
Scale: HugeCTR supports model parallel scaling. It distributes the large embedding tables over multiple GPUs or multiple nodes.
Easy-to-use: Easy-to-use Python API similar to Keras. Examples for popular deep learning recommender systems architectures (Wide&Deep, DLRM, DCN, DeepFM) are available.
HugeCTR is able to train recommender system models with larger-than-memory embedding tables by leveraging a parameter server.

You can find more information about HugeCTR [here](https://github.com/NVIDIA/HugeCTR).

### Learning Objectives 
In this notebook, we learn how to to use HugeCTR for training recommender system models

* Use HugeCTR to define a recommender system model
* Train Facebook's [Deep Learning Recommendation Model](https://arxiv.org/pdf/1906.00091.pdf) with HugeCTR
* Train popular [Deep & Cross Network](https://arxiv.org/pdf/1708.05123.pdf) with HugeCTR
* Train Google's [Wide & Deep Network](https://arxiv.org/pdf/1606.07792.pdf) with HugeCTR
* Train [DeepFM: A Factorization-Machine based Neural Network ](https://arxiv.org/pdf/1703.04247.pdf) with HugeCTR


## Training with HugeCTR
As HugeCTR optimizes the training in CUDA++, we need to define the training pipeline and model architecture and execute it via the commandline. We will use the Python API, which is similar to Keras models.

If you are not familiar with HugeCTR's Python API and parameters, you can read more in its GitHub repository:
* [HugeCTR User Guide](https://github.com/NVIDIA/HugeCTR/blob/master/docs/hugectr_user_guide.md)
* [HugeCTR Python API](https://github.com/NVIDIA/HugeCTR/blob/master/docs/python_interface.md)
* [HugeCTR example architectures](https://github.com/NVIDIA/HugeCTR/tree/master/samples)

#### NOTE: 
* In this example with Criteo Dataset, only DLRM architecture will be  used for further Inference purposes. Rest of the architectures are explored to give an insight on the AUC parameter with [SGD](https://arxiv.org/pdf/2003.10409.pdf) as an optimizer. Feel free to witness the difference with other optimizers as well like [ADAM](https://arxiv.org/pdf/1412.6980.pdf). 

* If you're training with Docker Container from NGC, make sure you use the below command for training purpose to avoid warnings and exceptions. 
```
docker run --runtime=nvidia --rm -it --cap-add SYS_NICE -v /your/host/dir:/your/container/dir -w /your/container/dir -it -u $(id -u):$(id -g) nvcr.io/nvidia/merlin/merlin-training:0.6
```
An example could be:
```
docker run --gpus=all --rm -it --cap-add SYS_NICE -v /home/nvidia/user:/workspace/user -p 8000:8888 d9cb4e6936ea
```


### 1. DLRM Architecture

We will write the code to a ./model.py file and execute it. It will create snapshot, which we will use for inference in the next notebook. The below cell executes the DLRM architecture.

In [2]:
%%writefile './model.py'
import hugectr
from mpi4py import MPI  # noqa

# HugeCTR
solver = hugectr.CreateSolver(
    vvgpu=[[0]],
    max_eval_batches=100,
    batchsize_eval=2720,
    batchsize=2720,
    i64_input_key=True,
    use_mixed_precision=False,
    repeat_dataset=True,
)
optimizer = hugectr.CreateOptimizer(optimizer_type=hugectr.Optimizer_t.SGD)


reader = hugectr.DataReaderParams(
    data_reader_type=hugectr.DataReaderType_t.Parquet,
    source=["/raid/data/criteo/test_dask/output/train/_file_list.txt"],
    eval_source="/raid/data/criteo/test_dask/output/valid/_file_list.txt",
    check_type=hugectr.Check_t.Non,
    slot_size_array=[
        10000000,
        10000000,
        3014529,
        400781,
        11,
        2209,
        11869,
        148,
        4,
        977,
        15,
        38713,
        10000000,
        10000000,
        10000000,
        584616,
        12883,
        109,
        37,
        17177,
        7425,
        20266,
        4,
        7085,
        1535,
        64,
    ],
)
model = hugectr.Model(solver, reader, optimizer)
model.add(
    hugectr.Input(
        label_dim=1,
        label_name="label",
        dense_dim=13,
        dense_name="dense",
        data_reader_sparse_param_array=[hugectr.DataReaderSparseParam("data1", 1, False, 26)],
    )
)
model.add(
    hugectr.SparseEmbedding(
        embedding_type=hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash,
        workspace_size_per_gpu_in_mb=6000,
        embedding_vec_size=128,
        combiner="sum",
        sparse_embedding_name="sparse_embedding1",
        bottom_name="data1",
        optimizer=optimizer,
    )
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["dense"],
        top_names=["fc1"],
        num_output=512,
    )
)
model.add(
    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc1"], top_names=["relu1"])
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu1"],
        top_names=["fc2"],
        num_output=256,
    )
)
model.add(
    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc2"], top_names=["relu2"])
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu2"],
        top_names=["fc3"],
        num_output=128,
    )
)
model.add(
    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc3"], top_names=["relu3"])
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.Interaction,
        bottom_names=["relu3", "sparse_embedding1"],
        top_names=["interaction1"],
    )
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["interaction1"],
        top_names=["fc4"],
        num_output=1024,
    )
)
model.add(
    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc4"], top_names=["relu4"])
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu4"],
        top_names=["fc5"],
        num_output=1024,
    )
)
model.add(
    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc5"], top_names=["relu5"])
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu5"],
        top_names=["fc6"],
        num_output=512,
    )
)
model.add(
    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc6"], top_names=["relu6"])
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu6"],
        top_names=["fc7"],
        num_output=256,
    )
)
model.add(
    hugectr.DenseLayer(layer_type=hugectr.Layer_t.ReLU, bottom_names=["fc7"], top_names=["relu7"])
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.InnerProduct,
        bottom_names=["relu7"],
        top_names=["fc8"],
        num_output=1,
    )
)
model.add(
    hugectr.DenseLayer(
        layer_type=hugectr.Layer_t.BinaryCrossEntropyLoss,
        bottom_names=["fc8", "label"],
        top_names=["loss"],
    )
)
model.compile()
model.summary()
model.fit(max_iter=10000, eval_interval=3200, display=1000, snapshot=3200)
model.graph_to_json(graph_config_file="./criteo_hugectr/1/criteo.json")

Overwriting ./model.py


In [3]:
!python model.py

[28d10h30m09s][HUGECTR][INFO]: Global seed is 2223000078
[28d10h30m09s][HUGECTR][INFO]: Device to NUMA mapping:
  GPU 0 ->  node 0

[28d10h30m10s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
[28d10h30m10s][HUGECTR][INFO]: Start all2all warmup
[28d10h30m10s][HUGECTR][INFO]: End all2all warmup
[28d10h30m10s][HUGECTR][INFO]: Using All-reduce algorithm OneShot
Device 0: Quadro RTX 8000
[28d10h30m10s][HUGECTR][INFO]: num of DataReader workers: 1
[28d10h30m10s][HUGECTR][INFO]: Vocabulary size: 54120457
[28d10h30m10s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=12288000
[28d10h30m10s][HUGECTR][INFO]: All2All Warmup Start
[28d10h30m10s][HUGECTR][INFO]: All2All Warmup End
[28d10h30m26s][HUGECTR][INFO]: gpu0 start to init embedding
[28d10h30m26s][HUGECTR][INFO]: gpu0 init embedding done
[28d10h30m26s][HUGECTR][INFO]: Starting AUC NCCL warm-up
[28d10h30m26s][HUGECTR][INFO]: Warm-up done
Label                                   Dense                         Sparse                 

### 2. Deep and Cross Network

We will write the code to a ./model_dcn.py file and execute it.  The below cell executes the Deep & Cross architecture.

In [4]:
%%writefile './model_dcn.py'
import hugectr
from mpi4py import MPI


solver = hugectr.CreateSolver(
    vvgpu=[[0]],
    max_eval_batches=100,
    batchsize_eval=2720,
    batchsize=2720,
    i64_input_key=True,
    use_mixed_precision=False,
    repeat_dataset=True,
)


optimizer = hugectr.CreateOptimizer(optimizer_type=hugectr.Optimizer_t.SGD)





reader = hugectr.DataReaderParams(
    data_reader_type=hugectr.DataReaderType_t.Parquet,
    source=["/raid/data/criteo/test_dask/output/train/_file_list.txt"],
    eval_source="/raid/data/criteo/test_dask/output/valid/_file_list.txt",
    check_type=hugectr.Check_t.Sum,

)

model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
                        dense_dim = 13, dense_name = "dense",
                        data_reader_sparse_param_array = 
                        [hugectr.DataReaderSparseParam("data1", 2, False, 26)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 89,
                            embedding_vec_size = 16,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding1",
                            bottom_name = "data1",
                            optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding1"],
                            top_names = ["reshape1"],
                            leading_dim=416))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                            bottom_names = ["reshape1", "dense"], top_names = ["concat1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Slice,
                            bottom_names = ["concat1"],
                            top_names = ["slice11", "slice12"],
                            ranges=[(0,429),(0,429)]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.MultiCross,
                            bottom_names = ["slice11"],
                            top_names = ["multicross1"],
                            num_layers=6))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["slice12"],
                            top_names = ["fc1"],
                            num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc1"],
                            top_names = ["relu1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu1"],
                            top_names = ["dropout1"],
                            dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout1"],
                            top_names = ["fc2"],
                            num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc2"],
                            top_names = ["relu2"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu2"],
                            top_names = ["dropout2"],
                            dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                            bottom_names = ["dropout2", "multicross1"],
                            top_names = ["concat2"]))

model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["concat2"],
                            top_names = ["fc3"],
                            num_output=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                            bottom_names = ["fc3", "label"],
                            top_names = ["loss"]))

model.compile()
model.summary()

model.fit(max_iter=1400, eval_interval=200, display=200, snapshot=1000000)


Overwriting ./model_dcn.py


In [5]:
!python model_dcn.py 

[28d10h31m40s][HUGECTR][INFO]: Global seed is 1036292643
[28d10h31m40s][HUGECTR][INFO]: Device to NUMA mapping:
  GPU 0 ->  node 0

[28d10h31m42s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
[28d10h31m42s][HUGECTR][INFO]: Start all2all warmup
[28d10h31m42s][HUGECTR][INFO]: End all2all warmup
[28d10h31m42s][HUGECTR][INFO]: Using All-reduce algorithm OneShot
Device 0: Quadro RTX 8000
[28d10h31m42s][HUGECTR][INFO]: num of DataReader workers: 1
[28d10h31m42s][HUGECTR][INFO]: Vocabulary size: 0
[28d10h31m42s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=1458176
[28d10h31m42s][HUGECTR][INFO]: All2All Warmup Start
[28d10h31m42s][HUGECTR][INFO]: All2All Warmup End
[28d10h31m53s][HUGECTR][INFO]: gpu0 start to init embedding
[28d10h31m53s][HUGECTR][INFO]: gpu0 init embedding done
[28d10h31m53s][HUGECTR][INFO]: Starting AUC NCCL warm-up
[28d10h31m53s][HUGECTR][INFO]: Warm-up done
Label                                   Dense                         Sparse                        


### 3. Wide & Deep Network
We will write the code to a ./model_wdl.py file and execute it. The below cell executes the Wide & Deep architecture.

In [6]:
%%writefile './model_wdl.py'
import hugectr
from mpi4py import MPI




solver = hugectr.CreateSolver(
    vvgpu=[[0]],
    max_eval_batches=100,
    batchsize_eval=2720,
    batchsize=2720,
    i64_input_key=True,
    use_mixed_precision=False,
    repeat_dataset=True,
)



# optimizer = hugectr.CreateOptimizer(optimizer_type=hugectr.Optimizer_t.SGD)
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam,
                                    update_type = hugectr.Update_t.Global,
                                    beta1 = 0.9,
                                    beta2 = 0.999,
                                    epsilon = 0.0000001)




reader = hugectr.DataReaderParams(
    data_reader_type=hugectr.DataReaderType_t.Parquet,
    source=["/raid/data/criteo/test_dask/output/train/_file_list.txt"],
    eval_source="/raid/data/criteo/test_dask/output/valid/_file_list.txt",
    check_type=hugectr.Check_t.Sum,
)


model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
                        dense_dim = 13, dense_name = "dense",
                        data_reader_sparse_param_array = 
                        [hugectr.DataReaderSparseParam("wide_data", 30, True, 1),
                        hugectr.DataReaderSparseParam("deep_data", 2, False, 26)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 23,
                            embedding_vec_size = 1,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding2",
                            bottom_name = "wide_data",
                            optimizer = optimizer))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.LocalizedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 358,
                            embedding_vec_size = 16,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding1",
                            bottom_name = "deep_data",
                            optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding1"],
                            top_names = ["reshape1"],
                            leading_dim=416))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding2"],
                            top_names = ["reshape2"],
                            leading_dim=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                            bottom_names = ["reshape1", "dense"],
                            top_names = ["concat1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["concat1"],
                            top_names = ["fc1"],
                            num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc1"],
                            top_names = ["relu1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu1"],
                            top_names = ["dropout1"],
                            dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout1"],
                            top_names = ["fc2"],
                            num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc2"],
                            top_names = ["relu2"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu2"],
                            top_names = ["dropout2"],
                            dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout2"],
                            top_names = ["fc3"],
                            num_output=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Add,
                            bottom_names = ["fc3", "reshape2"],
                            top_names = ["add1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                            bottom_names = ["add1", "label"],
                            top_names = ["loss"]))
model.compile()
model.summary()
model.fit(max_iter=10000, eval_interval=3200, display=1000, snapshot=3200)

Overwriting ./model_wdl.py


In [7]:
!python model_wdl.py 

[28d10h32m03s][HUGECTR][INFO]: Global seed is 4063477066
[28d10h32m03s][HUGECTR][INFO]: Device to NUMA mapping:
  GPU 0 ->  node 0

[28d10h32m04s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
[28d10h32m04s][HUGECTR][INFO]: Start all2all warmup
[28d10h32m04s][HUGECTR][INFO]: End all2all warmup
[28d10h32m04s][HUGECTR][INFO]: Using All-reduce algorithm OneShot
Device 0: Quadro RTX 8000
[28d10h32m04s][HUGECTR][INFO]: num of DataReader workers: 1
[28d10h32m04s][HUGECTR][INFO]: Vocabulary size: 0
[28d10h32m04s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=6029312
[28d10h32m04s][HUGECTR][INFO]: All2All Warmup Start
[28d10h32m04s][HUGECTR][INFO]: All2All Warmup End
[28d10h32m04s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=5865472
[28d10h32m04s][HUGECTR][INFO]: All2All Warmup Start
[28d10h32m04s][HUGECTR][INFO]: All2All Warmup End
[28d10h32m15s][HUGECTR][INFO]: gpu0 start to init embedding
[28d10h32m15s][HUGECTR][INFO]: gpu0 init embedding done
[28d10h32m15s][HUGECTR][INFO]: g

#### Notice the difference in the AUC score just by fine-tuning the Optimizer. The highest AUC score achieved is: 0.764 with Wide & Deep Network.

### 4. DeepFM: Factorization Machine based Neural Network
We will write the code to a ./model_deepfm.py file and execute it. The below cell executes the DeepFM architecture.

In [8]:
%%writefile './model_deepfm.py'
import hugectr
from mpi4py import MPI

solver = hugectr.CreateSolver(
    vvgpu=[[0]],
    max_eval_batches=100,
    batchsize_eval=2720,
    batchsize=2720,
    i64_input_key=True,
    use_mixed_precision=False,
    repeat_dataset=True,
)


optimizer = hugectr.CreateOptimizer(optimizer_type=hugectr.Optimizer_t.SGD)


reader = hugectr.DataReaderParams(
    data_reader_type=hugectr.DataReaderType_t.Parquet,
    source=["/raid/data/criteo/test_dask/output/train/_file_list.txt"],
    eval_source="/raid/data/criteo/test_dask/output/valid/_file_list.txt",
    check_type=hugectr.Check_t.Sum,
)

model = hugectr.Model(solver, reader, optimizer)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
                        dense_dim = 13, dense_name = "dense",
                        data_reader_sparse_param_array = 
                        [hugectr.DataReaderSparseParam("data1", 2, False, 26)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 61,
                            embedding_vec_size = 11,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding1",
                            bottom_name = "data1",
                            optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding1"],
                            top_names = ["reshape1"],
                            leading_dim=11))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Slice,
                            bottom_names = ["reshape1"],
                            top_names = ["slice11", "slice12"],
                            ranges=[(0,10),(10,11)]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["slice11"],
                            top_names = ["reshape2"],
                            leading_dim=260))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["slice12"],
                            top_names = ["reshape3"],
                            leading_dim=26))                            
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Slice,
                            bottom_names = ["dense"],
                            top_names = ["slice21", "slice22"],
                            ranges=[(0,13),(0,13)]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.WeightMultiply,
                            bottom_names = ["slice21"],
                            top_names = ["weight_multiply1"],
                            weight_dims= [13,10]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.WeightMultiply,
                            bottom_names = ["slice22"],
                            top_names = ["weight_multiply2"],
                            weight_dims= [13,1]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                            bottom_names = ["reshape2","weight_multiply1"],
                            top_names = ["concat1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Slice,
                            bottom_names = ["concat1"],
                            top_names = ["slice31", "slice32"],
                            ranges=[(0,390),(0,390)]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["slice31"],
                            top_names = ["fc1"],
                            num_output=400))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc1"],
                            top_names = ["relu1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu1"],
                            top_names = ["dropout1"],
                            dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout1"],
                            top_names = ["fc2"],
                            num_output=400))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc2"],
                            top_names = ["relu2"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu2"],
                            top_names = ["dropout2"],
                            dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout2"],
                            top_names = ["fc3"],
                            num_output=400))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc3"],
                            top_names = ["relu3"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu3"],
                            top_names = ["dropout3"],
                            dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout3"],
                            top_names = ["fc4"],
                            num_output=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.FmOrder2,
                            bottom_names = ["slice32"],
                            top_names = ["fmorder2"],
                            out_dim=10))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReduceSum,
                            bottom_names = ["fmorder2"],
                            top_names = ["reducesum1"],
                            axis=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                            bottom_names = ["reshape3","weight_multiply2"],
                            top_names = ["concat2"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReduceSum,
                            bottom_names = ["concat2"],
                            top_names = ["reducesum2"],
                            axis=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Add,
                            bottom_names = ["fc4", "reducesum1", "reducesum2"],
                            top_names = ["add"]))                                                                                                        
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                            bottom_names = ["add", "label"],
                            top_names = ["loss"]))
model.compile()
model.summary()
model.fit(max_iter = 1200, display = 200, eval_interval = 600, snapshot = 1000000, snapshot_prefix = "deepfm")

Overwriting ./model_deepfm.py


In [9]:
!python model_deepfm.py

[28d10h34m26s][HUGECTR][INFO]: Global seed is 554547979
[28d10h34m26s][HUGECTR][INFO]: Device to NUMA mapping:
  GPU 0 ->  node 0

[28d10h34m27s][HUGECTR][INFO]: Peer-to-peer access cannot be fully enabled.
[28d10h34m27s][HUGECTR][INFO]: Start all2all warmup
[28d10h34m27s][HUGECTR][INFO]: End all2all warmup
[28d10h34m27s][HUGECTR][INFO]: Using All-reduce algorithm OneShot
Device 0: Quadro RTX 8000
[28d10h34m27s][HUGECTR][INFO]: num of DataReader workers: 1
[28d10h34m27s][HUGECTR][INFO]: Vocabulary size: 0
[28d10h34m27s][HUGECTR][INFO]: max_vocabulary_size_per_gpu_=1453707
[28d10h34m33s][HUGECTR][INFO]: gpu0 start to init embedding
[28d10h34m33s][HUGECTR][INFO]: gpu0 init embedding done
[28d10h34m33s][HUGECTR][INFO]: Starting AUC NCCL warm-up
[28d10h34m33s][HUGECTR][INFO]: Warm-up done
Label                                   Dense                         Sparse                        
label                                   dense                          data1                         
(