<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# SOK to HPS DLRM Demo

## Overview

This notebook demonstrates how to train a DLRM model with SparseOperationKit (SOK) and then make inference with HierarchicalParameterServer(HPS). It is recommended to run [sparse_operation_kit_demo.ipynb](https://github.com/NVIDIA-Merlin/HugeCTR/blob/master/sparse_operation_kit/notebooks/sparse_operation_kit_demo.ipynb) and [hierarchical_parameter_server_demo.ipynb](hierarchical_parameter_server_demo.ipynb) before diving into this notebook.

For more details about SOK, please refer to [SOK Documentation](https://nvidia-merlin.github.io/HugeCTR/sparse_operation_kit/master/index.html). For more details about HPS APIs, please refer to [HPS APIs](https://nvidia-merlin.github.io/HugeCTR/hierarchical_parameter_server/master/api/index.html). For more details about HPS per se, please refer to [HugeCTR Hierarchical Parameter Server (HPS)](https://nvidia-merlin.github.io/HugeCTR/master/hugectr_parameter_server.html#hugectr-hierarchical-parameter-server-database-backend).

## Installation

### Get SOK from NGC

Both SOK and HPS Python modules are preinstalled in the 22.08 and later [Merlin TensorFlow Container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow): `nvcr.io/nvidia/merlin/merlin-tensorflow:22.08`.

You can check the existence of the required libraries by running the following Python code after launching this container.

```bash
$ python3 -c "import sparse_operation_kit as sok"
$ python3 -c "import hierarchical_parameter_server as hps"
```

## Configurations

First of all we specify the required configurations, e.g., the arguments needed for generating the dataset, the model parameters and the paths to save the model. We will use DLRM model which has one embedding table, bottom MLP layers, interaction layer and top MLP layers. Please note that the input to the embedding layer will be a sparse key tensor.

In [1]:
import sparse_operation_kit as sok
import sys
sys.path.append("/hugectr/sparse_operation_kit/unit_test/test_scripts/tf2/")
import utils

import os
import numpy as np
import tensorflow as tf
import struct

args = dict()

args["gpu_num"] = 1                               # the number of available GPUs
args["iter_num"] = 10                             # the number of training iteration
args["slot_num"] = 26                             # the number of feature fields in this embedding layer
args["embed_vec_size"] = 16                       # the dimension of embedding vectors
args["dense_dim"] = 13                            # the dimension of dense features
args["global_batch_size"] = 1024                  # the globally batchsize for all GPUs
args["max_vocabulary_size"] = 260000
args["vocabulary_range_per_slot"] = [[i*10000, (i+1)*10000] for i in range(26)] 
args["max_nnz"] = 10                # the max number of non-zeros for all slots
args["combiner"] = "mean"

args["ps_config_file"] = "dlrm.json"
args["dense_model_path"] = "dlrm_dense.model"
args["embedding_table_path"] = "dlrm_sparse.model"
args["saved_path"] = "dlrm_tf_saved_model"
args["np_key_type"] = np.int64
args["np_vector_type"] = np.float32
args["tf_key_type"] = tf.int64
args["tf_vector_type"] = tf.float32
args["optimizer"] = "plugin_adam"

os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(args["gpu_num"])))

[INFO]: sparse_operation_kit is imported


In [2]:
def generate_random_samples(num_samples, vocabulary_range_per_slot, max_nnz, dense_dim):
    def generate_sparse_keys(num_samples, vocabulary_range_per_slot, max_nnz, key_dtype = args["np_key_type"]):
        slot_num = len(vocabulary_range_per_slot)
        indices = []
        values = []
        for i in range(num_samples):
            for j in range(slot_num):
                vocab_range = vocabulary_range_per_slot[j]
                nnz = np.random.randint(low=1, high=max_nnz+1)
                entries = sorted(np.random.choice(max_nnz, nnz, replace=False))
                for entry in entries:
                    indices.append([i, j, entry])
                values.extend(np.random.randint(low=vocab_range[0], high=vocab_range[1], size=(nnz, )))
        values = np.array(values, dtype=key_dtype)
        return tf.sparse.SparseTensor(indices = indices,
                                    values = values,
                                    dense_shape = (num_samples, slot_num, max_nnz))

    
    sparse_keys = generate_sparse_keys(num_samples, vocabulary_range_per_slot, max_nnz)
    dense_features = np.random.random((num_samples, dense_dim)).astype(np.float32)
    labels = np.random.randint(low=0, high=2, size=(num_samples, 1))
    return sparse_keys, dense_features, labels

def tf_dataset(sparse_keys, dense_features, labels, batchsize):
    dataset = tf.data.Dataset.from_tensor_slices((sparse_keys, dense_features, labels))
    dataset = dataset.batch(batchsize, drop_remainder=True)
    return dataset

## Train with SOK embedding layers

We define the model graph for training with SOK embedding layers, i.e., `sok.DistributedEmbedding`. We can then train the model and save the trained weights of the embedding table into the formats required by HPS. As for the dense layers, they are saved as a separate model graph, which can be loaded directly during inference.

In [3]:
class MLP(tf.keras.layers.Layer):
    def __init__(self,
                arch,
                activation='relu',
                out_activation=None,
                **kwargs):
        super(MLP, self).__init__(**kwargs)
        self.layers = []
        index = 0
        for units in arch[:-1]:
            self.layers.append(tf.keras.layers.Dense(units, activation=activation, name="{}_{}".format(kwargs['name'], index)))
            index+=1
        self.layers.append(tf.keras.layers.Dense(arch[-1], activation=out_activation, name="{}_{}".format(kwargs['name'], index)))

            
    def call(self, inputs, training=True):
        x = self.layers[0](inputs)
        for layer in self.layers[1:]:
            x = layer(x)
        return x

class SecondOrderFeatureInteraction(tf.keras.layers.Layer):
    def __init__(self, self_interaction=False):
        super(SecondOrderFeatureInteraction, self).__init__()
        self.self_interaction = self_interaction

    def call(self, inputs):
        batch_size = tf.shape(inputs)[0]
        num_feas = tf.shape(inputs)[1]

        dot_products = tf.matmul(inputs, inputs, transpose_b=True)

        ones = tf.ones_like(dot_products)
        mask = tf.linalg.band_part(ones, 0, -1)
        out_dim = num_feas * (num_feas + 1) // 2

        if not self.self_interaction:
            mask = mask - tf.linalg.band_part(ones, 0, 0)
            out_dim = num_feas * (num_feas - 1) // 2
        flat_interactions = tf.reshape(tf.boolean_mask(dot_products, mask), (batch_size, out_dim))
        return flat_interactions

class DLRM(tf.keras.models.Model):
    def __init__(self,
                 combiner,
                 max_vocabulary_size_per_gpu,
                 embed_vec_size,
                 slot_num,
                 max_nnz,
                 dense_dim,
                 arch_bot,
                 arch_top,
                 self_interaction,
                 **kwargs):
        super(DLRM, self).__init__(**kwargs)
        
        self.combiner = combiner
        self.max_vocabulary_size_per_gpu = max_vocabulary_size_per_gpu
        self.embed_vec_size = embed_vec_size
        self.slot_num = slot_num
        self.max_nnz = max_nnz
        self.dense_dim = dense_dim
        
        self.embedding_layer = sok.DistributedEmbedding(combiner=self.combiner,
                                                        max_vocabulary_size_per_gpu=self.max_vocabulary_size_per_gpu,
                                                        embedding_vec_size=self.embed_vec_size,
                                                        slot_num=self.slot_num,
                                                        max_nnz=self.max_nnz)
        self.bot_nn = MLP(arch_bot, name = "bottom", out_activation='relu')
        self.top_nn = MLP(arch_top, name = "top", out_activation='sigmoid')
        self.interaction_op = SecondOrderFeatureInteraction(self_interaction)
        if self_interaction:
            self.interaction_out_dim = (self.slot_num+1) * (self.slot_num+2) // 2
        else:
            self.interaction_out_dim = self.slot_num * (self.slot_num+1) // 2
        self.reshape_layer1 = tf.keras.layers.Reshape((1, arch_bot[-1]), name = "reshape1")
        self.concat1 = tf.keras.layers.Concatenate(axis=1, name = "concat1")
        self.concat2 = tf.keras.layers.Concatenate(axis=1, name = "concat2")
            
    def call(self, inputs, training=True):
        input_cat = inputs[0]
        input_dense = inputs[1]
        
        embedding_vector = self.embedding_layer(input_cat, training=training)
        dense_x = self.bot_nn(input_dense)
        concat_features = self.concat1([embedding_vector, self.reshape_layer1(dense_x)])
        
        Z = self.interaction_op(concat_features)
        z = self.concat2([dense_x, Z])
        logit = self.top_nn(z)
        return logit, embedding_vector

    def summary(self):
        inputs = [tf.keras.Input(shape=(self.max_nnz, ), sparse=True, dtype=args["tf_key_type"]), 
                  tf.keras.Input(shape=(self.dense_dim, ), dtype=tf.float32)]
        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
        return model.summary()

In [4]:
def train(args):
    dlrm = DLRM(combiner = "mean", 
                max_vocabulary_size_per_gpu = args["max_vocabulary_size"] // args["gpu_num"],
                embed_vec_size = args["embed_vec_size"],
                slot_num = args["slot_num"],
                max_nnz = args["max_nnz"],
                dense_dim = args["dense_dim"],
                arch_bot = [256, 128, args["embed_vec_size"]],
                arch_top = [256, 128, 1],
                self_interaction = False)

    emb_opt = utils.get_embedding_optimizer(args["optimizer"])(learning_rate=0.1)
    dense_opt = utils.get_dense_optimizer(args["optimizer"])(learning_rate=0.1)

    init_tensors = np.ones(shape=[args["max_vocabulary_size"], args["embed_vec_size"]], dtype=args["np_vector_type"])
    embedding_saver = sok.Saver()
    embedding_saver.load_embedding_values(dlrm.embedding_layer.embedding_variable, init_tensors)

    loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)

    @tf.function
    def _train_step(inputs, labels):
        with tf.GradientTape() as tape:
            logit, embedding_vector = dlrm(inputs, training=True)
            loss = loss_fn(labels, logit)
        embedding_variables, other_variable = sok.split_embedding_variable_from_others(dlrm.trainable_variables)
        grads, emb_grads = tape.gradient(loss, [other_variable, embedding_variables])
        if 'plugin' not in args["optimizer"]:
            with sok.OptimizerScope(embedding_variables):
                emb_opt.apply_gradients(zip(emb_grads, embedding_variables),
                                        experimental_aggregate_gradients=False)
        else:
            emb_opt.apply_gradients(zip(emb_grads, embedding_variables),
                                    experimental_aggregate_gradients=False)
        dense_opt.apply_gradients(zip(grads, other_variable))
        return logit, embedding_vector, loss

    sparse_keys, dense_features, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot"], args["max_nnz"], args["dense_dim"])
    dataset = tf_dataset(sparse_keys, dense_features, labels, args["global_batch_size"])
    for i, (sparse_keys, dense_features, labels) in enumerate(dataset):
        sparse_keys = tf.sparse.reshape(sparse_keys, [-1, sparse_keys.shape[-1]])
        inputs = [sparse_keys, dense_features]
        logit, embedding_vector, loss = _train_step(inputs, labels)
        print("-"*20, "Step {}, loss: {}".format(i, loss),  "-"*20)
    return dlrm, embedding_saver

In [5]:
sok.Init(global_batch_size=args["global_batch_size"])
trained_model, embedding_saver = train(args)
trained_model.summary()

2022-07-29 07:16:16.793169: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-29 07:16:17.323141: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2022-07-29 07:16:17.323214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30997 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0


2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:107] Mapping from local_replica_id to device_id:
2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:109] 0 -> 0
2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:84] Global seed is 4287744788
2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:85] Local GPU Count: 1
2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:86] Global GPU Count: 1
2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:127] Global Replica Id: 0; Local Replica Id: 0
2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_manager.cc:132] Created embedding variable whose name is EmbeddingVariable
2022-07-29 07:16:17.078977: I sparse_operation_kit/kit_cc/kit_cc_infra/src

  return dispatch_target(*args, **kwargs)


-------------------- Step 0, loss: 0.9379717111587524 --------------------
-------------------- Step 1, loss: 12726.013671875 --------------------
-------------------- Step 2, loss: 73.78772735595703 --------------------
-------------------- Step 3, loss: 71.33247375488281 --------------------
-------------------- Step 4, loss: 33.48320770263672 --------------------
-------------------- Step 5, loss: 234.79978942871094 --------------------
-------------------- Step 6, loss: 1.6663873195648193 --------------------
-------------------- Step 7, loss: 30.426162719726562 --------------------
-------------------- Step 8, loss: 2.430748462677002 --------------------
-------------------- Step 9, loss: 4.768443584442139 --------------------
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_2 (InputLayer)           [(None, 13)] 

In [6]:
dense_model = tf.keras.Model([trained_model.get_layer("distributed_embedding").output,
                             trained_model.get_layer("bottom").input],
                             trained_model.get_layer("top").output)
dense_model.summary()
dense_model.save(args["dense_model_path"])

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_2 (InputLayer)           [(None, 13)]         0           []                               
                                                                                                  
 bottom (MLP)                   (None, 16)           38544       ['input_2[0][0]']                
                                                                                                  
 input_3 (InputLayer)           [(None, 26, 16)]     0           []                               
                                                                                                  
 reshape1 (Reshape)             (None, 1, 16)        0           ['bottom[1][0]']                 
                                                                                            

2022-07-29 07:16:56.089529: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: dlrm_dense.model/assets


INFO:tensorflow:Assets written to: dlrm_dense.model/assets


In [7]:
!mkdir -p dlrm_sparse.model
embedding_saver.dump_to_file(trained_model.embedding_layer.embedding_variable, args["embedding_table_path"])
!mv dlrm_sparse.model/EmbeddingVariable_keys.file dlrm_sparse.model/key
!mv dlrm_sparse.model/EmbeddingVariable_values.file dlrm_sparse.model/emb_vector
!ls -l dlrm_sparse.model

2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_manager.cc:192] Saving EmbeddingVariable to dlrm_sparse.model..
2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc_impl/embedding/common/src/dumping_functions.cc:60] Worker: 0, GPU: 0 key-index count = 260000
2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc_impl/embedding/common/src/dumping_functions.cc:147] Worker: 0, GPU: 0: dumping parameters from hashtable..
2022-07-29 07:17:01.079021: I sparse_operation_kit/kit_cc/kit_cc_infra/src/parameters/raw_manager.cc:200] Saved EmbeddingVariable to dlrm_sparse.model.
total 18360
-rw-r--r-- 1 nobody nogroup 16640000 Jul 29 07:17 emb_vector
-rw-r--r-- 1 nobody nogroup  2080000 Jul 29 07:17 key


## Create the inference graph with HPS SparseLookupLayer
In order to use HPS in the inference stage, we need to create a inference model graph which is almost the same as the train graph except that `sok.DistributedEmbedding` is replaced by `hps.SparseLookupLayer`. The trained dense model graph can be loaded directly, while the weights of the embedding table can be retrieved by HPS from the folder `dlrm_sparse.model`.

We can then save the inference model graph, which will be ready to be loaded for inference deployment.

In [8]:
import hierarchical_parameter_server as hps

class InferenceModel(tf.keras.models.Model):
    def __init__(self,
                 slot_num,
                 embed_vec_size,
                 max_nnz,
                 dense_dim,
                 dense_model_path,
                 **kwargs):
        super(InferenceModel, self).__init__(**kwargs)
        
        self.slot_num = slot_num
        self.embed_vec_size = embed_vec_size
        self.max_nnz = max_nnz
        self.dense_dim = dense_dim
        
        self.sparse_lookup_layer = hps.SparseLookupLayer(model_name = "dlrm", 
                                            table_id = 0,
                                            emb_vec_size = self.embed_vec_size,
                                            emb_vec_dtype = args["tf_vector_type"])
        self.dense_model = tf.keras.models.load_model(dense_model_path)
    
    def call(self, inputs):
        input_cat = inputs[0]
        input_dense = inputs[1]

        embeddings = tf.reshape(self.sparse_lookup_layer(sp_ids=input_cat, sp_weights = None, combiner="mean"),
                                shape=[-1, self.slot_num, self.embed_vec_size])
        logit = self.dense_model([embeddings, input_dense])
        return logit, embeddings

    def summary(self):
        inputs = [tf.keras.Input(shape=(self.max_nnz, ), sparse=True, dtype=args["tf_key_type"]), 
                  tf.keras.Input(shape=(self.dense_dim, ), dtype=tf.float32)]
        model = tf.keras.models.Model(inputs=inputs, outputs=self.call(inputs))
        return model.summary()

[INFO] hierarchical_parameter_server is imported


In [9]:
def create_and_save_inference_graph(args): 
    model = InferenceModel(args["slot_num"], args["embed_vec_size"], args["max_nnz"], args["dense_dim"], args["dense_model_path"])
    model.summary()
    inputs = [tf.keras.Input(shape=(args["max_nnz"], ), sparse=True, dtype=args["tf_key_type"]), 
              tf.keras.Input(shape=(args["dense_dim"], ), dtype=tf.float32)]
    _, _ = model(inputs)
    model.save(args["saved_path"])

In [10]:
create_and_save_inference_graph(args)

2022-07-29 07:24:43.911439: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-29 07:24:44.490542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30989 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0


Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 10)]         0           []                               
                                                                                                  
 sparse_lookup_layer (SparseLoo  (None, 16)          0           ['input_1[0][0]']                
 kupLayer)                                                                                        
                                                                                                  
 tf.reshape (TFOpLambda)        (None, 26, 16)       0           ['sparse_lookup_layer[0][0]']    
                                                                                                  
 input_2 (InputLayer)           [(None, 13)]         0           []                           

2022-07-29 07:24:48.043599: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: dlrm_tf_saved_model/assets


INFO:tensorflow:Assets written to: dlrm_tf_saved_model/assets


## Inference with saved model graph

In order to initialize the lookup service provided by HPS, we also need to create a JSON configuration file and specify the details of the embedding tables for the models to be deployed. We deploy the DLRM model that has one embedding table here, and it can support multiple models with multiple embedding tables actually. Please note how `maxnum_catfeature_query_per_table_per_sample` is specified for the embedding table: the `max_nnz` is 10 for all the slots and there are 26 slots, so this entry is configured as 260.

We first call `hps.Init` to do the necessary initialization work, and then load the saved model graph to make inference. We peek at the keys and the embedding vectors for each table for the last inference batch.

In [11]:
%%writefile dlrm.json
{
    "supportlonglong": true,
    "models": [{
        "model": "dlrm",
        "sparse_files": ["dlrm_sparse.model"],
        "num_of_worker_buffer_in_pool": 3,
        "embedding_table_names":["sparse_embedding0"],
        "embedding_vecsize_per_table": [16],
        "maxnum_catfeature_query_per_table_per_sample": [260],
        "default_value_for_each_table": [1.0],
        "deployed_device_list": [0],
        "max_batch_size": 1024,
        "cache_refresh_percentage_per_iteration": 0.2,
        "hit_rate_threshold": 1.0,
        "gpucacheper": 1.0,
        "gpucache": true
        }
    ]
}

Overwriting dlrm.json


In [12]:
def inference_with_saved_model(args):
    hps.Init(global_batch_size = args["global_batch_size"],
             ps_config_file = args["ps_config_file"])
    model = tf.keras.models.load_model(args["saved_path"])
    model.summary()
    def _infer_step(inputs, labels):
        logit, embeddings = model(inputs)
        return logit, embeddings
    
    embeddings_peek = list()
    inputs_peek = list()
    
    sparse_keys, dense_features, labels = generate_random_samples(args["global_batch_size"]  * args["iter_num"], args["vocabulary_range_per_slot"], args["max_nnz"], args["dense_dim"])
    dataset = tf_dataset(sparse_keys, dense_features, labels, args["global_batch_size"])
    for i, (sparse_keys, dense_features, labels) in enumerate(dataset):
        sparse_keys = tf.sparse.reshape(sparse_keys, [-1, sparse_keys.shape[-1]])
        inputs = [sparse_keys, dense_features]
        logit, embeddings = _infer_step(inputs, labels)
        embeddings_peek.append(embeddings)
        inputs_peek.append(inputs)
        print("-"*20, "Step {}".format(i),  "-"*20)
    return embeddings_peek, inputs_peek

In [13]:
embeddings_peek, inputs_peek = inference_with_saved_model(args)

# embedding table, input keys are SparseTensor 
print(inputs_peek[-1][0].values)
print(embeddings_peek[-1])

[HCTR][07:24:53.183][INFO][RK0][main]: dense_file is not specified using default: 
[HCTR][07:24:53.183][INFO][RK0][main]: num_of_refresher_buffer_in_pool is not specified using default: 1
[HCTR][07:24:53.183][INFO][RK0][main]: maxnum_des_feature_per_sample is not specified using default: 26
[HCTR][07:24:53.183][INFO][RK0][main]: refresh_delay is not specified using default: 0
[HCTR][07:24:53.183][INFO][RK0][main]: refresh_interval is not specified using default: 0
[HCTR][07:24:53.184][INFO][RK0][main]: Creating HashMap CPU database backend...
[HCTR][07:24:53.184][INFO][RK0][main]: Volatile DB: initial cache rate = 1
[HCTR][07:24:53.184][INFO][RK0][main]: Volatile DB: cache missed embeddings = 0
[HCTR][07:24:53.682][INFO][RK0][main]: Table: hps_et.dlrm.sparse_embedding0; cached 260000 / 260000 embeddings in volatile database (PreallocatedHashMapBackend); load: 260000 / 18446744073709551615 (0.00%).
[HCTR][07:24:53.682][DEBUG][RK0][main]: Real-time subscribers created!
[HCTR][07:24:53.68



Model: "inference_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 sparse_lookup_layer (Sparse  multiple                 0         
 LookupLayer)                                                    
                                                                 
 model_1 (Functional)        (None, 1)                 165777    
                                                                 
Total params: 165,777
Trainable params: 165,777
Non-trainable params: 0
_________________________________________________________________
-------------------- Step 0 --------------------
-------------------- Step 1 --------------------
-------------------- Step 2 --------------------
-------------------- Step 3 --------------------
-------------------- Step 4 --------------------
-------------------- Step 5 --------------------
-------------------- Step 6 --------------------
-------------------- Step 7 ----