[Question] How to serve TF2 SOK model in Triton Inference and convert it to ONNX? #422

tuanavu · 2023-09-25T00:08:10Z

Details

I'm currently working with an existing TensorFlow 2 (TF2) model and the SparseOperationsKit. This set up allows me to utilize the SparseEmbedding Layer of the SOK toolkit. However, I've found that I have to define the sok_model and tf_model separately for training.

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():   
    sok_model = SOKModel(
        dense_feature_stats=dense_feature_stats, 
        trainable_sparse_feature_vocab_dict=trainable_sparse_feature_vocab_dict, 
        pretrained_sparse_feature_info_map=pretrained_sparse_feature_info_map,
        dense_dim=dense_dim,
        sparse_dim=sparse_dim,
    )
    tf_model = TFModel(
        all_feature_names=all_feature_names,
        pretrained_sparse_feature_info_map=pretrained_sparse_feature_info_map,
        dense_dim=dense_dim,
        sparse_dim=sparse_dim,
    )
    
    sok_opt = sok.optimizers.Adam()
    tf_opt = tf.keras.optimizers.Adam()

After training the new TF2 model with SOK, I found that I need to export both the sok_model and the tf_model separately.

tf_model.save(save_dir)
saver = sok.Saver()
saver.dump_to_file()

The resulting outputs are as follows:

sok_model: This results in a collection of files named EmbeddingVariable_*_keys.file and EmbeddingVariable_*_values.file.
tf2 model: This exports saved_model.pb, variables files.

When I need to execute a local test prediction request, I have to load both models independently. I then call the inference_step as follows:

# Load the model
sok_model.load_pretrained_embedding_table()

tf_model = tf.saved_model.load(save_dir)

# Inference steps
@tf.function(experimental_relax_shapes=True, reduce_retracing=True)
def inference_step(inputs):
    return tf_model(sok_model(inputs, training=False), training=False)

# Call inference
res = inference_step(inputs)

Questions

Model Serving: My goal is to deploy this model on the Triton Inference Server. I'm seeking guidance or examples that could streamline this process. I'm also curious about the ideal structure for this deployment - would treating it as an ensemble model that includes both sok and TensorFlow 2 backends be beneficial? In terms of backends, which would be the optimal choice - HugeCTR, TensorFlow 2, or another option? If there are any resources or guides that could assist me in this situation, I'd appreciate the pointers. For HugeCTR, it seems necessary to export the model graph; I'm wondering how I can accomplish the same with this TensorFlow 2 model that utilizes the SOK toolkit.
Model Conversion to ONNX: According to the Hierarchical Parameter Server Demo, HugeCTR can load both the sparse and dense models and convert them to a single ONNX model. I'm wondering how I can perform a similar conversion for this merlin-tensorflow model that uses the SOK toolkit and exports both the sparse and dense model.

Environment details

Merlin version: nvcr.io/nvidia/merlin/merlin-tensorflow:23.02

The text was updated successfully, but these errors were encountered:

KingsleyLiu-NV · 2023-09-25T03:02:55Z

Hi @tuanavu, there are two solutions of deploying models trained with TF2 SOK:

Deploy with HPS TF plugin: sok_to_hps_dlrm_demo.ipynb.
Deploy with HPS TRT plugin: demo_for_tf_trained_model.ipynb

tuanavu added the question Further information is requested label Sep 25, 2023

tuanavu mentioned this issue Oct 27, 2023

[Question] How can I pre-calculate the GPU memory required for embedding cache size? #427

Open

tuanavu closed this as completed Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to serve TF2 SOK model in Triton Inference and convert it to ONNX? #422

[Question] How to serve TF2 SOK model in Triton Inference and convert it to ONNX? #422

tuanavu commented Sep 25, 2023 •

edited

Loading

KingsleyLiu-NV commented Sep 25, 2023

[Question] How to serve TF2 SOK model in Triton Inference and convert it to ONNX? #422

[Question] How to serve TF2 SOK model in Triton Inference and convert it to ONNX? #422

Comments

tuanavu commented Sep 25, 2023 • edited Loading

Details

Questions

Environment details

KingsleyLiu-NV commented Sep 25, 2023

tuanavu commented Sep 25, 2023 •

edited

Loading