Embedding_bag operator on GPU #3319

rishucoding · 2023-09-13T15:55:14Z

Hello,

Nvidia MLPerf suggests to use TensorRT framework for a performant inference deployment. For DLRM (DL based Recommendation Systems) inference on GPU, I have the following questions:

Does TensorRT modify the backend (CUDA/C++ source code) of Embedding bag operator or it uses the exact same vanilla PyTorch CUDA kernels?
What are the benefits of using vanilla PyTorch over TensorRT for DLRM inference?

Please let me know your comments. Thanks

zerollzeng · 2023-09-17T12:50:46Z

@nvpohanh ^ ^

nvpohanh · 2023-09-18T03:28:00Z

For Gather operation, TRT generates the kernel dynamically and tries to fuse it with other pointwise operations if possible. That means, we do not use the same Gather kernels as PyTorch does.

nvpohanh · 2023-09-18T03:29:22Z

What are the benefits of using vanilla PyTorch over TensorRT for DLRM inference?

Our MLPerf-Inference submission uses TensorRT for the DLRM benchmark: https://github.com/mlcommons/inference_results_v3.1/tree/main/closed/NVIDIA

Using TensorRT allows more aggressive fusions like Gemm+Pointwise fusions.

ttyio · 2023-10-10T20:12:08Z

closing since no activity for more than 3 weeks, thanks all!

rishucoding · 2024-02-08T18:18:54Z

Thanks @nvpohanh for the comments. Could you share the source code for TRT implementation of Gather Kernel used in Embedding Stage for DLRMs? Also, could you compare the TRT gather kernel with the PyTorch Embedding Stage CUDA kernel (link)

zerollzeng · 2024-02-13T07:53:56Z

@nvpohanh ^ ^

rishucoding · 2024-06-16T00:18:59Z

Hi -- could you please share your comments on my follow-up question? Thanks.

zerollzeng assigned nvpohanh Sep 17, 2023

zerollzeng added the triaged Issue has been triaged by maintainers label Sep 17, 2023

ttyio closed this as completed Oct 10, 2023

rishucoding mentioned this issue Feb 12, 2024

Follow up on Issue#3319 #3664

Closed

zerollzeng reopened this Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedding_bag operator on GPU #3319

Embedding_bag operator on GPU #3319

rishucoding commented Sep 13, 2023

zerollzeng commented Sep 17, 2023

nvpohanh commented Sep 18, 2023

nvpohanh commented Sep 18, 2023

ttyio commented Oct 10, 2023

rishucoding commented Feb 8, 2024

zerollzeng commented Feb 13, 2024

rishucoding commented Jun 16, 2024

Embedding_bag operator on GPU #3319

Embedding_bag operator on GPU #3319

Comments

rishucoding commented Sep 13, 2023

zerollzeng commented Sep 17, 2023

nvpohanh commented Sep 18, 2023

nvpohanh commented Sep 18, 2023

ttyio commented Oct 10, 2023

rishucoding commented Feb 8, 2024

zerollzeng commented Feb 13, 2024

rishucoding commented Jun 16, 2024