You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd be glad if you can advise why the Myelin ForeignNode takes a very long time at engine build. Based on this answer, I should not be too worried concerning inference itself, but how can I be sure that the very long time taken during engine build is expected? Myelin ForeignNode is taking at engine build ~20s for this small 300 MB model, but up to 500 seconds for GPT-J-6B. Is this expected?
This last matmul is huge, but I somewhat doubt it would take like >99% of the inference time. So I wonder if there's a bug in the inference as well.
Environment
TensorRT Version: 8.4.3.1 NVIDIA GPU: NVIDIA GeForce RTX 3060 Laptop GPU (but could reproduce on an A100 as well) NVIDIA Driver Version: 515.86.01 CUDA Version: 11.7 CUDNN Version: 8.7.0 Operating System: Linux 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux PyTorch Version (if applicable): 1.13.1 (version used for torch.onnx.export())
The text was updated successfully, but these errors were encountered:
fxmarty
changed the title
Myelin ForeignNode takes most time at inference, seem to consume most of the model
Myelin ForeignNode takes most time at engine build & inference
Dec 30, 2022
Description
The model and logs can be found here (uploading): https://huggingface.co/fxmarty/bugged-myelin-tensorrt-gptj/tree/main
A netron view can be found here: http://netron.app?url=https://huggingface.co/fxmarty/bugged-myelin-tensorrt-gptj/blob/main/decoder_model.onnx
I'd be glad if you can advise why the Myelin ForeignNode takes a very long time at engine build. Based on this answer, I should not be too worried concerning inference itself, but how can I be sure that the very long time taken during engine build is expected? Myelin ForeignNode is taking at engine build ~20s for this small 300 MB model, but up to 500 seconds for GPT-J-6B. Is this expected?
The command run is:
The profiling yields:
The first foreign node seem to come from:
The second one should be:
This last matmul is huge, but I somewhat doubt it would take like >99% of the inference time. So I wonder if there's a bug in the inference as well.
Environment
TensorRT Version: 8.4.3.1
NVIDIA GPU: NVIDIA GeForce RTX 3060 Laptop GPU (but could reproduce on an A100 as well)
NVIDIA Driver Version: 515.86.01
CUDA Version: 11.7
CUDNN Version: 8.7.0
Operating System: Linux 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
PyTorch Version (if applicable): 1.13.1 (version used for
torch.onnx.export()
)Related
#2308 #2576 huggingface/optimum#605
The text was updated successfully, but these errors were encountered: