Description
When running the tensorrt engine and do profiling (using trtexec), we found that two of the ForeignNode takes 60% inference time. And the total number of nodes in the tensorrt graph is about 1000 -- this huge latency consumption from two nodes doesn't make sense.
Could you please share some guidance on:
- what is a
ForeignNode?
- any analysis on why it runs so slowly compared to the others, are Myelin operators expected to run so slowly?
- how could we identify the root cause and is there any suggested fix?
We really need some help on these. Thanks ahead!
, { "name" : "{ForeignNode[ReduceMean_4492...Mul_4557]}", "timeMs" : 3802.94, "averageMs" : 37.2837, "medianMs" : 37.2818, "percentage" : 30.5601 }
, { "name" : "{ForeignNode[ReduceMean_699...Mul_764]}", "timeMs" : 3802.82, "averageMs" : 37.2825, "medianMs" : 37.2818, "percentage" : 30.5591 }
Environment
TensorRT Version: 8.4.3.1
NVIDIA GPU: A100
NVIDIA Driver Version: 470
CUDA Version: 11.4
CUDNN Version: 8.4
Operating System: Ubuntu 20.04
Python Version (if applicable): 3.8
Tensorflow Version (if applicable): NA
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Steps To Reproduce
We cannot share the model to reproduce but it's a large diffusion model, unet structure with plenty of Attentions.
Description
When running the tensorrt engine and do profiling (using
trtexec), we found that two of theForeignNodetakes60%inference time. And the total number of nodes in the tensorrt graph is about 1000 -- this huge latency consumption from two nodes doesn't make sense.Could you please share some guidance on:
ForeignNode?We really need some help on these. Thanks ahead!
Environment
TensorRT Version: 8.4.3.1
NVIDIA GPU: A100
NVIDIA Driver Version: 470
CUDA Version: 11.4
CUDNN Version: 8.4
Operating System: Ubuntu 20.04
Python Version (if applicable): 3.8
Tensorflow Version (if applicable): NA
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Steps To Reproduce
We cannot share the model to reproduce but it's a large diffusion model, unet structure with plenty of Attentions.