Skip to content

Two of the ForeignNodes consumes 60% inference time, among 1000 nodes  #2308

@CanyonWind

Description

@CanyonWind

Description

When running the tensorrt engine and do profiling (using trtexec), we found that two of the ForeignNode takes 60% inference time. And the total number of nodes in the tensorrt graph is about 1000 -- this huge latency consumption from two nodes doesn't make sense.

Could you please share some guidance on:

  • what is a ForeignNode?
  • any analysis on why it runs so slowly compared to the others, are Myelin operators expected to run so slowly?
  • how could we identify the root cause and is there any suggested fix?

We really need some help on these. Thanks ahead!

, { "name" : "{ForeignNode[ReduceMean_4492...Mul_4557]}", "timeMs" : 3802.94, "averageMs" : 37.2837, "medianMs" : 37.2818, "percentage" : 30.5601 }
, { "name" : "{ForeignNode[ReduceMean_699...Mul_764]}", "timeMs" : 3802.82, "averageMs" : 37.2825, "medianMs" : 37.2818, "percentage" : 30.5591 }

Environment

TensorRT Version: 8.4.3.1
NVIDIA GPU: A100
NVIDIA Driver Version: 470
CUDA Version: 11.4
CUDNN Version: 8.4
Operating System: Ubuntu 20.04
Python Version (if applicable): 3.8
Tensorflow Version (if applicable): NA
PyTorch Version (if applicable):
Baremetal or Container (if so, version):

Steps To Reproduce

We cannot share the model to reproduce but it's a large diffusion model, unet structure with plenty of Attentions.

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions