Skip to content

I rewrite the demo of Huggingface t5 to bart, And I inference with method of 'generate' ,and paramters with top_p sampling and return_num_sequences=10 #1756

@zzj-otw

Description

@zzj-otw

截屏2022-01-25 下午6 13 42

like the screenshot, when return_num_sequences=1, the speed of tensorrt engine is faster 2x than pytorch model. But when i increase 'return_num_sequences', trt engine is gradually slower than pytoch. when return_num_sequences=10 , trt engine is obviously slower than pytorch model. Who konws about it, and how to resolve it? And the batch_size of trt bart_decoder equal to return_num_sequences when export to trt bart decoder

TensorRT Version: 8.2.2
GPU: T4

Metadata

Metadata

Assignees

Labels

Module:PerformanceGeneral performance issuestriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions