You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I rewrite the demo of Huggingface t5 to bart, And I inference with method of 'generate' ,and paramters with top_p sampling and return_num_sequences=10 #1756
like the screenshot, when return_num_sequences=1, the speed of tensorrt engine is faster 2x than pytorch model. But when i increase 'return_num_sequences', trt engine is gradually slower than pytoch.
when return_num_sequences=10 , trt engine is obviously slower than pytorch model.
Who konws about it, and how to resolve it?
And the batch_size of trt bart_decoder equal to return_num_sequences when export to trt bart decoder