Description
We have a multi model environment, they infer on same gpu concurrently but have different priority
For model without myelin, this priority can be specific by enqueueV2 with a pre-created stream
TensorRT then run ops on this stream, with priority we want
But for model with myelin, ops may also run on internal streams managed by myelin
these internal streams seem create with default priority, and seems no public API can change that
So, do we have any method to help priority works for model with myelin?
or can we disable myelin concurrency support, to run ops all on the pre-created stream?
Tasks
Environment
TensorRT Version: 8.4.3
NVIDIA GPU: A10
NVIDIA Driver Version: 470.82.01
CUDA Version: 11.4
Operating System: Ubuntu 20.04
Relevant Files
Steps To Reproduce
Description
We have a multi model environment, they infer on same gpu concurrently but have different priority
For model without myelin, this priority can be specific by enqueueV2 with a pre-created stream
TensorRT then run ops on this stream, with priority we want
But for model with myelin, ops may also run on internal streams managed by myelin
these internal streams seem create with default priority, and seems no public API can change that
So, do we have any method to help priority works for model with myelin?
or can we disable myelin concurrency support, to run ops all on the pre-created stream?
Tasks
Environment
TensorRT Version: 8.4.3
NVIDIA GPU: A10
NVIDIA Driver Version: 470.82.01
CUDA Version: 11.4
Operating System: Ubuntu 20.04
Relevant Files
Steps To Reproduce