Description
I have two GPUs:
- GPU 0: NVIDIA GeForce RTX 3060 (UUID: GPU-aafef73b-feb4-ec58-f3f9-326cb9c7b353)
- GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-c0370922-f0e4-5f2b-5f7a-16ae5ab03013)
I have used trtexec to build the corresponding engines yolo11n-3060.engine and yolo11n-2080.engine on different devices. The build commands are as follows:
trtexec --onnx=yolo11n.onnx --saveEngine=yolo11n-3060.engine --device=0
trtexec --onnx=yolo11n.onnx --saveEngine=yolo11n-2080.engine --device=1
When I deserialize and create the execution context for either yolo11n-3060.engine or yolo11n-2080.engine individually, everything works fine. However, when I attempt to deserialize and create the execution context for both engines simultaneously (e.g., first deserialize and create the context for yolo11n-3060.engine, and then do the same for yolo11n-2080.engine), the following error occurs:
ERROR: ICudaEngine::createExecutionContext: Error Code 1: Myelin ([::0] Compiled assuming that device 0 was SM 75, but device 0 is SM 86.)
Error: Failed to create execution context.
If I call cudaDeviceReset() before deserializing and creating the context for yolo11n-2080.engine, it does create the context successfully, but this affects the previously allocated memory and TensorRT objects.
Is it necessary for the SM versions to be consistent across GPUs in order to create execution contexts simultaneously for TensorRT engines in a multi-GPU environment?
Environment
TensorRT Version: v10.6.0.26
NVIDIA GPU: NVIDIA GeForce RTX 3060 and NVIDIA GeForce RTX 2080 Ti
NVIDIA Driver Version: v572.83
CUDA Version: v12.6
CUDNN Version: v8.9.7.29
Operating System: Windows 10 Business LTSC
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):
Description
I have two GPUs:
I have used
trtexecto build the corresponding enginesyolo11n-3060.engineandyolo11n-2080.engineon different devices. The build commands are as follows:When I deserialize and create the execution context for either
yolo11n-3060.engineoryolo11n-2080.engineindividually, everything works fine. However, when I attempt to deserialize and create the execution context for both engines simultaneously (e.g., first deserialize and create the context foryolo11n-3060.engine, and then do the same foryolo11n-2080.engine), the following error occurs:If I call
cudaDeviceReset()before deserializing and creating the context foryolo11n-2080.engine, it does create the context successfully, but this affects the previously allocated memory and TensorRT objects.Is it necessary for the SM versions to be consistent across GPUs in order to create execution contexts simultaneously for TensorRT engines in a multi-GPU environment?
Environment
TensorRT Version: v10.6.0.26
NVIDIA GPU: NVIDIA GeForce RTX 3060 and NVIDIA GeForce RTX 2080 Ti
NVIDIA Driver Version: v572.83
CUDA Version: v12.6
CUDNN Version: v8.9.7.29
Operating System: Windows 10 Business LTSC
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt):