Description
I wonder whether device memory can be shared between execution contextes of different engines, those contexts will be executed one-by-one in the same stream, for example:
auto engine0 = runtime->deserializeCudaEngine(buffer0, size0, nullptr);
auto context0 = engine0->createExecutionContextWithoutDeviceMemory();
auto engine1 = runtime->deserializeCudaEngine(buffer1, size1, nullptr);
auto context1 = engine1->createExecutionContextWithoutDeviceMemory();
auto device_memory_size = std::max(engine0->getDeviceMemorySize(), engine1->getDeviceMemorySize());
void *device_memory = nullptr;
cudaMalloc(&device_memory, device_memory_size);
context0->setDeviceMemory(device_memory);
context1->setDeviceMemory(device_memory);
...
context0->enqueueV2(io_buffers0, stream, nullptr);
context1->enqueueV2(io_buffers1, stream, nullptr);
Environment
TensorRT Version: 7.1
NVIDIA GPU: Tesla T4
NVIDIA Driver Version: 530.41.03
CUDA Version: CUDA 11.0
CUDNN Version:
Operating System: Ubuntu 20.04
Python Version (if applicable): N/A
Baremetal or Container (if so, version):
Description
I wonder whether device memory can be shared between execution contextes of different engines, those contexts will be executed one-by-one in the same stream, for example:
Environment
TensorRT Version: 7.1
NVIDIA GPU: Tesla T4
NVIDIA Driver Version: 530.41.03
CUDA Version: CUDA 11.0
CUDNN Version:
Operating System: Ubuntu 20.04
Python Version (if applicable): N/A
Baremetal or Container (if so, version):