Summary
I am unable to load a .trt engine using tensorrt_lean runtime. The engine fails during deserialization with a ReformatRunner error, even though the setup is inference-only and uses a CUDA runtime container.
Environment
Docker base image: nvidia/cuda:13.1.2-cudnn-runtime-ubuntu24.04
TensorRT version: 10.16.1.11 (tensorrt_lean)
Python: 3.12
GPU: NVIDIA GPU (CUDA enabled container)
Engine format: .trt (prebuilt outside container)
Installed packages
tensorrt_lean
numpy
Issue description
When trying to deserialize a TensorRT engine using:
import tensorrt_lean as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
runtime = trt.Runtime(TRT_LOGGER)
with open("trt_weights_f16/mos.trt", "rb") as f:
engine = runtime.deserialize_cuda_engine(f.read())
The following error occurs:
[TRT] [E] IRuntime::deserializeCudaEngine: Error Code 1: Internal Error
Unexpected call to stub loadRunner for ReformatRunner
As a result:
engine is None
and inference cannot proceed.
Expected behavior
The engine should deserialize successfully and allow inference using:
context = engine.create_execution_context()
context.execute_async_v3(...)
Actual behavior
Engine fails during deserialization
ReformatRunner stub error is triggered
engine == None
No inference possible
What I tried
Switching to inference-only CUDA runtime image
Using tensorrt_lean instead of full TensorRT
Minimal Python inference script (no PyCUDA, no training dependencies)
Verifying engine file path and loading logic
Key observation
The engine was built using a full TensorRT environment, and fails when loaded with tensorrt_lean.
It seems that tensorrt_lean does not support certain internal runners (e.g., ReformatRunner) required by the engine.
Question
Is there a compatibility requirement between:
TensorRT engine build environment
TensorRT Lean runtime
Specifically:
Are engines built with full TensorRT incompatible with Lean runtime?
Is there a required “Lean-compatible engine export” workflow?
Additional context
This setup is intended for inference-only deployment, and the goal was to use a minimal runtime container without full TensorRT SDK.
Summary
I am unable to load a .trt engine using tensorrt_lean runtime. The engine fails during deserialization with a ReformatRunner error, even though the setup is inference-only and uses a CUDA runtime container.
Environment
Docker base image: nvidia/cuda:13.1.2-cudnn-runtime-ubuntu24.04
TensorRT version: 10.16.1.11 (tensorrt_lean)
Python: 3.12
GPU: NVIDIA GPU (CUDA enabled container)
Engine format: .trt (prebuilt outside container)
Installed packages
tensorrt_lean
numpy
Issue description
When trying to deserialize a TensorRT engine using:
import tensorrt_lean as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
runtime = trt.Runtime(TRT_LOGGER)
with open("trt_weights_f16/mos.trt", "rb") as f:
engine = runtime.deserialize_cuda_engine(f.read())
The following error occurs:
[TRT] [E] IRuntime::deserializeCudaEngine: Error Code 1: Internal Error
Unexpected call to stub loadRunner for ReformatRunner
As a result:
engine is None
and inference cannot proceed.
Expected behavior
The engine should deserialize successfully and allow inference using:
context = engine.create_execution_context()
context.execute_async_v3(...)
Actual behavior
Engine fails during deserialization
ReformatRunner stub error is triggered
engine == None
No inference possible
What I tried
Switching to inference-only CUDA runtime image
Using tensorrt_lean instead of full TensorRT
Minimal Python inference script (no PyCUDA, no training dependencies)
Verifying engine file path and loading logic
Key observation
The engine was built using a full TensorRT environment, and fails when loaded with tensorrt_lean.
It seems that tensorrt_lean does not support certain internal runners (e.g., ReformatRunner) required by the engine.
Question
Is there a compatibility requirement between:
TensorRT engine build environment
TensorRT Lean runtime
Specifically:
Are engines built with full TensorRT incompatible with Lean runtime?
Is there a required “Lean-compatible engine export” workflow?
Additional context
This setup is intended for inference-only deployment, and the goal was to use a minimal runtime container without full TensorRT SDK.