Description
TensorRT appears to overflow for ONNX ReduceLogSumExp on large but finite float32 inputs.
ONNX Runtime returns a finite result, while TensorRT returns inf for the same model and input. This suggests TensorRT may be computing log(sum(exp(x))) directly without a numerically stable max-subtraction implementation.
This appears to be a TensorRT numerical stability issue for ONNX ReduceLogSumExp.
Environment
TensorRT Version: 10.16.1.11
NVIDIA GPU: N/A / not detected by nvidia-smi
NVIDIA Driver Version: N/A / nvidia-smi failed
CUDA Version: N/A / nvcc not found
CUDNN Version: N/A / torch.backends.cudnn.version() returned None
Operating System: Linux 6.17.0-20-generic x86_64, glibc 2.39
Python Version (if applicable): Python 3.11.15
Tensorflow Version (if applicable): N/A
PyTorch Version (if applicable): N/A
Baremetal or Container (if so, version): Baremetal / non-Docker environment (/proc/1/cgroup: 0::/init.scope)
Additional package versions:
ONNX Version: 1.21.0
ONNX Runtime Version: 1.25.1
Relevant Files
Model link: N/A
The ONNX model is generated inline by the minimal reproducible script below.
Steps To Reproduce
Commands or scripts:
import numpy as np
import onnx
import onnxruntime as ort
from onnx import helper, TensorProto
from _trt_helper import build_engine_from_onnx, run_engine
n = helper.make_node("ReduceLogSumExp", ["x"], ["y"], keepdims=0)
g = helper.make_graph(
[n],
"g",
[helper.make_tensor_value_info("x", TensorProto.FLOAT, [4])],
[helper.make_tensor_value_info("y", TensorProto.FLOAT, [])],
)
m = helper.make_model(g, opset_imports=[helper.make_opsetid("", 18)])
m.ir_version = 10
ob = m.SerializeToString()
x = np.array([250.0, 248.0, 255.0, 251.0], dtype=np.float32)
ort_y = float(
ort.InferenceSession(
ob,
providers=["CPUExecutionProvider"],
).run(["y"], {"x": x})[0]
)
eng, _ = build_engine_from_onnx(ob)
trt_y = float(
run_engine(
eng,
{"x": x},
["y"],
[()],
[np.float32],
)["y"]
)
print("ORT:", ort_y)
print("TRT:", trt_y)
assert np.isfinite(ort_y) and not np.isfinite(trt_y)
Have you tried the latest release?: Yes, reproduced with TensorRT 10.16.1.11.
Attach the captured .json and .bin files from TensorRT's API Capture tool if you're on an x86_64 Unix system Not attached. The issue is reproducible from the self-contained Python script above.
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):
Yes. ONNX Runtime runs the same model and returns a finite result.
Actual output:
ORT: 255.025634765625
TRT: inf
TensorRT returns inf even though the mathematically expected result is finite.
Description
TensorRT appears to overflow for ONNX
ReduceLogSumExpon large but finite float32 inputs.ONNX Runtime returns a finite result, while TensorRT returns
inffor the same model and input. This suggests TensorRT may be computinglog(sum(exp(x)))directly without a numerically stable max-subtraction implementation.This appears to be a TensorRT numerical stability issue for ONNX
ReduceLogSumExp.Environment
TensorRT Version: 10.16.1.11
NVIDIA GPU: N/A / not detected by
nvidia-smiNVIDIA Driver Version: N/A /
nvidia-smifailedCUDA Version: N/A /
nvccnot foundCUDNN Version: N/A /
torch.backends.cudnn.version()returnedNoneOperating System: Linux 6.17.0-20-generic x86_64, glibc 2.39
Python Version (if applicable): Python 3.11.15
Tensorflow Version (if applicable): N/A
PyTorch Version (if applicable): N/A
Baremetal or Container (if so, version): Baremetal / non-Docker environment (
/proc/1/cgroup:0::/init.scope)Additional package versions:
ONNX Version: 1.21.0
ONNX Runtime Version: 1.25.1
Relevant Files
Model link: N/A
The ONNX model is generated inline by the minimal reproducible script below.
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?: Yes, reproduced with TensorRT 10.16.1.11.
Attach the captured .json and .bin files from TensorRT's API Capture tool if you're on an x86_64 Unix system Not attached. The issue is reproducible from the self-contained Python script above.
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt): For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):Yes. ONNX Runtime runs the same model and returns a finite result.
Actual output:
TensorRT returns inf even though the mathematically expected result is finite.