TensorRT

NVIDIA TensorRT is a platform for high-performance deep learning inference on GPU device.

Quantization Scheme

8bit per-channel symmetric linear quantization.

\begin{equation}
    q = \mathtt{clamp}(\lfloor x * s \rceil, lb, ub)
\end{equation}

where s is scaling factor to quantize a number from floating range to integer range, lb and ub are bounds of integer range. For weights, [lb, ub] = [-127, 127]. For activations, [lb, ub] = [-128, 127].

For weights, each filter needs an independent scale s.

In fact, when building the TensorRT engine, the official tool requires the clipping value as quantization parameters, which can be calculated by c = s * 127.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorrt.rst

tensorrt.rst

TensorRT

Quantization Scheme

Files

tensorrt.rst

Latest commit

History

tensorrt.rst

File metadata and controls

TensorRT

Quantization Scheme