Skip to content

Latest commit

 

History

History
23 lines (15 loc) · 845 Bytes

tensorrt.rst

File metadata and controls

23 lines (15 loc) · 845 Bytes

TensorRT

NVIDIA TensorRT is a platform for high-performance deep learning inference on GPU device.

Quantization Scheme

8bit per-channel symmetric linear quantization.

\begin{equation}
    q = \mathtt{clamp}(\lfloor x * s \rceil, lb, ub)
\end{equation}

where s is scaling factor to quantize a number from floating range to integer range, lb and ub are bounds of integer range. For weights, [lb, ub] = [-127, 127]. For activations, [lb, ub] = [-128, 127].

For weights, each filter needs an independent scale s.

In fact, when building the TensorRT engine, the official tool requires the clipping value as quantization parameters, which can be calculated by c = s * 127.