NVIDIA TensorRT is a platform for high-performance deep learning inference on GPU device.
8bit per-channel symmetric linear quantization.
\begin{equation} q = \mathtt{clamp}(\lfloor x * s \rceil, lb, ub) \end{equation}
where s is scaling factor to quantize a number from floating range to integer range, lb and ub are bounds of integer range. For weights, [lb, ub] = [-127, 127]. For activations, [lb, ub] = [-128, 127].
For weights, each filter needs an independent scale s.
In fact, when building the TensorRT engine, the official tool requires the clipping value as quantization parameters, which can be calculated by c = s * 127.