Skip to content

Int8 mode is slower than fp16 #993

@ye1024

Description

@ye1024

Hi,
I took out the token embedding layer in Bert and built tensorrt engine to test the inference effect of int8 mode, but found that int8 mode is slower than fp16;
i use nvprof to view the GPU consumption of the two modes, as follows:

fp16:
GPU activities: 99.87% 22.158ms 6 3.6930ms 1.7280us 22.148ms [CUDA memcpy HtoD]
0.06% 13.376us 8 1.6720us 1.6000us 1.9520us [CUDA memset]
0.05% 10.688us 1 10.688us 10.688us 10.688us void cuGatherLayer::gatherGeneric<float, int=32>(void*, cuGatherLayer::StrideArray, cuGatherLayer::gatherGeneric<float, int=32>, void*, int*, void*, cuGatherLayer::ShapeArray, int*, int*, int, cuGatherLayer::ReducedDivisorArray, int, int, int, int, cuGatherLayer::CoefficientData, cuGatherLayer::CoefficientIndices)
0.02% 4.1600us 1 4.1600us 4.1600us 4.1600us [CUDA memcpy DtoH]
0.01% 1.6320us 1 1.6320us 1.6320us 1.6320us [CUDA memcpy DtoD]

int8:
GPU activities: 99.84% 20.210ms 6 3.3683ms 1.6950us 20.201ms [CUDA memcpy HtoD]
0.07% 13.536us 8 1.6920us 1.6000us 1.9840us [CUDA memset]
0.07% 13.311us 1 13.311us 13.311us 13.311us void cuGatherLayer::gatherAxisZeroPartition<float, int=64, int=256>(void*, cuGatherLayer::StrideArray, cuGatherLayer::gatherAxisZeroPartition<float, int=64, int=256>, void*, int*, void*, cuGatherLayer::ShapeArray, int*, int*, int, cuGatherLayer::ReducedDivisorArray, cuGatherLayer::ShapeArray, cuGatherLayer::ShapeArray, int, int, int, int, int, int, nvinfer1::rt::reduced_divisor)
0.02% 3.7120us 1 3.7120us 3.7120us 3.7120us [CUDA memcpy DtoH]
0.01% 1.7280us 1 1.7280us 1.7280us 1.7280us [CUDA memcpy DtoD]

I want to know if there is something wrong with int8 quantization.
Thanks!

TensorRT Version: 6.0.1.5
GPU Type: V100
Nvidia Driver Version: 418.39
CUDA Version: 10.1
Operating System: ubuntu18.04

Metadata

Metadata

Assignees

No one assigned

    Labels

    triagedIssue has been triaged by maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions