We show BERT inference performance here.

CPU

We tested the performance of TurboTransformers on three CPU hardware platforms. We choose pytorch, pytorch-jit and onnxruntime-mkldnn and TensorRT implementation as a comparison. The performance test result is the average of 150 iterations. In order to avoid the phenomenon that the data of the last iteration is cached in the cache during multiple tests, each test uses random data and refreshes the cache data after calculation.

Intel Xeon 61xx

Intel Xeon 6133 Compared to the 61xx model, Intel Xeon 6133 has a longer vectorized length of 512 bits, and it has a 30 MB shared L3 cache between cores.

GPU

We tested the performance of turbo_transformers on four GPU hardware platforms. We choose pytorch, NVIDIA Faster Transformers, onnxruntime-gpu and TensorRT implementation as a comparison. The performance test result is the average of 150 iterations.

RTX 2060

Tesla V100

Tesla P40

Tesla M40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bert.md

bert.md

CPU

GPU

Files

bert.md

Latest commit

History

bert.md

File metadata and controls

CPU

GPU