Skip to content

Latest commit

 

History

History
38 lines (26 loc) · 2.36 KB

bert.md

File metadata and controls

38 lines (26 loc) · 2.36 KB

We show BERT inference performance here.

CPU

We tested the performance of TurboTransformers on three CPU hardware platforms. We choose pytorch, pytorch-jit and onnxruntime-mkldnn and TensorRT implementation as a comparison. The performance test result is the average of 150 iterations. In order to avoid the phenomenon that the data of the last iteration is cached in the cache during multiple tests, each test uses random data and refreshes the cache data after calculation.

  • Intel Xeon 61xx

61xx性能

61xx加速

  • Intel Xeon 6133 Compared to the 61xx model, Intel Xeon 6133 has a longer vectorized length of 512 bits, and it has a 30 MB shared L3 cache between cores.

6133性能

6133加速

GPU

We tested the performance of turbo_transformers on four GPU hardware platforms. We choose pytorch, NVIDIA Faster Transformers, onnxruntime-gpu and TensorRT implementation as a comparison. The performance test result is the average of 150 iterations.

  • RTX 2060

2060性能

2060加速

  • Tesla V100

V100性能

V100加速

  • Tesla P40

P40性能

P40加速

  • Tesla M40

M40性能

M40加速