Skip to content

Commit

Permalink
Jiaruifang/update readme (#5)
Browse files Browse the repository at this point in the history
update readme add TODO
Co-authored-by: jiaruifang <jiaruifang@tencent.com>
  • Loading branch information
feifeibear committed Apr 23, 2020
1 parent eeac89b commit f2d66bc
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ Transformer is the most critical alogrithm innovation in the NLP field in recent
1. Excellent CPU / GPU performance. For Intel multi-core CPU and NVIDIA GPU hardware platforms, TurboTransformers can fully utilize all levels of computing power of the hardware. It has achieved better performance over pytorch / tensorflow and current mainstream optimization engines (such as onnxruntime-mkldnn / onnxruntime-gpu, torch JIT, NVIDIA faster transformers) on a variety of CPU and GPU hardware. See the detailed benchmark results below.
2. Tailored to the characteristics of NLP inference tasks. Unlike the CV task, the input dimensions of the NLP inference task always change. The traditional approach is zero padding or truncation to a fixed length, which introduces additional zero padding computational overhead. Besides, some frameworks such as onnxruntime, tensorRT, and torchlib need to preprocess the calculation graph according to the input size in advance, which is not suitable for NLP tasks with varying sizes. TurboTransformers can support variable-length input sequence processing without preprocessing.
3. A simpler method of use. TurboTransformers supports python and C ++ interface for calling. It can be used as an acceleration plug-in for pytorch. In the Transformer task, the end-to-end acceleration effect obtained by adding a few lines of python code.
TurboTransformers has been applied to multiple online BERT service scenarios within Tencent. For example, It brings 1.88x acceleration to the WeChat FAQ service, 2.11x acceleration to the public cloud sentiment analysis service, and 13.6x acceleration to the QQ recommendation system.

TurboTransformers has been applied to multiple online BERT service scenarios in Tencent. For example, It brings 1.88x acceleration to the WeChat FAQ service, 2.11x acceleration to the public cloud sentiment analysis service, and 13.6x acceleration to the QQ recommendation system.

The following table is a comparison of TurboTransformers and related work.

Expand Down Expand Up @@ -135,3 +136,6 @@ We choose [pytorch](https://github.com/huggingface "pytorch"), [NVIDIA Faster Tr

<img width="900" height="300" src="./images/M40-perf-0302.jpg" alt="M40性能">
<img width="900" height="300" src="./images/M40-speedup-0302.jpg" alt="M40加速">

## TODO
Currently (April 2020), we only support a interface of the BERT encoder model using FP32. In the near futuer, we will add support for other models (GPT2, decoders, etc.) and low-precision floating point (CPU int8, GPU FP16).
2 changes: 1 addition & 1 deletion tools/build_and_run_unittests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ fi

SRC_ROOT=$1
WITH_GPU=$2
BUILD_PATH=/tmp/build_cpu
BUILD_PATH=/tmp/build
bash ${SRC_ROOT}/tools/compile.sh ${SRC_ROOT} ${WITH_GPU} ${BUILD_PATH}
python3 -m pip install -r ${SRC_ROOT}/requirements.txt
cd ${BUILD_PATH}
Expand Down

0 comments on commit f2d66bc

Please sign in to comment.