Jiaruifang/update readme (#5)

update readme add TODO Co-authored-by: jiaruifang <jiaruifang@tencent.com>
Tencent · Apr 23, 2020 · f2d66bc · f2d66bc
1 parent eeac89b
commit f2d66bc
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -8,7 +8,8 @@ Transformer is the most critical alogrithm innovation in the NLP field in recent
 1. Excellent CPU / GPU performance. For Intel multi-core CPU and NVIDIA GPU hardware platforms, TurboTransformers can fully utilize all levels of computing power of the hardware. It has achieved better performance over pytorch / tensorflow and current mainstream optimization engines (such as onnxruntime-mkldnn / onnxruntime-gpu, torch JIT, NVIDIA faster transformers) on a variety of CPU and GPU hardware. See the detailed benchmark results below.
 2. Tailored to the characteristics of NLP inference tasks. Unlike the CV task, the input dimensions of the NLP inference task always change. The traditional approach is zero padding or truncation to a fixed length, which introduces additional zero padding computational overhead. Besides, some frameworks such as onnxruntime, tensorRT, and torchlib need to preprocess the calculation graph according to the input size in advance, which is not suitable for NLP tasks with varying sizes. TurboTransformers can support variable-length input sequence processing without preprocessing.
 3. A simpler method of use. TurboTransformers supports python and C ++ interface for calling. It can be used as an acceleration plug-in for pytorch. In the Transformer task, the end-to-end acceleration effect obtained by adding a few lines of python code.
-TurboTransformers has been applied to multiple online BERT service scenarios within Tencent. For example, It brings 1.88x acceleration to the WeChat FAQ service, 2.11x acceleration to the public cloud sentiment analysis service, and 13.6x acceleration to the QQ recommendation system.
+
+TurboTransformers has been applied to multiple online BERT service scenarios in Tencent. For example, It brings 1.88x acceleration to the WeChat FAQ service, 2.11x acceleration to the public cloud sentiment analysis service, and 13.6x acceleration to the QQ recommendation system.
 
 The following table is a comparison of TurboTransformers and related work.
 
@@ -135,3 +136,6 @@ We choose [pytorch](https://github.com/huggingface "pytorch"), [NVIDIA Faster Tr
 
 <img width="900" height="300" src="./images/M40-perf-0302.jpg" alt="M40性能">
 <img width="900" height="300" src="./images/M40-speedup-0302.jpg" alt="M40加速">
+
+## TODO
+Currently (April 2020), we only support a interface of the BERT encoder model using FP32. In the near futuer, we will add support for other models (GPT2, decoders, etc.) and low-precision floating point (CPU int8, GPU FP16).
diff --git a/tools/build_and_run_unittests.sh b/tools/build_and_run_unittests.sh
@@ -21,7 +21,7 @@ fi
 
 SRC_ROOT=$1
 WITH_GPU=$2
-BUILD_PATH=/tmp/build_cpu
+BUILD_PATH=/tmp/build
 bash ${SRC_ROOT}/tools/compile.sh ${SRC_ROOT} ${WITH_GPU} ${BUILD_PATH}
 python3 -m pip install -r ${SRC_ROOT}/requirements.txt
 cd ${BUILD_PATH}