Enhance arm int8 #486

seanxcwang · 2020-10-20T09:57:05Z

int8 dw conv增加了channel非8的倍数处理
针对channel比较小/hw较大的case，增加了8x8和4x8 gemm kernel
添加conv factory来消除select conv impl的重复代码

powerpwang · 2020-10-20T11:07:00Z

source/tnn/device/arm/acc/convolution/arm_conv_layer_acc_factory.cc

+    }
+}
+
+void ArmConvLayerAccFactory::CreateImpFP(const std::vector<Blob *> &inputs, const std::vector<Blob *> &outputs,


CreateImpFP should be CreateImpHalf?

powerpwang · 2020-10-20T11:08:39Z

source/tnn/device/arm/acc/convolution/arm_conv_layer_acc_factory.h

+
+}  // namespace TNN_NS
+
+#endif  // TNN_SOURCE_TNN_DEVICE_ARM_ARM_CONV_LAYER_GROUP_H_


codecov-io · 2020-10-22T12:35:05Z

Codecov Report

Merging #486 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #486   +/-   ##
=======================================
  Coverage   24.34%   24.34%           
=======================================
  Files         287      287           
  Lines        8979     8979           
=======================================
  Hits         2186     2186           
  Misses       6793     6793

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d70e594...c5e016c. Read the comment docs.

[ARM][OPT] 1. add dw tail process 2. add qgemm asm kernel(big hw and small c) 3. add conv impl factory Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* minor update * Hotfix linux compile (#446) * [CONVERTER][BUG]1. fix complie failed on centos (gcc 4.9); * fix codecc warning (#441) fix codecc warnning Co-authored-by: lnmdlong <lnmdlong@hotmail.com> Co-authored-by: devandong <devandong@tencent.com> Co-authored-by: quinnrong94 <quinnrong@tencent.com> * interpret func use reference param (#451) * interpret func use reference param * ncnn interpret use reference param * fix enum value error (#454) * add missing letter (#455) Co-authored-by: nihui <shuizhuyuanluo@126.com> * Stable v0.2 merge master (#419) * [EXAMPLES][PATCH] add face align demo and refactor for some case * [EXAMPLES][FIX] fix align opencl error * [EXAMPLES][FIX] fix arm linux demo * [EXAMPLE][FIX] fix android preview size error * [UPD]update youtu face alignment mean pts logic (#385) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [ARM][BUG] fix sqrt layer with zero input (#392) * pull tflite2tnn tools (#378) * add tflite2tnn tools Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD]move blaze anchor file to resource; fix blazeface error; (#390) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [OPENCL]support google pixel phone opencl mode (#399) Co-authored-by: janchen <janchen@tencent.com> Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [UPD] update readme (#404) * [UPD] update readme * [UPD] fix newline in README_en.md Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [OPENCL][BUG] fix gflops calculate bug in conv (#412) * [OPENCL][BUG] fix gflops calculate bug in conv * [OPENCL][FIX] fix deconv calculate flops Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: neiltian <neiltian@tencent.com> * Hotfix issue 400 (#410) * [CPU] fix bfp16 blob converter * [CPU] fix cpu device allocate * [CPU] skip blob converter to yuv mat Co-authored-by: lucasktian <lucasktian@tencent.com> * [ARM][BUG] fix armv7 gemm_float_n4 error Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: stephehuang <69882565+stephehuang@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: Bbean <j850447553@icloud.com> Co-authored-by: janchen <janchen@tencent.com> Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com> Co-authored-by: devandong <67893313+devandong@users.noreply.github.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> * x86 demo minor change * x86 demo add resize * add null check for MatUtils * [RKNPU][CHG] add leakly relu convert (#465) * [NPU][ADD] add Huawei NPU profiling support issue, #463 * [OPENCL][BUG] fix profiling summary incorrect when loop count > 1 * add webcam based demo * add null check for MatUtils (#466) * [Fix] Fix pad layer inconsistent problem * [X86][OPENVINO] increase x86 unary layer operator * add x86 demo: blaze face detector & aligner * x86 demo change to UltraFaceDetecotr * Update reshape 's conversion in TFLite (#469) Update reshape 's conversion in TFLite Co-authored-by: lucasktian <lucasktian@tencent.com> * x86 demo msvc ok * fix cmake versioning & macos build scripts * [X86][OPENVINO] Add Binary Op Frame * [COMPILE][FIX] fix gnustl_static compile error and warning * fix xcode compile error * fix xcode build errors * [FIX] fix sdk sample build error * [NPU][CHG] refactor cpu blob converter seperate NPU blob converter from cpu blob converter * [X86] Increase x86 convolution layer (im2col) * [EXAMPLE][BUG] fix cls id error * [X86] add pooling layer (max & average) * [OPENCL][BUG] fix fp16 overflow risk in avg pooling issue, #480 * [OPENCL][CHG] add more error info * [X86] Add batch and scale layer * [X86] add reduce op layer * fix base layer_builder Init * build metal on macos * [X86][OPENVINO] add splitv layer for openvino * fix display of README_EN (#484) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [x86] add all reduce layer operations * Opencl reduce softmax opt (#443) * [OPENCL][BUG] skip NNV21/NNV12 blob converter test case, not supported for now * [OPENCL][OPT] optimize reduce perf with fine-grained parallelism when parallelism is low, intensity is high * [OPENCL][BUG] fix workgroup size init * [OPENCL][BUG] fix work group size init, ensure size to be power of 2 * [OPENCL][OPT] optimize softmax perf with fine-grained parallelism when parallelism is low, intensity is high * [OPENCL] refine code for pull request * update opencl program * [OPENCL][FIX] use fp16 for local memory when enable && fix global work items filter && set threshold based on experiments Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: neiltian <neiltian@tencent.com> * Feature fp16 workflow (#482) * [OPT] rename int8 reformat; change SupportDevice to IsSupported * [OPT] add GetEnabledPrecision in abstract device; implement RegisterLayerPrecision in arm device * [OPT] update global_device_map only once * [OPT] set fp16 blob in network initlayers * [OPT] support fp16 reformat in net_optimizer * [OPT] get cpu fp16 capability; refactor update blob precision * [FIX] update fp16 blob with cpu support * [FIX] fix typo * [CHG] only update precision for cpu; rename to ImplementedPrecision * Npu fp16 fix (#488) * [NPU][UPD] add test android * [NPU][UPD] add fp16 * [NPU][UPD] modify build test script * [NPU][UPD]add permute op Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com> * [FIX] fix inner_product_layer_builder error * [FIX][X86][OPENVINO] fix openvino deconvolution shape unaligned * [NPU][BUG] fix comiple error due to api change * Enhance arm int8 (#486) [ARM][OPT] 1. add dw tail process 2. add qgemm asm kernel(big hw and small c) 3. add conv impl factory Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Feature mat make border (#491) * [CHG] enhance mat converter param check * [CPU] implement cpu copy make border * [TEST] add copy make border unit test * [ARM] support copy make border * [METAL] support copy make border * [CHG] reset dst mat only when its data is nullptr * [OPENCL][ADD] support copy make border * [ARM] optimize mat copy make border * [Metal] disable interpolation-related unit_tests on Metal Co-authored-by: devandong <devandong@tencent.com> Co-authored-by: lnmdlong <lnmdlong@hotmail.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [OPENCL] fix chinese comments (#493) * [OPENCL] fix chinese comments * [DEVICE][OPENCL] change comment Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: neiltian <neiltian@tencent.com> * [X86] Add HardSwish layer acc * [X86] Add Optimized HardSwish Layer ACC, fix custom_implmentation issue * Enhance warpaffine nearest (#501) * [CPU] support nearest warpaffine * [CPU] fix nearest choose error * [ARM] support nearest warpaffine * [Metal] support nearest warpaffine * [Metal] fix bilinear warpaffine border access error * [ARM] optmize channel equals 4 * [OPENCL] support nearest warpaffine Co-authored-by: devandong <devandong@tencent.com> Co-authored-by: lnmdlong <lnmdlong@hotmail.com> * [ONNX][BUG] fix pool fusion bug (#500) fix pool fusion bug * [DEV][UPD] 1. Int8Reformat -> Reformat; * [X86] add concat layer * [X86] resolve conflicts * [X86][OPENVINO] fix splitv layer builder Co-authored-by: Dandiding <Dandiding@tencent.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> Co-authored-by: lnmdlong <lnmdlong@hotmail.com> Co-authored-by: devandong <devandong@tencent.com> Co-authored-by: quinnrong94 <quinnrong@tencent.com> Co-authored-by: 103yiran <1039105206@qq.com> Co-authored-by: nihui <shuizhuyuanluo@126.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: stephehuang <69882565+stephehuang@users.noreply.github.com> Co-authored-by: Bbean <j850447553@icloud.com> Co-authored-by: janchen <janchen@tencent.com> Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com> Co-authored-by: devandong <67893313+devandong@users.noreply.github.com> Co-authored-by: shaundai <shaundai@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: seanxcwang <seanxcwang@tencent.com> Co-authored-by: neiltian <neiltian@tencent.com> Co-authored-by: yeli <32798887+yl16417@users.noreply.github.com>

[ARM][OPT] 1. add dw tail process 2. add qgemm asm kernel(big hw and small c) 3. add conv impl factory Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

seanxcwang added 7 commits October 13, 2020 15:27

[ARM][OPT] int8 dw conv add tail process

63d14ef

tmp add conv int8 1x1

82ba4a8

add conv int8 1x1 naive impl

ee4b01d

add gemm int8 8x8 intrinsic impl

d2abb38

fix compile error, crc ok

7708a3c

fix int8 dw conv error

eb4aec5

add conv layer acc factory

985646e

seanxcwang requested review from quinnrong94 and powerpwang October 20, 2020 09:57

fix typo

760fb86

powerpwang reviewed Oct 20, 2020

View reviewed changes

seanxcwang and others added 4 commits October 22, 2020 11:21

[TMP] add asm gemm int 8x8

7d23f99

gemm int8 8x8 add arm64 asm impl

8edb99b

Merge branch 'master' into enhance-arm-int8

e15cef8

fuse relu into asm kernel, less instruction dependency

286d982

seanxcwang and others added 3 commits October 23, 2020 18:03

add arm32 gemm int8 4x8

ba67d70

fix arm32 compile error

c5d5895

Merge branch 'master' into enhance-arm-int8

7ae4b35

powerpwang previously approved these changes Oct 26, 2020

View reviewed changes

quinnrong94 previously approved these changes Oct 26, 2020

View reviewed changes

seanxcwang and others added 2 commits October 26, 2020 20:32

Merge branch 'master' into enhance-arm-int8

a10386d

delete unused instruction

c5e016c

seanxcwang dismissed stale reviews from quinnrong94 and powerpwang via c5e016c October 26, 2020 15:00

quinnrong94 approved these changes Oct 27, 2020

View reviewed changes

seanxcwang merged commit f452c41 into master Oct 27, 2020

seanxcwang deleted the enhance-arm-int8 branch October 27, 2020 02:51

gttiankai pushed a commit that referenced this pull request Nov 2, 2020

Enhance arm int8 (#486)

d1abd77

[ARM][OPT] 1. add dw tail process 2. add qgemm asm kernel(big hw and small c) 3. add conv impl factory Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

gttiankai pushed a commit that referenced this pull request Nov 9, 2020

Enhance arm int8 (#486)

c9f3a58

[ARM][OPT] 1. add dw tail process 2. add qgemm asm kernel(big hw and small c) 3. add conv impl factory Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance arm int8 #486

Enhance arm int8 #486

seanxcwang commented Oct 20, 2020 •

edited

Loading

powerpwang Oct 20, 2020

powerpwang Oct 20, 2020

codecov-io commented Oct 22, 2020 •

edited

Loading


		} // namespace TNN_NS

		#endif // TNN_SOURCE_TNN_DEVICE_ARM_ARM_CONV_LAYER_GROUP_H_

Enhance arm int8 #486

Enhance arm int8 #486

Conversation

seanxcwang commented Oct 20, 2020 • edited Loading

powerpwang Oct 20, 2020

Choose a reason for hiding this comment

powerpwang Oct 20, 2020

Choose a reason for hiding this comment

codecov-io commented Oct 22, 2020 • edited Loading

Codecov Report

seanxcwang commented Oct 20, 2020 •

edited

Loading

codecov-io commented Oct 22, 2020 •

edited

Loading