-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance arm int8 #486
Enhance arm int8 #486
Conversation
seanxcwang
commented
Oct 20, 2020
•
edited
Loading
edited
- int8 dw conv增加了channel非8的倍数处理
- 针对channel比较小/hw较大的case,增加了8x8和4x8 gemm kernel
- 添加conv factory来消除select conv impl的重复代码
} | ||
} | ||
|
||
void ArmConvLayerAccFactory::CreateImpFP(const std::vector<Blob *> &inputs, const std::vector<Blob *> &outputs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CreateImpFP should be CreateImpHalf?
|
||
} // namespace TNN_NS | ||
|
||
#endif // TNN_SOURCE_TNN_DEVICE_ARM_ARM_CONV_LAYER_GROUP_H_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
Codecov Report
@@ Coverage Diff @@
## master #486 +/- ##
=======================================
Coverage 24.34% 24.34%
=======================================
Files 287 287
Lines 8979 8979
=======================================
Hits 2186 2186
Misses 6793 6793 Continue to review full report at Codecov.
|
c5e016c
[ARM][OPT] 1. add dw tail process 2. add qgemm asm kernel(big hw and small c) 3. add conv impl factory Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
* minor update * Hotfix linux compile (#446) * [CONVERTER][BUG]1. fix complie failed on centos (gcc 4.9); * fix codecc warning (#441) fix codecc warnning Co-authored-by: lnmdlong <lnmdlong@hotmail.com> Co-authored-by: devandong <devandong@tencent.com> Co-authored-by: quinnrong94 <quinnrong@tencent.com> * interpret func use reference param (#451) * interpret func use reference param * ncnn interpret use reference param * fix enum value error (#454) * add missing letter (#455) Co-authored-by: nihui <shuizhuyuanluo@126.com> * Stable v0.2 merge master (#419) * [EXAMPLES][PATCH] add face align demo and refactor for some case * [EXAMPLES][FIX] fix align opencl error * [EXAMPLES][FIX] fix arm linux demo * [EXAMPLE][FIX] fix android preview size error * [UPD]update youtu face alignment mean pts logic (#385) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [ARM][BUG] fix sqrt layer with zero input (#392) * pull tflite2tnn tools (#378) * add tflite2tnn tools Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD]move blaze anchor file to resource; fix blazeface error; (#390) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [OPENCL]support google pixel phone opencl mode (#399) Co-authored-by: janchen <janchen@tencent.com> Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [UPD] update readme (#404) * [UPD] update readme * [UPD] fix newline in README_en.md Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [OPENCL][BUG] fix gflops calculate bug in conv (#412) * [OPENCL][BUG] fix gflops calculate bug in conv * [OPENCL][FIX] fix deconv calculate flops Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: neiltian <neiltian@tencent.com> * Hotfix issue 400 (#410) * [CPU] fix bfp16 blob converter * [CPU] fix cpu device allocate * [CPU] skip blob converter to yuv mat Co-authored-by: lucasktian <lucasktian@tencent.com> * [ARM][BUG] fix armv7 gemm_float_n4 error Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: stephehuang <69882565+stephehuang@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: Bbean <j850447553@icloud.com> Co-authored-by: janchen <janchen@tencent.com> Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com> Co-authored-by: devandong <67893313+devandong@users.noreply.github.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> * x86 demo minor change * x86 demo add resize * add null check for MatUtils * [RKNPU][CHG] add leakly relu convert (#465) * [NPU][ADD] add Huawei NPU profiling support issue, #463 * [OPENCL][BUG] fix profiling summary incorrect when loop count > 1 * add webcam based demo * add null check for MatUtils (#466) * [Fix] Fix pad layer inconsistent problem * [X86][OPENVINO] increase x86 unary layer operator * add x86 demo: blaze face detector & aligner * x86 demo change to UltraFaceDetecotr * Update reshape 's conversion in TFLite (#469) Update reshape 's conversion in TFLite Co-authored-by: lucasktian <lucasktian@tencent.com> * x86 demo msvc ok * fix cmake versioning & macos build scripts * [X86][OPENVINO] Add Binary Op Frame * [COMPILE][FIX] fix gnustl_static compile error and warning * fix xcode compile error * fix xcode build errors * [FIX] fix sdk sample build error * [NPU][CHG] refactor cpu blob converter seperate NPU blob converter from cpu blob converter * [X86] Increase x86 convolution layer (im2col) * [EXAMPLE][BUG] fix cls id error * [X86] add pooling layer (max & average) * [OPENCL][BUG] fix fp16 overflow risk in avg pooling issue, #480 * [OPENCL][CHG] add more error info * [X86] Add batch and scale layer * [X86] add reduce op layer * fix base layer_builder Init * build metal on macos * [X86][OPENVINO] add splitv layer for openvino * fix display of README_EN (#484) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [x86] add all reduce layer operations * Opencl reduce softmax opt (#443) * [OPENCL][BUG] skip NNV21/NNV12 blob converter test case, not supported for now * [OPENCL][OPT] optimize reduce perf with fine-grained parallelism when parallelism is low, intensity is high * [OPENCL][BUG] fix workgroup size init * [OPENCL][BUG] fix work group size init, ensure size to be power of 2 * [OPENCL][OPT] optimize softmax perf with fine-grained parallelism when parallelism is low, intensity is high * [OPENCL] refine code for pull request * update opencl program * [OPENCL][FIX] use fp16 for local memory when enable && fix global work items filter && set threshold based on experiments Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: neiltian <neiltian@tencent.com> * Feature fp16 workflow (#482) * [OPT] rename int8 reformat; change SupportDevice to IsSupported * [OPT] add GetEnabledPrecision in abstract device; implement RegisterLayerPrecision in arm device * [OPT] update global_device_map only once * [OPT] set fp16 blob in network initlayers * [OPT] support fp16 reformat in net_optimizer * [OPT] get cpu fp16 capability; refactor update blob precision * [FIX] update fp16 blob with cpu support * [FIX] fix typo * [CHG] only update precision for cpu; rename to ImplementedPrecision * Npu fp16 fix (#488) * [NPU][UPD] add test android * [NPU][UPD] add fp16 * [NPU][UPD] modify build test script * [NPU][UPD]add permute op Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com> * [FIX] fix inner_product_layer_builder error * [FIX][X86][OPENVINO] fix openvino deconvolution shape unaligned * [NPU][BUG] fix comiple error due to api change * Enhance arm int8 (#486) [ARM][OPT] 1. add dw tail process 2. add qgemm asm kernel(big hw and small c) 3. add conv impl factory Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Feature mat make border (#491) * [CHG] enhance mat converter param check * [CPU] implement cpu copy make border * [TEST] add copy make border unit test * [ARM] support copy make border * [METAL] support copy make border * [CHG] reset dst mat only when its data is nullptr * [OPENCL][ADD] support copy make border * [ARM] optimize mat copy make border * [Metal] disable interpolation-related unit_tests on Metal Co-authored-by: devandong <devandong@tencent.com> Co-authored-by: lnmdlong <lnmdlong@hotmail.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [OPENCL] fix chinese comments (#493) * [OPENCL] fix chinese comments * [DEVICE][OPENCL] change comment Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: neiltian <neiltian@tencent.com> * [X86] Add HardSwish layer acc * [X86] Add Optimized HardSwish Layer ACC, fix custom_implmentation issue * Enhance warpaffine nearest (#501) * [CPU] support nearest warpaffine * [CPU] fix nearest choose error * [ARM] support nearest warpaffine * [Metal] support nearest warpaffine * [Metal] fix bilinear warpaffine border access error * [ARM] optmize channel equals 4 * [OPENCL] support nearest warpaffine Co-authored-by: devandong <devandong@tencent.com> Co-authored-by: lnmdlong <lnmdlong@hotmail.com> * [ONNX][BUG] fix pool fusion bug (#500) fix pool fusion bug * [DEV][UPD] 1. Int8Reformat -> Reformat; * [X86] add concat layer * [X86] resolve conflicts * [X86][OPENVINO] fix splitv layer builder Co-authored-by: Dandiding <Dandiding@tencent.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> Co-authored-by: lnmdlong <lnmdlong@hotmail.com> Co-authored-by: devandong <devandong@tencent.com> Co-authored-by: quinnrong94 <quinnrong@tencent.com> Co-authored-by: 103yiran <1039105206@qq.com> Co-authored-by: nihui <shuizhuyuanluo@126.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: stephehuang <69882565+stephehuang@users.noreply.github.com> Co-authored-by: Bbean <j850447553@icloud.com> Co-authored-by: janchen <janchen@tencent.com> Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com> Co-authored-by: devandong <67893313+devandong@users.noreply.github.com> Co-authored-by: shaundai <shaundai@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: seanxcwang <seanxcwang@tencent.com> Co-authored-by: neiltian <neiltian@tencent.com> Co-authored-by: yeli <32798887+yl16417@users.noreply.github.com>
[ARM][OPT] 1. add dw tail process 2. add qgemm asm kernel(big hw and small c) 3. add conv impl factory Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>