Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance arm int8 #486

Merged
merged 17 commits into from
Oct 27, 2020
Merged

Enhance arm int8 #486

merged 17 commits into from
Oct 27, 2020

Conversation

seanxcwang
Copy link
Collaborator

@seanxcwang seanxcwang commented Oct 20, 2020

  1. int8 dw conv增加了channel非8的倍数处理
  2. 针对channel比较小/hw较大的case,增加了8x8和4x8 gemm kernel
  3. 添加conv factory来消除select conv impl的重复代码

}
}

void ArmConvLayerAccFactory::CreateImpFP(const std::vector<Blob *> &inputs, const std::vector<Blob *> &outputs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateImpFP should be CreateImpHalf?


} // namespace TNN_NS

#endif // TNN_SOURCE_TNN_DEVICE_ARM_ARM_CONV_LAYER_GROUP_H_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@codecov-io
Copy link

codecov-io commented Oct 22, 2020

Codecov Report

Merging #486 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #486   +/-   ##
=======================================
  Coverage   24.34%   24.34%           
=======================================
  Files         287      287           
  Lines        8979     8979           
=======================================
  Hits         2186     2186           
  Misses       6793     6793           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d70e594...c5e016c. Read the comment docs.

powerpwang
powerpwang previously approved these changes Oct 26, 2020
quinnrong94
quinnrong94 previously approved these changes Oct 26, 2020
@seanxcwang seanxcwang merged commit f452c41 into master Oct 27, 2020
@seanxcwang seanxcwang deleted the enhance-arm-int8 branch October 27, 2020 02:51
gttiankai pushed a commit that referenced this pull request Nov 2, 2020
[ARM][OPT] 1. add dw tail process 2. add qgemm asm kernel(big hw and small c) 3. add conv impl factory

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
bluaxe added a commit that referenced this pull request Nov 9, 2020
* minor update

* Hotfix linux compile (#446)

* [CONVERTER][BUG]1. fix complie failed on centos (gcc 4.9);

* fix codecc warning (#441)

fix codecc warnning

Co-authored-by: lnmdlong <lnmdlong@hotmail.com>
Co-authored-by: devandong <devandong@tencent.com>
Co-authored-by: quinnrong94 <quinnrong@tencent.com>

* interpret func use reference param (#451)

* interpret func use reference param

* ncnn interpret use reference param

* fix enum value error (#454)

* add missing letter (#455)

Co-authored-by: nihui <shuizhuyuanluo@126.com>

* Stable v0.2 merge master (#419)

* [EXAMPLES][PATCH] add face align demo and refactor for some case

* [EXAMPLES][FIX] fix align opencl error

* [EXAMPLES][FIX] fix arm linux demo

* [EXAMPLE][FIX] fix android preview size error

* [UPD]update youtu face alignment mean pts logic (#385)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [ARM][BUG] fix sqrt layer with zero input (#392)

* pull tflite2tnn tools (#378)

* add tflite2tnn tools

Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD]move blaze anchor file to resource; fix blazeface error; (#390)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [OPENCL]support google pixel phone opencl mode (#399)

Co-authored-by: janchen <janchen@tencent.com>
Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [UPD] update readme (#404)

* [UPD] update readme

* [UPD] fix newline in README_en.md

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [OPENCL][BUG] fix gflops calculate bug in conv (#412)

* [OPENCL][BUG] fix gflops calculate bug in conv

* [OPENCL][FIX] fix deconv calculate flops

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: neiltian <neiltian@tencent.com>

* Hotfix issue 400 (#410)

* [CPU] fix bfp16 blob converter

* [CPU] fix cpu device allocate

* [CPU] skip blob converter to yuv mat

Co-authored-by: lucasktian <lucasktian@tencent.com>

* [ARM][BUG] fix armv7 gemm_float_n4 error

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>
Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>
Co-authored-by: stephehuang <69882565+stephehuang@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: Bbean <j850447553@icloud.com>
Co-authored-by: janchen <janchen@tencent.com>
Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com>
Co-authored-by: devandong <67893313+devandong@users.noreply.github.com>
Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com>

* x86 demo minor change

* x86 demo add resize

* add null check for MatUtils

* [RKNPU][CHG] add leakly relu convert (#465)

* [NPU][ADD] add Huawei NPU profiling support

issue, #463

* [OPENCL][BUG] fix profiling summary incorrect when loop count > 1

* add webcam based demo

* add null check for MatUtils (#466)

* [Fix] Fix pad layer inconsistent problem

* [X86][OPENVINO] increase x86 unary layer operator

* add x86 demo: blaze face detector & aligner

* x86 demo change to UltraFaceDetecotr

* Update reshape 's conversion in TFLite (#469)

Update reshape 's conversion in TFLite

Co-authored-by: lucasktian <lucasktian@tencent.com>

* x86 demo msvc ok

* fix cmake versioning & macos build scripts

* [X86][OPENVINO] Add Binary Op Frame

* [COMPILE][FIX] fix gnustl_static compile error and warning

* fix xcode compile error

* fix xcode build errors

* [FIX] fix sdk sample build error

* [NPU][CHG] refactor cpu blob converter

seperate NPU blob converter from cpu blob converter

* [X86] Increase x86 convolution layer (im2col)

* [EXAMPLE][BUG] fix cls id error

* [X86] add pooling layer (max & average)

* [OPENCL][BUG] fix fp16 overflow risk in avg pooling

issue, #480

* [OPENCL][CHG] add more error info

* [X86] Add batch and scale layer

* [X86] add reduce op layer

* fix base layer_builder Init

* build metal on macos

* [X86][OPENVINO] add splitv layer for openvino

* fix display of README_EN (#484)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [x86] add all reduce layer operations

* Opencl reduce softmax opt (#443)

* [OPENCL][BUG] skip NNV21/NNV12 blob converter test case, not supported for now

* [OPENCL][OPT] optimize reduce perf with fine-grained parallelism when parallelism is low, intensity is high

* [OPENCL][BUG] fix workgroup size init

* [OPENCL][BUG] fix work group size init, ensure size to be power of 2

* [OPENCL][OPT] optimize softmax perf with fine-grained parallelism when parallelism is low, intensity is high

* [OPENCL] refine code for pull request

* update opencl program

* [OPENCL][FIX] use fp16 for local memory when enable && fix global work items filter && set threshold based on experiments

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: neiltian <neiltian@tencent.com>

* Feature fp16 workflow (#482)

* [OPT] rename int8 reformat; change SupportDevice to IsSupported

* [OPT] add GetEnabledPrecision in abstract device; implement RegisterLayerPrecision in arm device

* [OPT] update global_device_map only once

* [OPT] set fp16 blob in network initlayers

* [OPT] support fp16 reformat in net_optimizer

* [OPT] get cpu fp16 capability; refactor update blob precision

* [FIX] update fp16 blob with cpu support

* [FIX] fix typo

* [CHG] only update precision for cpu; rename to ImplementedPrecision

* Npu fp16 fix (#488)

* [NPU][UPD] add test android

* [NPU][UPD] add fp16

* [NPU][UPD] modify build test script

* [NPU][UPD]add permute op

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com>

* [FIX] fix inner_product_layer_builder error

* [FIX][X86][OPENVINO] fix openvino deconvolution shape unaligned

* [NPU][BUG] fix comiple error due to api change

* Enhance arm int8 (#486)

[ARM][OPT] 1. add dw tail process 2. add qgemm asm kernel(big hw and small c) 3. add conv impl factory

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Feature mat make border (#491)

* [CHG] enhance mat converter param check

* [CPU] implement cpu copy make border

* [TEST] add copy make border unit test

* [ARM] support copy make border

* [METAL] support copy make border

* [CHG] reset dst mat only when its data is nullptr

* [OPENCL][ADD] support copy make border

* [ARM] optimize mat copy make border

* [Metal] disable interpolation-related unit_tests on Metal

Co-authored-by: devandong <devandong@tencent.com>
Co-authored-by: lnmdlong <lnmdlong@hotmail.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [OPENCL] fix chinese comments (#493)

* [OPENCL] fix chinese comments

* [DEVICE][OPENCL] change comment

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: neiltian <neiltian@tencent.com>

* [X86] Add HardSwish layer acc

* [X86] Add Optimized HardSwish Layer ACC, fix custom_implmentation issue

* Enhance warpaffine nearest (#501)

* [CPU] support nearest warpaffine

* [CPU] fix nearest choose error

* [ARM] support nearest warpaffine

* [Metal] support nearest warpaffine

* [Metal] fix bilinear warpaffine border access error

* [ARM] optmize channel equals 4

* [OPENCL] support nearest warpaffine

Co-authored-by: devandong <devandong@tencent.com>
Co-authored-by: lnmdlong <lnmdlong@hotmail.com>

* [ONNX][BUG] fix pool fusion bug (#500)

fix pool fusion bug

* [DEV][UPD] 1. Int8Reformat -> Reformat;

* [X86] add concat layer

* [X86] resolve conflicts

* [X86][OPENVINO] fix splitv layer builder

Co-authored-by: Dandiding <Dandiding@tencent.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com>
Co-authored-by: lnmdlong <lnmdlong@hotmail.com>
Co-authored-by: devandong <devandong@tencent.com>
Co-authored-by: quinnrong94 <quinnrong@tencent.com>
Co-authored-by: 103yiran <1039105206@qq.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>
Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>
Co-authored-by: stephehuang <69882565+stephehuang@users.noreply.github.com>
Co-authored-by: Bbean <j850447553@icloud.com>
Co-authored-by: janchen <janchen@tencent.com>
Co-authored-by: ShaunDai <66760945+shaundai-tencent@users.noreply.github.com>
Co-authored-by: devandong <67893313+devandong@users.noreply.github.com>
Co-authored-by: shaundai <shaundai@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: seanxcwang <seanxcwang@tencent.com>
Co-authored-by: neiltian <neiltian@tencent.com>
Co-authored-by: yeli <32798887+yl16417@users.noreply.github.com>
gttiankai pushed a commit that referenced this pull request Nov 9, 2020
[ARM][OPT] 1. add dw tail process 2. add qgemm asm kernel(big hw and small c) 3. add conv impl factory

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants