Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[arm]add gemm + relu6/leakyrelu fusion #2674

Merged
merged 67 commits into from
Jan 14, 2020

Conversation

chenjiaoAngel
Copy link
Collaborator

No description provided.

….cc to distinguish between conv3x3s1_depthwise_fp32.cc
it is coped from __gemm_sdot_meta_.h
fix build error in kernels/x86/conv_compute.h
Merge branch 'develop' of git://github.com/PaddlePaddle/Paddle-Lite into PaddlePaddle-develop
Merge branch 'conv_pad' of https://github.com/chenjiaoAngel/Paddle-Lite into conv_pad
delete con2d_transpose test, this test can found in test/math/
@@ -11,8 +11,6 @@ if((NOT LITE_WITH_OPENCL AND NOT LITE_WITH_FPGA) AND (LITE_WITH_X86 OR LITE_WITH
lite_cc_test(test_kernel_activation_compute SRCS activation_compute_test.cc DEPS arena_framework ${npu_kernels} ${xpu_kernels} ${x86_kernels} ${cuda_kernels} ${arm_kernels} ${lite_ops} ${host_kernels})
lite_cc_test(test_kernel_argmax_compute SRCS argmax_compute_test.cc DEPS arena_framework ${x86_kernels} ${cuda_kernels} ${arm_kernels} ${lite_ops} ${host_kernels})
lite_cc_test(test_kernel_axpy_compute SRCS axpy_compute_test.cc DEPS arena_framework ${x86_kernels} ${cuda_kernels} ${arm_kernels} ${lite_ops} ${host_kernels})
lite_cc_test(test_kernel_conv_compute SRCS conv_compute_test.cc DEPS arena_framework ${xpu_kernels} ${npu_kernels} ${x86_kernels} ${cuda_kernels} ${arm_kernels} ${lite_ops} ${host_kernels})
lite_cc_test(test_kernel_conv2d_transpose_compute SRCS conv2d_transpose_compute_test.cc DEPS arena_framework ${x86_kernels} ${cuda_kernels} ${arm_kernels} ${lite_ops} ${host_kernels})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个单测不跑了?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不跑了,test/math下也有conv_compute和conv_transpose的单测。避免每次修改,要修改两个单测

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • conv的单测是我最近加的,用来跑npu kernel的,如果是arm kernel会跳过,不会多跑
  • lite/tests/kernels 目录下的单测最好保留。1、这里面的单测更多是用来测试功能性的东西,比如padding_algorithm,不同参数组合下的结果是否正确。2、这里面的单测会验证operator部分是否正确,而不只是kernel本身是否正确。3、不同平台的单测都可以比较容易的加在这里,只需要改下place就可以,用LITE_WITH_XXX分隔。
  • lite/tests/math 目录下的单测,我理解是测试同一个OP kernel的多种实现是否正确,应该不要和lite/tests/kernels下的单测合并

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
比如这样可以很快添加不同平台的单测

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,好。我下次提交把这个恢复过来。删除原因是,我现在在加relu6融合,这个需要修改单测,给conv_param.activation_param设值,不然跑单测会挂。不想重复修改单测,所以删除了。

"vld1.32 {d10-d11}, [%[din_ptr]]! @ vld1q_f32(din_ptr) \n" \
"vld1.32 {d12-d13}, [%[din_ptr]]! @ vld1q_f32(din_ptr) \n" \
"vadd.f32 q3, q3, %q[vbias] @ add \n" \
"vadd.f32 q4, q5, %q[vbias] @ add \n" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这应该是q4,q4?还是就是q4,q5

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这应该是q4,q4

"vbif q3, q8, q7 @ choose \n" \
"vbif q4, q10, q9 @ choose \n" \
"vbif q5, q12, q11 @ choose \n" \
"vbif q6, q13, q13 @ choose \n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q6,q14,q13?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q6,q14,q13

int remain = channel_size % 16;
float32x4_t vzero = vdupq_n_f32(0.f);
for (int j = 0; j < channel; j++) {
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

空循环?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

忘了删除

@MyPandaShaoxiang MyPandaShaoxiang merged commit c0af965 into PaddlePaddle:develop Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants