riscv optimize convolution packed by nihui · Pull Request #6731 · Tencent/ncnn

nihui · 2026-05-19T16:34:26Z

No description provided.

tencent-adm · 2026-05-19T16:34:44Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov-commenter · 2026-05-19T16:38:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.95%. Comparing base (0f5c6ef) to head (8ef6fab).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6731      +/-   ##
==========================================
- Coverage   95.95%   95.95%   -0.01%     
==========================================
  Files         970      960      -10     
  Lines      403476   403285     -191     
==========================================
- Hits       387159   386967     -192     
- Misses      16317    16318       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR consolidates RISC-V (RVV/Zvfh) convolution and deconvolution implementations by replacing multiple pack-specific kernels (packn/pack1ton/packnto1/pack1) with unified “packed” kernel transform + execution paths, and removes the older per-pack headers.

Changes:

Introduces unified packed kernel transform/execution helpers for convolution and deconvolution (fp32 + fp16s/fp16sa paths).
Updates RISC-V convolution/deconvolution pipeline/forward code to call the new packed entrypoints.
Removes the legacy pack-specific header implementations that are now superseded.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/layer/riscv/deconvolution_riscv.cpp	Switches fp32 deconvolution to use `deconvolution_packed.*` transform + forward path.
src/layer/riscv/deconvolution_riscv_zfh.cpp	Switches fp16 deconvolution to use `deconvolution_packed_fp16s.*` transform + forward paths.
src/layer/riscv/deconvolution_packnto1.h	Removed legacy RVV packnto1 deconvolution kernel.
src/layer/riscv/deconvolution_packnto1_fp16s.h	Removed legacy RVV fp16s/fp16sa packnto1 deconvolution kernels.
src/layer/riscv/deconvolution_packn.h	Removed legacy RVV packn deconvolution kernel.
src/layer/riscv/deconvolution_packn_fp16s.h	Removed legacy RVV fp16s/fp16sa packn deconvolution kernels.
src/layer/riscv/deconvolution_pack1ton.h	Removed legacy RVV pack1ton deconvolution kernel.
src/layer/riscv/deconvolution_pack1ton_fp16s.h	Removed legacy RVV fp16s/fp16sa pack1ton deconvolution kernels.
src/layer/riscv/deconvolution_fp16s.h	Removed legacy fp16s deconvolution fallback implementation.
src/layer/riscv/deconvolution_packed.h	Adds unified fp32 packed deconvolution kernel transform + forward implementation.
src/layer/riscv/deconvolution_packed_fp16s.h	Adds unified fp16s/fp16sa packed deconvolution kernel transform + forward implementations.
src/layer/riscv/convolution_riscv.cpp	Switches fp32 convolution to use `convolution_packed.*` transform + forward path.
src/layer/riscv/convolution_riscv_zfh.cpp	Switches fp16 convolution to use `convolution_packed_fp16s.*` transform + forward paths.
src/layer/riscv/convolution_packnto1.h	Removed legacy RVV packnto1 convolution kernel.
src/layer/riscv/convolution_packnto1_fp16s.h	Removed legacy RVV fp16s/fp16sa packnto1 convolution kernels.
src/layer/riscv/convolution_packn.h	Removed legacy RVV packn convolution kernel.
src/layer/riscv/convolution_packn_fp16s.h	Removed legacy RVV fp16s/fp16sa packn convolution kernels.
src/layer/riscv/convolution_pack1ton.h	Removed legacy RVV pack1ton convolution kernel.
src/layer/riscv/convolution_pack1ton_fp16s.h	Removed legacy RVV fp16s/fp16sa pack1ton convolution kernels.
src/layer/riscv/convolution_fp16s.h	Removed legacy fp16s convolution fallback implementation.
src/layer/riscv/convolution_packed.h	Adds unified fp32 packed convolution kernel transform + forward implementation.
src/layer/riscv/convolution_packed_fp16s.h	Adds unified fp16s/fp16sa packed convolution kernel transform + forward implementations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+#if __riscv_vector
+    if (num_output >= packn)
+    {
+        if (num_input >= packn)
+            weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn);
+        else
+            weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn);


+            weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn, (size_t)2u);
+        else
+            weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn, (size_t)2u);
+    }
+    else
+    {
+        if (num_input >= packn)
+            weight_data_tm.create(packn * maxk, num_input / packn + num_input % packn, num_output, (size_t)2u);


+    if (num_output >= packn)
+    {
+        if (num_input >= packn)
+            weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn);
+        else
+            weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn);
+    }
+    else
+    {
+        if (num_input >= packn)
+            weight_data_tm.create(packn * maxk, num_input / packn + num_input % packn, num_output);


+            weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn, (size_t)2u);
+        else
+            weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn, (size_t)2u);
+    }
+    else
+    {
+        if (num_input >= packn)
+            weight_data_tm.create(packn * maxk, num_input / packn + num_input % packn, num_output, (size_t)2u);


riscv optimize convolution packed

8ef6fab

github-actions Bot added the riscv label May 19, 2026

nihui requested a review from Copilot May 20, 2026 03:23

Copilot started reviewing on behalf of nihui May 20, 2026 03:24 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

nihui merged commit 023dc18 into Tencent:master May 20, 2026
52 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

riscv optimize convolution packed#6731

riscv optimize convolution packed#6731
nihui merged 1 commit into
Tencent:masterfrom
nihui:opt-riscv-packed

nihui commented May 19, 2026

Uh oh!

tencent-adm commented May 19, 2026

Uh oh!

codecov-commenter commented May 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nihui commented May 19, 2026

Uh oh!

tencent-adm commented May 19, 2026

Uh oh!

codecov-commenter commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented May 19, 2026 •

edited

Loading