Skip to content

riscv optimize convolution packed#6731

Merged
nihui merged 1 commit into
Tencent:masterfrom
nihui:opt-riscv-packed
May 20, 2026
Merged

riscv optimize convolution packed#6731
nihui merged 1 commit into
Tencent:masterfrom
nihui:opt-riscv-packed

Conversation

@nihui
Copy link
Copy Markdown
Member

@nihui nihui commented May 19, 2026

No description provided.

@github-actions github-actions Bot added the riscv label May 19, 2026
@tencent-adm
Copy link
Copy Markdown
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.95%. Comparing base (0f5c6ef) to head (8ef6fab).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6731      +/-   ##
==========================================
- Coverage   95.95%   95.95%   -0.01%     
==========================================
  Files         970      960      -10     
  Lines      403476   403285     -191     
==========================================
- Hits       387159   386967     -192     
- Misses      16317    16318       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates RISC-V (RVV/Zvfh) convolution and deconvolution implementations by replacing multiple pack-specific kernels (packn/pack1ton/packnto1/pack1) with unified “packed” kernel transform + execution paths, and removes the older per-pack headers.

Changes:

  • Introduces unified packed kernel transform/execution helpers for convolution and deconvolution (fp32 + fp16s/fp16sa paths).
  • Updates RISC-V convolution/deconvolution pipeline/forward code to call the new packed entrypoints.
  • Removes the legacy pack-specific header implementations that are now superseded.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/layer/riscv/deconvolution_riscv.cpp Switches fp32 deconvolution to use deconvolution_packed.* transform + forward path.
src/layer/riscv/deconvolution_riscv_zfh.cpp Switches fp16 deconvolution to use deconvolution_packed_fp16s.* transform + forward paths.
src/layer/riscv/deconvolution_packnto1.h Removed legacy RVV packnto1 deconvolution kernel.
src/layer/riscv/deconvolution_packnto1_fp16s.h Removed legacy RVV fp16s/fp16sa packnto1 deconvolution kernels.
src/layer/riscv/deconvolution_packn.h Removed legacy RVV packn deconvolution kernel.
src/layer/riscv/deconvolution_packn_fp16s.h Removed legacy RVV fp16s/fp16sa packn deconvolution kernels.
src/layer/riscv/deconvolution_pack1ton.h Removed legacy RVV pack1ton deconvolution kernel.
src/layer/riscv/deconvolution_pack1ton_fp16s.h Removed legacy RVV fp16s/fp16sa pack1ton deconvolution kernels.
src/layer/riscv/deconvolution_fp16s.h Removed legacy fp16s deconvolution fallback implementation.
src/layer/riscv/deconvolution_packed.h Adds unified fp32 packed deconvolution kernel transform + forward implementation.
src/layer/riscv/deconvolution_packed_fp16s.h Adds unified fp16s/fp16sa packed deconvolution kernel transform + forward implementations.
src/layer/riscv/convolution_riscv.cpp Switches fp32 convolution to use convolution_packed.* transform + forward path.
src/layer/riscv/convolution_riscv_zfh.cpp Switches fp16 convolution to use convolution_packed_fp16s.* transform + forward paths.
src/layer/riscv/convolution_packnto1.h Removed legacy RVV packnto1 convolution kernel.
src/layer/riscv/convolution_packnto1_fp16s.h Removed legacy RVV fp16s/fp16sa packnto1 convolution kernels.
src/layer/riscv/convolution_packn.h Removed legacy RVV packn convolution kernel.
src/layer/riscv/convolution_packn_fp16s.h Removed legacy RVV fp16s/fp16sa packn convolution kernels.
src/layer/riscv/convolution_pack1ton.h Removed legacy RVV pack1ton convolution kernel.
src/layer/riscv/convolution_pack1ton_fp16s.h Removed legacy RVV fp16s/fp16sa pack1ton convolution kernels.
src/layer/riscv/convolution_fp16s.h Removed legacy fp16s convolution fallback implementation.
src/layer/riscv/convolution_packed.h Adds unified fp32 packed convolution kernel transform + forward implementation.
src/layer/riscv/convolution_packed_fp16s.h Adds unified fp16s/fp16sa packed convolution kernel transform + forward implementations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +50 to +56
#if __riscv_vector
if (num_output >= packn)
{
if (num_input >= packn)
weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn);
else
weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn);
Comment on lines +54 to +61
weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn, (size_t)2u);
else
weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn, (size_t)2u);
}
else
{
if (num_input >= packn)
weight_data_tm.create(packn * maxk, num_input / packn + num_input % packn, num_output, (size_t)2u);
Comment on lines +34 to +44
if (num_output >= packn)
{
if (num_input >= packn)
weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn);
else
weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn);
}
else
{
if (num_input >= packn)
weight_data_tm.create(packn * maxk, num_input / packn + num_input % packn, num_output);
Comment on lines +37 to +44
weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn, (size_t)2u);
else
weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn, (size_t)2u);
}
else
{
if (num_input >= packn)
weight_data_tm.create(packn * maxk, num_input / packn + num_input % packn, num_output, (size_t)2u);
@nihui nihui merged commit 023dc18 into Tencent:master May 20, 2026
52 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants