riscv optimize convolution packed#6731
Conversation
|
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6731 +/- ##
==========================================
- Coverage 95.95% 95.95% -0.01%
==========================================
Files 970 960 -10
Lines 403476 403285 -191
==========================================
- Hits 387159 386967 -192
- Misses 16317 16318 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR consolidates RISC-V (RVV/Zvfh) convolution and deconvolution implementations by replacing multiple pack-specific kernels (packn/pack1ton/packnto1/pack1) with unified “packed” kernel transform + execution paths, and removes the older per-pack headers.
Changes:
- Introduces unified packed kernel transform/execution helpers for convolution and deconvolution (fp32 + fp16s/fp16sa paths).
- Updates RISC-V convolution/deconvolution pipeline/forward code to call the new packed entrypoints.
- Removes the legacy pack-specific header implementations that are now superseded.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/layer/riscv/deconvolution_riscv.cpp | Switches fp32 deconvolution to use deconvolution_packed.* transform + forward path. |
| src/layer/riscv/deconvolution_riscv_zfh.cpp | Switches fp16 deconvolution to use deconvolution_packed_fp16s.* transform + forward paths. |
| src/layer/riscv/deconvolution_packnto1.h | Removed legacy RVV packnto1 deconvolution kernel. |
| src/layer/riscv/deconvolution_packnto1_fp16s.h | Removed legacy RVV fp16s/fp16sa packnto1 deconvolution kernels. |
| src/layer/riscv/deconvolution_packn.h | Removed legacy RVV packn deconvolution kernel. |
| src/layer/riscv/deconvolution_packn_fp16s.h | Removed legacy RVV fp16s/fp16sa packn deconvolution kernels. |
| src/layer/riscv/deconvolution_pack1ton.h | Removed legacy RVV pack1ton deconvolution kernel. |
| src/layer/riscv/deconvolution_pack1ton_fp16s.h | Removed legacy RVV fp16s/fp16sa pack1ton deconvolution kernels. |
| src/layer/riscv/deconvolution_fp16s.h | Removed legacy fp16s deconvolution fallback implementation. |
| src/layer/riscv/deconvolution_packed.h | Adds unified fp32 packed deconvolution kernel transform + forward implementation. |
| src/layer/riscv/deconvolution_packed_fp16s.h | Adds unified fp16s/fp16sa packed deconvolution kernel transform + forward implementations. |
| src/layer/riscv/convolution_riscv.cpp | Switches fp32 convolution to use convolution_packed.* transform + forward path. |
| src/layer/riscv/convolution_riscv_zfh.cpp | Switches fp16 convolution to use convolution_packed_fp16s.* transform + forward paths. |
| src/layer/riscv/convolution_packnto1.h | Removed legacy RVV packnto1 convolution kernel. |
| src/layer/riscv/convolution_packnto1_fp16s.h | Removed legacy RVV fp16s/fp16sa packnto1 convolution kernels. |
| src/layer/riscv/convolution_packn.h | Removed legacy RVV packn convolution kernel. |
| src/layer/riscv/convolution_packn_fp16s.h | Removed legacy RVV fp16s/fp16sa packn convolution kernels. |
| src/layer/riscv/convolution_pack1ton.h | Removed legacy RVV pack1ton convolution kernel. |
| src/layer/riscv/convolution_pack1ton_fp16s.h | Removed legacy RVV fp16s/fp16sa pack1ton convolution kernels. |
| src/layer/riscv/convolution_fp16s.h | Removed legacy fp16s convolution fallback implementation. |
| src/layer/riscv/convolution_packed.h | Adds unified fp32 packed convolution kernel transform + forward implementation. |
| src/layer/riscv/convolution_packed_fp16s.h | Adds unified fp16s/fp16sa packed convolution kernel transform + forward implementations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #if __riscv_vector | ||
| if (num_output >= packn) | ||
| { | ||
| if (num_input >= packn) | ||
| weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn); | ||
| else | ||
| weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn); |
| weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn, (size_t)2u); | ||
| else | ||
| weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn, (size_t)2u); | ||
| } | ||
| else | ||
| { | ||
| if (num_input >= packn) | ||
| weight_data_tm.create(packn * maxk, num_input / packn + num_input % packn, num_output, (size_t)2u); |
| if (num_output >= packn) | ||
| { | ||
| if (num_input >= packn) | ||
| weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn); | ||
| else | ||
| weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn); | ||
| } | ||
| else | ||
| { | ||
| if (num_input >= packn) | ||
| weight_data_tm.create(packn * maxk, num_input / packn + num_input % packn, num_output); |
| weight_data_tm.create(packn * packn * maxk, num_input / packn + num_input % packn, num_output / packn + num_output % packn, (size_t)2u); | ||
| else | ||
| weight_data_tm.create(packn * maxk, num_input, num_output / packn + num_output % packn, (size_t)2u); | ||
| } | ||
| else | ||
| { | ||
| if (num_input >= packn) | ||
| weight_data_tm.create(packn * maxk, num_input / packn + num_input % packn, num_output, (size_t)2u); |
No description provided.