feat: add TopK and Gather ncnn operators for YOLOv10 deployment#6668
feat: add TopK and Gather ncnn operators for YOLOv10 deployment#6668vlordier wants to merge 31 commits intoTencent:masterfrom
Conversation
- Generate TopK class definition in pnnx.py output with forward() method - Instantiate TopK modules in Model.__init__() with proper parameters - Update forward() method to call self.topk_name() instead of direct TopK() calls - Fixes pnnx inference to properly execute TopK operations using torch.topk() - Test confirms TopK ONNX→pnnx conversion and inference working correctly
- Fix IR pattern syntax to use explicit parameter names (axis=%, largest=%, sorted=%) - Replace incorrect parameter lookup from 'op_0.axis' to 'axis' to match captured names - TopK pass now properly fires during ONNX→pnnx→ncnn conversion - All TopK parameters (axis, largest, sorted) correctly captured and set in ncnn layers - End-to-end test confirms ONNX→pnnx→ncnn conversion with TopK working correctly
use c++03-style topk comparator and keep deterministic nan/inf ordering remove redundant constructor param initialization fix tests cmakelists alphabetical order (Tile before TopK) expand torch_topk onnx tests (k=0/k=1, negative dim, sorted=false cases) drop generated topk onnx/pnnx/ncnn sidecar artifacts from repo
- Guard <algorithm>/<vector> behind #if NCNN_SIMPLESTL, include simplestl.h - Use std::partial_sort in simplestl mode (no std::nth_element available) - Guard <math.h> in tests behind #if !NCNN_SIMPLESTL to avoid simplemath.h conflict; define INFINITY/NAN as float expressions in simplestl mode - Fix cstep-unaware indexing for 3D/4D output tensors: use actual cstep for channel offset instead of assuming contiguous w*h layout - Convert #pragma omp parallel + inner #pragma omp for to #pragma omp parallel for to avoid __kmpc_barrier in simpleomp mode - Fix copyright year 2026->2025 - Apply code-format whitespace cleanup
Topk ci tests
- pass_level2/torch_topk.cpp: capture k/dim/largest/sorted as parameters
(prim::Constant) instead of tensor inputs, enabling ncnn pass matching
- pass_level2/torch_gather.cpp: restore original pattern (dim as tensor)
- pass_ncnn/TopK.cpp: match torch.topk with captured parameters and
convert to ncnn TopK layer (axis, largest, sorted)
- pass_ncnn/torch_gather.cpp (NEW): match torch.gather with 2 inputs
(input, index) and captured dim parameter, convert to ncnn Gather layer
- src/layer/gather.{h,cpp} (NEW): implement Gather ncnn operator
supporting 1D/2D/3D tensors with arbitrary axis
- PNNX CMakeLists fixes:
- per-target Torch include dirs to avoid protobuf header conflicts
- Abseil linking for Homebrew protobuf 34.x
- disable onnxruntime auto-detection (protobuf conflict)
- directory-level INCLUDE_DIRECTORIES_BEFORE for protobuf headers
Verified: YOLOv10n converts with 2 TopK + 2 Gather layers, only
cosmetic ops (Tensor.to, pnnx.Expression) ignored.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to enable end-to-end YOLOv10 deployment on ncnn by adding missing TopK and Gather operators, updating PNNX conversion passes to no longer ignore these ops, and adjusting the PNNX build system to avoid protobuf header/link conflicts.
Changes:
- Add ncnn runtime layers:
TopKandGather, plus atest_topkunit test and CMake registrations. - Update PNNX conversion: capture
torch.topkconstants in level2 and add ncnn graph rewrite passes fortorch.topk→TopKandtorch.gather→Gather. - Adjust PNNX build/includes and add an ONNX TopK conversion test + a dedicated GitHub Actions workflow.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/pnnx/tests/onnx/test_torch_topk.py | Adds ONNX roundtrip test coverage for torch.topk conversion. |
| tools/pnnx/tests/onnx/CMakeLists.txt | Registers the new ONNX TopK test. |
| tools/pnnx/src/pass_onnx/shape_inference.cpp | Makes onnxruntime C API include path more robust across installs. |
| tools/pnnx/src/pass_onnx/fold_constants.cpp | Same onnxruntime C API include path robustness for constant folding. |
| tools/pnnx/src/pass_ncnn/torch_gather.cpp | Introduces torch.gather → ncnn Gather rewrite pass. |
| tools/pnnx/src/pass_ncnn/TopK.cpp | Introduces torch.topk → ncnn TopK rewrite pass (values+indices and values-only patterns). |
| tools/pnnx/src/pass_level2/torch_topk.cpp | Captures topk args as prim::Constant nodes so later passes can read k/dim/largest/sorted. |
| tools/pnnx/src/load_onnx.cpp | Adds multi-path onnxruntime C API header inclusion. |
| tools/pnnx/src/ir.cpp | Adds Python codegen support for TopK custom module emission. |
| tools/pnnx/src/CMakeLists.txt | Registers new ncnn passes and adjusts protobuf/abseil include/link handling. |
| tools/pnnx/CMakeLists.txt | Changes how protobuf is discovered and modifies onnxruntime detection behavior. |
| tests/test_topk.cpp | Adds CPU correctness tests for TopK (including NaN/Inf behavior and fastpaths). |
| tests/CMakeLists.txt | Registers TopK as a layer test target. |
| src/layer/topk.h / src/layer/topk.cpp | Implements the ncnn TopK layer. |
| src/layer/gather.h / src/layer/gather.cpp | Implements the ncnn Gather layer. |
| src/CMakeLists.txt | Registers TopK and Gather layers for build. |
| .github/workflows/topk-linux-test.yml | Adds a CI workflow intended to validate TopK (and a PNNX ONNX TopK test). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Disable onnxruntime auto-detection — we only need torch2pnnx for YOLOv10 | ||
| set(onnxruntime_FOUND FALSE) |
There was a problem hiding this comment.
This unconditionally disables onnxruntime support (onnxruntime_FOUND is forced to FALSE). That disables building onnx2pnnx and breaks ONNX->PNNX conversion and the existing ONNX test suite. Please make this behavior opt-in (e.g., a CMake option) or keep auto-detection so existing workflows/users that rely on onnxruntime continue to work.
| # Disable onnxruntime auto-detection — we only need torch2pnnx for YOLOv10 | |
| set(onnxruntime_FOUND FALSE) | |
| # Allow specialized builds to disable onnxruntime explicitly, while preserving | |
| # the default auto-detection behavior for existing ONNX->PNNX workflows. | |
| option(PNNX_DISABLE_ONNXRUNTIME "disable onnxruntime support" OFF) | |
| if(PNNX_DISABLE_ONNXRUNTIME) | |
| message(STATUS "Building without onnxruntime support") | |
| set(onnxruntime_FOUND FALSE) | |
| endif() |
| // index_blob should contain int64 or int32 indices | ||
| // For simplicity we treat it as float and cast | ||
| const int index_size = (int)index_blob.total(); | ||
|
|
There was a problem hiding this comment.
Gather currently treats index_blob memory as int32 via a raw cast, but upstream indices are commonly int64 (and sometimes int32) depending on exporter. Reading an int64 tensor as int32 will corrupt indices and can segfault/out-of-bounds. Please branch on index_blob.elemsize (4 vs 8) and load indices accordingly (or convert into a temporary int32 buffer).
| op->params["0"] = new_axis; | ||
| op->params["1"] = largest; | ||
| op->params["2"] = sorted; | ||
| } |
There was a problem hiding this comment.
The captured k parameter from torch.topk is never written into the generated TopK layer params. As-is, the TopK layer will fall back to its default k, producing incorrect results for any k != 1. Extract k from captured_params and write it to the TopK param id used by TopK::load_param (pd.get(3, ...)), or explicitly wire k as a second input blob.
| { | ||
| fprintf(pyfp, "%s=", it.first.c_str()); | ||
|
|
||
| const Parameter& param = it.second; | ||
| if (param.type == 2) |
There was a problem hiding this comment.
TopK Python codegen emits keyword arguments using op->params keys directly. After ncnn rewrite these keys are numeric strings ("0", "1", ...), which produces invalid Python syntax like "0=...". Map param ids to valid names (axis/largest/sorted/k) when emitting the TopK(...) constructor call.
| fprintf(pyfp, " def __init__(self, axis=1, largest=1, sorted=1):\n"); | ||
| fprintf(pyfp, " super(TopK, self).__init__()\n"); | ||
| fprintf(pyfp, " self.axis = axis\n"); | ||
| fprintf(pyfp, " self.largest = largest\n"); | ||
| fprintf(pyfp, " self.sorted = sorted\n"); | ||
| fprintf(pyfp, " def forward(self, x, k):\n"); | ||
| fprintf(pyfp, " # Torch topk returns (values, indices)\n"); | ||
| fprintf(pyfp, " return torch.topk(x, k.item() if hasattr(k, 'item') else k, dim=self.axis, largest=bool(self.largest), sorted=bool(self.sorted))\n"); |
There was a problem hiding this comment.
The generated TopK module signature is forward(self, x, k), but the ncnn rewrite pattern for TopK has only one input. This will cause the generated model forward to call the module with the wrong number of arguments. Align codegen with the rewritten graph by either (1) making forward(self, x) use a stored k parameter, or (2) ensuring the rewritten TopK op actually has a k input operand.
| fprintf(pyfp, " def __init__(self, axis=1, largest=1, sorted=1):\n"); | |
| fprintf(pyfp, " super(TopK, self).__init__()\n"); | |
| fprintf(pyfp, " self.axis = axis\n"); | |
| fprintf(pyfp, " self.largest = largest\n"); | |
| fprintf(pyfp, " self.sorted = sorted\n"); | |
| fprintf(pyfp, " def forward(self, x, k):\n"); | |
| fprintf(pyfp, " # Torch topk returns (values, indices)\n"); | |
| fprintf(pyfp, " return torch.topk(x, k.item() if hasattr(k, 'item') else k, dim=self.axis, largest=bool(self.largest), sorted=bool(self.sorted))\n"); | |
| fprintf(pyfp, " def __init__(self, k=1, axis=1, largest=1, sorted=1):\n"); | |
| fprintf(pyfp, " super(TopK, self).__init__()\n"); | |
| fprintf(pyfp, " self.k = k\n"); | |
| fprintf(pyfp, " self.axis = axis\n"); | |
| fprintf(pyfp, " self.largest = largest\n"); | |
| fprintf(pyfp, " self.sorted = sorted\n"); | |
| fprintf(pyfp, " def forward(self, x):\n"); | |
| fprintf(pyfp, " # Torch topk returns (values, indices)\n"); | |
| fprintf(pyfp, " k = self.k.item() if hasattr(self.k, 'item') else self.k\n"); | |
| fprintf(pyfp, " return torch.topk(x, k, dim=self.axis, largest=bool(self.largest), sorted=bool(self.sorted))\n"); |
| // index_blob should contain int64 or int32 indices | ||
| // For simplicity we treat it as float and cast | ||
| const int index_size = (int)index_blob.total(); | ||
|
|
There was a problem hiding this comment.
The Gather implementation comments say indices are treated as float and cast, but the code later reinterprets index_blob memory as int32. Indices are commonly int64 in exported graphs, so this will corrupt reads. Please load indices according to index_blob.elemsize (4 vs 8) or convert to a temporary int32 buffer.
|
|
||
| // Allocate output (same dtype as input, shape matches index) | ||
| Mat& top_blob = top_blobs[0]; | ||
| top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator); |
There was a problem hiding this comment.
Output allocation always uses Mat::create(w,h,c,...) which forces a 3D output, even when index_blob is 1D/2D (and potentially 4D). This breaks Gather's contract that output shape matches index shape. Allocate top_blob with the same dims as index_blob using the appropriate create overload.
| top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator); | |
| if (out_shape.dims == 1) | |
| top_blob.create(out_shape.w, input_blob.elemsize, input_blob.elempack, opt.blob_allocator); | |
| else if (out_shape.dims == 2) | |
| top_blob.create(out_shape.w, out_shape.h, input_blob.elemsize, input_blob.elempack, opt.blob_allocator); | |
| else if (out_shape.dims == 3) | |
| top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator); | |
| else if (out_shape.dims == 4) | |
| top_blob.create(out_shape.w, out_shape.h, out_shape.d, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator); | |
| else | |
| return -1; |
| const float* inp = input_blob; | ||
| const int* idx = (const int*)index_blob; | ||
| float* out = top_blob; |
There was a problem hiding this comment.
Gather treats input and output tensors as float* (and writes via float*), but top_blob is allocated using input_blob.elemsize/elempack which may be fp16/int8/etc. This will produce incorrect results or memory corruption when elemsize != 4. Either restrict Gather to float32 with a runtime check or add proper type handling.
| else if (dims == 3) | ||
| { | ||
| // ncnn 3D layout: w * h * c, with cstride padding | ||
| size_t cstep = input_blob.cstep; | ||
| flat_in = coord_in[0] + coord_in[1] * input_blob.w + coord_in[2] * (int)cstep; |
There was a problem hiding this comment.
Only dims 1/2/3 are handled in flat index computation. For dims==4, flat_in remains incorrect and Gather will return wrong results silently. Either implement 4D indexing (w,h,d,c with cstep) or explicitly reject dims > 3 early with a clear error code.
| branches: | ||
| - topk-ci-tests |
There was a problem hiding this comment.
This workflow only runs on pushes to the 'topk-ci-tests' branch, so it will not run for pull_request events or for the repository's normal branches after merge. If it is meant to provide ongoing CI coverage, add appropriate pull_request/push triggers; otherwise consider not adding it to the mainline PR.
| branches: | |
| - topk-ci-tests | |
| pull_request: |
- src/layer/cast.{h,cpp}: extend Cast layer with int64 (type 5) and
int32 (type 6) support, adding conversions int64↔float32 and
int32↔float32
- pass_ncnn/tensor_to.cpp (NEW): convert Tensor.to (dtype cast) to
ncnn Cast layer, mapping torch dtype strings to ncnn type codes
- CMakeLists.txt: register tensor_to.cpp in pass_ncnn sources
Verified: YOLOv10n Tensor.to (i64→f32) now converts to Cast layer
instead of being ignored. Only cosmetic ops (pnnx.Expression) remain.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
33e04a2 to
93bd423
Compare
- ir.cpp: store k in TopK.__init__, use forward(self, x) — k was a ctor param but forward() expected it as an input arg, causing runtime error - ir.cpp: pass k= in TopK instantiation (k_val from params["3"]) - gather.cpp: reject non-float32 data (elemsize != 4) and dims > 3 explicitly - pnnx/src/CMakeLists: replace invalid set_property(INCLUDE_DIRECTORIES_BEFORE) with include_directories(BEFORE ...) to correctly force protobuf header order - pnnx/tests/onnx: add test_torch_gather.py roundtrip test (1D/2D/3D, multiple axes, negative axis) and register it in CMakeLists
|
Superseded by #6669 which consolidates all operators (TopK, Gather, GatherElements, Expand, Tile, Mod) with all review issues addressed. |
Summary
This PR adds the missing TopK and Gather ncnn operators and fixes the PNNX conversion passes so that YOLOv10 models convert end-to-end without any ignored core operators.
Problem
YOLOv10 integrates NMS directly into the model graph, including TopK and torch.gather operators. The upstream PNNX pipeline marks both as
ignore, making YOLOv10 unusable on ncnn.Solution
1. Fix TopK conversion (pass_level2 → pass_ncnn)
k,dim,largest,sortedasprim::Constantparameters instead of tensor inputs, enabling the ncnn pass to access these valuestorch.topkoperator and produce ncnnTopKlayer with correct axis/largest/sorted params2. Add Gather ncnn operator
ncnn_add_layer(Gather)torch.gatherwith captureddimparameter and convert to ncnnGather3. PNNX build system fixes
Verified
YOLOv10n converts with 2 TopK + 2 Gather layers, only cosmetic ops ignored:
Previously: