Skip to content

feat: add TopK and Gather ncnn operators for YOLOv10 deployment#6668

Closed
vlordier wants to merge 31 commits intoTencent:masterfrom
vlordier:fix-pnnx-gather-support
Closed

feat: add TopK and Gather ncnn operators for YOLOv10 deployment#6668
vlordier wants to merge 31 commits intoTencent:masterfrom
vlordier:fix-pnnx-gather-support

Conversation

@vlordier
Copy link
Copy Markdown

Summary

This PR adds the missing TopK and Gather ncnn operators and fixes the PNNX conversion passes so that YOLOv10 models convert end-to-end without any ignored core operators.

Problem

YOLOv10 integrates NMS directly into the model graph, including TopK and torch.gather operators. The upstream PNNX pipeline marks both as ignore, making YOLOv10 unusable on ncnn.

Solution

1. Fix TopK conversion (pass_level2 → pass_ncnn)

  • pass_level2/torch_topk.cpp: Capture k, dim, largest, sorted as prim::Constant parameters instead of tensor inputs, enabling the ncnn pass to access these values
  • pass_ncnn/TopK.cpp: Updated match pattern to consume the parameterized torch.topk operator and produce ncnn TopK layer with correct axis/largest/sorted params

2. Add Gather ncnn operator

  • src/layer/gather.h — Gather layer header
  • src/layer/gather.cpp — Implementation supporting 1D/2D/3D tensors with arbitrary axis
  • src/CMakeLists.txt — Registered as ncnn_add_layer(Gather)
  • pass_ncnn/torch_gather.cpp (NEW) — Match torch.gather with captured dim parameter and convert to ncnn Gather

3. PNNX build system fixes

  • Per-target Torch include directories to avoid protobuf header conflicts (Torch bundles protobuf v3, system has v34.x)
  • Abseil linking for Homebrew protobuf 34.x (macOS/Linux)
  • Directory-level INCLUDE_DIRECTORIES_BEFORE for protobuf headers

Verified

YOLOv10n converts with 2 TopK + 2 Gather layers, only cosmetic ops ignored:

TopK                     topk_162   1 2 296 297 298 0=-1 1=1 2=1
TopK                     topk_163   1 2 304 305 306 0=-1 1=1 2=1
Gather                   gather_165 2 1 294 302 303 0=1
Gather                   gather_166 2 1 292 317 318 0=1

Previously:

ignore torch.topk torch.topk_11 param dim=-1
ignore torch.topk torch.topk_11 param k=300
ignore torch.gather torch.gather_69 param dim=1

vlordier and others added 28 commits April 10, 2026 12:24
- Generate TopK class definition in pnnx.py output with forward() method
- Instantiate TopK modules in Model.__init__() with proper parameters
- Update forward() method to call self.topk_name() instead of direct TopK() calls
- Fixes pnnx inference to properly execute TopK operations using torch.topk()
- Test confirms TopK ONNX→pnnx conversion and inference working correctly
- Fix IR pattern syntax to use explicit parameter names (axis=%, largest=%, sorted=%)
- Replace incorrect parameter lookup from 'op_0.axis' to 'axis' to match captured names
- TopK pass now properly fires during ONNX→pnnx→ncnn conversion
- All TopK parameters (axis, largest, sorted) correctly captured and set in ncnn layers
- End-to-end test confirms ONNX→pnnx→ncnn conversion with TopK working correctly
use c++03-style topk comparator and keep deterministic nan/inf ordering

remove redundant constructor param initialization

fix tests cmakelists alphabetical order (Tile before TopK)

expand torch_topk onnx tests (k=0/k=1, negative dim, sorted=false cases)

drop generated topk onnx/pnnx/ncnn sidecar artifacts from repo
- Guard <algorithm>/<vector> behind #if NCNN_SIMPLESTL, include simplestl.h
- Use std::partial_sort in simplestl mode (no std::nth_element available)
- Guard <math.h> in tests behind #if !NCNN_SIMPLESTL to avoid simplemath.h
  conflict; define INFINITY/NAN as float expressions in simplestl mode
- Fix cstep-unaware indexing for 3D/4D output tensors: use actual cstep
  for channel offset instead of assuming contiguous w*h layout
- Convert #pragma omp parallel + inner #pragma omp for to #pragma omp
  parallel for to avoid __kmpc_barrier in simpleomp mode
- Fix copyright year 2026->2025
- Apply code-format whitespace cleanup
- pass_level2/torch_topk.cpp: capture k/dim/largest/sorted as parameters
  (prim::Constant) instead of tensor inputs, enabling ncnn pass matching
- pass_level2/torch_gather.cpp: restore original pattern (dim as tensor)
- pass_ncnn/TopK.cpp: match torch.topk with captured parameters and
  convert to ncnn TopK layer (axis, largest, sorted)
- pass_ncnn/torch_gather.cpp (NEW): match torch.gather with 2 inputs
  (input, index) and captured dim parameter, convert to ncnn Gather layer
- src/layer/gather.{h,cpp} (NEW): implement Gather ncnn operator
  supporting 1D/2D/3D tensors with arbitrary axis
- PNNX CMakeLists fixes:
  - per-target Torch include dirs to avoid protobuf header conflicts
  - Abseil linking for Homebrew protobuf 34.x
  - disable onnxruntime auto-detection (protobuf conflict)
  - directory-level INCLUDE_DIRECTORIES_BEFORE for protobuf headers

Verified: YOLOv10n converts with 2 TopK + 2 Gather layers, only
cosmetic ops (Tensor.to, pnnx.Expression) ignored.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Copilot AI review requested due to automatic review settings April 11, 2026 07:31
@github-actions github-actions Bot added the core label Apr 11, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to enable end-to-end YOLOv10 deployment on ncnn by adding missing TopK and Gather operators, updating PNNX conversion passes to no longer ignore these ops, and adjusting the PNNX build system to avoid protobuf header/link conflicts.

Changes:

  • Add ncnn runtime layers: TopK and Gather, plus a test_topk unit test and CMake registrations.
  • Update PNNX conversion: capture torch.topk constants in level2 and add ncnn graph rewrite passes for torch.topkTopK and torch.gatherGather.
  • Adjust PNNX build/includes and add an ONNX TopK conversion test + a dedicated GitHub Actions workflow.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
tools/pnnx/tests/onnx/test_torch_topk.py Adds ONNX roundtrip test coverage for torch.topk conversion.
tools/pnnx/tests/onnx/CMakeLists.txt Registers the new ONNX TopK test.
tools/pnnx/src/pass_onnx/shape_inference.cpp Makes onnxruntime C API include path more robust across installs.
tools/pnnx/src/pass_onnx/fold_constants.cpp Same onnxruntime C API include path robustness for constant folding.
tools/pnnx/src/pass_ncnn/torch_gather.cpp Introduces torch.gather → ncnn Gather rewrite pass.
tools/pnnx/src/pass_ncnn/TopK.cpp Introduces torch.topk → ncnn TopK rewrite pass (values+indices and values-only patterns).
tools/pnnx/src/pass_level2/torch_topk.cpp Captures topk args as prim::Constant nodes so later passes can read k/dim/largest/sorted.
tools/pnnx/src/load_onnx.cpp Adds multi-path onnxruntime C API header inclusion.
tools/pnnx/src/ir.cpp Adds Python codegen support for TopK custom module emission.
tools/pnnx/src/CMakeLists.txt Registers new ncnn passes and adjusts protobuf/abseil include/link handling.
tools/pnnx/CMakeLists.txt Changes how protobuf is discovered and modifies onnxruntime detection behavior.
tests/test_topk.cpp Adds CPU correctness tests for TopK (including NaN/Inf behavior and fastpaths).
tests/CMakeLists.txt Registers TopK as a layer test target.
src/layer/topk.h / src/layer/topk.cpp Implements the ncnn TopK layer.
src/layer/gather.h / src/layer/gather.cpp Implements the ncnn Gather layer.
src/CMakeLists.txt Registers TopK and Gather layers for build.
.github/workflows/topk-linux-test.yml Adds a CI workflow intended to validate TopK (and a PNNX ONNX TopK test).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/pnnx/CMakeLists.txt
Comment on lines +128 to +129
# Disable onnxruntime auto-detection — we only need torch2pnnx for YOLOv10
set(onnxruntime_FOUND FALSE)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unconditionally disables onnxruntime support (onnxruntime_FOUND is forced to FALSE). That disables building onnx2pnnx and breaks ONNX->PNNX conversion and the existing ONNX test suite. Please make this behavior opt-in (e.g., a CMake option) or keep auto-detection so existing workflows/users that rely on onnxruntime continue to work.

Suggested change
# Disable onnxruntime auto-detection — we only need torch2pnnx for YOLOv10
set(onnxruntime_FOUND FALSE)
# Allow specialized builds to disable onnxruntime explicitly, while preserving
# the default auto-detection behavior for existing ONNX->PNNX workflows.
option(PNNX_DISABLE_ONNXRUNTIME "disable onnxruntime support" OFF)
if(PNNX_DISABLE_ONNXRUNTIME)
message(STATUS "Building without onnxruntime support")
set(onnxruntime_FOUND FALSE)
endif()

Copilot uses AI. Check for mistakes.
Comment thread src/layer/gather.cpp
Comment on lines +30 to +33
// index_blob should contain int64 or int32 indices
// For simplicity we treat it as float and cast
const int index_size = (int)index_blob.total();

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gather currently treats index_blob memory as int32 via a raw cast, but upstream indices are commonly int64 (and sometimes int32) depending on exporter. Reading an int64 tensor as int32 will corrupt indices and can segfault/out-of-bounds. Please branch on index_blob.elemsize (4 vs 8) and load indices accordingly (or convert into a temporary int32 buffer).

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +78
op->params["0"] = new_axis;
op->params["1"] = largest;
op->params["2"] = sorted;
}
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The captured k parameter from torch.topk is never written into the generated TopK layer params. As-is, the TopK layer will fall back to its default k, producing incorrect results for any k != 1. Extract k from captured_params and write it to the TopK param id used by TopK::load_param (pd.get(3, ...)), or explicitly wire k as a second input blob.

Copilot uses AI. Check for mistakes.
Comment thread tools/pnnx/src/ir.cpp
Comment on lines +1646 to +1650
{
fprintf(pyfp, "%s=", it.first.c_str());

const Parameter& param = it.second;
if (param.type == 2)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TopK Python codegen emits keyword arguments using op->params keys directly. After ncnn rewrite these keys are numeric strings ("0", "1", ...), which produces invalid Python syntax like "0=...". Map param ids to valid names (axis/largest/sorted/k) when emitting the TopK(...) constructor call.

Copilot uses AI. Check for mistakes.
Comment thread tools/pnnx/src/ir.cpp
Comment on lines +1497 to +1504
fprintf(pyfp, " def __init__(self, axis=1, largest=1, sorted=1):\n");
fprintf(pyfp, " super(TopK, self).__init__()\n");
fprintf(pyfp, " self.axis = axis\n");
fprintf(pyfp, " self.largest = largest\n");
fprintf(pyfp, " self.sorted = sorted\n");
fprintf(pyfp, " def forward(self, x, k):\n");
fprintf(pyfp, " # Torch topk returns (values, indices)\n");
fprintf(pyfp, " return torch.topk(x, k.item() if hasattr(k, 'item') else k, dim=self.axis, largest=bool(self.largest), sorted=bool(self.sorted))\n");
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated TopK module signature is forward(self, x, k), but the ncnn rewrite pattern for TopK has only one input. This will cause the generated model forward to call the module with the wrong number of arguments. Align codegen with the rewritten graph by either (1) making forward(self, x) use a stored k parameter, or (2) ensuring the rewritten TopK op actually has a k input operand.

Suggested change
fprintf(pyfp, " def __init__(self, axis=1, largest=1, sorted=1):\n");
fprintf(pyfp, " super(TopK, self).__init__()\n");
fprintf(pyfp, " self.axis = axis\n");
fprintf(pyfp, " self.largest = largest\n");
fprintf(pyfp, " self.sorted = sorted\n");
fprintf(pyfp, " def forward(self, x, k):\n");
fprintf(pyfp, " # Torch topk returns (values, indices)\n");
fprintf(pyfp, " return torch.topk(x, k.item() if hasattr(k, 'item') else k, dim=self.axis, largest=bool(self.largest), sorted=bool(self.sorted))\n");
fprintf(pyfp, " def __init__(self, k=1, axis=1, largest=1, sorted=1):\n");
fprintf(pyfp, " super(TopK, self).__init__()\n");
fprintf(pyfp, " self.k = k\n");
fprintf(pyfp, " self.axis = axis\n");
fprintf(pyfp, " self.largest = largest\n");
fprintf(pyfp, " self.sorted = sorted\n");
fprintf(pyfp, " def forward(self, x):\n");
fprintf(pyfp, " # Torch topk returns (values, indices)\n");
fprintf(pyfp, " k = self.k.item() if hasattr(self.k, 'item') else self.k\n");
fprintf(pyfp, " return torch.topk(x, k, dim=self.axis, largest=bool(self.largest), sorted=bool(self.sorted))\n");

Copilot uses AI. Check for mistakes.
Comment thread src/layer/gather.cpp
Comment on lines +30 to +33
// index_blob should contain int64 or int32 indices
// For simplicity we treat it as float and cast
const int index_size = (int)index_blob.total();

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Gather implementation comments say indices are treated as float and cast, but the code later reinterprets index_blob memory as int32. Indices are commonly int64 in exported graphs, so this will corrupt reads. Please load indices according to index_blob.elemsize (4 vs 8) or convert to a temporary int32 buffer.

Copilot uses AI. Check for mistakes.
Comment thread src/layer/gather.cpp

// Allocate output (same dtype as input, shape matches index)
Mat& top_blob = top_blobs[0];
top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output allocation always uses Mat::create(w,h,c,...) which forces a 3D output, even when index_blob is 1D/2D (and potentially 4D). This breaks Gather's contract that output shape matches index shape. Allocate top_blob with the same dims as index_blob using the appropriate create overload.

Suggested change
top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
if (out_shape.dims == 1)
top_blob.create(out_shape.w, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
else if (out_shape.dims == 2)
top_blob.create(out_shape.w, out_shape.h, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
else if (out_shape.dims == 3)
top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
else if (out_shape.dims == 4)
top_blob.create(out_shape.w, out_shape.h, out_shape.d, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
else
return -1;

Copilot uses AI. Check for mistakes.
Comment thread src/layer/gather.cpp
Comment on lines +55 to +57
const float* inp = input_blob;
const int* idx = (const int*)index_blob;
float* out = top_blob;
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gather treats input and output tensors as float* (and writes via float*), but top_blob is allocated using input_blob.elemsize/elempack which may be fp16/int8/etc. This will produce incorrect results or memory corruption when elemsize != 4. Either restrict Gather to float32 with a runtime check or add proper type handling.

Copilot uses AI. Check for mistakes.
Comment thread src/layer/gather.cpp
Comment on lines +108 to +112
else if (dims == 3)
{
// ncnn 3D layout: w * h * c, with cstride padding
size_t cstep = input_blob.cstep;
flat_in = coord_in[0] + coord_in[1] * input_blob.w + coord_in[2] * (int)cstep;
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only dims 1/2/3 are handled in flat index computation. For dims==4, flat_in remains incorrect and Gather will return wrong results silently. Either implement 4D indexing (w,h,d,c with cstep) or explicitly reject dims > 3 early with a clear error code.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +5
branches:
- topk-ci-tests
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow only runs on pushes to the 'topk-ci-tests' branch, so it will not run for pull_request events or for the repository's normal branches after merge. If it is meant to provide ongoing CI coverage, add appropriate pull_request/push triggers; otherwise consider not adding it to the mainline PR.

Suggested change
branches:
- topk-ci-tests
pull_request:

Copilot uses AI. Check for mistakes.
vlordier and others added 2 commits April 11, 2026 07:43
- src/layer/cast.{h,cpp}: extend Cast layer with int64 (type 5) and
  int32 (type 6) support, adding conversions int64↔float32 and
  int32↔float32
- pass_ncnn/tensor_to.cpp (NEW): convert Tensor.to (dtype cast) to
  ncnn Cast layer, mapping torch dtype strings to ncnn type codes
- CMakeLists.txt: register tensor_to.cpp in pass_ncnn sources

Verified: YOLOv10n Tensor.to (i64→f32) now converts to Cast layer
instead of being ignored. Only cosmetic ops (pnnx.Expression) remain.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
@vlordier vlordier force-pushed the fix-pnnx-gather-support branch from 33e04a2 to 93bd423 Compare April 11, 2026 08:59
vlordier added a commit to vlordier/ncnn that referenced this pull request Apr 16, 2026
- ir.cpp: store k in TopK.__init__, use forward(self, x) — k was a ctor
  param but forward() expected it as an input arg, causing runtime error
- ir.cpp: pass k= in TopK instantiation (k_val from params["3"])
- gather.cpp: reject non-float32 data (elemsize != 4) and dims > 3 explicitly
- pnnx/src/CMakeLists: replace invalid set_property(INCLUDE_DIRECTORIES_BEFORE)
  with include_directories(BEFORE ...) to correctly force protobuf header order
- pnnx/tests/onnx: add test_torch_gather.py roundtrip test (1D/2D/3D,
  multiple axes, negative axis) and register it in CMakeLists
@vlordier
Copy link
Copy Markdown
Author

Superseded by #6669 which consolidates all operators (TopK, Gather, GatherElements, Expand, Tile, Mod) with all review issues addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants