Skip to content

Add TopK layer support for pnnx ONNX export path#6558

Closed
vlordier wants to merge 32 commits intoTencent:masterfrom
vlordier:fix-pnnx-onnx-topk-support
Closed

Add TopK layer support for pnnx ONNX export path#6558
vlordier wants to merge 32 commits intoTencent:masterfrom
vlordier:fix-pnnx-onnx-topk-support

Conversation

@vlordier
Copy link
Copy Markdown

@vlordier vlordier commented Feb 26, 2026

Summary

  • add native TopK layer implementation in ncnn core
  • lower ONNX TopK nodes in pnnx ncnn pass to the new TopK layer
  • register layer/build integration and add test_topk coverage

Why

Issue #6377 reports ONNX export pipelines (for example YOLOv10 via ultralytics) failing due to unsupported TopK in ncnn conversion/runtime path.

Validation

  • configured and built test_topk locally
  • ran ./build/tests/test_topk successfully

Fixes #6377

Copilot AI review requested due to automatic review settings February 26, 2026 22:24
@tencent-adm
Copy link
Copy Markdown
Member

tencent-adm commented Feb 26, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds TopK layer support for the pnnx ONNX export path in ncnn, addressing issue #6377 which reports failures in ONNX export pipelines for models like YOLOv10 through ultralytics.

Changes:

  • Implements native TopK layer in ncnn core with support for different axes, largest/smallest selection, and sorted/unsorted output
  • Adds pnnx pass to convert ONNX TopK nodes to ncnn TopK layer with proper batch axis handling
  • Adds comprehensive test coverage for 1D, 2D, 3D, and 4D tensors with various axis configurations

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/layer/topk.h Header file defining TopK layer class with parameters for axis, largest, sorted, and k
src/layer/topk.cpp Implementation of TopK layer supporting 1-4D tensors with efficient sorting using std::partial_sort and std::nth_element
src/CMakeLists.txt Registers TopK layer in build system (correctly ordered alphabetically)
tools/pnnx/src/pass_ncnn/TopK.cpp PNNX pass to convert ONNX TopK operations to ncnn format, handling batch axis removal and parameter mapping
tools/pnnx/src/CMakeLists.txt Adds TopK pass to pnnx build (correctly ordered alphabetically)
tests/test_topk.cpp Test cases covering 1D through 4D tensors with various axis, k, and largest parameters
tests/CMakeLists.txt Registers TopK test (minor ordering issue - should come after Tile)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_topk.cpp
Comment thread tests/CMakeLists.txt Outdated
Comment thread src/layer/topk.cpp Outdated
Comment thread src/layer/topk.h Outdated
Comment thread tests/test_topk.cpp Outdated
Comment thread tools/pnnx/src/pass_ncnn/TopK.cpp Outdated
@github-actions
Copy link
Copy Markdown

Please enable github action in YOUR FORKED REPO to make code-format workflow work

@mlknz
Copy link
Copy Markdown

mlknz commented Apr 9, 2026

any updates on this?

vlordier added 11 commits April 10, 2026 12:24
- Generate TopK class definition in pnnx.py output with forward() method
- Instantiate TopK modules in Model.__init__() with proper parameters
- Update forward() method to call self.topk_name() instead of direct TopK() calls
- Fixes pnnx inference to properly execute TopK operations using torch.topk()
- Test confirms TopK ONNX→pnnx conversion and inference working correctly
- Fix IR pattern syntax to use explicit parameter names (axis=%, largest=%, sorted=%)
- Replace incorrect parameter lookup from 'op_0.axis' to 'axis' to match captured names
- TopK pass now properly fires during ONNX→pnnx→ncnn conversion
- All TopK parameters (axis, largest, sorted) correctly captured and set in ncnn layers
- End-to-end test confirms ONNX→pnnx→ncnn conversion with TopK working correctly
use c++03-style topk comparator and keep deterministic nan/inf ordering

remove redundant constructor param initialization

fix tests cmakelists alphabetical order (Tile before TopK)

expand torch_topk onnx tests (k=0/k=1, negative dim, sorted=false cases)

drop generated topk onnx/pnnx/ncnn sidecar artifacts from repo
vlordier and others added 8 commits April 10, 2026 12:24
- Guard <algorithm>/<vector> behind #if NCNN_SIMPLESTL, include simplestl.h
- Use std::partial_sort in simplestl mode (no std::nth_element available)
- Guard <math.h> in tests behind #if !NCNN_SIMPLESTL to avoid simplemath.h
  conflict; define INFINITY/NAN as float expressions in simplestl mode
- Fix cstep-unaware indexing for 3D/4D output tensors: use actual cstep
  for channel offset instead of assuming contiguous w*h layout
- Convert #pragma omp parallel + inner #pragma omp for to #pragma omp
  parallel for to avoid __kmpc_barrier in simpleomp mode
- Fix copyright year 2026->2025
- Apply code-format whitespace cleanup
@vlordier vlordier force-pushed the fix-pnnx-onnx-topk-support branch from 830149c to 84e083b Compare April 10, 2026 10:25
@vlordier
Copy link
Copy Markdown
Author

@mlknz seems it's passing the tests and CI, give it a try and let me know

vlordier and others added 4 commits April 11, 2026 00:05
- pass_level2/torch_topk.cpp: capture k/dim/largest/sorted as parameters
  (prim::Constant) instead of tensor inputs, enabling ncnn pass matching
- pass_level2/torch_gather.cpp: restore original pattern (dim as tensor)
- pass_ncnn/TopK.cpp: match torch.topk with captured parameters and
  convert to ncnn TopK layer (axis, largest, sorted)
- pass_ncnn/torch_gather.cpp (NEW): match torch.gather with 2 inputs
  (input, index) and captured dim parameter, convert to ncnn Gather layer
- src/layer/gather.{h,cpp} (NEW): implement Gather ncnn operator
  supporting 1D/2D/3D tensors with arbitrary axis
- PNNX CMakeLists fixes:
  - per-target Torch include dirs to avoid protobuf header conflicts
  - Abseil linking for Homebrew protobuf 34.x
  - disable onnxruntime auto-detection (protobuf conflict)
  - directory-level INCLUDE_DIRECTORIES_BEFORE for protobuf headers

Verified: YOLOv10n converts with 2 TopK + 2 Gather layers, only
cosmetic ops (Tensor.to, pnnx.Expression) ignored.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- src/layer/cast.{h,cpp}: extend Cast layer with int64 (type 5) and
  int32 (type 6) support, adding conversions int64↔float32 and
  int32↔float32
- pass_ncnn/tensor_to.cpp (NEW): convert Tensor.to (dtype cast) to
  ncnn Cast layer, mapping torch dtype strings to ncnn type codes
- CMakeLists.txt: register tensor_to.cpp in pass_ncnn sources

Verified: YOLOv10n Tensor.to (i64→f32) now converts to Cast layer
instead of being ignored. Only cosmetic ops (pnnx.Expression) remain.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…PR triggers to workflow

- load_onnx.cpp does not use any onnxruntime API (no Ort* usage),
  so the include guards and #error are unnecessary and break builds
  for users who don't have onnxruntime installed
- Add pull_request trigger and fix-pnnx-onnx-topk-support push trigger
  to topk-linux-test.yml so CI runs on this PR

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
vlordier added a commit to vlordier/ncnn that referenced this pull request Apr 11, 2026
…le ONNX pass

This commit adds critical missing operators needed for YOLO26 model conversion to NCNN.

New Operators:
- GatherElements: ONNX GatherElements operator for tensor element gathering
- Expand: ONNX Expand operator for tensor broadcasting with numpy semantics
- Tile ONNX pass: Conversion pass for ONNX Tile operator (layer already exists)

Changes:
- Add src/layer/gatherelements.h and .cpp
- Add src/layer/expand.h and .cpp
- Add tools/pnnx/src/pass_ncnn/gatherelements.cpp
- Add tools/pnnx/src/pass_ncnn/expand.cpp
- Add tools/pnnx/src/pass_ncnn/tile.cpp
- Update src/CMakeLists.txt to register GatherElements layer
- Update tools/pnnx/src/CMakeLists.txt to register PNNX passes

Implementation follows the pattern from PR Tencent#6558 (TopK/Gather/Cast).

YOLO26 Operator Analysis (453 nodes, 28 unique ops):
- 25 operators: Already supported in NCNN
- 3 operators: Newly implemented (this commit)
- 1 operator (Mod): Low priority, only 1 usage, can workaround

Testing:
- All files compile successfully
- No compilation errors
- Follows NCNN coding style and patterns

Enables YOLO26 end2end NMS-free conversion with output shape [1, 300, 6].

References:
- PR Tencent#6558: TopK/Gather/Cast implementation
- YOLO26: https://arxiv.org/abs/2602.14582
- Issues: Tencent#6518, Tencent#6610

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
vlordier added a commit to vlordier/ncnn that referenced this pull request Apr 11, 2026
- Tile: Added support for ONNX Tile operator with repeats as second input blob
- Expand: Fixed implementation for proper shape broadcasting
- Both operators now follow PR Tencent#6558 pattern
- Maintains backward compatibility with parameter-based mode

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 12 comments.

Comments suppressed due to low confidence (1)

src/layer/cast.cpp:108

  • Cast advertises type_from=0 as “auto” (see cast.h), but Cast::forward() never resolves type_from==0 to an actual source type. With the new pnnx Tensor.to -> Cast lowering emitting type_from=0, this will produce uninitialized outputs. Add type_from==0 inference (e.g., from bottom_blob.elemsize / known flags) or explicitly treat 0 as float32 when no better signal exists, and return an error for unsupported conversions.
int Cast::forward(const Mat& bottom_blob, Mat& top_blob, const Option& opt) const
{
    if (type_from == type_to)
    {
        top_blob = bottom_blob;
        return 0;
    }

    int w = bottom_blob.w;
    int h = bottom_blob.h;
    int d = bottom_blob.d;
    int channels = bottom_blob.c;
    int dims = bottom_blob.dims;
    size_t elemsize = bottom_blob.elemsize;
    int elempack = bottom_blob.elempack;

    size_t out_elemsize = elemsize;
    if (type_to == 1)
    {
        // float32
        out_elemsize = 4 * elempack;
    }
    else if (type_to == 2)
    {
        // float16
        out_elemsize = 2 * elempack;
    }
    else if (type_to == 3)
    {
        // int8
        out_elemsize = elempack;
    }
    else if (type_to == 4)
    {
        // bfloat16
        out_elemsize = 2 * elempack;
    }
    else if (type_to == 5)
    {
        // int64
        out_elemsize = 8 * elempack;
    }
    else if (type_to == 6)
    {
        // int32
        out_elemsize = 4 * elempack;
    }

    if (dims == 1)
    {
        top_blob.create(w, out_elemsize, elempack, opt.blob_allocator);
    }
    else if (dims == 2)
    {
        top_blob.create(w, h, out_elemsize, elempack, opt.blob_allocator);
    }
    else if (dims == 3)
    {
        top_blob.create(w, h, channels, out_elemsize, elempack, opt.blob_allocator);
    }
    else if (dims == 4)
    {
        top_blob.create(w, h, d, channels, out_elemsize, elempack, opt.blob_allocator);
    }
    if (top_blob.empty())
        return -100;

    int size = w * h * d * elempack;


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_topk.cpp
Comment on lines +129 to +162
return 0
|| test_topk(a, 0, 1, 1, 1)
|| test_topk(a, 0, 5, 1, 1)
|| test_topk(a, 0, 1, 0, 0)
|| test_topk(a, -1, 7, 0, 1)
|| test_topk(a, 0, 4, 1, 0)
|| test_topk(a, 0, 9, 1, 1);
}

static int test_topk_1()
{
ncnn::Mat a = RandomMat(12, 17);

return 0
|| test_topk(a, 0, 1, 1, 1)
|| test_topk(a, 0, 5, 1, 1)
|| test_topk(a, 1, 3, 1, 1)
|| test_topk(a, -1, 8, 0, 1)
|| test_topk(a, 1, 6, 0, 0)
|| test_topk(a, -2, 7, 1, 1);
}

static int test_topk_2()
{
ncnn::Mat a = RandomMat(8, 9, 11);

return 0
|| test_topk(a, 0, 3, 1, 1)
|| test_topk(a, 1, 4, 1, 1)
|| test_topk(a, 2, 2, 0, 1)
|| test_topk(a, 2, 5, 1, 0)
|| test_topk(a, -1, 6, 1, 1)
|| test_topk(a, -2, 5, 0, 1)
|| test_topk(a, -3, 7, 1, 1);
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing semicolon after return 0 causes a compile error, blocking the new test_topk target from building. Add the semicolon so the chained || test_topk(...) expressions compile.

Suggested change
return 0
|| test_topk(a, 0, 1, 1, 1)
|| test_topk(a, 0, 5, 1, 1)
|| test_topk(a, 0, 1, 0, 0)
|| test_topk(a, -1, 7, 0, 1)
|| test_topk(a, 0, 4, 1, 0)
|| test_topk(a, 0, 9, 1, 1);
}
static int test_topk_1()
{
ncnn::Mat a = RandomMat(12, 17);
return 0
|| test_topk(a, 0, 1, 1, 1)
|| test_topk(a, 0, 5, 1, 1)
|| test_topk(a, 1, 3, 1, 1)
|| test_topk(a, -1, 8, 0, 1)
|| test_topk(a, 1, 6, 0, 0)
|| test_topk(a, -2, 7, 1, 1);
}
static int test_topk_2()
{
ncnn::Mat a = RandomMat(8, 9, 11);
return 0
|| test_topk(a, 0, 3, 1, 1)
|| test_topk(a, 1, 4, 1, 1)
|| test_topk(a, 2, 2, 0, 1)
|| test_topk(a, 2, 5, 1, 0)
|| test_topk(a, -1, 6, 1, 1)
|| test_topk(a, -2, 5, 0, 1)
|| test_topk(a, -3, 7, 1, 1);
return 0 ||
test_topk(a, 0, 1, 1, 1) ||
test_topk(a, 0, 5, 1, 1) ||
test_topk(a, 0, 1, 0, 0) ||
test_topk(a, -1, 7, 0, 1) ||
test_topk(a, 0, 4, 1, 0) ||
test_topk(a, 0, 9, 1, 1);
}
static int test_topk_1()
{
ncnn::Mat a = RandomMat(12, 17);
return 0 ||
test_topk(a, 0, 1, 1, 1) ||
test_topk(a, 0, 5, 1, 1) ||
test_topk(a, 1, 3, 1, 1) ||
test_topk(a, -1, 8, 0, 1) ||
test_topk(a, 1, 6, 0, 0) ||
test_topk(a, -2, 7, 1, 1);
}
static int test_topk_2()
{
ncnn::Mat a = RandomMat(8, 9, 11);
return 0 ||
test_topk(a, 0, 3, 1, 1) ||
test_topk(a, 1, 4, 1, 1) ||
test_topk(a, 2, 2, 0, 1) ||
test_topk(a, 2, 5, 1, 0) ||
test_topk(a, -1, 6, 1, 1) ||
test_topk(a, -2, 5, 0, 1) ||
test_topk(a, -3, 7, 1, 1);

Copilot uses AI. Check for mistakes.
Comment on lines 13 to +20
return R"PNNXIR(7767517
7 7
12 7
pnnx.Input input_0 0 1 input
pnnx.Input input_1 0 1 k
pnnx.Input input_2 0 1 dim
pnnx.Input input_3 0 1 largest
pnnx.Input input_4 0 1 sorted
aten::topk op_0 5 2 input k dim largest sorted values indices
prim::Constant op_0 0 1 k value=%k
prim::Constant op_1 0 1 dim value=%dim
prim::Constant op_2 0 1 largest value=%largest
prim::Constant op_3 0 1 sorted value=%sorted
aten::topk op_4 5 2 input k dim largest sorted values indices
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PNNXIR header 12 7 does not match the actual number of operator lines in this pattern (7). This mismatch will prevent the graph rewriter from matching aten::topk nodes. Update the header counts to reflect the pattern contents (likely 7 7).

Copilot uses AI. Check for mistakes.
Comment thread tools/pnnx/src/ir.cpp
Comment on lines +1642 to +1664
fprintf(pyfp, " self.%s = TopK(", sanitize_identifier(op->name).c_str());

int i = 0;
for (const auto& it : op->params)
{
fprintf(pyfp, "%s=", it.first.c_str());

const Parameter& param = it.second;
if (param.type == 2)
{
fprintf(pyfp, "%d", param.i);
}
else if (param.type == 1)
{
fprintf(pyfp, "%d", param.b ? 1 : 0);
}

if (i + 1 != op->params.size())
fprintf(pyfp, ", ");
i++;
}

fprintf(pyfp, ")\n");
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When instantiating TopK modules, this loop prints it.first as a Python keyword (e.g., 0=..., 1=...). ncnn-style param keys are numeric strings, which are not valid Python identifiers and will generate invalid Python code. Map these param ids to named arguments (axis, largest, sorted, k) or emit positional arguments in a fixed order.

Copilot uses AI. Check for mistakes.
Comment thread src/layer/gather.cpp
return -100;

const float* inp = input_blob;
const int* idx = (const int*)index_blob;
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gather::forward() reinterprets the index tensor as const int* (idx = (const int*)index_blob), but ncnn (and this PR’s TopK/ArgMax) commonly represent indices as float values stored in a float buffer. Reinterpreting float bits as int will produce huge indices and out-of-bounds reads. Read indices as float and cast to int (or branch based on index_blob.elemsize / the Cast output type).

Suggested change
const int* idx = (const int*)index_blob;
const float* idx = index_blob;

Copilot uses AI. Check for mistakes.
Comment thread src/layer/gather.cpp
Comment on lines +66 to +114
int rem = i;
int coord_out[4] = {0, 0, 0, 0};
if (top_blob.dims == 1)
{
coord_out[0] = rem;
}
else if (top_blob.dims == 2)
{
coord_out[0] = rem % top_blob.w;
coord_out[1] = rem / top_blob.w;
}
else if (top_blob.dims == 3)
{
int hw = top_blob.w * top_blob.h;
coord_out[0] = (rem % hw) % top_blob.w;
coord_out[1] = (rem % hw) / top_blob.w;
coord_out[2] = rem / hw;
}

// Get index value at this output position
int gather_idx = idx[i];
// Handle negative indices
if (gather_idx < 0) gather_idx += axis_dim_size;

// Build input coordinate (same as output, but axis coord replaced)
int coord_in[4] = {coord_out[0], coord_out[1], coord_out[2], coord_out[3]};
coord_in[positive_axis] = gather_idx;

// Clamp to input bounds
if (coord_in[positive_axis] >= axis_dim_size) coord_in[positive_axis] = axis_dim_size - 1;
if (coord_in[positive_axis] < 0) coord_in[positive_axis] = 0;

// Compute flat input index
int flat_in = 0;
if (dims == 1)
{
flat_in = coord_in[0];
}
else if (dims == 2)
{
flat_in = coord_in[0] + coord_in[1] * input_blob.w;
}
else if (dims == 3)
{
// ncnn 3D layout: w * h * c, with cstride padding
size_t cstep = input_blob.cstep;
flat_in = coord_in[0] + coord_in[1] * input_blob.w + coord_in[2] * (int)cstep;
}

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation only decomposes output coordinates / computes flat_in for dims up to 3. For 4D inputs (and/or 4D index tensors), coord_out/flat_in are left partially unset and flat_in stays 0, so most outputs will incorrectly read the first element. Add explicit handling for dims==4 (both coordinate decomposition and flat index computation, including cstep/d strides).

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +60
type_to = 5;
else if (dtype == "torch.int32" || dtype == "torch.int")
type_to = 6;

op->params["0"] = 0; // auto-detect input type
op->params["1"] = type_to;
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensor.to is lowered to Cast with type_from=0 (“auto”), but Cast::forward() currently has no logic to infer the source type when type_from==0, so none of the conversion branches will run and the output buffer will be left uninitialized. Either implement type_from==0 auto-detection in Cast or set an explicit type_from here based on the actual input dtype tracked by pnnx.

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +78
int new_axis = axis;
if (axis >= 0)
new_axis = axis > batch_index ? axis - 1 : axis;

op->params["0"] = new_axis;
op->params["1"] = largest;
op->params["2"] = sorted;
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ncnn TopK layer expects k via param id 3 (or via a second input blob), but this rewrite pass never writes k into op->params. As a result, converted graphs will use the default k=1 regardless of the original torch.topk(..., k=...). Capture and write k (and consider supporting the ONNX-style second input for dynamic K if needed).

Copilot uses AI. Check for mistakes.
Comment thread tools/pnnx/src/ir.cpp
Comment on lines +1497 to +1502
fprintf(pyfp, " def __init__(self, axis=1, largest=1, sorted=1):\n");
fprintf(pyfp, " super(TopK, self).__init__()\n");
fprintf(pyfp, " self.axis = axis\n");
fprintf(pyfp, " self.largest = largest\n");
fprintf(pyfp, " self.sorted = sorted\n");
fprintf(pyfp, " def forward(self, x, k):\n");
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated Python TopK module takes forward(self, x, k), but the pnnx->ncnn lowering for torch.topk in this PR models k as a captured parameter (not an input tensor). If the resulting TopK operator only has one input, the emitted self.<name>(v_input) call will raise at runtime due to the missing k argument. Align the Python stub with how k is represented (store k as a module attribute and accept only x, or ensure the operator keeps a second k input).

Suggested change
fprintf(pyfp, " def __init__(self, axis=1, largest=1, sorted=1):\n");
fprintf(pyfp, " super(TopK, self).__init__()\n");
fprintf(pyfp, " self.axis = axis\n");
fprintf(pyfp, " self.largest = largest\n");
fprintf(pyfp, " self.sorted = sorted\n");
fprintf(pyfp, " def forward(self, x, k):\n");
fprintf(pyfp, " def __init__(self, axis=1, largest=1, sorted=1, k=None):\n");
fprintf(pyfp, " super(TopK, self).__init__()\n");
fprintf(pyfp, " self.axis = axis\n");
fprintf(pyfp, " self.largest = largest\n");
fprintf(pyfp, " self.sorted = sorted\n");
fprintf(pyfp, " self.k = k\n");
fprintf(pyfp, " def forward(self, x, k=None):\n");
fprintf(pyfp, " k = self.k if k is None else k\n");
fprintf(pyfp, " if k is None:\n");
fprintf(pyfp, " raise TypeError('TopK.forward() missing required argument: k')\n");

Copilot uses AI. Check for mistakes.
Comment thread src/layer/gather.cpp

// Allocate output (same dtype as input, shape matches index)
Mat& top_blob = top_blobs[0];
top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

top_blob.create(out_shape.w, out_shape.h, out_shape.c, ...) always creates a 3D output, even when index_blob is 1D/2D/4D. That changes the output rank and breaks the coordinate decomposition logic below (which depends on top_blob.dims). Allocate top_blob using the correct create() overload based on out_shape.dims so the output shape truly matches the index tensor.

Suggested change
top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
if (out_shape.dims == 1)
top_blob.create(out_shape.w, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
else if (out_shape.dims == 2)
top_blob.create(out_shape.w, out_shape.h, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
else if (out_shape.dims == 3)
top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
else if (out_shape.dims == 4)
top_blob.create(out_shape.w, out_shape.h, out_shape.d, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
else
return -1;

Copilot uses AI. Check for mistakes.
else if (dim_p.type == 5 && !dim_p.ai.empty())
axis = dim_p.ai[0];
}

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lowering writes axis directly from dim, but other ncnn rewrite passes drop/renumber the batch axis using __batch_index (see e.g. torch_flip/TopK). Without the same adjustment here, Gather will target the wrong dimension after batch removal. Apply the same batch-axis handling (reject gather along batch, and decrement axes > batch_index).

Suggested change
if (op->params.find("__batch_index") != op->params.end())
{
const Parameter& batch_index_p = op->params.at("__batch_index");
if (batch_index_p.type == 2)
{
const int batch_index = batch_index_p.i;
// ncnn lowers tensor ranks with the batch axis removed.
// gather along the removed batch axis cannot be represented here.
if (axis == batch_index)
return;
if (axis > batch_index)
axis -= 1;
}
}

Copilot uses AI. Check for mistakes.
vlordier added a commit to vlordier/ncnn that referenced this pull request Apr 16, 2026
- ir.cpp: store k in TopK.__init__, use forward(self, x) — k was a ctor
  param but forward() expected it as an input arg, causing runtime error
- ir.cpp: pass k= in TopK instantiation (k_val from params["3"])
- gather.cpp: reject non-float32 data (elemsize != 4) and dims > 3 explicitly
- pnnx/src/CMakeLists: replace invalid set_property(INCLUDE_DIRECTORIES_BEFORE)
  with include_directories(BEFORE ...) to correctly force protobuf header order
- pnnx/tests/onnx: add test_torch_gather.py roundtrip test (1D/2D/3D,
  multiple axes, negative axis) and register it in CMakeLists
@vlordier
Copy link
Copy Markdown
Author

Superseded by #6669 which consolidates all operators (TopK, Gather, GatherElements, Expand, Tile, Mod) with all review issues addressed.

@vlordier vlordier closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Requesting support for TopK op

4 participants