Add TopK layer support for pnnx ONNX export path by vlordier · Pull Request #6558 · Tencent/ncnn

vlordier · 2026-02-26T22:24:51Z

Summary

add native TopK layer implementation in ncnn core
lower ONNX TopK nodes in pnnx ncnn pass to the new TopK layer
register layer/build integration and add test_topk coverage

Why

Issue #6377 reports ONNX export pipelines (for example YOLOv10 via ultralytics) failing due to unsupported TopK in ncnn conversion/runtime path.

Validation

configured and built test_topk locally
ran ./build/tests/test_topk successfully

Fixes #6377

tencent-adm · 2026-02-26T22:25:07Z

All committers have signed the CLA.

Copilot

Pull request overview

This pull request adds TopK layer support for the pnnx ONNX export path in ncnn, addressing issue #6377 which reports failures in ONNX export pipelines for models like YOLOv10 through ultralytics.

Changes:

Implements native TopK layer in ncnn core with support for different axes, largest/smallest selection, and sorted/unsorted output
Adds pnnx pass to convert ONNX TopK nodes to ncnn TopK layer with proper batch axis handling
Adds comprehensive test coverage for 1D, 2D, 3D, and 4D tensors with various axis configurations

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/layer/topk.h	Header file defining TopK layer class with parameters for axis, largest, sorted, and k
src/layer/topk.cpp	Implementation of TopK layer supporting 1-4D tensors with efficient sorting using std::partial_sort and std::nth_element
src/CMakeLists.txt	Registers TopK layer in build system (correctly ordered alphabetically)
tools/pnnx/src/pass_ncnn/TopK.cpp	PNNX pass to convert ONNX TopK operations to ncnn format, handling batch axis removal and parameter mapping
tools/pnnx/src/CMakeLists.txt	Adds TopK pass to pnnx build (correctly ordered alphabetically)
tests/test_topk.cpp	Test cases covering 1D through 4D tensors with various axis, k, and largest parameters
tests/CMakeLists.txt	Registers TopK test (minor ordering issue - should come after Tile)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-02-27T01:14:43Z

Please enable github action in YOUR FORKED REPO to make code-format workflow work

mlknz · 2026-04-09T15:29:38Z

any updates on this?

- Generate TopK class definition in pnnx.py output with forward() method - Instantiate TopK modules in Model.__init__() with proper parameters - Update forward() method to call self.topk_name() instead of direct TopK() calls - Fixes pnnx inference to properly execute TopK operations using torch.topk() - Test confirms TopK ONNX→pnnx conversion and inference working correctly

- Fix IR pattern syntax to use explicit parameter names (axis=%, largest=%, sorted=%) - Replace incorrect parameter lookup from 'op_0.axis' to 'axis' to match captured names - TopK pass now properly fires during ONNX→pnnx→ncnn conversion - All TopK parameters (axis, largest, sorted) correctly captured and set in ncnn layers - End-to-end test confirms ONNX→pnnx→ncnn conversion with TopK working correctly

use c++03-style topk comparator and keep deterministic nan/inf ordering remove redundant constructor param initialization fix tests cmakelists alphabetical order (Tile before TopK) expand torch_topk onnx tests (k=0/k=1, negative dim, sorted=false cases) drop generated topk onnx/pnnx/ncnn sidecar artifacts from repo

- Guard <algorithm>/<vector> behind #if NCNN_SIMPLESTL, include simplestl.h - Use std::partial_sort in simplestl mode (no std::nth_element available) - Guard <math.h> in tests behind #if !NCNN_SIMPLESTL to avoid simplemath.h conflict; define INFINITY/NAN as float expressions in simplestl mode - Fix cstep-unaware indexing for 3D/4D output tensors: use actual cstep for channel offset instead of assuming contiguous w*h layout - Convert #pragma omp parallel + inner #pragma omp for to #pragma omp parallel for to avoid __kmpc_barrier in simpleomp mode - Fix copyright year 2026->2025 - Apply code-format whitespace cleanup

Topk ci tests

vlordier · 2026-04-10T13:15:08Z

@mlknz seems it's passing the tests and CI, give it a try and let me know

- pass_level2/torch_topk.cpp: capture k/dim/largest/sorted as parameters (prim::Constant) instead of tensor inputs, enabling ncnn pass matching - pass_level2/torch_gather.cpp: restore original pattern (dim as tensor) - pass_ncnn/TopK.cpp: match torch.topk with captured parameters and convert to ncnn TopK layer (axis, largest, sorted) - pass_ncnn/torch_gather.cpp (NEW): match torch.gather with 2 inputs (input, index) and captured dim parameter, convert to ncnn Gather layer - src/layer/gather.{h,cpp} (NEW): implement Gather ncnn operator supporting 1D/2D/3D tensors with arbitrary axis - PNNX CMakeLists fixes: - per-target Torch include dirs to avoid protobuf header conflicts - Abseil linking for Homebrew protobuf 34.x - disable onnxruntime auto-detection (protobuf conflict) - directory-level INCLUDE_DIRECTORIES_BEFORE for protobuf headers Verified: YOLOv10n converts with 2 TopK + 2 Gather layers, only cosmetic ops (Tensor.to, pnnx.Expression) ignored. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

- src/layer/cast.{h,cpp}: extend Cast layer with int64 (type 5) and int32 (type 6) support, adding conversions int64↔float32 and int32↔float32 - pass_ncnn/tensor_to.cpp (NEW): convert Tensor.to (dtype cast) to ncnn Cast layer, mapping torch dtype strings to ncnn type codes - CMakeLists.txt: register tensor_to.cpp in pass_ncnn sources Verified: YOLOv10n Tensor.to (i64→f32) now converts to Cast layer instead of being ignored. Only cosmetic ops (pnnx.Expression) remain. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

…PR triggers to workflow - load_onnx.cpp does not use any onnxruntime API (no Ort* usage), so the include guards and #error are unnecessary and break builds for users who don't have onnxruntime installed - Add pull_request trigger and fix-pnnx-onnx-topk-support push trigger to topk-linux-test.yml so CI runs on this PR Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

…le ONNX pass This commit adds critical missing operators needed for YOLO26 model conversion to NCNN. New Operators: - GatherElements: ONNX GatherElements operator for tensor element gathering - Expand: ONNX Expand operator for tensor broadcasting with numpy semantics - Tile ONNX pass: Conversion pass for ONNX Tile operator (layer already exists) Changes: - Add src/layer/gatherelements.h and .cpp - Add src/layer/expand.h and .cpp - Add tools/pnnx/src/pass_ncnn/gatherelements.cpp - Add tools/pnnx/src/pass_ncnn/expand.cpp - Add tools/pnnx/src/pass_ncnn/tile.cpp - Update src/CMakeLists.txt to register GatherElements layer - Update tools/pnnx/src/CMakeLists.txt to register PNNX passes Implementation follows the pattern from PR Tencent#6558 (TopK/Gather/Cast). YOLO26 Operator Analysis (453 nodes, 28 unique ops): - 25 operators: Already supported in NCNN - 3 operators: Newly implemented (this commit) - 1 operator (Mod): Low priority, only 1 usage, can workaround Testing: - All files compile successfully - No compilation errors - Follows NCNN coding style and patterns Enables YOLO26 end2end NMS-free conversion with output shape [1, 300, 6]. References: - PR Tencent#6558: TopK/Gather/Cast implementation - YOLO26: https://arxiv.org/abs/2602.14582 - Issues: Tencent#6518, Tencent#6610 Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

- Tile: Added support for ONNX Tile operator with repeats as second input blob - Expand: Fixed implementation for proper shape broadcasting - Both operators now follow PR Tencent#6558 pattern - Maintains backward compatibility with parameter-based mode Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>

Copilot

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 12 comments.

Comments suppressed due to low confidence (1)

src/layer/cast.cpp:108

Cast advertises type_from=0 as “auto” (see cast.h), but Cast::forward() never resolves type_from==0 to an actual source type. With the new pnnx Tensor.to -> Cast lowering emitting type_from=0, this will produce uninitialized outputs. Add type_from==0 inference (e.g., from bottom_blob.elemsize / known flags) or explicitly treat 0 as float32 when no better signal exists, and return an error for unsupported conversions.

int Cast::forward(const Mat& bottom_blob, Mat& top_blob, const Option& opt) const
{
    if (type_from == type_to)
    {
        top_blob = bottom_blob;
        return 0;
    }

    int w = bottom_blob.w;
    int h = bottom_blob.h;
    int d = bottom_blob.d;
    int channels = bottom_blob.c;
    int dims = bottom_blob.dims;
    size_t elemsize = bottom_blob.elemsize;
    int elempack = bottom_blob.elempack;

    size_t out_elemsize = elemsize;
    if (type_to == 1)
    {
        // float32
        out_elemsize = 4 * elempack;
    }
    else if (type_to == 2)
    {
        // float16
        out_elemsize = 2 * elempack;
    }
    else if (type_to == 3)
    {
        // int8
        out_elemsize = elempack;
    }
    else if (type_to == 4)
    {
        // bfloat16
        out_elemsize = 2 * elempack;
    }
    else if (type_to == 5)
    {
        // int64
        out_elemsize = 8 * elempack;
    }
    else if (type_to == 6)
    {
        // int32
        out_elemsize = 4 * elempack;
    }

    if (dims == 1)
    {
        top_blob.create(w, out_elemsize, elempack, opt.blob_allocator);
    }
    else if (dims == 2)
    {
        top_blob.create(w, h, out_elemsize, elempack, opt.blob_allocator);
    }
    else if (dims == 3)
    {
        top_blob.create(w, h, channels, out_elemsize, elempack, opt.blob_allocator);
    }
    else if (dims == 4)
    {
        top_blob.create(w, h, d, channels, out_elemsize, elempack, opt.blob_allocator);
    }
    if (top_blob.empty())
        return -100;

    int size = w * h * d * elempack;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-16T12:42:27Z

+    return 0
+           || test_topk(a, 0, 1, 1, 1)
+           || test_topk(a, 0, 5, 1, 1)
+           || test_topk(a, 0, 1, 0, 0)
+           || test_topk(a, -1, 7, 0, 1)
+           || test_topk(a, 0, 4, 1, 0)
+           || test_topk(a, 0, 9, 1, 1);
+}
+
+static int test_topk_1()
+{
+    ncnn::Mat a = RandomMat(12, 17);
+
+    return 0
+           || test_topk(a, 0, 1, 1, 1)
+           || test_topk(a, 0, 5, 1, 1)
+           || test_topk(a, 1, 3, 1, 1)
+           || test_topk(a, -1, 8, 0, 1)
+           || test_topk(a, 1, 6, 0, 0)
+           || test_topk(a, -2, 7, 1, 1);
+}
+
+static int test_topk_2()
+{
+    ncnn::Mat a = RandomMat(8, 9, 11);
+
+    return 0
+           || test_topk(a, 0, 3, 1, 1)
+           || test_topk(a, 1, 4, 1, 1)
+           || test_topk(a, 2, 2, 0, 1)
+           || test_topk(a, 2, 5, 1, 0)
+           || test_topk(a, -1, 6, 1, 1)
+           || test_topk(a, -2, 5, 0, 1)
+           || test_topk(a, -3, 7, 1, 1);


Missing semicolon after return 0 causes a compile error, blocking the new test_topk target from building. Add the semicolon so the chained || test_topk(...) expressions compile.

Suggested change

return 0

|| test_topk(a, 0, 1, 1, 1)

|| test_topk(a, 0, 5, 1, 1)

|| test_topk(a, 0, 1, 0, 0)

|| test_topk(a, -1, 7, 0, 1)

|| test_topk(a, 0, 4, 1, 0)

|| test_topk(a, 0, 9, 1, 1);

}

static int test_topk_1()

{

ncnn::Mat a = RandomMat(12, 17);

return 0

|| test_topk(a, 0, 1, 1, 1)

|| test_topk(a, 0, 5, 1, 1)

|| test_topk(a, 1, 3, 1, 1)

|| test_topk(a, -1, 8, 0, 1)

|| test_topk(a, 1, 6, 0, 0)

|| test_topk(a, -2, 7, 1, 1);

}

static int test_topk_2()

{

ncnn::Mat a = RandomMat(8, 9, 11);

return 0

|| test_topk(a, 0, 3, 1, 1)

|| test_topk(a, 1, 4, 1, 1)

|| test_topk(a, 2, 2, 0, 1)

|| test_topk(a, 2, 5, 1, 0)

|| test_topk(a, -1, 6, 1, 1)

|| test_topk(a, -2, 5, 0, 1)

|| test_topk(a, -3, 7, 1, 1);

return 0 ||

test_topk(a, 0, 1, 1, 1) ||

test_topk(a, 0, 5, 1, 1) ||

test_topk(a, 0, 1, 0, 0) ||

test_topk(a, -1, 7, 0, 1) ||

test_topk(a, 0, 4, 1, 0) ||

test_topk(a, 0, 9, 1, 1);

}

static int test_topk_1()

{

ncnn::Mat a = RandomMat(12, 17);

return 0 ||

test_topk(a, 0, 1, 1, 1) ||

test_topk(a, 0, 5, 1, 1) ||

test_topk(a, 1, 3, 1, 1) ||

test_topk(a, -1, 8, 0, 1) ||

test_topk(a, 1, 6, 0, 0) ||

test_topk(a, -2, 7, 1, 1);

}

static int test_topk_2()

{

ncnn::Mat a = RandomMat(8, 9, 11);

return 0 ||

test_topk(a, 0, 3, 1, 1) ||

test_topk(a, 1, 4, 1, 1) ||

test_topk(a, 2, 2, 0, 1) ||

test_topk(a, 2, 5, 1, 0) ||

test_topk(a, -1, 6, 1, 1) ||

test_topk(a, -2, 5, 0, 1) ||

test_topk(a, -3, 7, 1, 1);

Copilot · 2026-04-16T12:42:28Z

        return R"PNNXIR(7767517
-7 7
+12 7
 pnnx.Input              input_0     0 1 input
-pnnx.Input              input_1     0 1 k
-pnnx.Input              input_2     0 1 dim
-pnnx.Input              input_3     0 1 largest
-pnnx.Input              input_4     0 1 sorted
-aten::topk              op_0        5 2 input k dim largest sorted values indices
+prim::Constant          op_0        0 1 k value=%k
+prim::Constant          op_1        0 1 dim value=%dim
+prim::Constant          op_2        0 1 largest value=%largest
+prim::Constant          op_3        0 1 sorted value=%sorted
+aten::topk              op_4        5 2 input k dim largest sorted values indices


The PNNXIR header 12 7 does not match the actual number of operator lines in this pattern (7). This mismatch will prevent the graph rewriter from matching aten::topk nodes. Update the header counts to reflect the pattern contents (likely 7 7).

Copilot · 2026-04-16T12:42:28Z

+            fprintf(pyfp, "        self.%s = TopK(", sanitize_identifier(op->name).c_str());
+
+            int i = 0;
+            for (const auto& it : op->params)
+            {
+                fprintf(pyfp, "%s=", it.first.c_str());
+
+                const Parameter& param = it.second;
+                if (param.type == 2)
+                {
+                    fprintf(pyfp, "%d", param.i);
+                }
+                else if (param.type == 1)
+                {
+                    fprintf(pyfp, "%d", param.b ? 1 : 0);
+                }
+
+                if (i + 1 != op->params.size())
+                    fprintf(pyfp, ", ");
+                i++;
+            }
+
+            fprintf(pyfp, ")\n");


When instantiating TopK modules, this loop prints it.first as a Python keyword (e.g., 0=..., 1=...). ncnn-style param keys are numeric strings, which are not valid Python identifiers and will generate invalid Python code. Map these param ids to named arguments (axis, largest, sorted, k) or emit positional arguments in a fixed order.

Copilot · 2026-04-16T12:42:28Z

+        return -100;
+
+    const float* inp = input_blob;
+    const int* idx = (const int*)index_blob;


Gather::forward() reinterprets the index tensor as const int* (idx = (const int*)index_blob), but ncnn (and this PR’s TopK/ArgMax) commonly represent indices as float values stored in a float buffer. Reinterpreting float bits as int will produce huge indices and out-of-bounds reads. Read indices as float and cast to int (or branch based on index_blob.elemsize / the Cast output type).

Suggested change

const int* idx = (const int*)index_blob;

const float* idx = index_blob;

Copilot · 2026-04-16T12:42:28Z

+        int rem = i;
+        int coord_out[4] = {0, 0, 0, 0};
+        if (top_blob.dims == 1)
+        {
+            coord_out[0] = rem;
+        }
+        else if (top_blob.dims == 2)
+        {
+            coord_out[0] = rem % top_blob.w;
+            coord_out[1] = rem / top_blob.w;
+        }
+        else if (top_blob.dims == 3)
+        {
+            int hw = top_blob.w * top_blob.h;
+            coord_out[0] = (rem % hw) % top_blob.w;
+            coord_out[1] = (rem % hw) / top_blob.w;
+            coord_out[2] = rem / hw;
+        }
+
+        // Get index value at this output position
+        int gather_idx = idx[i];
+        // Handle negative indices
+        if (gather_idx < 0) gather_idx += axis_dim_size;
+
+        // Build input coordinate (same as output, but axis coord replaced)
+        int coord_in[4] = {coord_out[0], coord_out[1], coord_out[2], coord_out[3]};
+        coord_in[positive_axis] = gather_idx;
+
+        // Clamp to input bounds
+        if (coord_in[positive_axis] >= axis_dim_size) coord_in[positive_axis] = axis_dim_size - 1;
+        if (coord_in[positive_axis] < 0) coord_in[positive_axis] = 0;
+
+        // Compute flat input index
+        int flat_in = 0;
+        if (dims == 1)
+        {
+            flat_in = coord_in[0];
+        }
+        else if (dims == 2)
+        {
+            flat_in = coord_in[0] + coord_in[1] * input_blob.w;
+        }
+        else if (dims == 3)
+        {
+            // ncnn 3D layout: w * h * c, with cstride padding
+            size_t cstep = input_blob.cstep;
+            flat_in = coord_in[0] + coord_in[1] * input_blob.w + coord_in[2] * (int)cstep;
+        }
+


This implementation only decomposes output coordinates / computes flat_in for dims up to 3. For 4D inputs (and/or 4D index tensors), coord_out/flat_in are left partially unset and flat_in stays 0, so most outputs will incorrectly read the first element. Add explicit handling for dims==4 (both coordinate decomposition and flat index computation, including cstep/d strides).

Copilot · 2026-04-16T12:42:30Z

+            type_to = 5;
+        else if (dtype == "torch.int32" || dtype == "torch.int")
+            type_to = 6;
+
+        op->params["0"] = 0; // auto-detect input type
+        op->params["1"] = type_to;
+    }


Tensor.to is lowered to Cast with type_from=0 (“auto”), but Cast::forward() currently has no logic to infer the source type when type_from==0, so none of the conversion branches will run and the output buffer will be left uninitialized. Either implement type_from==0 auto-detection in Cast or set an explicit type_from here based on the actual input dtype tracked by pnnx.

Copilot · 2026-04-16T12:42:30Z

+        int new_axis = axis;
+        if (axis >= 0)
+            new_axis = axis > batch_index ? axis - 1 : axis;
+
+        op->params["0"] = new_axis;
+        op->params["1"] = largest;
+        op->params["2"] = sorted;
+    }


The ncnn TopK layer expects k via param id 3 (or via a second input blob), but this rewrite pass never writes k into op->params. As a result, converted graphs will use the default k=1 regardless of the original torch.topk(..., k=...). Capture and write k (and consider supporting the ONNX-style second input for dynamic K if needed).

Copilot · 2026-04-16T12:42:30Z

+            fprintf(pyfp, "    def __init__(self, axis=1, largest=1, sorted=1):\n");
+            fprintf(pyfp, "        super(TopK, self).__init__()\n");
+            fprintf(pyfp, "        self.axis = axis\n");
+            fprintf(pyfp, "        self.largest = largest\n");
+            fprintf(pyfp, "        self.sorted = sorted\n");
+            fprintf(pyfp, "    def forward(self, x, k):\n");


The generated Python TopK module takes forward(self, x, k), but the pnnx->ncnn lowering for torch.topk in this PR models k as a captured parameter (not an input tensor). If the resulting TopK operator only has one input, the emitted self.<name>(v_input) call will raise at runtime due to the missing k argument. Align the Python stub with how k is represented (store k as a module attribute and accept only x, or ensure the operator keeps a second k input).

Suggested change

fprintf(pyfp, " def __init__(self, axis=1, largest=1, sorted=1):\n");

fprintf(pyfp, " super(TopK, self).__init__()\n");

fprintf(pyfp, " self.axis = axis\n");

fprintf(pyfp, " self.largest = largest\n");

fprintf(pyfp, " self.sorted = sorted\n");

fprintf(pyfp, " def forward(self, x, k):\n");

fprintf(pyfp, " def __init__(self, axis=1, largest=1, sorted=1, k=None):\n");

fprintf(pyfp, " super(TopK, self).__init__()\n");

fprintf(pyfp, " self.axis = axis\n");

fprintf(pyfp, " self.largest = largest\n");

fprintf(pyfp, " self.sorted = sorted\n");

fprintf(pyfp, " self.k = k\n");

fprintf(pyfp, " def forward(self, x, k=None):\n");

fprintf(pyfp, " k = self.k if k is None else k\n");

fprintf(pyfp, " if k is None:\n");

fprintf(pyfp, " raise TypeError('TopK.forward() missing required argument: k')\n");

Copilot · 2026-04-16T12:42:30Z

+
+    // Allocate output (same dtype as input, shape matches index)
+    Mat& top_blob = top_blobs[0];
+    top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);


top_blob.create(out_shape.w, out_shape.h, out_shape.c, ...) always creates a 3D output, even when index_blob is 1D/2D/4D. That changes the output rank and breaks the coordinate decomposition logic below (which depends on top_blob.dims). Allocate top_blob using the correct create() overload based on out_shape.dims so the output shape truly matches the index tensor.

Suggested change

top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);

if (out_shape.dims == 1)

top_blob.create(out_shape.w, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);

else if (out_shape.dims == 2)

top_blob.create(out_shape.w, out_shape.h, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);

else if (out_shape.dims == 3)

top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);

else if (out_shape.dims == 4)

top_blob.create(out_shape.w, out_shape.h, out_shape.d, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);

else

return -1;

Copilot · 2026-04-16T12:42:31Z

+            else if (dim_p.type == 5 && !dim_p.ai.empty())
+                axis = dim_p.ai[0];
+        }
+


This lowering writes axis directly from dim, but other ncnn rewrite passes drop/renumber the batch axis using __batch_index (see e.g. torch_flip/TopK). Without the same adjustment here, Gather will target the wrong dimension after batch removal. Apply the same batch-axis handling (reject gather along batch, and decrement axes > batch_index).

Suggested change

if (op->params.find("__batch_index") != op->params.end())

{

const Parameter& batch_index_p = op->params.at("__batch_index");

if (batch_index_p.type == 2)

{

const int batch_index = batch_index_p.i;

// ncnn lowers tensor ranks with the batch axis removed.

// gather along the removed batch axis cannot be represented here.

if (axis == batch_index)

return;

if (axis > batch_index)

axis -= 1;

}

}

- ir.cpp: store k in TopK.__init__, use forward(self, x) — k was a ctor param but forward() expected it as an input arg, causing runtime error - ir.cpp: pass k= in TopK instantiation (k_val from params["3"]) - gather.cpp: reject non-float32 data (elemsize != 4) and dims > 3 explicitly - pnnx/src/CMakeLists: replace invalid set_property(INCLUDE_DIRECTORIES_BEFORE) with include_directories(BEFORE ...) to correctly force protobuf header order - pnnx/tests/onnx: add test_torch_gather.py roundtrip test (1D/2D/3D, multiple axes, negative axis) and register it in CMakeLists

vlordier · 2026-04-16T14:57:52Z

Superseded by #6669 which consolidates all operators (TopK, Gather, GatherElements, Expand, Tile, Mod) with all review issues addressed.

Copilot AI review requested due to automatic review settings February 26, 2026 22:24

github-actions Bot added core tool pnnx test layer labels Feb 26, 2026

Copilot started reviewing on behalf of vlordier February 26, 2026 22:25 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Comment thread tests/test_topk.cpp

Comment thread tests/CMakeLists.txt Outdated

Comment thread src/layer/topk.cpp Outdated

Comment thread src/layer/topk.h Outdated

Comment thread tests/test_topk.cpp Outdated

Comment thread tools/pnnx/src/pass_ncnn/TopK.cpp Outdated

vlordier force-pushed the fix-pnnx-onnx-topk-support branch from 7cc5ae0 to 830149c Compare April 10, 2026 10:19

github-actions Bot added riscv vulkan arm loongarch mips x86 doc labels Apr 10, 2026

vlordier added 11 commits April 10, 2026 12:24

Add TopK layer and pnnx ONNX TopK lowering

8bd7d30

Add ONNX torch_topk pnnx regression test

b2c445a

tests: add sorted=0 coverage for topk

4b4b87a

tests: remove generated topk onnx artifacts

c9e856e

pnnx: drop unrelated cmake and symlink changes

4d5b35f

topk: reuse per-thread scratch buffer in forward

5c11058

topk: optimize sorted path and k=0 fast return

226bd88

topk: add k=1 fast path for embedded runtime

6c5978b

vlordier and others added 8 commits April 10, 2026 12:24

topk: fast path unsorted full-k copy

1fe4463

topk: add small-k hot path for embedded runtime

6ea29eb

topk: add guarded neon fast path for k=1

7befff6

topk: fix neon k=1 inf initialization edge case

5ba7fbc

topk: make neon mask check arm-portable

e4b4073

topk: optimize small-k unsorted selection path

49dbc7b

tests: add values-only topk coverage in cpp and onnx

9d31f3b

vlordier force-pushed the fix-pnnx-onnx-topk-support branch from 830149c to 84e083b Compare April 10, 2026 10:25

vlordier and others added 6 commits April 10, 2026 12:28

apply code-format

2ea44dd

apply code-format changes

5674b1c

ci: add topk test coverage and pnnx onnx test

caa9de3

ci: fix pnnx test invocation — use ctest

4e39cb6

apply code-format changes

ca55f8a

Merge pull request #2 from vlordier/topk-ci-tests

2b5fa16

Topk ci tests

vlordier and others added 4 commits April 11, 2026 00:05

apply code-format changes

d68852d

vlordier mentioned this pull request Apr 11, 2026

feat: add TopK, Gather, GatherElements, Expand, Tile, and Mod operators #6669

Open

Merge branch 'master' into fix-pnnx-onnx-topk-support

9db3d84

vlordier requested a review from Copilot April 16, 2026 12:30

Copilot started reviewing on behalf of vlordier April 16, 2026 12:31 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

vlordier closed this Apr 16, 2026

	const int* idx = (const int*)index_blob;
	const float* idx = index_blob;

-    top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
+    if (out_shape.dims == 1)
+        top_blob.create(out_shape.w, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
+    else if (out_shape.dims == 2)
+        top_blob.create(out_shape.w, out_shape.h, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
+    else if (out_shape.dims == 3)
+        top_blob.create(out_shape.w, out_shape.h, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
+    else if (out_shape.dims == 4)
+        top_blob.create(out_shape.w, out_shape.h, out_shape.d, out_shape.c, input_blob.elemsize, input_blob.elempack, opt.blob_allocator);
+    else
+        return -1;

+        if (op->params.find("__batch_index") != op->params.end())
+        {
+            const Parameter& batch_index_p = op->params.at("__batch_index");
+            if (batch_index_p.type == 2)
+            {
+                const int batch_index = batch_index_p.i;
+                // ncnn lowers tensor ranks with the batch axis removed.
+                // gather along the removed batch axis cannot be represented here.
+                if (axis == batch_index)
+                    return;
+                if (axis > batch_index)
+                    axis -= 1;
+            }
+        }

Conversation

vlordier commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Validation

Uh oh!

tencent-adm commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Feb 27, 2026

Uh oh!

mlknz commented Apr 9, 2026

Uh oh!

vlordier commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

vlordier commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vlordier commented Feb 26, 2026 •

edited

Loading

tencent-adm commented Feb 26, 2026 •

edited

Loading