feat(ascend-framwork): framework scaffolding + CI/generator fixes for operator split by zhangyue207 · Pull Request #64 · InfiniTensor/InfiniOps

zhangyue207 · 2026-04-18T05:32:39Z

Summary

Prerequisite PR for the Ascend operator split. Ships the shared framework,
build, and test infrastructure required by the three follow-up category
PRs, plus three independent bug fixes surfaced during the split.

No operator kernels are added in this PR — each category PR (op-simple /
op-norm-rope / op-cache-attn) ships its operators atomically together with
their src/base/<op>.h declarations and tests/test_<op>.py.

src/ascend/common.h — AclTensorCache + toAclDtype helpers
src/ascend/workspace_pool_.h — stream-scoped workspace pool with named
arenas; entry points are GetWorkspacePool() / Pool::Ensure()
(matches master PR refactor: group backends by hardware category #60)
src/ascend/atb_common_.h — ATB Context management + toAtbTensor for
operators wrapping ATB APIs
src/ascend/runtime_.h, data_type_.h, device_.h — Ascend-specific
Runtime and type-mapping specializations

Custom-kernel build infra: src/ascend/custom_kernel/ — AscendC
standalone build system producing libascend_kernel.so. Gated by its own
CMakeLists.txt; the actual kernels (rms_norm, add_rms_norm) are
consumed by kernel_custom.h variants that land in the follow-up norm-rope
PR.

Core framework (shared helpers referenced by the generator):

src/hash.h — cache-key plumbing for Operator::call()
src/operator.h — Operator<Key, device, N> dispatch + SFINAE
ActiveImplementationsImpl
src/pybind11_utils.h — tensor / optional-tensor / vector-tensor casters

Build (CMakeLists.txt, src/CMakeLists.txt) — Ascend target, ATB
discovery, WITH_ASCEND option; no per-op add_subdirectory needed
because the Ascend source list is glob-based (ascend/*.cc *.cpp).

Test infra (tests/conftest.py, tests/utils.py) — Payload,
randn_strided, get_npu_stream, get_stream(device), and device
parametrization (--devices ascend/nvidia/...).

CI (.ci/run.py, .ci/images/ascend/Dockerfile) — unchanged apart
from the fixes below.

Three bug fixes (bundled with related files)

`fix(scripts): align` py::arg `order with C++ lambda params`

The pybind11 bindings emitted by scripts/generate_wrappers.py listed
py::arg entries in a different order than the C++ lambda parameters.
When callers used kwargs, implementation_index and stream were
silently swapped — the stream integer landed in the impl-index slot, and
dispatch SIGABRT'd. Re-order so kwarg names line up positionally with the
C++ signature. Also generalize = py::none() default emission to all
std::optional<...> parameters (previously only std::optional<Tensor>).

`fix(ci): treat exit 137 as success when pytest junit XML is clean`

Docker 18.09 occasionally SIGKILLs the container during its chown
teardown step, so .ci/run.py exits with code 137 even when pytest itself
completed normally. Parse /workspace/results/test-results.xml errors /
failures fields and treat 137 as success when pytest reports no
failures.

`test(conftest): auto-skip tests whose op has no impl on the target device`

Adds a skip_op_without_platform_impl autouse fixture that derives the
InfiniOps class name from the test module filename (tests/test_<snake>.py
→ <Snake>) and checks active_implementation_indices for the
parametrized device. When the op has no backend specialization, the test
is skipped instead of SIGABRTing through Operator<Op, device>::Make().

This is essential for the operator split: each per-category branch
contains only its category's Ascend impls but inherits test files for all
operators from master. Without this guard, pytest tests/ --devices ascend crashes on ops lacking ascend impls on the branch.

Test-file updates (for master-existing ops)

tests/test_add.py, test_gemm.py, test_causal_softmax.py,
test_rms_norm.py, test_swiglu.py updated to match the framework PR's
tests/utils.py helpers and to guard Ascend dispatch against unsupported
implementation_index values (fixes a pre-existing master bug where
Operator<Gemm, kAscend, 2> was mistakenly reported active by SFINAE
because src/torch/gemm/gemm.h specializes Operator<Gemm, kDev, 2> for
all kDev).

Verification

python3 .ci/run.py --local --gpu-id <N> — framework-only build
succeeds; libinfiniops.so links cleanly (Ascend glob finds only the
src/ascend/gemm/kernel.h that already exists on master; generator
emits bindings for master's 10 base headers).
tests/test_gemm.py — 1500 skipped expected (Ascend-only parametrize;
only impl=0 runs, impl=1/2 skip via active_implementation_indices
guard introduced in this PR).

Merge order

This PR must merge first. The three follow-up category PRs rebase
onto master after this lands.

Test plan

python3 .ci/run.py --local — framework build + master's gemm
tests pass on Ascend 910B / CANN 8.5.1
scripts/generate_wrappers.py --devices ascend — generates
generated/bindings/ops.cc with the 10 master base headers; no
link error
clang-format passes locally on all tracked *.h / *.cc /
*.cuh / *.mlu
CUDA / Metax / Cambricon / Moore / Iluvatar regressions (CI-verified
— this PR does not alter those backends' operator paths)

Docker 18.09 occasionally SIGKILLs the container during its `chown` teardown step, causing `.ci/run.py` to exit 137 even when pytest completed normally. Parse `/workspace/results/test-results.xml` for `errors` / `failures` fields and treat 137 as success when pytest reports no failures. Also bundles a small Dockerfile update for the Ascend image used by `.ci/run.py`.

… defaults Two fixes in the pybind11 bindings generator: 1. `py::arg("implementation_index")` was emitted before `py::arg("stream")` in the generated `def(...)` call, but the C++ lambda parameters were declared in the opposite order. Kwargs then silently swapped — the stream integer landed in the impl-index slot, and dispatch SIGABRT'd. Re-order so `py::arg` entries are positional-consistent with the C++ lambda signature. 2. Only `std::optional<Tensor>` parameters had a `= py::none()` default; `std::optional<int64_t>` (and other scalar optionals) had no default, forcing callers to pass them explicitly. Generalize the default emission to all `std::optional<...>` parameters.

Framework headers shared across all Ascend operators: - `common.h`: `AclTensorCache` descriptor-caching + `toAclDtype` helpers - `workspace_pool_.h`: stream-scoped `WorkspacePool` with named arenas; `GetWorkspacePool()` / `Pool::Ensure()` entry points (matches master PR #60 naming) - `atb_common_.h`: ATB `Context` management + `toAtbTensor` helper for operators wrapping ATB APIs - `data_type_.h`, `device_.h`: `TypeMap<Ascend, T>` + `Runtime` specialization - `runtime_.h` is the existing file; left untouched by this PR `custom_kernel/` ships the AscendC standalone build system for custom kernels. Gated by its own `CMakeLists.txt`; produces `libascend_kernel.so` consumed by `kernel_custom.h` op variants (landed in follow-up category PRs).

Shared changes needed by every Ascend operator PR: - `src/hash.h` + `src/operator.h`: cache-key plumbing used by `Operator<Op, device>` dispatch - `src/pybind11_utils.h`: tensor / optional-tensor / vector-tensor pybind11 casters used by the generator output - `CMakeLists.txt` + `src/CMakeLists.txt`: Ascend build target, atb discovery, `WITH_ASCEND` option - `tests/conftest.py`: `auto_act_and_assert` fixture + device parametrization (`--devices ascend/nvidia/...`) - `tests/utils.py`: `Payload`, `randn_strided`, `get_npu_stream`, and similar test helpers shared by every `tests/test_<op>.py`

…vice Adds a `skip_op_without_platform_impl` autouse fixture that derives the InfiniOps class name from the test module filename (`tests/test_<snake>.py` → `<Snake>`) and checks `active_implementation_indices` for the parametrized device. When the op has no backend specialization on the current branch, the test is skipped instead of SIGABRTing through `Operator<Op, device>::Make()`. This is essential for the operator split: each per-category branch contains only its category's Ascend impls but inherits test files for all operators from master. Without this guard, `pytest tests/ --devices ascend` crashes on ops lacking ascend impls on the branch.

zhangyue207 · 2026-04-20T01:58:18Z

merge test:

[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape3-input_strides3-gate_strides3-out_strides3] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape4-None-None-None] 
[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape4-None-None-None] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape5-input_strides5-gate_strides5-out_strides5] 
[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape5-input_strides5-gate_strides5-out_strides5] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape6-None-None-None] 
[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape6-None-None-None] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape7-input_strides7-gate_strides7-out_strides7] 
[gw0] [100%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape7-input_strides7-gate_strides7-out_strides7] 

----------- generated xml file: /workspace/results/test-results.xml ------------
===================== 3767 passed, 1664 skipped in 45.95s ======================
========== Summary ==========
[warn] job ascend_npu: container exited with 137 (likely docker teardown SIGKILL after clean pytest); junit XML reports no failures — treating as success
EXIT=0

voltjia

更改有些太多了，只能先看到这个程度了。很多问题应该是共通的，让 AI 把每个问题都重视起来，确保全部更改都没问题。除了这些以外，有些架构上面的问题，我看在 src 里面引入了 custom_kernel 和 csrc 文件夹，还有 test 和 tests，以及一些 .gitignore 和看上去是 AI 开发过程中所需的 plan 似的文档，这部分内容按理说是不应引入的。还有就是引入这些所额外引入的 CMake 各种文件和 *.sh 各种脚本应该也去掉，或者统一在项目外面的 CMake 以及 scripts 里面去做。对于一些算子，我看有 op_host 和 op_kernel 文件夹，首先需要确定是否一定需要分离，如果确定需要分离的话，现在其他平台都只是改变后缀即可，比如 .h 和 .cuh 这种，不确定昇腾是否有相应后缀，如果没有的话，也可以使用 kernel.h 和 kernel_.h 这种方式，或者 host.h 和 device.h 这种方式，主要是避免过度分离文件。这一版 review 不一定全，因为东西太多，我们先按这个思路改一下，改完之后可能还需要再过一遍。主要是让 AI 全面扫一下，每个 comment 可能都不止是那一处。感谢越总，辛苦了！

zhangyue207 · 2026-04-20T08:29:26Z

更改有些太多了，只能先看到这个程度了。很多问题应该是共通的，让 AI 把每个问题都重视起来，确保全部更改都没问题。除了这些以外，有些架构上面的问题，我看在 src 里面引入了 custom_kernel 和 csrc 文件夹，还有 test 和 tests，以及一些 .gitignore 和看上去是 AI 开发过程中所需的 plan 似的文档，这部分内容按理说是不应引入的。还有就是引入这些所额外引入的 CMake 各种文件和 *.sh 各种脚本应该也去掉，或者统一在项目外面的 CMake 以及 scripts 里面去做。对于一些算子，我看有 op_host 和 op_kernel 文件夹，首先需要确定是否一定需要分离，如果确定需要分离的话，现在其他平台都只是改变后缀即可，比如 .h 和 .cuh 这种，不确定昇腾是否有相应后缀，如果没有的话，也可以使用 kernel.h 和 kernel_.h 这种方式，或者 host.h 和 device.h 这种方式，主要是避免过度分离文件。这一版 review 不一定全，因为东西太多，我们先按这个思路改一下，改完之后可能还需要再过一遍。主要是让 AI 全面扫一下，每个 comment 可能都不止是那一处。感谢越总，辛苦了！

这个是ascend kernel开发给的固定模板，我们没有太多公开资料去修改它的整体结构，我可以做的就是把一些不用的plan 这类的东西删除，但是整体结构没办法改

… files Remove content that duplicates what the pytest integration tests (`tests/test_rms_norm.py`, `tests/test_add_rms_norm.py`) already cover, or that's developer scratchpad rather than checked-in artifact: - `csrc/ops/rms_norm/{README,design}.md` — design scratch - `csrc/ops/rms_norm/test/{benchmark_rms_norm_msprof,run_rms_norm_case}.py`, `rms_norm_cases.jsonl`, `rms_norm_perf_report.md`, `rms_norm-test-cases.md` — per-op perf benchmarking + reports - `tests/test_{rms_norm,add_rms_norm}.py` under custom_kernel/ — redundant with the top-level pytest integration tests Build infra, kernel sources, registration, and utility headers are unchanged; the `libascend_kernel.so` artifact and its consumers (`kernel_custom.h` variants in the op-norm-rope PR) are unaffected.

…dundant .gitignore entry Review items 1-5 on `scripts/generate_wrappers.py`: - Restore docstring quoting in `_find_optional_tensor_params` (reverts accidental change to ```int`` and the double-space). - Restore blank lines before `return` in `_find_optional_tensor_params`, `_is_optional_tensor`, and `_generate_params` / `_generate_arguments` (project CLAUDE.md Python style: "blank line before `return` unless inside a block body"). - Add missing blank line before `return` in `_find_vector_tensor_params` and `_is_vector_tensor`. - Drop redundant `import re` inside `_find_vector_tensor_params` — `re` is imported at module level. Review item 10 on `src/ascend/custom_kernel/.gitignore`: - Drop redundant `build/` entry (already ignored globally via the project-root `.gitignore`). Keep `output/` and `python/` — both are AscendC-specific build artifacts not covered by the root ignore.

…tch vllm-ascend/csrc layout Reviewer top-level feedback on PR #64: mirror the directory layout of https://github.com/vllm-project/vllm-ascend/tree/main/csrc and drop the extra nesting layers. Directory changes: - `src/ascend/custom_kernel/` → `src/ascend/custom/` - Merge `csrc/` into the top: move `csrc/register.cpp`, `csrc/ops.h`, `csrc/utils/` up one level. - Rename `register.cpp` → `torch_binding.cpp` to match vllm-ascend naming. - Promote `csrc/ops/<op>/` to `<op>/` at the top (drop the `ops/` layer). - Merge `csrc/CMakeLists.txt` content into top-level `CMakeLists.txt`; delete the now-empty `csrc/` layer. - Remove `src/ascend/custom_kernel/.gitignore` (root `.gitignore` already ignores `build/`; `output/`+`python/` were custom_kernel-scoped build artifacts that fit the root gitignore's scope too). Resulting layout: custom/ ├── build.sh ├── CMakeLists.txt ├── cmake/{config_ascend,config_envs}.cmake ├── ops.h ├── torch_binding.cpp (was `register.cpp`) ├── utils/torch_kernel_helper.h ├── rms_norm/{op_host,op_kernel}/rms_norm.cpp └── add_rms_norm/{op_host,op_kernel}/add_rms_norm.cpp License preservation: files shared in structure/substance with vllm-ascend (`torch_binding.cpp`, `ops.h`, `utils/torch_kernel_helper.h`, top-level `CMakeLists.txt`) now carry proper Apache License 2.0 headers with the original Huawei Technologies copyright preserved alongside InfiniTensor's modification copyright. Callers: - `src/CMakeLists.txt`: `custom_kernel` → `custom` in two references. - Root `CMakeLists.txt`: updated inline comment pointing to the build script. - Library name (`ascend_kernel`), static lib (`no_workspace_kernel`), and Python module name remain unchanged — `kernel_custom.h` consumers in the op-norm-rope PR link via those identifiers, not by path, so this rename does not ripple into that branch. CI: `.ci/run.py --local --gpu-id 0` passes 3072/1782 on Ascend 910B with `BUILD_CUSTOM_KERNEL=OFF` (default); the custom kernel build itself is exercised by the op-norm-rope PR's `kernel_custom.h` integration.

Scan-and-fix pass for patterns flagged in reviewer comments on `custom_kernel/` that also appear in other files in this PR. - `src/ascend/common.h`: wrap `aclTensor` in backticks in two comments (matches comment 9 on Markdown formatting in custom_kernel). - `tests/utils.py`: add missing blank line before trailing `return` in `get_stream()` (matches comments 3/5 on missing blank line before return in non-block-body context). No camelCase-local violations in the framework C++ headers (atb_common_, common, data_type_, device_, workspace_pool_, hash, operator, pybind11_utils) — reviewer comment 6 was specific to `custom/` op_host code adapted from vllm-ascend.

@voltjia

Reviewer @voltjia on PR #64 inline comments: - Comment 6: local variables must follow Google C++ Style Guide (`dimLength` → `dim_length`, etc.). Applied across all locals in the two op_host files. - Comment 7: namespace `ascend_kernel` is non-standard; use `detail` or `ascend::detail` to match other platforms. Renamed to `ascend::detail` in `ops.h`, `torch_binding.cpp`, `utils/torch_kernel_helper.h`, and both `op_host/*.cpp` files. The library name (`ascend_kernel` → `libascend_kernel.so`), `OP_PLUGIN_NAME`, and Python-import name are unchanged — those are compile/link identity and are independent of the C++ namespace. `kernel_custom.h` in op-norm-rope links via the C `extern` launch symbol, not the namespace, so this rename does not ripple into that branch. Also took the opportunity to backtick-wrap identifiers in comments that the rename touched. Inline comments 8 and 9 (Markdown formatting in comments) were already covered by the backtick pass in commit 0aed3a5 for non-custom files; the custom/ comments here also get normalized as a side-effect of rewriting the affected lines.

Scanned ALL 30 inline comments on PR #64 (not just the 10 visible in collapsed view). 22 had been missed by the earlier passes. Generator (scripts/generate_wrappers.py): - Comments 8-10: swap `stream` and `implementation_index` in both the pybind lambda parameters and the `py::arg` declarations, to match the `Operator::Call(Handle, Config, ...)` order (Handle first, Config second). Previously ordered impl_index first for lambda-signature alignment; with the swap, both are reordered together so kwargs still resolve correctly. - Comment 11: restore backticks around device names in `--devices` help text. - Comment 12: `.def_static("clear_cache", ...)` kept — it is the API used by the new `_clear_operator_caches` pytest fixture. CMakeLists.txt: - Comments 13-14: wrap `NEEDED` and `torch_npu` in Markdown backticks in comments. tests/conftest.py (comments 23-29): - Reset the file to master's content and re-apply only the two new fixtures (`_clear_operator_caches`, `skip_op_without_platform_impl`) with Markdown docstrings (single backticks, not rST double). Reverts incidental changes to `pytest_addoption` help text, `skip_unsupported_dtypes` rename, `_PLATFORM_TO_TORCH_DEVICE` dict order, `_resolve_device` docstring, and the `torch_npu` comment line-wrap. - Fix comment 27's concern: `_TORCH_DEVICE_TO_PLATFORMS` now maps one torch device type to multiple platforms (`cuda` → `{nvidia, metax, iluvatar}`) and `skip_op_without_platform_impl` checks `active_implementation_indices` across all of them; it skips only when every mapped platform reports empty. tests/utils.py: - Comment 16: remove `get_npu_stream`; `get_stream(device)` covers all torch device types. tests/test_{add,causal_softmax,gemm,rms_norm,swiglu}.py: - Comments 17-22: replace the `if device.type == "npu"` branches with a single call that passes `stream=get_stream(<tensor>.device)`. Single- line import restored in `test_add.py` (comment 22 — format minimization after dropping the `get_npu_stream` import). test_gemm.py specifically: moved the "impl=2 on Ascend is broken because of `src/torch/gemm/gemm.h` SFINAE pollution" workaround from the helper-level conditional into a `pytest.skip` at the top of the test body, so the helper itself becomes unconditional.

- `src/ascend/custom/utils/torch_kernel_helper.h`: clang-format wrapped a long `ConvertTypes` macro continuation. - `tests/test_add.py`: ruff `format` wrapped the 5-import `tests.utils` line (89 chars, over the default 88 limit) back into multi-line form. Reviewer comment 22 suggested restoring a single line after dropping `get_npu_stream`, but with `get_stream` added the shortened form still exceeds the ruff line-length cap.

…kticks Scan-and-fix pass for identifiers in comments that still lack Markdown backticks, matching reviewer comments 9, 11, 13, 14 on PR #64. Applied only to files authored / modified by this PR (leaves custom/cmake/ config_envs.cmake and similar vllm-ascend-verbatim content untouched to stay consistent with the upstream it was adapted from). - `CMakeLists.txt`: `pybind11` (line 7). - `src/ascend/common.h`: `shape`, `strides`, `storage_shape`, `dtype` in the `AclTensorCache` class doc. - `src/ascend/custom/CMakeLists.txt`: `AscendC` toolchain reference. - `src/ascend/custom/build.sh`: `AscendC`, `libascend_kernel.so`. - `src/ascend/custom/cmake/config_ascend.cmake`: `SOC_VERSION`, `CANN`, `AscendC`.

zhangyue207 · 2026-04-20T13:24:57Z

ascend:

tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape6-None-None-None] 
[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape6-None-None-None] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape7-input_strides7-gate_strides7-out_strides7] 
[gw0] [100%] SKIPPED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape7-input_strides7-gate_strides7-out_strides7] 

----------- generated xml file: /workspace/results/test-results.xml ------------
===================== 1572 passed, 3282 skipped in 37.82s ======================
========== Summary ==========

moore

Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy==1.13.1->torch->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.2)
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=514579 sha256=eb7c8e51edaa954d0b596df06890d030fc0af86e5761ba51a524ed285164b525
  Stored in directory: /tmp/pip-ephem-wheel-cache-4vo9x05h/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 25.2 -> 26.0.1
[notice] To update, run: python -m pip install --upgrade pip
========== Stage: build ==========
========== Summary ==========

iluvatar

 Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.2)
     Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from sympy->torch->InfiniOps==0.1.0) (1.3.0)
     Building wheels for collected packages: InfiniOps
       Building wheel for InfiniOps (pyproject.toml): started
       Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
       Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=329504 sha256=f30dd64e57d7c7e75e4ca27d2e1f086ea83de43225f2a2a19730ee68db54ae58
       Stored in directory: /tmp/pip-ephem-wheel-cache-fov27lij/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
     Successfully built InfiniOps
     Installing collected packages: InfiniOps
     Successfully installed InfiniOps-0.1.0

nvidia

Requirement already satisfied: nvidia-cuda-cupti==13.0.85.* in /home/zhangyue/python3.10/lib/python3.10/site-packages (from cuda-toolkit[cublas,cudart,cufft,cufile,cupti,curand,cusolver,cusparse,nvjitlink,nvrtc,nvtx]==13.0.2; platform_system == "Linux"->torch->InfiniOps==0.1.0) (13.0.85)
Requirement already satisfied: cuda-pathfinder~=1.1 in /home/zhangyue/python3.10/lib/python3.10/site-packages (from cuda-bindings<14,>=13.0.3->torch->InfiniOps==0.1.0) (1.5.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/zhangyue/python3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/zhangyue/python3.10/lib/python3.10/site-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.3)
Building wheels for collected packages: InfiniOps
  Building editable for InfiniOps (pyproject.toml) ... done
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=409566 sha256=9e3dcc598d9b8cd99f649e8060a257f1d1649d7e58e961fc19b75658a3bd4e91
  Stored in directory: /tmp/pip-ephem-wheel-cache-x_no3w_i/wheels/22/59/d1/fd9a553995db5fba4a2c3a44dd6adb7f24dd94303692b98e00
Successfully built InfiniOps
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
Successfully installed InfiniOps-0.1.0

[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0-1-a_shape3-b_shape3-c_shape3-a_strides3-b_strides3-c_strides3]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0-1-a_shape4-b_shape4-c_shape4-None-None-None]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0-1-a_shape4-b_shape4-c_shape4-None-None-None]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape0-b_shape0-c_shape0-None-None-None]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape0-b_shape0-c_shape0-None-None-None]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape1-b_shape1-c_shape1-None-None-None]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape1-b_shape1-c_shape1-None-None-None]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape2-b_shape2-c_shape2-a_strides2-b_strides2-c_strides2]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape2-b_shape2-c_shape2-a_strides2-b_strides2-c_strides2]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape3-b_shape3-c_shape3-a_strides3-b_strides3-c_strides3]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape3-b_shape3-c_shape3-a_strides3-b_strides3-c_strides3]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape4-b_shape4-c_shape4-None-None-None]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape4-b_shape4-c_shape4-None-None-None]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--0.5-a_shape0-b_shape0-c_shape0-None-None-None]
[gw4] [100%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--0.5-a_shape0-b_shape0-c_shape0-None-None-None]

----------- generated xml file: /workspace/results/test-results.xml ------------
================ 6016 passed, 3692 skipped in 115.22s (0:01:55) ================
========== Summary ==========

metax

zhangyue@test:~/InfiniOps$ python3 .ci/run.py  --local --stage build
platform: metax
==> running job: metax_gpu
========== Setup ==========
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=791843 sha256=bdb6d3d06fa0172f91d6ac8199633a648b002b88faf9ff9d50e91956e940a623
  Stored in directory: /tmp/pip-ephem-wheel-cache-iatimiaj/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
========== Stage: build ==========
========== Summary ==========

cambricon

Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_aarch64.whl size=225319 sha256=d8d6dd6f0382f7cb107182e832cb5644224c9b9ca60a4cfd68fd5500059c7e64
  Stored in directory: /tmp/pip-ephem-wheel-cache-nf7ouvtx/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: There was an error checking the latest version of pip.
========== Stage: build ==========
========== Summary ==========

Per Google C++ Style Guide §Function Names: ordinary non-accessor functions are PascalCase. Accessors/mutators (get/set on class members) are snake_case. These 7 are standalone helpers / converters / predicates — not member accessors — so they need PascalCase. threadLocalAtbContext → ThreadLocalAtbContext getAtbContext → GetAtbContext toAtbTensor (×2) → ToAtbTensor isAclRuntimeAlive → IsAclRuntimeAlive buildAclTensor → BuildAclTensor toAclDtype → ToAclDtype isIntegerDtype → IsIntegerDtype CANN APIs (`aclrtGetDevice`, `aclCreateTensor`, …), STL/PyTorch interop methods (`begin`/`end`/`size`/`data`/…), and class accessors (`get_<field>`/`set_<field>`) are all kept as-is — they either belong to another vendor or match the "looks like a variable" exception. Callers on the three category branches (op-simple / op-norm-rope / op-cache-attn) will pick up the new names automatically on rebase.

…ticks in common.h - `src/ascend/custom/cmake/config_envs.cmake`: capitalize + period + Markdown backticks on all comments and status messages. - `src/ascend/custom/cmake/config_ascend.cmake`: fix `CANN` casing and backticks in the fatal-error message. - `src/ascend/custom/CMakeLists.txt`: polish status messages and inline comments (Markdown backticks + sentence case). - `src/ascend/common.h`: restore `Gemm` and `MatMul` backticks in the `BuildAclTensor` docstring per PR #64 review.

…`TORCH_CHECK`, Google C++ naming - `workspace_pool_.h`: drop commented-out `<cinttypes>` / `<cstdio>` includes (transitively available). - `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant. - `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers. - `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers. - Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).

…`TORCH_CHECK`, Google C++ naming - `workspace_pool_.h`: uncomment `<cinttypes>` / `<cstdio>` (needed for `PRIu64` and `fprintf` in the destructor; not transitively available on all platforms). - `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant. - `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers. - `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers. - Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).

…`TORCH_CHECK`, Google C++ naming - `workspace_pool_.h`: drop commented-out `<cinttypes>` / `<cstdio>` includes (transitively available). - `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant. - `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers. - `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers. - Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).

…`TORCH_CHECK`, Google C++ naming - `workspace_pool_.h`: uncomment `<cinttypes>` / `<cstdio>` (needed for `PRIu64` and `fprintf` in the destructor; not transitively available on all platforms). - `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant. - `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers. - `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers. - Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).

zhangyue added 5 commits April 18, 2026 05:09

zhangyue207 changed the title ~~framework scaffolding + CI/generator fixes for operator split~~ feat(ascend-framwork): framework scaffolding + CI/generator fixes for operator split Apr 18, 2026

voltjia requested changes Apr 20, 2026

View reviewed changes

Ziminli mentioned this pull request Apr 20, 2026

feat(ascend): op-simple group — Add, Mul, Cast, Cat, Matmul, Gemm, Linear #65

Merged

3 tasks

zhangyue added 8 commits April 20, 2026 17:28

voltjia requested changes Apr 20, 2026

View reviewed changes

Comment thread src/ascend/custom/cmake/config_envs.cmake Outdated

Comment thread src/ascend/custom/cmake/config_envs.cmake Outdated

Comment thread src/ascend/atb_common_.h Outdated

Comment thread src/ascend/common.h Outdated

Comment thread tests/test_add.py

zhangyue added 2 commits April 20, 2026 21:53

zhangyue207 force-pushed the feat/ascend-framework-pr branch from 4c5960a to cb1d29b Compare April 21, 2026 06:04

zhangyue207 force-pushed the feat/ascend-framework-pr branch from cb1d29b to 70542ec Compare April 21, 2026 06:12

zhangyue207 force-pushed the feat/ascend-framework-pr branch from 70542ec to 720234d Compare April 21, 2026 06:17

voltjia self-requested a review April 21, 2026 06:30

voltjia approved these changes Apr 21, 2026

View reviewed changes

voltjia merged commit a05713b into master Apr 21, 2026
4 checks passed

voltjia deleted the feat/ascend-framework-pr branch April 21, 2026 06:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ascend-framwork): framework scaffolding + CI/generator fixes for operator split #64

feat(ascend-framwork): framework scaffolding + CI/generator fixes for operator split #64
voltjia merged 16 commits intomasterfrom
feat/ascend-framework-pr

zhangyue207 commented Apr 18, 2026 •

edited

Loading

Uh oh!

zhangyue207 commented Apr 20, 2026

Uh oh!

voltjia left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhangyue207 commented Apr 20, 2026

Uh oh!

zhangyue207 commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhangyue207 commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Contents

Three bug fixes (bundled with related files)

fix(scripts): align py::arg order with C++ lambda params

fix(ci): treat exit 137 as success when pytest junit XML is clean

test(conftest): auto-skip tests whose op has no impl on the target device

Test-file updates (for master-existing ops)

Verification

Merge order

Test plan

Uh oh!

zhangyue207 commented Apr 20, 2026

Uh oh!

voltjia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhangyue207 commented Apr 20, 2026

Uh oh!

zhangyue207 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhangyue207 commented Apr 18, 2026 •

edited

Loading

`fix(scripts): align` py::arg `order with C++ lambda params`

`fix(ci): treat exit 137 as success when pytest junit XML is clean`

`test(conftest): auto-skip tests whose op has no impl on the target device`

zhangyue207 commented Apr 20, 2026 •

edited

Loading