Skip to content

feat(ascend-framwork): framework scaffolding + CI/generator fixes for operator split #64

Merged
voltjia merged 16 commits intomasterfrom
feat/ascend-framework-pr
Apr 21, 2026
Merged

feat(ascend-framwork): framework scaffolding + CI/generator fixes for operator split #64
voltjia merged 16 commits intomasterfrom
feat/ascend-framework-pr

Conversation

@zhangyue207
Copy link
Copy Markdown
Collaborator

@zhangyue207 zhangyue207 commented Apr 18, 2026

Summary

Prerequisite PR for the Ascend operator split. Ships the shared framework,
build, and test infrastructure required by the three follow-up category
PRs, plus three independent bug fixes surfaced during the split.

No operator kernels are added in this PR — each category PR (op-simple /
op-norm-rope / op-cache-attn) ships its operators atomically together with
their src/base/<op>.h declarations and tests/test_<op>.py.

Contents

Ascend framework headers (shared across every Ascend operator):

  • src/ascend/common.hAclTensorCache + toAclDtype helpers
  • src/ascend/workspace_pool_.h — stream-scoped workspace pool with named
    arenas; entry points are GetWorkspacePool() / Pool::Ensure()
    (matches master PR refactor: group backends by hardware category #60)
  • src/ascend/atb_common_.h — ATB Context management + toAtbTensor for
    operators wrapping ATB APIs
  • src/ascend/runtime_.h, data_type_.h, device_.h — Ascend-specific
    Runtime and type-mapping specializations

Custom-kernel build infra: src/ascend/custom_kernel/ — AscendC
standalone build system producing libascend_kernel.so. Gated by its own
CMakeLists.txt; the actual kernels (rms_norm, add_rms_norm) are
consumed by kernel_custom.h variants that land in the follow-up norm-rope
PR.

Core framework (shared helpers referenced by the generator):

  • src/hash.h — cache-key plumbing for Operator::call()
  • src/operator.hOperator<Key, device, N> dispatch + SFINAE
    ActiveImplementationsImpl
  • src/pybind11_utils.h — tensor / optional-tensor / vector-tensor casters

Build (CMakeLists.txt, src/CMakeLists.txt) — Ascend target, ATB
discovery, WITH_ASCEND option; no per-op add_subdirectory needed
because the Ascend source list is glob-based (ascend/*.cc *.cpp).

Test infra (tests/conftest.py, tests/utils.py) — Payload,
randn_strided, get_npu_stream, get_stream(device), and device
parametrization (--devices ascend/nvidia/...).

CI (.ci/run.py, .ci/images/ascend/Dockerfile) — unchanged apart
from the fixes below.

Three bug fixes (bundled with related files)

fix(scripts): align py::arg order with C++ lambda params

The pybind11 bindings emitted by scripts/generate_wrappers.py listed
py::arg entries in a different order than the C++ lambda parameters.
When callers used kwargs, implementation_index and stream were
silently swapped — the stream integer landed in the impl-index slot, and
dispatch SIGABRT'd. Re-order so kwarg names line up positionally with the
C++ signature. Also generalize = py::none() default emission to all
std::optional<...> parameters (previously only std::optional<Tensor>).

fix(ci): treat exit 137 as success when pytest junit XML is clean

Docker 18.09 occasionally SIGKILLs the container during its chown
teardown step, so .ci/run.py exits with code 137 even when pytest itself
completed normally. Parse /workspace/results/test-results.xml errors /
failures fields and treat 137 as success when pytest reports no
failures.

test(conftest): auto-skip tests whose op has no impl on the target device

Adds a skip_op_without_platform_impl autouse fixture that derives the
InfiniOps class name from the test module filename (tests/test_<snake>.py
<Snake>) and checks active_implementation_indices for the
parametrized device. When the op has no backend specialization, the test
is skipped instead of SIGABRTing through Operator<Op, device>::Make().

This is essential for the operator split: each per-category branch
contains only its category's Ascend impls but inherits test files for all
operators from master. Without this guard, pytest tests/ --devices ascend crashes on ops lacking ascend impls on the branch.

Test-file updates (for master-existing ops)

tests/test_add.py, test_gemm.py, test_causal_softmax.py,
test_rms_norm.py, test_swiglu.py updated to match the framework PR's
tests/utils.py helpers and to guard Ascend dispatch against unsupported
implementation_index values (fixes a pre-existing master bug where
Operator<Gemm, kAscend, 2> was mistakenly reported active by SFINAE
because src/torch/gemm/gemm.h specializes Operator<Gemm, kDev, 2> for
all kDev).

Verification

  • python3 .ci/run.py --local --gpu-id <N> — framework-only build
    succeeds; libinfiniops.so links cleanly (Ascend glob finds only the
    src/ascend/gemm/kernel.h that already exists on master; generator
    emits bindings for master's 10 base headers).
  • tests/test_gemm.py — 1500 skipped expected (Ascend-only parametrize;
    only impl=0 runs, impl=1/2 skip via active_implementation_indices
    guard introduced in this PR).

Merge order

This PR must merge first. The three follow-up category PRs rebase
onto master after this lands.

Test plan

  • python3 .ci/run.py --local — framework build + master's gemm
    tests pass on Ascend 910B / CANN 8.5.1
  • scripts/generate_wrappers.py --devices ascend — generates
    generated/bindings/ops.cc with the 10 master base headers; no
    link error
  • clang-format passes locally on all tracked *.h / *.cc /
    *.cuh / *.mlu
  • CUDA / Metax / Cambricon / Moore / Iluvatar regressions (CI-verified
    — this PR does not alter those backends' operator paths)

zhangyue added 5 commits April 18, 2026 05:09
Docker 18.09 occasionally SIGKILLs the container during its `chown`
teardown step, causing `.ci/run.py` to exit 137 even when pytest
completed normally. Parse `/workspace/results/test-results.xml` for
`errors` / `failures` fields and treat 137 as success when pytest
reports no failures.

Also bundles a small Dockerfile update for the Ascend image used by
`.ci/run.py`.
… defaults

Two fixes in the pybind11 bindings generator:

1. `py::arg("implementation_index")` was emitted before `py::arg("stream")`
   in the generated `def(...)` call, but the C++ lambda parameters were
   declared in the opposite order. Kwargs then silently swapped — the
   stream integer landed in the impl-index slot, and dispatch SIGABRT'd.
   Re-order so `py::arg` entries are positional-consistent with the C++
   lambda signature.

2. Only `std::optional<Tensor>` parameters had a `= py::none()` default;
   `std::optional<int64_t>` (and other scalar optionals) had no default,
   forcing callers to pass them explicitly. Generalize the default
   emission to all `std::optional<...>` parameters.
Framework headers shared across all Ascend operators:
- `common.h`: `AclTensorCache` descriptor-caching + `toAclDtype` helpers
- `workspace_pool_.h`: stream-scoped `WorkspacePool` with named arenas;
  `GetWorkspacePool()` / `Pool::Ensure()` entry points (matches master
  PR #60 naming)
- `atb_common_.h`: ATB `Context` management + `toAtbTensor` helper for
  operators wrapping ATB APIs
- `data_type_.h`, `device_.h`: `TypeMap<Ascend, T>` + `Runtime` specialization
- `runtime_.h` is the existing file; left untouched by this PR

`custom_kernel/` ships the AscendC standalone build system for custom
kernels. Gated by its own `CMakeLists.txt`; produces
`libascend_kernel.so` consumed by `kernel_custom.h` op variants (landed
in follow-up category PRs).
Shared changes needed by every Ascend operator PR:

- `src/hash.h` + `src/operator.h`: cache-key plumbing used by
  `Operator<Op, device>` dispatch
- `src/pybind11_utils.h`: tensor / optional-tensor / vector-tensor
  pybind11 casters used by the generator output
- `CMakeLists.txt` + `src/CMakeLists.txt`: Ascend build target, atb
  discovery, `WITH_ASCEND` option
- `tests/conftest.py`: `auto_act_and_assert` fixture + device
  parametrization (`--devices ascend/nvidia/...`)
- `tests/utils.py`: `Payload`, `randn_strided`, `get_npu_stream`, and
  similar test helpers shared by every `tests/test_<op>.py`
…vice

Adds a `skip_op_without_platform_impl` autouse fixture that derives the
InfiniOps class name from the test module filename
(`tests/test_<snake>.py` → `<Snake>`) and checks
`active_implementation_indices` for the parametrized device. When the
op has no backend specialization on the current branch, the test is
skipped instead of SIGABRTing through `Operator<Op, device>::Make()`.

This is essential for the operator split: each per-category branch
contains only its category's Ascend impls but inherits test files for
all operators from master. Without this guard, `pytest tests/
--devices ascend` crashes on ops lacking ascend impls on the branch.
@zhangyue207 zhangyue207 changed the title framework scaffolding + CI/generator fixes for operator split feat(ascend-framwork): framework scaffolding + CI/generator fixes for operator split Apr 18, 2026
@zhangyue207
Copy link
Copy Markdown
Collaborator Author

merge test:

[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape3-input_strides3-gate_strides3-out_strides3] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape4-None-None-None] 
[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape4-None-None-None] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape5-input_strides5-gate_strides5-out_strides5] 
[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape5-input_strides5-gate_strides5-out_strides5] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape6-None-None-None] 
[gw0] [ 99%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape6-None-None-None] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape7-input_strides7-gate_strides7-out_strides7] 
[gw0] [100%] PASSED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape7-input_strides7-gate_strides7-out_strides7] 

----------- generated xml file: /workspace/results/test-results.xml ------------
===================== 3767 passed, 1664 skipped in 45.95s ======================
========== Summary ==========
[warn] job ascend_npu: container exited with 137 (likely docker teardown SIGKILL after clean pytest); junit XML reports no failures — treating as success
EXIT=0

Copy link
Copy Markdown
Collaborator

@voltjia voltjia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

更改有些太多了,只能先看到这个程度了。很多问题应该是共通的,让 AI 把每个问题都重视起来,确保全部更改都没问题。除了这些以外,有些架构上面的问题,我看在 src 里面引入了 custom_kernelcsrc 文件夹,还有 testtests,以及一些 .gitignore 和看上去是 AI 开发过程中所需的 plan 似的文档,这部分内容按理说是不应引入的。还有就是引入这些所额外引入的 CMake 各种文件和 *.sh 各种脚本应该也去掉,或者统一在项目外面的 CMake 以及 scripts 里面去做。对于一些算子,我看有 op_hostop_kernel 文件夹,首先需要确定是否一定需要分离,如果确定需要分离的话,现在其他平台都只是改变后缀即可,比如 .h.cuh 这种,不确定昇腾是否有相应后缀,如果没有的话,也可以使用 kernel.hkernel_.h 这种方式,或者 host.hdevice.h 这种方式,主要是避免过度分离文件。这一版 review 不一定全,因为东西太多,我们先按这个思路改一下,改完之后可能还需要再过一遍。主要是让 AI 全面扫一下,每个 comment 可能都不止是那一处。感谢越总,辛苦了!

Comment thread scripts/generate_wrappers.py Outdated
Comment thread scripts/generate_wrappers.py
Comment thread scripts/generate_wrappers.py
Comment thread scripts/generate_wrappers.py
Comment thread scripts/generate_wrappers.py
Comment thread src/ascend/custom/add_rms_norm/op_host/add_rms_norm.cpp Outdated
Comment thread src/ascend/custom/add_rms_norm/op_host/add_rms_norm.cpp Outdated
Comment thread src/ascend/custom/add_rms_norm/op_host/add_rms_norm.cpp Outdated
Comment thread src/ascend/custom/add_rms_norm/op_host/add_rms_norm.cpp Outdated
Comment thread src/ascend/custom_kernel/.gitignore Outdated
@zhangyue207
Copy link
Copy Markdown
Collaborator Author

更改有些太多了,只能先看到这个程度了。很多问题应该是共通的,让 AI 把每个问题都重视起来,确保全部更改都没问题。除了这些以外,有些架构上面的问题,我看在 src 里面引入了 custom_kernelcsrc 文件夹,还有 testtests,以及一些 .gitignore 和看上去是 AI 开发过程中所需的 plan 似的文档,这部分内容按理说是不应引入的。还有就是引入这些所额外引入的 CMake 各种文件和 *.sh 各种脚本应该也去掉,或者统一在项目外面的 CMake 以及 scripts 里面去做。对于一些算子,我看有 op_hostop_kernel 文件夹,首先需要确定是否一定需要分离,如果确定需要分离的话,现在其他平台都只是改变后缀即可,比如 .h.cuh 这种,不确定昇腾是否有相应后缀,如果没有的话,也可以使用 kernel.hkernel_.h 这种方式,或者 host.hdevice.h 这种方式,主要是避免过度分离文件。这一版 review 不一定全,因为东西太多,我们先按这个思路改一下,改完之后可能还需要再过一遍。主要是让 AI 全面扫一下,每个 comment 可能都不止是那一处。感谢越总,辛苦了!

这个是ascend kernel开发给的固定模板,我们没有太多公开资料去修改它的整体结构,我可以做的就是把一些不用的plan 这类的东西删除,但是整体结构没办法改

zhangyue added 8 commits April 20, 2026 17:28
… files

Remove content that duplicates what the pytest integration tests
(`tests/test_rms_norm.py`, `tests/test_add_rms_norm.py`) already
cover, or that's developer scratchpad rather than checked-in
artifact:

- `csrc/ops/rms_norm/{README,design}.md` — design scratch
- `csrc/ops/rms_norm/test/{benchmark_rms_norm_msprof,run_rms_norm_case}.py`,
  `rms_norm_cases.jsonl`, `rms_norm_perf_report.md`,
  `rms_norm-test-cases.md` — per-op perf benchmarking + reports
- `tests/test_{rms_norm,add_rms_norm}.py` under custom_kernel/ —
  redundant with the top-level pytest integration tests

Build infra, kernel sources, registration, and utility headers are
unchanged; the `libascend_kernel.so` artifact and its consumers
(`kernel_custom.h` variants in the op-norm-rope PR) are unaffected.
…dundant .gitignore entry

Review items 1-5 on `scripts/generate_wrappers.py`:
- Restore docstring quoting in `_find_optional_tensor_params` (reverts
  accidental change to ```int`` and the double-space).
- Restore blank lines before `return` in `_find_optional_tensor_params`,
  `_is_optional_tensor`, and `_generate_params` / `_generate_arguments`
  (project CLAUDE.md Python style: "blank line before `return` unless
  inside a block body").
- Add missing blank line before `return` in `_find_vector_tensor_params`
  and `_is_vector_tensor`.
- Drop redundant `import re` inside `_find_vector_tensor_params` — `re`
  is imported at module level.

Review item 10 on `src/ascend/custom_kernel/.gitignore`:
- Drop redundant `build/` entry (already ignored globally via the
  project-root `.gitignore`). Keep `output/` and `python/` — both are
  AscendC-specific build artifacts not covered by the root ignore.
…tch vllm-ascend/csrc layout

Reviewer top-level feedback on PR #64: mirror the directory layout of
https://github.com/vllm-project/vllm-ascend/tree/main/csrc and drop the
extra nesting layers.

Directory changes:
- `src/ascend/custom_kernel/` → `src/ascend/custom/`
- Merge `csrc/` into the top: move `csrc/register.cpp`,
  `csrc/ops.h`, `csrc/utils/` up one level.
- Rename `register.cpp` → `torch_binding.cpp` to match vllm-ascend naming.
- Promote `csrc/ops/<op>/` to `<op>/` at the top (drop the `ops/` layer).
- Merge `csrc/CMakeLists.txt` content into top-level `CMakeLists.txt`;
  delete the now-empty `csrc/` layer.
- Remove `src/ascend/custom_kernel/.gitignore` (root `.gitignore`
  already ignores `build/`; `output/`+`python/` were custom_kernel-scoped
  build artifacts that fit the root gitignore's scope too).

Resulting layout:
  custom/
  ├── build.sh
  ├── CMakeLists.txt
  ├── cmake/{config_ascend,config_envs}.cmake
  ├── ops.h
  ├── torch_binding.cpp           (was `register.cpp`)
  ├── utils/torch_kernel_helper.h
  ├── rms_norm/{op_host,op_kernel}/rms_norm.cpp
  └── add_rms_norm/{op_host,op_kernel}/add_rms_norm.cpp

License preservation: files shared in structure/substance with
vllm-ascend (`torch_binding.cpp`, `ops.h`, `utils/torch_kernel_helper.h`,
top-level `CMakeLists.txt`) now carry proper Apache License 2.0 headers
with the original Huawei Technologies copyright preserved alongside
InfiniTensor's modification copyright.

Callers:
- `src/CMakeLists.txt`: `custom_kernel` → `custom` in two references.
- Root `CMakeLists.txt`: updated inline comment pointing to the build
  script.
- Library name (`ascend_kernel`), static lib (`no_workspace_kernel`),
  and Python module name remain unchanged — `kernel_custom.h` consumers
  in the op-norm-rope PR link via those identifiers, not by path, so
  this rename does not ripple into that branch.

CI: `.ci/run.py --local --gpu-id 0` passes 3072/1782 on Ascend 910B
with `BUILD_CUSTOM_KERNEL=OFF` (default); the custom kernel build
itself is exercised by the op-norm-rope PR's `kernel_custom.h`
integration.
Scan-and-fix pass for patterns flagged in reviewer comments on
`custom_kernel/` that also appear in other files in this PR.

- `src/ascend/common.h`: wrap `aclTensor` in backticks in two comments
  (matches comment 9 on Markdown formatting in custom_kernel).
- `tests/utils.py`: add missing blank line before trailing `return` in
  `get_stream()` (matches comments 3/5 on missing blank line before
  return in non-block-body context).

No camelCase-local violations in the framework C++ headers
(atb_common_, common, data_type_, device_, workspace_pool_, hash,
operator, pybind11_utils) — reviewer comment 6 was specific to
`custom/` op_host code adapted from vllm-ascend.
Reviewer @voltjia on PR #64 inline comments:
- Comment 6: local variables must follow Google C++ Style Guide
  (`dimLength` → `dim_length`, etc.). Applied across all locals in the
  two op_host files.
- Comment 7: namespace `ascend_kernel` is non-standard; use `detail` or
  `ascend::detail` to match other platforms. Renamed to
  `ascend::detail` in `ops.h`, `torch_binding.cpp`,
  `utils/torch_kernel_helper.h`, and both `op_host/*.cpp` files.

The library name (`ascend_kernel` → `libascend_kernel.so`), `OP_PLUGIN_NAME`,
and Python-import name are unchanged — those are compile/link identity
and are independent of the C++ namespace.  `kernel_custom.h` in
op-norm-rope links via the C `extern` launch symbol, not the namespace,
so this rename does not ripple into that branch.

Also took the opportunity to backtick-wrap identifiers in comments that
the rename touched.

Inline comments 8 and 9 (Markdown formatting in comments) were already
covered by the backtick pass in commit 0aed3a5 for non-custom files; the
custom/ comments here also get normalized as a side-effect of rewriting
the affected lines.
Scanned ALL 30 inline comments on PR #64 (not just the 10 visible in
collapsed view). 22 had been missed by the earlier passes.

Generator (scripts/generate_wrappers.py):
- Comments 8-10: swap `stream` and `implementation_index` in both the
  pybind lambda parameters and the `py::arg` declarations, to match
  the `Operator::Call(Handle, Config, ...)` order (Handle first, Config
  second). Previously ordered impl_index first for lambda-signature
  alignment; with the swap, both are reordered together so kwargs
  still resolve correctly.
- Comment 11: restore backticks around device names in `--devices` help
  text.
- Comment 12: `.def_static("clear_cache", ...)` kept — it is the API
  used by the new `_clear_operator_caches` pytest fixture.

CMakeLists.txt:
- Comments 13-14: wrap `NEEDED` and `torch_npu` in Markdown backticks in
  comments.

tests/conftest.py (comments 23-29):
- Reset the file to master's content and re-apply only the two new
  fixtures (`_clear_operator_caches`, `skip_op_without_platform_impl`)
  with Markdown docstrings (single backticks, not rST double). Reverts
  incidental changes to `pytest_addoption` help text,
  `skip_unsupported_dtypes` rename, `_PLATFORM_TO_TORCH_DEVICE` dict
  order, `_resolve_device` docstring, and the `torch_npu` comment
  line-wrap.
- Fix comment 27's concern: `_TORCH_DEVICE_TO_PLATFORMS` now maps one
  torch device type to multiple platforms (`cuda` →
  `{nvidia, metax, iluvatar}`) and `skip_op_without_platform_impl`
  checks `active_implementation_indices` across all of them; it skips
  only when every mapped platform reports empty.

tests/utils.py:
- Comment 16: remove `get_npu_stream`; `get_stream(device)` covers all
  torch device types.

tests/test_{add,causal_softmax,gemm,rms_norm,swiglu}.py:
- Comments 17-22: replace the `if device.type == "npu"` branches with a
  single call that passes `stream=get_stream(<tensor>.device)`. Single-
  line import restored in `test_add.py` (comment 22 — format minimization
  after dropping the `get_npu_stream` import).

test_gemm.py specifically: moved the "impl=2 on Ascend is broken because
of `src/torch/gemm/gemm.h` SFINAE pollution" workaround from the
helper-level conditional into a `pytest.skip` at the top of the test
body, so the helper itself becomes unconditional.
- `src/ascend/custom/utils/torch_kernel_helper.h`: clang-format wrapped
  a long `ConvertTypes` macro continuation.
- `tests/test_add.py`: ruff `format` wrapped the 5-import `tests.utils`
  line (89 chars, over the default 88 limit) back into multi-line
  form. Reviewer comment 22 suggested restoring a single line after
  dropping `get_npu_stream`, but with `get_stream` added the shortened
  form still exceeds the ruff line-length cap.
…kticks

Scan-and-fix pass for identifiers in comments that still lack Markdown
backticks, matching reviewer comments 9, 11, 13, 14 on PR #64. Applied
only to files authored / modified by this PR (leaves custom/cmake/
config_envs.cmake and similar vllm-ascend-verbatim content untouched to
stay consistent with the upstream it was adapted from).

- `CMakeLists.txt`: `pybind11` (line 7).
- `src/ascend/common.h`: `shape`, `strides`, `storage_shape`, `dtype`
  in the `AclTensorCache` class doc.
- `src/ascend/custom/CMakeLists.txt`: `AscendC` toolchain reference.
- `src/ascend/custom/build.sh`: `AscendC`, `libascend_kernel.so`.
- `src/ascend/custom/cmake/config_ascend.cmake`: `SOC_VERSION`,
  `CANN`, `AscendC`.
@zhangyue207
Copy link
Copy Markdown
Collaborator Author

zhangyue207 commented Apr 20, 2026

ascend:

tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape6-None-None-None] 
[gw0] [ 99%] SKIPPED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape6-None-None-None] 
tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape7-input_strides7-gate_strides7-out_strides7] 
[gw0] [100%] SKIPPED tests/test_swiglu.py::test_swiglu[npu-dtype2-0.01-0.005-1-shape7-input_strides7-gate_strides7-out_strides7] 

----------- generated xml file: /workspace/results/test-results.xml ------------
===================== 1572 passed, 3282 skipped in 37.82s ======================
========== Summary ==========

moore

Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy==1.13.1->torch->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.2)
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=514579 sha256=eb7c8e51edaa954d0b596df06890d030fc0af86e5761ba51a524ed285164b525
  Stored in directory: /tmp/pip-ephem-wheel-cache-4vo9x05h/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 25.2 -> 26.0.1
[notice] To update, run: python -m pip install --upgrade pip
========== Stage: build ==========
========== Summary ==========

iluvatar

 Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.2)
     Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from sympy->torch->InfiniOps==0.1.0) (1.3.0)
     Building wheels for collected packages: InfiniOps
       Building wheel for InfiniOps (pyproject.toml): started
       Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
       Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=329504 sha256=f30dd64e57d7c7e75e4ca27d2e1f086ea83de43225f2a2a19730ee68db54ae58
       Stored in directory: /tmp/pip-ephem-wheel-cache-fov27lij/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
     Successfully built InfiniOps
     Installing collected packages: InfiniOps
     Successfully installed InfiniOps-0.1.0

nvidia

Requirement already satisfied: nvidia-cuda-cupti==13.0.85.* in /home/zhangyue/python3.10/lib/python3.10/site-packages (from cuda-toolkit[cublas,cudart,cufft,cufile,cupti,curand,cusolver,cusparse,nvjitlink,nvrtc,nvtx]==13.0.2; platform_system == "Linux"->torch->InfiniOps==0.1.0) (13.0.85)
Requirement already satisfied: cuda-pathfinder~=1.1 in /home/zhangyue/python3.10/lib/python3.10/site-packages (from cuda-bindings<14,>=13.0.3->torch->InfiniOps==0.1.0) (1.5.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/zhangyue/python3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/zhangyue/python3.10/lib/python3.10/site-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.3)
Building wheels for collected packages: InfiniOps
  Building editable for InfiniOps (pyproject.toml) ... done
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=409566 sha256=9e3dcc598d9b8cd99f649e8060a257f1d1649d7e58e961fc19b75658a3bd4e91
  Stored in directory: /tmp/pip-ephem-wheel-cache-x_no3w_i/wheels/22/59/d1/fd9a553995db5fba4a2c3a44dd6adb7f24dd94303692b98e00
Successfully built InfiniOps
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
Successfully installed InfiniOps-0.1.0

[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0-1-a_shape3-b_shape3-c_shape3-a_strides3-b_strides3-c_strides3]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0-1-a_shape4-b_shape4-c_shape4-None-None-None]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0-1-a_shape4-b_shape4-c_shape4-None-None-None]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape0-b_shape0-c_shape0-None-None-None]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape0-b_shape0-c_shape0-None-None-None]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape1-b_shape1-c_shape1-None-None-None]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape1-b_shape1-c_shape1-None-None-None]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape2-b_shape2-c_shape2-a_strides2-b_strides2-c_strides2]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape2-b_shape2-c_shape2-a_strides2-b_strides2-c_strides2]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape3-b_shape3-c_shape3-a_strides3-b_strides3-c_strides3]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape3-b_shape3-c_shape3-a_strides3-b_strides3-c_strides3]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape4-b_shape4-c_shape4-None-None-None]
[gw4] [ 99%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--1-a_shape4-b_shape4-c_shape4-None-None-None]
tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--0.5-a_shape0-b_shape0-c_shape0-None-None-None]
[gw4] [100%] PASSED tests/test_gemm.py::test_gemm[cpu-dtype1-0.01-0.01-0-True-True-0.5--0.5-a_shape0-b_shape0-c_shape0-None-None-None]

----------- generated xml file: /workspace/results/test-results.xml ------------
================ 6016 passed, 3692 skipped in 115.22s (0:01:55) ================
========== Summary ==========

metax

zhangyue@test:~/InfiniOps$ python3 .ci/run.py  --local --stage build
platform: metax
==> running job: metax_gpu
========== Setup ==========
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=791843 sha256=bdb6d3d06fa0172f91d6ac8199633a648b002b88faf9ff9d50e91956e940a623
  Stored in directory: /tmp/pip-ephem-wheel-cache-iatimiaj/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
========== Stage: build ==========
========== Summary ==========

cambricon

Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_aarch64.whl size=225319 sha256=d8d6dd6f0382f7cb107182e832cb5644224c9b9ca60a4cfd68fd5500059c7e64
  Stored in directory: /tmp/pip-ephem-wheel-cache-nf7ouvtx/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: There was an error checking the latest version of pip.
========== Stage: build ==========
========== Summary ==========

Comment thread src/ascend/custom/cmake/config_envs.cmake Outdated
Comment thread src/ascend/custom/cmake/config_envs.cmake Outdated
Comment thread src/ascend/atb_common_.h Outdated
Comment thread src/ascend/common.h Outdated
Comment thread tests/test_add.py
zhangyue added 2 commits April 20, 2026 21:53
Per Google C++ Style Guide §Function Names: ordinary non-accessor
functions are PascalCase. Accessors/mutators (get/set on class members)
are snake_case. These 7 are standalone helpers / converters /
predicates — not member accessors — so they need PascalCase.

  threadLocalAtbContext → ThreadLocalAtbContext
  getAtbContext          → GetAtbContext
  toAtbTensor (×2)       → ToAtbTensor
  isAclRuntimeAlive      → IsAclRuntimeAlive
  buildAclTensor         → BuildAclTensor
  toAclDtype             → ToAclDtype
  isIntegerDtype         → IsIntegerDtype

CANN APIs (`aclrtGetDevice`, `aclCreateTensor`, …), STL/PyTorch interop
methods (`begin`/`end`/`size`/`data`/…), and class accessors
(`get_<field>`/`set_<field>`) are all kept as-is — they either belong
to another vendor or match the "looks like a variable" exception.

Callers on the three category branches (op-simple / op-norm-rope /
op-cache-attn) will pick up the new names automatically on rebase.
…ticks in common.h

- `src/ascend/custom/cmake/config_envs.cmake`: capitalize + period + Markdown backticks on all comments and status messages.
- `src/ascend/custom/cmake/config_ascend.cmake`: fix `CANN` casing and backticks in the fatal-error message.
- `src/ascend/custom/CMakeLists.txt`: polish status messages and inline comments (Markdown backticks + sentence case).
- `src/ascend/common.h`: restore `Gemm` and `MatMul` backticks in the `BuildAclTensor` docstring per PR #64 review.
zhangyue207 pushed a commit that referenced this pull request Apr 21, 2026
…`TORCH_CHECK`, Google C++ naming

- `workspace_pool_.h`: drop commented-out `<cinttypes>` / `<cstdio>` includes (transitively available).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
zhangyue207 pushed a commit that referenced this pull request Apr 21, 2026
…`TORCH_CHECK`, Google C++ naming

- `workspace_pool_.h`: uncomment `<cinttypes>` / `<cstdio>` (needed for `PRIu64` and `fprintf` in the destructor; not transitively available on all platforms).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
@zhangyue207 zhangyue207 force-pushed the feat/ascend-framework-pr branch from 4c5960a to cb1d29b Compare April 21, 2026 06:04
zhangyue207 pushed a commit that referenced this pull request Apr 21, 2026
…`TORCH_CHECK`, Google C++ naming

- `workspace_pool_.h`: drop commented-out `<cinttypes>` / `<cstdio>` includes (transitively available).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
zhangyue207 pushed a commit that referenced this pull request Apr 21, 2026
…`TORCH_CHECK`, Google C++ naming

- `workspace_pool_.h`: drop commented-out `<cinttypes>` / `<cstdio>` includes (transitively available).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
zhangyue207 pushed a commit that referenced this pull request Apr 21, 2026
…`TORCH_CHECK`, Google C++ naming

- `workspace_pool_.h`: drop commented-out `<cinttypes>` / `<cstdio>` includes (transitively available).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
zhangyue207 pushed a commit that referenced this pull request Apr 21, 2026
…`TORCH_CHECK`, Google C++ naming

- `workspace_pool_.h`: uncomment `<cinttypes>` / `<cstdio>` (needed for `PRIu64` and `fprintf` in the destructor; not transitively available on all platforms).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
@zhangyue207 zhangyue207 force-pushed the feat/ascend-framework-pr branch from cb1d29b to 70542ec Compare April 21, 2026 06:12
…`TORCH_CHECK`, Google C++ naming

- `workspace_pool_.h`: uncomment `<cinttypes>` / `<cstdio>` (needed for `PRIu64` and `fprintf` in the destructor; not transitively available on all platforms).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
@zhangyue207 zhangyue207 force-pushed the feat/ascend-framework-pr branch from 70542ec to 720234d Compare April 21, 2026 06:17
@voltjia voltjia self-requested a review April 21, 2026 06:30
@voltjia voltjia merged commit a05713b into master Apr 21, 2026
4 checks passed
@voltjia voltjia deleted the feat/ascend-framework-pr branch April 21, 2026 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants