feat(ascend-framwork): framework scaffolding + CI/generator fixes for operator split #64
feat(ascend-framwork): framework scaffolding + CI/generator fixes for operator split #64
Conversation
Docker 18.09 occasionally SIGKILLs the container during its `chown` teardown step, causing `.ci/run.py` to exit 137 even when pytest completed normally. Parse `/workspace/results/test-results.xml` for `errors` / `failures` fields and treat 137 as success when pytest reports no failures. Also bundles a small Dockerfile update for the Ascend image used by `.ci/run.py`.
… defaults
Two fixes in the pybind11 bindings generator:
1. `py::arg("implementation_index")` was emitted before `py::arg("stream")`
in the generated `def(...)` call, but the C++ lambda parameters were
declared in the opposite order. Kwargs then silently swapped — the
stream integer landed in the impl-index slot, and dispatch SIGABRT'd.
Re-order so `py::arg` entries are positional-consistent with the C++
lambda signature.
2. Only `std::optional<Tensor>` parameters had a `= py::none()` default;
`std::optional<int64_t>` (and other scalar optionals) had no default,
forcing callers to pass them explicitly. Generalize the default
emission to all `std::optional<...>` parameters.
Framework headers shared across all Ascend operators: - `common.h`: `AclTensorCache` descriptor-caching + `toAclDtype` helpers - `workspace_pool_.h`: stream-scoped `WorkspacePool` with named arenas; `GetWorkspacePool()` / `Pool::Ensure()` entry points (matches master PR #60 naming) - `atb_common_.h`: ATB `Context` management + `toAtbTensor` helper for operators wrapping ATB APIs - `data_type_.h`, `device_.h`: `TypeMap<Ascend, T>` + `Runtime` specialization - `runtime_.h` is the existing file; left untouched by this PR `custom_kernel/` ships the AscendC standalone build system for custom kernels. Gated by its own `CMakeLists.txt`; produces `libascend_kernel.so` consumed by `kernel_custom.h` op variants (landed in follow-up category PRs).
Shared changes needed by every Ascend operator PR: - `src/hash.h` + `src/operator.h`: cache-key plumbing used by `Operator<Op, device>` dispatch - `src/pybind11_utils.h`: tensor / optional-tensor / vector-tensor pybind11 casters used by the generator output - `CMakeLists.txt` + `src/CMakeLists.txt`: Ascend build target, atb discovery, `WITH_ASCEND` option - `tests/conftest.py`: `auto_act_and_assert` fixture + device parametrization (`--devices ascend/nvidia/...`) - `tests/utils.py`: `Payload`, `randn_strided`, `get_npu_stream`, and similar test helpers shared by every `tests/test_<op>.py`
…vice Adds a `skip_op_without_platform_impl` autouse fixture that derives the InfiniOps class name from the test module filename (`tests/test_<snake>.py` → `<Snake>`) and checks `active_implementation_indices` for the parametrized device. When the op has no backend specialization on the current branch, the test is skipped instead of SIGABRTing through `Operator<Op, device>::Make()`. This is essential for the operator split: each per-category branch contains only its category's Ascend impls but inherits test files for all operators from master. Without this guard, `pytest tests/ --devices ascend` crashes on ops lacking ascend impls on the branch.
|
merge test: |
voltjia
left a comment
There was a problem hiding this comment.
更改有些太多了,只能先看到这个程度了。很多问题应该是共通的,让 AI 把每个问题都重视起来,确保全部更改都没问题。除了这些以外,有些架构上面的问题,我看在 src 里面引入了 custom_kernel 和 csrc 文件夹,还有 test 和 tests,以及一些 .gitignore 和看上去是 AI 开发过程中所需的 plan 似的文档,这部分内容按理说是不应引入的。还有就是引入这些所额外引入的 CMake 各种文件和 *.sh 各种脚本应该也去掉,或者统一在项目外面的 CMake 以及 scripts 里面去做。对于一些算子,我看有 op_host 和 op_kernel 文件夹,首先需要确定是否一定需要分离,如果确定需要分离的话,现在其他平台都只是改变后缀即可,比如 .h 和 .cuh 这种,不确定昇腾是否有相应后缀,如果没有的话,也可以使用 kernel.h 和 kernel_.h 这种方式,或者 host.h 和 device.h 这种方式,主要是避免过度分离文件。这一版 review 不一定全,因为东西太多,我们先按这个思路改一下,改完之后可能还需要再过一遍。主要是让 AI 全面扫一下,每个 comment 可能都不止是那一处。感谢越总,辛苦了!
这个是ascend kernel开发给的固定模板,我们没有太多公开资料去修改它的整体结构,我可以做的就是把一些不用的plan 这类的东西删除,但是整体结构没办法改 |
… files
Remove content that duplicates what the pytest integration tests
(`tests/test_rms_norm.py`, `tests/test_add_rms_norm.py`) already
cover, or that's developer scratchpad rather than checked-in
artifact:
- `csrc/ops/rms_norm/{README,design}.md` — design scratch
- `csrc/ops/rms_norm/test/{benchmark_rms_norm_msprof,run_rms_norm_case}.py`,
`rms_norm_cases.jsonl`, `rms_norm_perf_report.md`,
`rms_norm-test-cases.md` — per-op perf benchmarking + reports
- `tests/test_{rms_norm,add_rms_norm}.py` under custom_kernel/ —
redundant with the top-level pytest integration tests
Build infra, kernel sources, registration, and utility headers are
unchanged; the `libascend_kernel.so` artifact and its consumers
(`kernel_custom.h` variants in the op-norm-rope PR) are unaffected.
…dundant .gitignore entry Review items 1-5 on `scripts/generate_wrappers.py`: - Restore docstring quoting in `_find_optional_tensor_params` (reverts accidental change to ```int`` and the double-space). - Restore blank lines before `return` in `_find_optional_tensor_params`, `_is_optional_tensor`, and `_generate_params` / `_generate_arguments` (project CLAUDE.md Python style: "blank line before `return` unless inside a block body"). - Add missing blank line before `return` in `_find_vector_tensor_params` and `_is_vector_tensor`. - Drop redundant `import re` inside `_find_vector_tensor_params` — `re` is imported at module level. Review item 10 on `src/ascend/custom_kernel/.gitignore`: - Drop redundant `build/` entry (already ignored globally via the project-root `.gitignore`). Keep `output/` and `python/` — both are AscendC-specific build artifacts not covered by the root ignore.
…tch vllm-ascend/csrc layout Reviewer top-level feedback on PR #64: mirror the directory layout of https://github.com/vllm-project/vllm-ascend/tree/main/csrc and drop the extra nesting layers. Directory changes: - `src/ascend/custom_kernel/` → `src/ascend/custom/` - Merge `csrc/` into the top: move `csrc/register.cpp`, `csrc/ops.h`, `csrc/utils/` up one level. - Rename `register.cpp` → `torch_binding.cpp` to match vllm-ascend naming. - Promote `csrc/ops/<op>/` to `<op>/` at the top (drop the `ops/` layer). - Merge `csrc/CMakeLists.txt` content into top-level `CMakeLists.txt`; delete the now-empty `csrc/` layer. - Remove `src/ascend/custom_kernel/.gitignore` (root `.gitignore` already ignores `build/`; `output/`+`python/` were custom_kernel-scoped build artifacts that fit the root gitignore's scope too). Resulting layout: custom/ ├── build.sh ├── CMakeLists.txt ├── cmake/{config_ascend,config_envs}.cmake ├── ops.h ├── torch_binding.cpp (was `register.cpp`) ├── utils/torch_kernel_helper.h ├── rms_norm/{op_host,op_kernel}/rms_norm.cpp └── add_rms_norm/{op_host,op_kernel}/add_rms_norm.cpp License preservation: files shared in structure/substance with vllm-ascend (`torch_binding.cpp`, `ops.h`, `utils/torch_kernel_helper.h`, top-level `CMakeLists.txt`) now carry proper Apache License 2.0 headers with the original Huawei Technologies copyright preserved alongside InfiniTensor's modification copyright. Callers: - `src/CMakeLists.txt`: `custom_kernel` → `custom` in two references. - Root `CMakeLists.txt`: updated inline comment pointing to the build script. - Library name (`ascend_kernel`), static lib (`no_workspace_kernel`), and Python module name remain unchanged — `kernel_custom.h` consumers in the op-norm-rope PR link via those identifiers, not by path, so this rename does not ripple into that branch. CI: `.ci/run.py --local --gpu-id 0` passes 3072/1782 on Ascend 910B with `BUILD_CUSTOM_KERNEL=OFF` (default); the custom kernel build itself is exercised by the op-norm-rope PR's `kernel_custom.h` integration.
Scan-and-fix pass for patterns flagged in reviewer comments on `custom_kernel/` that also appear in other files in this PR. - `src/ascend/common.h`: wrap `aclTensor` in backticks in two comments (matches comment 9 on Markdown formatting in custom_kernel). - `tests/utils.py`: add missing blank line before trailing `return` in `get_stream()` (matches comments 3/5 on missing blank line before return in non-block-body context). No camelCase-local violations in the framework C++ headers (atb_common_, common, data_type_, device_, workspace_pool_, hash, operator, pybind11_utils) — reviewer comment 6 was specific to `custom/` op_host code adapted from vllm-ascend.
Reviewer @voltjia on PR #64 inline comments: - Comment 6: local variables must follow Google C++ Style Guide (`dimLength` → `dim_length`, etc.). Applied across all locals in the two op_host files. - Comment 7: namespace `ascend_kernel` is non-standard; use `detail` or `ascend::detail` to match other platforms. Renamed to `ascend::detail` in `ops.h`, `torch_binding.cpp`, `utils/torch_kernel_helper.h`, and both `op_host/*.cpp` files. The library name (`ascend_kernel` → `libascend_kernel.so`), `OP_PLUGIN_NAME`, and Python-import name are unchanged — those are compile/link identity and are independent of the C++ namespace. `kernel_custom.h` in op-norm-rope links via the C `extern` launch symbol, not the namespace, so this rename does not ripple into that branch. Also took the opportunity to backtick-wrap identifiers in comments that the rename touched. Inline comments 8 and 9 (Markdown formatting in comments) were already covered by the backtick pass in commit 0aed3a5 for non-custom files; the custom/ comments here also get normalized as a side-effect of rewriting the affected lines.
Scanned ALL 30 inline comments on PR #64 (not just the 10 visible in collapsed view). 22 had been missed by the earlier passes. Generator (scripts/generate_wrappers.py): - Comments 8-10: swap `stream` and `implementation_index` in both the pybind lambda parameters and the `py::arg` declarations, to match the `Operator::Call(Handle, Config, ...)` order (Handle first, Config second). Previously ordered impl_index first for lambda-signature alignment; with the swap, both are reordered together so kwargs still resolve correctly. - Comment 11: restore backticks around device names in `--devices` help text. - Comment 12: `.def_static("clear_cache", ...)` kept — it is the API used by the new `_clear_operator_caches` pytest fixture. CMakeLists.txt: - Comments 13-14: wrap `NEEDED` and `torch_npu` in Markdown backticks in comments. tests/conftest.py (comments 23-29): - Reset the file to master's content and re-apply only the two new fixtures (`_clear_operator_caches`, `skip_op_without_platform_impl`) with Markdown docstrings (single backticks, not rST double). Reverts incidental changes to `pytest_addoption` help text, `skip_unsupported_dtypes` rename, `_PLATFORM_TO_TORCH_DEVICE` dict order, `_resolve_device` docstring, and the `torch_npu` comment line-wrap. - Fix comment 27's concern: `_TORCH_DEVICE_TO_PLATFORMS` now maps one torch device type to multiple platforms (`cuda` → `{nvidia, metax, iluvatar}`) and `skip_op_without_platform_impl` checks `active_implementation_indices` across all of them; it skips only when every mapped platform reports empty. tests/utils.py: - Comment 16: remove `get_npu_stream`; `get_stream(device)` covers all torch device types. tests/test_{add,causal_softmax,gemm,rms_norm,swiglu}.py: - Comments 17-22: replace the `if device.type == "npu"` branches with a single call that passes `stream=get_stream(<tensor>.device)`. Single- line import restored in `test_add.py` (comment 22 — format minimization after dropping the `get_npu_stream` import). test_gemm.py specifically: moved the "impl=2 on Ascend is broken because of `src/torch/gemm/gemm.h` SFINAE pollution" workaround from the helper-level conditional into a `pytest.skip` at the top of the test body, so the helper itself becomes unconditional.
- `src/ascend/custom/utils/torch_kernel_helper.h`: clang-format wrapped a long `ConvertTypes` macro continuation. - `tests/test_add.py`: ruff `format` wrapped the 5-import `tests.utils` line (89 chars, over the default 88 limit) back into multi-line form. Reviewer comment 22 suggested restoring a single line after dropping `get_npu_stream`, but with `get_stream` added the shortened form still exceeds the ruff line-length cap.
…kticks Scan-and-fix pass for identifiers in comments that still lack Markdown backticks, matching reviewer comments 9, 11, 13, 14 on PR #64. Applied only to files authored / modified by this PR (leaves custom/cmake/ config_envs.cmake and similar vllm-ascend-verbatim content untouched to stay consistent with the upstream it was adapted from). - `CMakeLists.txt`: `pybind11` (line 7). - `src/ascend/common.h`: `shape`, `strides`, `storage_shape`, `dtype` in the `AclTensorCache` class doc. - `src/ascend/custom/CMakeLists.txt`: `AscendC` toolchain reference. - `src/ascend/custom/build.sh`: `AscendC`, `libascend_kernel.so`. - `src/ascend/custom/cmake/config_ascend.cmake`: `SOC_VERSION`, `CANN`, `AscendC`.
|
ascend: moore iluvatar nvidia metax cambricon |
Per Google C++ Style Guide §Function Names: ordinary non-accessor functions are PascalCase. Accessors/mutators (get/set on class members) are snake_case. These 7 are standalone helpers / converters / predicates — not member accessors — so they need PascalCase. threadLocalAtbContext → ThreadLocalAtbContext getAtbContext → GetAtbContext toAtbTensor (×2) → ToAtbTensor isAclRuntimeAlive → IsAclRuntimeAlive buildAclTensor → BuildAclTensor toAclDtype → ToAclDtype isIntegerDtype → IsIntegerDtype CANN APIs (`aclrtGetDevice`, `aclCreateTensor`, …), STL/PyTorch interop methods (`begin`/`end`/`size`/`data`/…), and class accessors (`get_<field>`/`set_<field>`) are all kept as-is — they either belong to another vendor or match the "looks like a variable" exception. Callers on the three category branches (op-simple / op-norm-rope / op-cache-attn) will pick up the new names automatically on rebase.
…ticks in common.h - `src/ascend/custom/cmake/config_envs.cmake`: capitalize + period + Markdown backticks on all comments and status messages. - `src/ascend/custom/cmake/config_ascend.cmake`: fix `CANN` casing and backticks in the fatal-error message. - `src/ascend/custom/CMakeLists.txt`: polish status messages and inline comments (Markdown backticks + sentence case). - `src/ascend/common.h`: restore `Gemm` and `MatMul` backticks in the `BuildAclTensor` docstring per PR #64 review.
…`TORCH_CHECK`, Google C++ naming
- `workspace_pool_.h`: drop commented-out `<cinttypes>` / `<cstdio>` includes (transitively available).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
…`TORCH_CHECK`, Google C++ naming
- `workspace_pool_.h`: uncomment `<cinttypes>` / `<cstdio>` (needed for `PRIu64` and `fprintf` in the destructor; not transitively available on all platforms).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
4c5960a to
cb1d29b
Compare
…`TORCH_CHECK`, Google C++ naming
- `workspace_pool_.h`: drop commented-out `<cinttypes>` / `<cstdio>` includes (transitively available).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
…`TORCH_CHECK`, Google C++ naming
- `workspace_pool_.h`: drop commented-out `<cinttypes>` / `<cstdio>` includes (transitively available).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
…`TORCH_CHECK`, Google C++ naming
- `workspace_pool_.h`: drop commented-out `<cinttypes>` / `<cstdio>` includes (transitively available).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
…`TORCH_CHECK`, Google C++ naming
- `workspace_pool_.h`: uncomment `<cinttypes>` / `<cstdio>` (needed for `PRIu64` and `fprintf` in the destructor; not transitively available on all platforms).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
cb1d29b to
70542ec
Compare
…`TORCH_CHECK`, Google C++ naming
- `workspace_pool_.h`: uncomment `<cinttypes>` / `<cstdio>` (needed for `PRIu64` and `fprintf` in the destructor; not transitively available on all platforms).
- `device_.h`: switch relative `../device.h` to absolute `device.h` — the historical `src/ascend/device.h` naming collision is no longer relevant.
- `custom/{add_rms_norm,rms_norm}/op_host/*.cpp`: drop unneeded BSD-3-Clause headers and switch `TORCH_CHECK` messages to Markdown-backticked identifiers.
- `custom/{add_rms_norm,rms_norm}/op_kernel/*.cpp`: drop unneeded BSD-3-Clause headers.
- Rename wrapper functions to PascalCase per Google C++ Style: `add_rms_norm` → `AddRmsNorm`, `rms_norm` → `RmsNorm` (ops.h + torch_binding.cpp updated; `torch.ops.npu.rms_norm` registry name unchanged; kernel entry-point names stay snake_case as required by `EXEC_KERNEL_CMD`).
70542ec to
720234d
Compare
Summary
Prerequisite PR for the Ascend operator split. Ships the shared framework,
build, and test infrastructure required by the three follow-up category
PRs, plus three independent bug fixes surfaced during the split.
No operator kernels are added in this PR — each category PR (op-simple /
op-norm-rope / op-cache-attn) ships its operators atomically together with
their
src/base/<op>.hdeclarations andtests/test_<op>.py.Contents
Ascend framework headers (shared across every Ascend operator):
src/ascend/common.h—AclTensorCache+toAclDtypehelperssrc/ascend/workspace_pool_.h— stream-scoped workspace pool with namedarenas; entry points are
GetWorkspacePool()/Pool::Ensure()(matches master PR refactor: group backends by hardware category #60)
src/ascend/atb_common_.h— ATBContextmanagement +toAtbTensorforoperators wrapping ATB APIs
src/ascend/runtime_.h,data_type_.h,device_.h— Ascend-specificRuntimeand type-mapping specializationsCustom-kernel build infra:
src/ascend/custom_kernel/— AscendCstandalone build system producing
libascend_kernel.so. Gated by its ownCMakeLists.txt; the actual kernels (rms_norm,add_rms_norm) areconsumed by
kernel_custom.hvariants that land in the follow-up norm-ropePR.
Core framework (shared helpers referenced by the generator):
src/hash.h— cache-key plumbing forOperator::call()src/operator.h—Operator<Key, device, N>dispatch + SFINAEActiveImplementationsImplsrc/pybind11_utils.h— tensor / optional-tensor / vector-tensor castersBuild (
CMakeLists.txt,src/CMakeLists.txt) — Ascend target, ATBdiscovery,
WITH_ASCENDoption; no per-opadd_subdirectoryneededbecause the Ascend source list is glob-based (
ascend/*.cc *.cpp).Test infra (
tests/conftest.py,tests/utils.py) —Payload,randn_strided,get_npu_stream,get_stream(device), and deviceparametrization (
--devices ascend/nvidia/...).CI (
.ci/run.py,.ci/images/ascend/Dockerfile) — unchanged apartfrom the fixes below.
Three bug fixes (bundled with related files)
fix(scripts): alignpy::argorder with C++ lambda paramsThe pybind11 bindings emitted by
scripts/generate_wrappers.pylistedpy::argentries in a different order than the C++ lambda parameters.When callers used kwargs,
implementation_indexandstreamweresilently swapped — the stream integer landed in the impl-index slot, and
dispatch SIGABRT'd. Re-order so kwarg names line up positionally with the
C++ signature. Also generalize
= py::none()default emission to allstd::optional<...>parameters (previously onlystd::optional<Tensor>).fix(ci): treat exit 137 as success when pytest junit XML is cleanDocker 18.09 occasionally SIGKILLs the container during its
chownteardown step, so
.ci/run.pyexits with code 137 even when pytest itselfcompleted normally. Parse
/workspace/results/test-results.xmlerrors/failuresfields and treat 137 as success when pytest reports nofailures.
test(conftest): auto-skip tests whose op has no impl on the target deviceAdds a
skip_op_without_platform_implautouse fixture that derives theInfiniOps class name from the test module filename (
tests/test_<snake>.py→
<Snake>) and checksactive_implementation_indicesfor theparametrized device. When the op has no backend specialization, the test
is skipped instead of SIGABRTing through
Operator<Op, device>::Make().This is essential for the operator split: each per-category branch
contains only its category's Ascend impls but inherits test files for all
operators from master. Without this guard,
pytest tests/ --devices ascendcrashes on ops lacking ascend impls on the branch.Test-file updates (for master-existing ops)
tests/test_add.py,test_gemm.py,test_causal_softmax.py,test_rms_norm.py,test_swiglu.pyupdated to match the framework PR'stests/utils.pyhelpers and to guard Ascend dispatch against unsupportedimplementation_indexvalues (fixes a pre-existing master bug whereOperator<Gemm, kAscend, 2>was mistakenly reported active by SFINAEbecause
src/torch/gemm/gemm.hspecializesOperator<Gemm, kDev, 2>forall
kDev).Verification
python3 .ci/run.py --local --gpu-id <N>— framework-only buildsucceeds;
libinfiniops.solinks cleanly (Ascend glob finds only thesrc/ascend/gemm/kernel.hthat already exists on master; generatoremits bindings for master's 10 base headers).
tests/test_gemm.py— 1500 skipped expected (Ascend-only parametrize;only impl=0 runs, impl=1/2 skip via
active_implementation_indicesguard introduced in this PR).
Merge order
This PR must merge first. The three follow-up category PRs rebase
onto master after this lands.
Test plan
python3 .ci/run.py --local— framework build + master's gemmtests pass on Ascend 910B / CANN 8.5.1
scripts/generate_wrappers.py --devices ascend— generatesgenerated/bindings/ops.ccwith the 10 master base headers; nolink error
clang-formatpasses locally on all tracked*.h/*.cc/*.cuh/*.mlu— this PR does not alter those backends' operator paths)