feat(ascend): add Ascend framework layer — runtime, type mapping, bui… by zhangyue207 · Pull Request #46 · InfiniTensor/InfiniOps

zhangyue207 · 2026-04-08T03:02:58Z

…ld integration

Add Ascend platform scaffolding:

device_.h: DeviceEnabled<kAscend> specialization
data_type_.h: toAclDtype(), isIntegerDtype()
common.h: buildAclTensor() with optional transpose
workspace_pool_.h: stream-keyed workspace allocator
runtime_.h: Runtime<kAscend> (Malloc, Free, Memcpy, Memset)
5 new operator base classes (AddRmsNorm, FlashAttention, Matmul, ReshapeAndCache, RotaryEmbedding)

Integrate into CMake build system, Python binding generation (stream + optional tensor support), and examples runtime API.

zhangyue207 · 2026-04-10T06:12:23Z

nv

(python3.10) zhangyue@server:~/InfiniOps$ python .ci/run.py --local --test "pip install .[dev]"
platform: nvidia
==> running job: nvidia_gpu

=============
== PyTorch ==
=============

NVIDIA Release 25.12 (build 245654590)
PyTorch Version 2.10.0a0+b4e4ee8
Container image Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2024 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

GOVERNING TERMS: The software and materials are governed by the NVIDIA Software License Agreement
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/)
and the Product-Specific Terms for NVIDIA AI Products
(found at https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/).

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 13.1 driver version 590.44.01 with kernel driver version 580.105.08.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

NOTE: Mellanox network driver detected, but NVIDIA peer memory driver not
      detected.  Multi-node communication performance may be reduced.

========== Setup ==========
Processing ./.
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pytest in /usr/local/lib/python3.12/dist-packages (from InfiniOps==0.1.0) (8.1.1)
Requirement already satisfied: pytest-cov in /usr/local/lib/python3.12/dist-packages (from InfiniOps==0.1.0) (7.1.0)
Requirement already satisfied: pytest-xdist in /usr/local/lib/python3.12/dist-packages (from InfiniOps==0.1.0) (3.8.0)
Requirement already satisfied: ruff in /usr/local/lib/python3.12/dist-packages (from InfiniOps==0.1.0) (0.15.7)
Requirement already satisfied: torch in /usr/local/lib/python3.12/dist-packages (from InfiniOps==0.1.0) (2.10.0a0+b4e4ee81d3.nv25.12)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.12/dist-packages (from InfiniOps==0.1.0) (6.0.3)
Requirement already satisfied: iniconfig in /usr/local/lib/python3.12/dist-packages (from pytest->InfiniOps==0.1.0) (2.3.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from pytest->InfiniOps==0.1.0) (25.0)
Requirement already satisfied: pluggy<2.0,>=1.4 in /usr/local/lib/python3.12/dist-packages (from pytest->InfiniOps==0.1.0) (1.6.0)
Requirement already satisfied: coverage>=7.10.6 in /usr/local/lib/python3.12/dist-packages (from coverage[toml]>=7.10.6->pytest-cov->InfiniOps==0.1.0) (7.13.5)
Requirement already satisfied: execnet>=2.1 in /usr/local/lib/python3.12/dist-packages (from pytest-xdist->InfiniOps==0.1.0) (2.1.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.12/dist-packages (from torch->InfiniOps==0.1.0) (3.20.1)
Requirement already satisfied: typing-extensions>=4.10.0 in /usr/local/lib/python3.12/dist-packages (from torch->InfiniOps==0.1.0) (4.15.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch->InfiniOps==0.1.0) (80.9.0)
Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch->InfiniOps==0.1.0) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch->InfiniOps==0.1.0) (3.6.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch->InfiniOps==0.1.0) (3.1.6)
Requirement already satisfied: fsspec>=0.8.5 in /usr/local/lib/python3.12/dist-packages (from torch->InfiniOps==0.1.0) (2025.10.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.3)
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp312-cp312-linux_x86_64.whl size=413715 sha256=8367037ce13c5f1ab0e31525eadbaa09340d965c83754692673ff9ad86e75816
  Stored in directory: /tmp/pip-ephem-wheel-cache-mxypsk0a/wheels/b1/a4/57/72f62aaee401db75e8c1c3aca62878014646bf19d33c76d6bf
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0

zhangyue207 · 2026-04-10T06:12:51Z

metax

zhangyue@test:~/InfiniOps$ python3 .ci/run.py --local --test "pip install .[dev]"
platform: metax
==> running job: metax_gpu
========== Setup ==========
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Processing /tmp/src
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pytest in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (8.4.1)
Requirement already satisfied: pytest-cov in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (7.1.0)
Requirement already satisfied: pytest-xdist in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (3.8.0)
Requirement already satisfied: ruff in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (0.15.7)
Requirement already satisfied: torch in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (2.4.0+metax3.2.1.3)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.10/site-packages (from InfiniOps==0.1.0) (6.0.3)
Requirement already satisfied: exceptiongroup>=1 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: iniconfig>=1 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.1.0)
Requirement already satisfied: packaging>=20 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (25.0)
Requirement already satisfied: pluggy<2,>=1.5 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (1.6.0)
Requirement already satisfied: pygments>=2.7.2 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.19.2)
Requirement already satisfied: tomli>=1 in /opt/conda/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.3.0)
Requirement already satisfied: coverage>=7.10.6 in /opt/conda/lib/python3.10/site-packages (from coverage[toml]>=7.10.6->pytest-cov->InfiniOps==0.1.0) (7.11.0)
Requirement already satisfied: execnet>=2.1 in /opt/conda/lib/python3.10/site-packages (from pytest-xdist->InfiniOps==0.1.0) (2.1.2)
Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.20.0)
Requirement already satisfied: typing-extensions>=4.8.0 in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (4.15.0)
Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (1.14.0)
Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.4.2)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.1.6)
Requirement already satisfied: fsspec in /opt/conda/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (2025.5.1)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/lib/python3.10/site-packages (from sympy->torch->InfiniOps==0.1.0) (1.3.0)
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=775970 sha256=36a3fde2e0ab714bf66aade75b7bcc46f55d151bb6fcf8e26b83af3a22a1c27d
  Stored in directory: /tmp/pip-ephem-wheel-cache-rjx7fndp/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0

zhangyue207 · 2026-04-10T06:13:14Z

iluvatar

(python3.10) zhangyue@iluvatar:~/InfiniOps$ python .ci/run.py --local --test "pip install .[dev]"
platform: iluvatar
==> running job: iluvatar_gpu
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
========== Setup ==========
Processing ./.
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pytest in /usr/local/lib/python3.10/site-packages (from InfiniOps==0.1.0) (9.0.2)
Requirement already satisfied: pytest-cov in /usr/local/lib/python3.10/site-packages (from InfiniOps==0.1.0) (7.0.0)
Requirement already satisfied: pytest-xdist in /usr/local/lib/python3.10/site-packages (from InfiniOps==0.1.0) (3.8.0)
Requirement already satisfied: ruff in /usr/local/lib/python3.10/site-packages (from InfiniOps==0.1.0) (0.15.7)
Requirement already satisfied: torch in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from InfiniOps==0.1.0) (2.4.1+corex.4.3.0.20250624)
Requirement already satisfied: pyyaml in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from InfiniOps==0.1.0) (6.0.2)
Requirement already satisfied: exceptiongroup>=1 in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from pytest->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: iniconfig>=1.0.1 in /usr/local/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.3.0)
Requirement already satisfied: packaging>=22 in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from pytest->InfiniOps==0.1.0) (25.0)
Requirement already satisfied: pluggy<2,>=1.5 in /usr/local/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (1.6.0)
Requirement already satisfied: pygments>=2.7.2 in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from pytest->InfiniOps==0.1.0) (2.19.2)
Requirement already satisfied: tomli>=1 in /usr/local/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.4.0)
Requirement already satisfied: typing-extensions>=4.6.0 in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from exceptiongroup>=1->pytest->InfiniOps==0.1.0) (4.14.0)
Requirement already satisfied: coverage>=7.10.6 in /usr/local/lib/python3.10/site-packages (from coverage[toml]>=7.10.6->pytest-cov->InfiniOps==0.1.0) (7.13.5)
Requirement already satisfied: execnet>=2.1 in /usr/local/lib/python3.10/site-packages (from pytest-xdist->InfiniOps==0.1.0) (2.1.2)
Requirement already satisfied: filelock in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from torch->InfiniOps==0.1.0) (3.18.0)
Requirement already satisfied: sympy in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from torch->InfiniOps==0.1.0) (1.14.0)
Requirement already satisfied: networkx in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from torch->InfiniOps==0.1.0) (3.4.2)
Requirement already satisfied: jinja2 in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from torch->InfiniOps==0.1.0) (3.1.6)
Requirement already satisfied: fsspec in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from torch->InfiniOps==0.1.0) (2025.5.1)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/corex-4.3.0.20250624/lib64/python3/dist-packages (from sympy->torch->InfiniOps==0.1.0) (1.3.0)
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=379218 sha256=6a36a0d91c29d2ff0c7ad6f97eaccf50d91c821a43e2607dd2e135aae74fec68
  Stored in directory: /tmp/pip-ephem-wheel-cache-26j0w2jf/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0

zhangyue207 · 2026-04-10T06:14:15Z

cambricon

[zhangyue@localhost InfiniOps]$ python .ci/run.py --local --test "pip install .[dev]"
platform: cambricon
==> running job: cambricon_gpu
========== Setup ==========
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Processing /tmp/src
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: ruff in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (0.15.7)
Requirement already satisfied: pytest-xdist in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (3.8.0)
Requirement already satisfied: pytest-cov in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (7.1.0)
Requirement already satisfied: pytest in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (9.0.2)
Requirement already satisfied: pyyaml in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (5.3.1)
Requirement already satisfied: torch in /usr/local/python3.10/lib/python3.10/site-packages (from InfiniOps==0.1.0) (2.1.0)
Requirement already satisfied: pluggy<2,>=1.5 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (1.6.0)
Requirement already satisfied: iniconfig>=1.0.1 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.3.0)
Requirement already satisfied: tomli>=1 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.4.0)
Requirement already satisfied: pygments>=2.7.2 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (2.19.2)
Requirement already satisfied: exceptiongroup>=1 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: packaging>=22 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest->InfiniOps==0.1.0) (25.0)
Requirement already satisfied: coverage[toml]>=7.10.6 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest-cov->InfiniOps==0.1.0) (7.13.5)
Requirement already satisfied: execnet>=2.1 in /usr/local/python3.10/lib/python3.10/site-packages (from pytest-xdist->InfiniOps==0.1.0) (2.1.2)
Requirement already satisfied: sympy in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (1.14.0)
Requirement already satisfied: networkx in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.4.2)
Requirement already satisfied: fsspec in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (2025.5.1)
Requirement already satisfied: jinja2 in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.1.6)
Requirement already satisfied: typing-extensions in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (4.14.0)
Requirement already satisfied: filelock in /usr/local/python3.10/lib/python3.10/site-packages (from torch->InfiniOps==0.1.0) (3.18.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/python3.10/lib/python3.10/site-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/python3.10/lib/python3.10/site-packages (from sympy->torch->InfiniOps==0.1.0) (1.3.0)
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_aarch64.whl size=196128 sha256=f8058928250eae585c7978caacc0756ad5e798b8e2c2464bbec3f230b1aac2a0
  Stored in directory: /tmp/pip-ephem-wheel-cache-rbe8v3ag/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0

zhangyue207 · 2026-04-10T06:14:39Z

moore

zhangyue@mccx:~/InfiniOps$ python3 .ci/run.py --local --test "pip install .[dev]"
platform: moore
==> running job: moore_gpu
========== Setup ==========
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing /tmp/src
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pytest in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (7.2.2)
Requirement already satisfied: pytest-cov in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (7.1.0)
Requirement already satisfied: pytest-xdist in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (3.8.0)
Requirement already satisfied: ruff in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (0.15.7)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (2.5.0)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from InfiniOps==0.1.0) (6.0.2)
Requirement already satisfied: attrs>=19.2.0 in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (25.3.0)
Requirement already satisfied: iniconfig in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (2.1.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (24.2)
Requirement already satisfied: pluggy<2.0,>=0.12 in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (1.6.0)
Requirement already satisfied: exceptiongroup>=1.0.0rc8 in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: tomli>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from pytest->InfiniOps==0.1.0) (2.2.1)
Requirement already satisfied: typing-extensions>=4.6.0 in /usr/local/lib/python3.10/dist-packages (from exceptiongroup>=1.0.0rc8->pytest->InfiniOps==0.1.0) (4.15.0)
Requirement already satisfied: coverage>=7.10.6 in /usr/local/lib/python3.10/dist-packages (from coverage[toml]>=7.10.6->pytest-cov->InfiniOps==0.1.0) (7.13.5)
Requirement already satisfied: execnet>=2.1 in /usr/local/lib/python3.10/dist-packages (from pytest-xdist->InfiniOps==0.1.0) (2.1.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (3.19.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (3.4.2)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (3.1.6)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (2025.9.0)
Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.10/dist-packages (from torch->InfiniOps==0.1.0) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy==1.13.1->torch->InfiniOps==0.1.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->InfiniOps==0.1.0) (3.0.2)
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): still running...
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=403085 sha256=a972c7b1e46c3bdf81d72263f52734d563a374ed267d66c8efd69676890a0741
  Stored in directory: /tmp/pip-ephem-wheel-cache-49b8ut6h/wheels/ac/4c/a5/78fe3376fbe0f633e8ad47ec3e677a6762cbf147a5e0195bab
Successfully built InfiniOps
Installing collected packages: InfiniOps
Successfully installed InfiniOps-0.1.0

zhangyue207 · 2026-04-10T06:28:05Z

ascend

tests/test_gemm.py::test_gemm[npu-dtype2-0.01-0.01-1-True-True-1-1-a_shape0-b_shape0-c_shape0-None-None-None]
[gw0] [ 99%] SKIPPED tests/test_gemm.py::test_gemm[npu-dtype2-0.01-0.01-1-True-True-1-1-a_shape0-b_shape0-c_shape0-None-None-None]
tests/test_gemm.py::test_gemm[npu-dtype2-0.01-0.01-1-True-True-1-1-a_shape1-b_shape1-c_shape1-None-None-None]
[gw0] [ 99%] SKIPPED tests/test_gemm.py::test_gemm[npu-dtype2-0.01-0.01-1-True-True-1-1-a_shape1-b_shape1-c_shape1-None-None-None]
tests/test_gemm.py::test_gemm[npu-dtype2-0.01-0.01-1-True-True-1-1-a_shape2-b_shape2-c_shape2-a_strides2-b_strides2-c_strides2]
[gw0] [ 99%] SKIPPED tests/test_gemm.py::test_gemm[npu-dtype2-0.01-0.01-1-True-True-1-1-a_shape2-b_shape2-c_shape2-a_strides2-b_strides2-c_strides2]
tests/test_gemm.py::test_gemm[npu-dtype2-0.01-0.01-1-True-True-1-1-a_shape3-b_shape3-c_shape3-a_strides3-b_strides3-c_strides3]
[gw0] [ 99%] SKIPPED tests/test_gemm.py::test_gemm[npu-dtype2-0.01-0.01-1-True-True-1-1-a_shape3-b_shape3-c_shape3-a_strides3-b_strides3-c_strides3]
tests/test_gemm.py::test_gemm[npu-dtype2-0.01-0.01-1-True-True-1-1-a_shape4-b_shape4-c_shape4-None-None-None]
[gw0] [100%] SKIPPED tests/test_gemm.py::test_gemm[npu-dtype2-0.01-0.01-1-True-True-1-1-a_shape4-b_shape4-c_shape4-None-None-None]

----------- generated xml file: /workspace/results/test-results.xml ------------
===================== 1500 passed, 1500 skipped in 29.93s ======================
========== Summary ==========

- Rename `toAclDtype` → `ToAclDtype`, `isIntegerDtype` → `IsIntegerDtype` (Google C++ Style Guide PascalCase). - Reorder `switch` cases in `ToAclDtype` to match `DataType` enum definition. - Simplify `device_.h` include to `#include "device.h"`. - Add Markdown backticks to code references in comments and help messages. - Add blank lines before `return`/`if` per CONTRIBUTING.md Python style rules. - Reorder pybind11 generated params: `Handle` (`stream`) before `Config` (`implementation_index`), matching `Operator::call` signature. - Rename `Matmul` → `MatMul` (ONNX convention), params → `input`/`other`/`out`, remove `trans_a`/`trans_b` (use `Gemm` for transposed matmul). - Rename `AddRmsNorm` params: `x1`/`x2`/`gamma` → `input`/`other`/`weight`, `y_out`/`x_out` → `out`/`rstd_out` (PyTorch conventions). - Rename `skip_unsupported_dtype` → `skip_unsupported_dtypes`. - Replace `get_npu_stream` with generic `get_stream(device)` using `torch.accelerator.current_stream` with device-specific fallbacks. - Reorder `_PLATFORM_TO_TORCH_DEVICE` with `nvidia` first.

…ld integration Add Ascend platform scaffolding: - `device_.h`: `DeviceEnabled<kAscend>` specialization - `data_type_.h`: `toAclDtype()`, `isIntegerDtype()` - `common.h`: `buildAclTensor()` with optional transpose - `workspace_pool_.h`: stream-keyed workspace allocator - `runtime_.h`: `Runtime<kAscend>` (Malloc, Free, Memcpy, Memset) - 5 new operator base classes (AddRmsNorm, FlashAttention, Matmul, ReshapeAndCache, RotaryEmbedding) Integrate into CMake build system, Python binding generation (stream + optional tensor support), and examples runtime API.

…emove missing include - Wrap `aclrtMemcpy` (5-arg) and `aclrtMemset` (4-arg) in lambdas to match the generic 4-arg / 3-arg calling convention used by examples. - Assert `aclrtMalloc` return value in `WorkspacePool::ensure()`. - Remove `ascend/gemm/kernel.h` include from `runtime_api.h` (file does not exist until the kernels commit).

- Add Ascend GEMM specialization using `aclnnAddmm`/`aclnnBaddbmm`. - Add `get_npu_stream()` helper and NPU device detection in test utils. - Add `skip_unsupported_dtype` fixture for Ascend in conftest. - Update `runtime_api.h` with Ascend backend entry.

The `aclrtMalloc` call was the sole expression inside `assert()`, so it was compiled away in release builds (NDEBUG). This left the workspace buffer null, causing `aclnnAddmm` to return ACLNN_ERR_PARAM_NULLPTR (161001) for any operation that requires workspace (e.g. alpha != 1.0).

`CudaCausalSoftmax` was missing `#include "cuda/runtime_utils.h"`, causing `RuntimeUtils` to be undefined. Drop `std::forward` from `Operator::make` nested lambda — NVCC instantiates the body during SFINAE invocability checks even inside `if constexpr` false branches, causing template resolution failures. All operator constructors take parameters by value, so lvalue pass has identical semantics.

Upgrade base image from `nvcr.io/nvidia/pytorch:24.10-py3` (CUDA 12.6) to `25.12-py3` (CUDA 13.1), aligning CI with the local dev environment. Restore `std::forward<Args>(args)...` in `Operator::make`, as the NVCC bug that required dropping it is fixed in the newer toolkit.

`Tensor::Size` (`unsigned long`) to `int64_t` narrowing is an error on MetaX's clang-based compiler (`-Wc++11-narrowing`).

- Add blank lines between struct/class members per style guide - Capitalize comments and use backtick syntax for code refs in `matmul.h` - Move `import re` to module level in `generate_wrappers.py` - Add blank lines before `for`/`return` per PEP 8 in `generate_wrappers.py` - Replace `-k npu` with `--devices ascend` in CI config

- Fix `ruff format` violations in `generate_wrappers.py` and `test_gemm.py`. - Fix `ruff isort` violation: move `import re` into stdlib group. - Add backticks around identifiers in comments (`numel()`, `operator()`, `make()`, `torch_npu`, `uint16`/`uint32`/`uint64`). - Add missing blank line after `if` block in `skip_unsupported_dtype`. - Remove `.worktrees/` from project `.gitignore` (belongs in global gitignore).

- Rename `toAclDtype` → `ToAclDtype`, `isIntegerDtype` → `IsIntegerDtype` (Google C++ Style Guide PascalCase). - Reorder `switch` cases in `ToAclDtype` to match `DataType` enum definition. - Simplify `device_.h` include to `#include "device.h"`. - Add Markdown backticks to code references in comments and help messages. - Add blank lines before `return`/`if` per CONTRIBUTING.md Python style rules. - Reorder pybind11 generated params: `Handle` (`stream`) before `Config` (`implementation_index`), matching `Operator::call` signature. - Rename `Matmul` → `MatMul` (ONNX convention), params → `input`/`other`/`out`, remove `trans_a`/`trans_b` (use `Gemm` for transposed matmul). - Rename `AddRmsNorm` params: `x1`/`x2`/`gamma` → `input`/`other`/`weight`, `y_out`/`x_out` → `out`/`rstd_out` (PyTorch conventions). - Rename `skip_unsupported_dtype` → `skip_unsupported_dtypes`. - Replace `get_npu_stream` with generic `get_stream(device)` using `torch.accelerator.current_stream` with device-specific fallbacks. - Reorder `_PLATFORM_TO_TORCH_DEVICE` with `nvidia` first.

…peAndCache`, `RotaryEmbedding`

The codegen script `generate_wrappers.py` uses `_snake_to_pascal()` to derive the class name from the filename. `matmul` -> `Matmul`, but the class was renamed to `MatMul` (ONNX convention). Renaming the file to `mat_mul.h` makes `_snake_to_pascal("mat_mul")` -> `MatMul`, fixing the `IndexError: list index out of range` build failure.

The CI image has not been rebuilt with the 25.12 base yet, so NVCC 12.6 (in 24.10-py3) still instantiates the std::forward call inside if-constexpr false branches. Drop std::forward — all operator constructors take parameters by value, so lvalue pass is equivalent.

This reverts commit bf9e4b1.

voltjia · 2026-04-14T03:54:37Z

results.log

zhangyue207 force-pushed the feat/ascend-framework branch 2 times, most recently from fb9f42f to 62fb25a Compare April 10, 2026 04:06

zhangyue207 force-pushed the feat/ascend-framework branch from 80acc8b to 7628b2f Compare April 10, 2026 16:58

voltjia requested changes Apr 13, 2026

View reviewed changes

zhangyue and others added 17 commits April 13, 2026 21:33

style(ascend): apply clang-format to framework headers

44d681a

fix(nvidia): restore CUDA::cublasLt link dependency

1ccadc0

feat(test): add --devices option to pytest for platform-name filtering

2fa9e13

fix: add explicit narrowing casts in RotaryEmbedding initializer list

40b5858

`Tensor::Size` (`unsigned long`) to `int64_t` narrowing is an error on MetaX's clang-based compiler (`-Wc++11-narrowing`).

docs(base): add vLLM interface references to FlashAttention, `Resha…

2cb185b

…peAndCache`, `RotaryEmbedding`

Revert "fix(nvidia): revert std::forward workaround for NVCC 12.6 bug"

7398f9f

This reverts commit bf9e4b1.

zhangyue207 force-pushed the feat/ascend-framework branch from bf9e4b1 to 7398f9f Compare April 13, 2026 13:41

style: remove extra blank line in conftest.py

978f948

docs: add TODO for making eps optional in AddRmsNorm

2f5acb4

voltjia approved these changes Apr 14, 2026

View reviewed changes

voltjia merged commit 0c63c78 into master Apr 14, 2026

voltjia deleted the feat/ascend-framework branch April 14, 2026 03:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ascend): add Ascend framework layer — runtime, type mapping, bui…#46

feat(ascend): add Ascend framework layer — runtime, type mapping, bui…#46
voltjia merged 19 commits intomasterfrom
feat/ascend-framework

zhangyue207 commented Apr 8, 2026

Uh oh!

zhangyue207 commented Apr 10, 2026

Uh oh!

zhangyue207 commented Apr 10, 2026

Uh oh!

zhangyue207 commented Apr 10, 2026 •

edited

Loading

Uh oh!

zhangyue207 commented Apr 10, 2026

Uh oh!

zhangyue207 commented Apr 10, 2026 •

edited

Loading

Uh oh!

zhangyue207 commented Apr 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

voltjia commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhangyue207 commented Apr 8, 2026

Uh oh!

zhangyue207 commented Apr 10, 2026

Uh oh!

zhangyue207 commented Apr 10, 2026

Uh oh!

zhangyue207 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhangyue207 commented Apr 10, 2026

Uh oh!

zhangyue207 commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhangyue207 commented Apr 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

voltjia commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhangyue207 commented Apr 10, 2026 •

edited

Loading

zhangyue207 commented Apr 10, 2026 •

edited

Loading