feat(hygon-gemm): add Hygon backend support for Add/Gemm by gongchensu · Pull Request #31 · InfiniTensor/InfiniOps

gongchensu · 2026-03-24T01:50:34Z

Summary

Add Hygon backend infrastructure under src/native/cuda/hygon/:
- src/native/cuda/hygon/device_.h
- src/native/cuda/hygon/device_property.h
- src/native/cuda/hygon/runtime_.h
- src/native/cuda/hygon/runtime_utils.h
- src/native/cuda/hygon/blas.h
- src/native/cuda/hygon/blas_utils.h
Add Hygon build integration through WITH_HYGON:
- Update backend/device auto-detection and Hygon toolchain setup in CMakeLists.txt.
- Register Hygon sources and backend wiring in src/CMakeLists.txt.
- Preserve the existing mutual-exclusion behavior among CUDA-like GPU backends.
Update Python binding device-name resolution in src/pybind11_utils.h so CUDA-compatible PyTorch device names resolve to the active InfiniOps backend, including Hygon.
Add Hygon CI and test-environment integration:
- .ci/config.yaml
- .ci/images/hygon/Dockerfile
- tests/conftest.py
Update build/example/documentation support for the new backend:
- examples/CMakeLists.txt
- README.md

Motivation

This PR introduces the Hygon backend infrastructure for InfiniOps.

Hygon DCU platforms expose a CUDA/HIP-compatible programming model, so InfiniOps can reuse the existing CUDA-style backend organization while adding Hygon-specific runtime, device, BLAS, and build-system integration. This makes it possible to compile and dispatch InfiniOps on Hygon hardware without changing the public operator API.

This PR is intentionally limited to backend infrastructure and related build/CI/test integration. It establishes the shared Hygon foundation needed by follow-up operator PRs, including:

Hygon runtime and device abstractions
Hygon BLAS integration utilities
Hygon build and toolchain configuration
Hygon CI and test-device wiring
Python binding device-name resolution for the active backend

Operator implementations such as Add and Gemm will be submitted separately in follow-up PRs on top of this infrastructure layer.

No linked issue.

Type of Change

feat — new feature / new operator / new platform
fix — bug fix
perf — performance improvement (no behavioral change)
refactor — code restructuring without behavior change
test — adding or fixing tests only
docs — documentation only
build / ci — build system or CI configuration
chore — tooling, formatting, or other non-code changes
Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

Test Results on Supported Platforms

Platform	Built	`pytest` Result
NVIDIA	Successfully installed InfiniOps-0.1.0	5 failed, 3108 passed, 1000 skipped in 191.93s (0:03:11)
Iluvatar	Successfully installed InfiniOps-0.1.0	5 failed, 2909 passed in 231.55s (0:03:51)
MetaX	Successfully installed InfiniOps-0.1.0	5 failed, 2909 passed in 361.25s (0:06:01)
Cambricon	Successfully installed InfiniOps-0.1.0	5 failed, 2179 passed, 1511 skipped, 6 warnings in 841.64s (0:14:01)
Moore	Successfully installed InfiniOps-0.1.0	5 failed, 3600 passed, 315 skipped in 756.88s (0:12:36)
Ascend	Successfully installed InfiniOps-0.1.0	5 failed, 3815 passed, 90 skipped in 492.85s (0:08:12)
Hygon	Successfully installed InfiniOps-0.1.0	3180 passed, 441 skipped in 18.35s

Full `pytest` output (optional)

TODO: paste full or trimmed pytest output here.

Benchmark / Performance Impact

No benchmark numbers are included in this PR.

This PR adds initial Hygon backend support and validates functionality through Add and Gemm. It does not change existing CPU, NVIDIA, Iluvatar, MetaX, Cambricon, Moore, or Ascend implementations. Performance tuning for Hygon kernels and BLAS usage can be handled in follow-up PRs once the backend integration is established.

Notes for Reviewers

The Hygon backend is added as a CUDA-like backend and participates in the existing CUDA-like backend mutual-exclusion logic.
src/hygon/gemm/cublas.h intentionally follows the existing BLAS-backed GEMM structure used by CUDA-like platforms.
src/pybind11_utils.h now accepts backend-specific internal device names in addition to PyTorch device names, so a CUDA-compatible PyTorch device type can resolve to the enabled InfiniOps backend.
This PR is split into three reviewable commits:
- feat(hygon): add Hygon backend infrastructure
- feat(hygon-add): add Hygon backend support for Add
- feat(hygon-gemm): add Hygon backend support for Gemm

Checklist

Every contributor must verify every item below before requesting
review. Tick each box only after the check has actually been performed —
do not tick speculatively. If an item truly does not apply, replace the
checkbox with N/A and briefly explain why in an inline comment.

Title, Branch, and Commits

PR title follows Conventional Commits (e.g. feat(hygon): add Add and Gemm backend support).
Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
Each commit message follows Conventional Commits.
Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests feat: support GEMM on CPU & MetaX and add generic dispatcher #1).
No stray merge commits from master — the branch is rebased cleanly on top of the current master.
No fixup! / squash! / wip commits remain.

Scope and Design

Changes are minimal — this PR only adds Hygon backend infrastructure plus Hygon Add and Gemm support (CONTRIBUTING.md §Code/General feat: support GEMM on CPU & MetaX and add generic dispatcher #1).
No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
No unrelated formatting churn that would obscure the diff.
Public API changes are intentional, documented, and reflected in affected callers/tests.

General Code Hygiene (applies to all languages)

The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General build: add CMake build system and README #2).
Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General feat(gemm-iluvatar): add Iluvatar GEMM backend support #3).
No trailing whitespace, tab/space mixing, or stray BOMs.
Identifiers in comments and error messages are wrapped in backticks where applicable (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General feat: add the implementation of Add operator on CPU, NVIDIA, and MetaX #4).
All comments and error messages are in English (CONTRIBUTING.md §Code/General refactor: adapt dispatcher for full C++17 compatibility and support pip install on MetaX #5).
Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General feat(ops): add RmsNorm with Iluvatar, NVIDIA, CPU backends and fp16/bf16 support #6; Python build: add CMake build system and README #2).

C++ Specific (if C++ files changed)

Python Specific (if Python files changed)

N/A — no Python source files were changed in this PR.
N/A — ruff check is not applicable to this PR because no Python source files were changed.
N/A — ruff format --check is not applicable to this PR because no Python source files were changed.
N/A — no Python comments or error messages were added.
N/A — no Python control-flow formatting was changed.
N/A — no Python docstrings were added or changed.
N/A — no Python type hints were added or changed.

Testing

pytest was run locally on Hygon hardware, and the results are recorded in the "Test Results" table above (CONTRIBUTING.md §Pull Requests feat(gemm-iluvatar): add Iluvatar GEMM backend support #3).
For any platform that could not be tested, an explicit reason is given in the table and a reviewer with access has been tagged.
New functionality has matching coverage through existing tests under tests/, especially tests/test_add.py and tests/test_gemm.py (CONTRIBUTING.md §Adding an Operator feat(gemm-iluvatar): add Iluvatar GEMM backend support #3).
Tests use pytest.mark.parametrize correctly through the existing project test patterns.
Where appropriate, existing pytest.mark.auto_act_and_assert coverage is reused and the test returns a Payload whose func and ref share the same calling convention.
Default dtype / device parameterization is relied on, or overridden with an explicit pytest.mark.parametrize when necessary.
N/A — no new flaky test was added.
N/A — this is a new backend feature, not a bug fix requiring a regression test that fails on master.

Build, CI, and Tooling

The project builds cleanly from a fresh directory with pip install .[dev] on Hygon.
compile_commands.json still regenerates (CMake option CMAKE_EXPORT_COMPILE_COMMANDS=ON in pyproject.toml — required by the code-lint skill and clang-tidy -p).
Hygon has been added to auto-detection in CMakeLists.txt under if(AUTO_DETECT_DEVICES) and to if(AUTO_DETECT_BACKENDS) where applicable.
Only one CUDA-like GPU backend is selectable at a time — the existing mutual-exclusion check in CMakeLists.txt is not broken.
clang-format.yml is green locally.
N/A — ruff.yml is not applicable to this PR because no Python source files were changed.
No new runtime dependency was added without updating pyproject.toml's [project.optional-dependencies].

Documentation

README.md, examples, or inline docs were updated where Hygon behavior, build flags, or developer workflow changed.
New Hygon dispatch helpers and runtime utilities follow the existing backend layout and naming conventions.
N/A — no user-visible breaking change is introduced.

Security and Safety

No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
Third-party code is license-compatible and attributed where applicable.
No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were intentionally introduced.

gongchensu · 2026-03-25T07:28:37Z

A100编译及相关算子测试：

沐曦编译及相关算子测试：

摩尔编译及算子测试：

寒武纪编译及算子测试：

天数编译测试：

baominghelly · 2026-05-12T02:16:18Z

代码结构调整了，summary部分是不是也需要调整一下啊？
在其他平台上的CI测试结果要不要也粘贴一下啊？

baominghelly · 2026-05-12T03:25:43Z

这3个todo也改一下呗：
TODO: paste build result, e.g. Successfully installed InfiniOps-0.1.0
TODO: paste result for tests/test_add.py and tests/test_gemm.py
TODO: Hygon DCU model / driver / runtime version

baominghelly · 2026-05-13T02:50:56Z

测试中的5 failed 是为啥失败啊？

voltjia

算子添加应当与编译、CI、运行时相关部分独立拆开，不应合入统一 PR 提交。Add、Gemm 等不同算子，也需要拆开。

gongchensu self-assigned this Mar 24, 2026

gongchensu force-pushed the feat/hygon-gemm branch from a56e674 to 2290578 Compare March 25, 2026 06:51

gongchensu force-pushed the feat/hygon-gemm branch 2 times, most recently from 9b9dda2 to e397d93 Compare March 26, 2026 09:03

gongchensu marked this pull request as draft April 13, 2026 08:19

gongchensu force-pushed the feat/hygon-gemm branch from e397d93 to c8d8b56 Compare April 27, 2026 06:26

gongchensu changed the base branch from feat/dev-infra to master April 27, 2026 06:27

gongchensu force-pushed the feat/hygon-gemm branch from c8d8b56 to ceb9a4c Compare April 27, 2026 07:53

gongchensu requested a review from baominghelly April 27, 2026 08:40

gongchensu force-pushed the feat/hygon-gemm branch 2 times, most recently from aa6ed00 to 8fd111a Compare April 28, 2026 02:25

gongchensu marked this pull request as ready for review April 28, 2026 02:44

gongchensu requested a review from a team April 28, 2026 02:44

gongchensu force-pushed the feat/hygon-gemm branch from 8fd111a to b1117d4 Compare April 28, 2026 03:18

gongchensu requested a review from voltjia May 8, 2026 01:40

gongchensu force-pushed the feat/hygon-gemm branch 2 times, most recently from 8a7a264 to c9dccc3 Compare May 11, 2026 07:19

baominghelly requested changes May 12, 2026

View reviewed changes

baominghelly previously approved these changes May 13, 2026

View reviewed changes

voltjia requested changes May 15, 2026

View reviewed changes

gongchensu dismissed baominghelly’s stale review via a8f2bf3 May 18, 2026 08:25

gongchensu force-pushed the feat/hygon-gemm branch from 5901e76 to a8f2bf3 Compare May 18, 2026 08:25

gongchensu requested a review from voltjia May 18, 2026 08:36

gongchensu force-pushed the feat/hygon-gemm branch 2 times, most recently from 5901e76 to 153ea5d Compare May 19, 2026 07:06

feat(hygon): add backend infrastructure

77b13ef

gongchensu force-pushed the feat/hygon-gemm branch from 153ea5d to 77b13ef Compare May 19, 2026 08:06

voltjia reviewed May 20, 2026

View reviewed changes

Comment thread .github/ci_config.yml

Merge branch 'master' into feat/hygon-gemm

dbe302d

gongchensu force-pushed the feat/hygon-gemm branch from 08ea7a8 to dbe302d Compare May 20, 2026 07:39

baominghelly approved these changes May 20, 2026

View reviewed changes

voltjia approved these changes May 20, 2026

View reviewed changes

voltjia merged commit 76094ad into InfiniTensor:master May 20, 2026
22 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hygon-gemm): add Hygon backend support for Add/Gemm#31

feat(hygon-gemm): add Hygon backend support for Add/Gemm#31
voltjia merged 2 commits into
InfiniTensor:masterfrom
gongchensu:feat/hygon-gemm

gongchensu commented Mar 24, 2026 •

edited

Loading

Uh oh!

gongchensu commented Mar 25, 2026 •

edited

Loading

Uh oh!

baominghelly commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

baominghelly commented May 12, 2026

Uh oh!

baominghelly commented May 13, 2026

Uh oh!

voltjia left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gongchensu commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Type of Change

Platforms Affected

Test Results on Supported Platforms

Benchmark / Performance Impact

Notes for Reviewers

Checklist

Title, Branch, and Commits

Scope and Design

General Code Hygiene (applies to all languages)

C++ Specific (if C++ files changed)

Python Specific (if Python files changed)

Testing

Build, CI, and Tooling

Documentation

Security and Safety

Uh oh!

gongchensu commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baominghelly commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

baominghelly commented May 12, 2026

Uh oh!

baominghelly commented May 13, 2026

Uh oh!

voltjia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gongchensu commented Mar 24, 2026 •

edited

Loading

gongchensu commented Mar 25, 2026 •

edited

Loading