Skip to content

feat(hygon-gemm): add Hygon backend support for Add/Gemm#31

Merged
voltjia merged 2 commits into
InfiniTensor:masterfrom
gongchensu:feat/hygon-gemm
May 20, 2026
Merged

feat(hygon-gemm): add Hygon backend support for Add/Gemm#31
voltjia merged 2 commits into
InfiniTensor:masterfrom
gongchensu:feat/hygon-gemm

Conversation

@gongchensu
Copy link
Copy Markdown
Contributor

@gongchensu gongchensu commented Mar 24, 2026

Summary

  • Add Hygon backend infrastructure under src/native/cuda/hygon/:
    • src/native/cuda/hygon/device_.h
    • src/native/cuda/hygon/device_property.h
    • src/native/cuda/hygon/runtime_.h
    • src/native/cuda/hygon/runtime_utils.h
    • src/native/cuda/hygon/blas.h
    • src/native/cuda/hygon/blas_utils.h
  • Add Hygon build integration through WITH_HYGON:
    • Update backend/device auto-detection and Hygon toolchain setup in CMakeLists.txt.
    • Register Hygon sources and backend wiring in src/CMakeLists.txt.
    • Preserve the existing mutual-exclusion behavior among CUDA-like GPU backends.
  • Update Python binding device-name resolution in src/pybind11_utils.h so CUDA-compatible PyTorch device names resolve to the active InfiniOps backend, including Hygon.
  • Add Hygon CI and test-environment integration:
    • .ci/config.yaml
    • .ci/images/hygon/Dockerfile
    • tests/conftest.py
  • Update build/example/documentation support for the new backend:
    • examples/CMakeLists.txt
    • README.md

Motivation

This PR introduces the Hygon backend infrastructure for InfiniOps.

Hygon DCU platforms expose a CUDA/HIP-compatible programming model, so InfiniOps can reuse the existing CUDA-style backend organization while adding Hygon-specific runtime, device, BLAS, and build-system integration. This makes it possible to compile and dispatch InfiniOps on Hygon hardware without changing the public operator API.

This PR is intentionally limited to backend infrastructure and related build/CI/test integration. It establishes the shared Hygon foundation needed by follow-up operator PRs, including:

  • Hygon runtime and device abstractions
  • Hygon BLAS integration utilities
  • Hygon build and toolchain configuration
  • Hygon CI and test-device wiring
  • Python binding device-name resolution for the active backend

Operator implementations such as Add and Gemm will be submitted separately in follow-up PRs on top of this infrastructure layer.

No linked issue.

Type of Change

  • feat — new feature / new operator / new platform
  • fix — bug fix
  • perf — performance improvement (no behavioral change)
  • refactor — code restructuring without behavior change
  • test — adding or fixing tests only
  • docs — documentation only
  • build / ci — build system or CI configuration
  • chore — tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • Hygon (WITH_HYGON)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

Platform Built pytest Result Notes / Hardware
NVIDIA Successfully installed InfiniOps-0.1.0 5 failed, 3108 passed, 1000 skipped in 191.93s (0:03:11)
Iluvatar Successfully installed InfiniOps-0.1.0 5 failed, 2909 passed in 231.55s (0:03:51)
MetaX Successfully installed InfiniOps-0.1.0 5 failed, 2909 passed in 361.25s (0:06:01)
Cambricon Successfully installed InfiniOps-0.1.0 5 failed, 2179 passed, 1511 skipped, 6 warnings in 841.64s (0:14:01)
Moore Successfully installed InfiniOps-0.1.0 5 failed, 3600 passed, 315 skipped in 756.88s (0:12:36)
Ascend Successfully installed InfiniOps-0.1.0 5 failed, 3815 passed, 90 skipped in 492.85s (0:08:12)
Hygon Successfully installed InfiniOps-0.1.0 3180 passed, 441 skipped in 18.35s
Full `pytest` output (optional)
TODO: paste full or trimmed pytest output here.

Benchmark / Performance Impact

No benchmark numbers are included in this PR.

This PR adds initial Hygon backend support and validates functionality through Add and Gemm. It does not change existing CPU, NVIDIA, Iluvatar, MetaX, Cambricon, Moore, or Ascend implementations. Performance tuning for Hygon kernels and BLAS usage can be handled in follow-up PRs once the backend integration is established.

Notes for Reviewers

  • The Hygon backend is added as a CUDA-like backend and participates in the existing CUDA-like backend mutual-exclusion logic.
  • src/hygon/gemm/cublas.h intentionally follows the existing BLAS-backed GEMM structure used by CUDA-like platforms.
  • src/pybind11_utils.h now accepts backend-specific internal device names in addition to PyTorch device names, so a CUDA-compatible PyTorch device type can resolve to the enabled InfiniOps backend.
  • This PR is split into three reviewable commits:
    • feat(hygon): add Hygon backend infrastructure
    • feat(hygon-add): add Hygon backend support for Add
    • feat(hygon-gemm): add Hygon backend support for Gemm

Checklist

Every contributor must verify every item below before requesting
review. Tick each box only after the check has actually been performed —
do not tick speculatively. If an item truly does not apply, replace the
checkbox with N/A and briefly explain why in an inline comment.

Title, Branch, and Commits

  • PR title follows Conventional Commits (e.g. feat(hygon): add Add and Gemm backend support).
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests feat: support GEMM on CPU & MetaX and add generic dispatcher #1).
  • No stray merge commits from master — the branch is rebased cleanly on top of the current master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal — this PR only adds Hygon backend infrastructure plus Hygon Add and Gemm support (CONTRIBUTING.md §Code/General feat: support GEMM on CPU & MetaX and add generic dispatcher #1).
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • Public API changes are intentional, documented, and reflected in affected callers/tests.

General Code Hygiene (applies to all languages)

C++ Specific (if C++ files changed)

Python Specific (if Python files changed)

  • N/A — no Python source files were changed in this PR.
  • N/A — ruff check is not applicable to this PR because no Python source files were changed.
  • N/A — ruff format --check is not applicable to this PR because no Python source files were changed.
  • N/A — no Python comments or error messages were added.
  • N/A — no Python control-flow formatting was changed.
  • N/A — no Python docstrings were added or changed.
  • N/A — no Python type hints were added or changed.

Testing

  • pytest was run locally on Hygon hardware, and the results are recorded in the "Test Results" table above (CONTRIBUTING.md §Pull Requests feat(gemm-iluvatar): add Iluvatar GEMM backend support #3).
  • For any platform that could not be tested, an explicit reason is given in the table and a reviewer with access has been tagged.
  • New functionality has matching coverage through existing tests under tests/, especially tests/test_add.py and tests/test_gemm.py (CONTRIBUTING.md §Adding an Operator feat(gemm-iluvatar): add Iluvatar GEMM backend support #3).
  • Tests use pytest.mark.parametrize correctly through the existing project test patterns.
  • Where appropriate, existing pytest.mark.auto_act_and_assert coverage is reused and the test returns a Payload whose func and ref share the same calling convention.
  • Default dtype / device parameterization is relied on, or overridden with an explicit pytest.mark.parametrize when necessary.
  • N/A — no new flaky test was added.
  • N/A — this is a new backend feature, not a bug fix requiring a regression test that fails on master.

Build, CI, and Tooling

  • The project builds cleanly from a fresh directory with pip install .[dev] on Hygon.
  • compile_commands.json still regenerates (CMake option CMAKE_EXPORT_COMPILE_COMMANDS=ON in pyproject.toml — required by the code-lint skill and clang-tidy -p).
  • Hygon has been added to auto-detection in CMakeLists.txt under if(AUTO_DETECT_DEVICES) and to if(AUTO_DETECT_BACKENDS) where applicable.
  • Only one CUDA-like GPU backend is selectable at a time — the existing mutual-exclusion check in CMakeLists.txt is not broken.
  • clang-format.yml is green locally.
  • N/A — ruff.yml is not applicable to this PR because no Python source files were changed.
  • No new runtime dependency was added without updating pyproject.toml's [project.optional-dependencies].

Documentation

  • README.md, examples, or inline docs were updated where Hygon behavior, build flags, or developer workflow changed.
  • New Hygon dispatch helpers and runtime utilities follow the existing backend layout and naming conventions.
  • N/A — no user-visible breaking change is introduced.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • Third-party code is license-compatible and attributed where applicable.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were intentionally introduced.

@gongchensu gongchensu self-assigned this Mar 24, 2026
@gongchensu
Copy link
Copy Markdown
Contributor Author

gongchensu commented Mar 25, 2026

A100编译及相关算子测试:
image
沐曦编译及相关算子测试:
image
摩尔编译及算子测试:
image
寒武纪编译及算子测试:
image
天数编译测试:
image

@gongchensu gongchensu force-pushed the feat/hygon-gemm branch 2 times, most recently from 9b9dda2 to e397d93 Compare March 26, 2026 09:03
@gongchensu gongchensu marked this pull request as draft April 13, 2026 08:19
@gongchensu gongchensu changed the base branch from feat/dev-infra to master April 27, 2026 06:27
@gongchensu gongchensu requested a review from baominghelly April 27, 2026 08:40
@gongchensu gongchensu force-pushed the feat/hygon-gemm branch 2 times, most recently from aa6ed00 to 8fd111a Compare April 28, 2026 02:25
@gongchensu gongchensu marked this pull request as ready for review April 28, 2026 02:44
@gongchensu gongchensu requested a review from a team April 28, 2026 02:44
@gongchensu gongchensu requested a review from voltjia May 8, 2026 01:40
@gongchensu gongchensu force-pushed the feat/hygon-gemm branch 2 times, most recently from 8a7a264 to c9dccc3 Compare May 11, 2026 07:19
@baominghelly
Copy link
Copy Markdown

  1. 代码结构调整了,summary部分是不是也需要调整一下啊?
  2. 在其他平台上的CI测试结果要不要也粘贴一下啊?

Comment thread .ci/images/hygon/Dockerfile Outdated
Comment thread .ci/config.yaml Outdated
Comment thread src/native/cuda/hygon/blas_utils.h Outdated
Comment thread src/CMakeLists.txt
Comment thread src/native/cuda/hygon/device_.h Outdated
Comment thread CMakeLists.txt Outdated
Comment thread src/CMakeLists.txt Outdated
@baominghelly
Copy link
Copy Markdown

这3个todo也改一下呗:
TODO: paste build result, e.g. Successfully installed InfiniOps-0.1.0
TODO: paste result for tests/test_add.py and tests/test_gemm.py
TODO: Hygon DCU model / driver / runtime version

@baominghelly
Copy link
Copy Markdown

测试中的5 failed 是为啥失败啊?

baominghelly
baominghelly previously approved these changes May 13, 2026
Copy link
Copy Markdown
Collaborator

@voltjia voltjia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

算子添加应当与编译、CI、运行时相关部分独立拆开,不应合入统一 PR 提交。AddGemm 等不同算子,也需要拆开。

@gongchensu gongchensu requested a review from voltjia May 18, 2026 08:36
@gongchensu gongchensu force-pushed the feat/hygon-gemm branch 2 times, most recently from 5901e76 to 153ea5d Compare May 19, 2026 07:06
Comment thread .github/ci_config.yml
@voltjia voltjia merged commit 76094ad into InfiniTensor:master May 20, 2026
22 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants