fix(tests): run `causal_softmax` reference on CPU by voltjia · Pull Request #612 · InfiniTensor/InfiniOps

voltjia · 2026-05-16T02:40:04Z

Summary

Moves the tests/test_causal_softmax.py PyTorch reference computation to CPU.
Copies the reference result back to the tested output device and dtype before comparison.

Motivation

Some vendor PyTorch CUDA forks can fail their own softmax reference kernel with invalid device function on this test shape/dtype matrix. That makes the InfiniOps test fail before it can validate infini.ops.causal_softmax. Running the reference calculation on CPU keeps the test focused on the InfiniOps implementation and avoids depending on the vendor PyTorch softmax CUDA kernel.

Closes N/A.

Type of Change

feat — new feature / new operator / new platform
fix — bug fix
perf — performance improvement (no behavioral change)
refactor — code restructuring without behavior change
test — adding or fixing tests only
docs — documentation only
build / ci — build system or CI configuration
chore — tooling, formatting, or other non-code changes
Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

Test Results on Supported Platforms

The clean full-suite validation was run on a temporary branch containing this PR plus #611, because the two fixes address independent failures that otherwise mask each other in full-platform runs.

Platform	Built	`pytest` Result	Notes / Hardware
NVIDIA	Yes	`6295 passed, 2447 skipped in 345.48s`	Full suite passed.
Iluvatar	Yes	`4795 passed, 2447 skipped in 284.25s`	Full suite passed. Targeted `tests/test_causal_softmax.py` passed: `18 passed in 3.35s`.
MetaX	Yes	`5795 passed, 1447 skipped in 361.81s`	Full suite passed.
Cambricon	Yes	`3073 passed, 3857 skipped in 920.38s`	Full suite passed.
Moore	Yes	`5759 passed, 1483 skipped in 574.92s`	Full suite passed.
Ascend	Yes	`4472 passed, 2710 skipped in 527.68s`; wrapper exit code `137`	Pytest summary passed; the container exited after the test summary.

Full `pytest` output (optional)

Iluvatar targeted validation:
18 passed in 3.35s

Combined validation with #611:
NVIDIA: 6295 passed, 2447 skipped in 345.48s
Iluvatar: 4795 passed, 2447 skipped in 284.25s
MetaX: 5795 passed, 1447 skipped in 361.81s
Cambricon: 3073 passed, 3857 skipped in 920.38s
Moore: 5759 passed, 1483 skipped in 574.92s
Ascend: 4472 passed, 2710 skipped in 527.68s; wrapper exit code 137 after pytest summary

Benchmark / Performance Impact

N/A. This changes only a test reference path.

Notes for Reviewers

The operator under test still runs on the requested device. Only the PyTorch reference calculation moves to CPU to avoid a vendor PyTorch reference-kernel failure.

This PR is independent from #611, but a fully clean all-platform suite currently requires both fixes: this PR removes the Iluvatar tests/test_causal_softmax.py reference failure, while #611 removes unrelated tests/test_gemm.py fallback failures.

Checklist

Title, Branch, and Commits

PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
Each commit message follows Conventional Commits.
Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
No stray merge commits from master — the branch is rebased cleanly on top of the current master.
No fixup! / squash! / wip commits remain.

Scope and Design

Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
No unrelated formatting churn that would obscure the diff.
N/A — No public API changes.

General Code Hygiene (applies to all languages)

The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
No trailing whitespace, tab/space mixing, or stray BOMs.
Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
All comments and error messages are in English (CONTRIBUTING.md §Code/General).
Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific (if C++ files changed)

N/A — No C++ files changed.

Python Specific (if Python files changed)

Code is PEP 8 compliant; ruff check passes cleanly on CI (see .github/workflows/ruff.yml).
ruff format --check passes cleanly — if not, run ruff format and commit the result.
Comments are complete English sentences, starting with a capital letter and ending with punctuation; Markdown backticks are used for code references (CONTRIBUTING.md §Python).
Framework-specific conventions (e.g. lowercase pytest.skip messages without terminal period) are honored where applicable (CONTRIBUTING.md §Python).
No blank line between the function signature and the body when there is no docstring or comment (CONTRIBUTING.md §Python).
A blank line is present before and after if, for, and similar control-flow statements (CONTRIBUTING.md §Python).
A blank line appears before each return, except when it directly follows a control-flow statement like if or for (CONTRIBUTING.md §Python).
N/A — No Python docstrings added.
N/A — No type hints changed.

Testing

pytest was run locally on every supported platform that this PR can affect, and the results are recorded in the "Test Results" table above (CONTRIBUTING.md §Pull Requests).
N/A — No platform is intentionally omitted.
New functionality has matching tests under tests/ following tests/test_add.py / tests/test_gemm.py patterns (CONTRIBUTING.md §Adding an Operator).
Tests use pytest.mark.parametrize correctly: dependent parameters share one decorator (e.g. @pytest.mark.parametrize("dtype, rtol, atol", …)); independent parameters use separate decorators ordered by parameter declaration.
Where appropriate, pytest.mark.auto_act_and_assert is used and the test returns a Payload whose func and ref share the same calling convention.
Default dtype / device parameterization is relied on, or overridden with an explicit pytest.mark.parametrize when necessary.
Any new test that is flaky under parallelism is marked so, or documented to require pytest -n 1.
For bug fixes: a regression test has been added that fails on master and passes with this PR.

Build, CI, and Tooling

The project builds cleanly from a fresh directory with pip install .[dev] on at least one affected platform.
compile_commands.json still regenerates (CMake option CMAKE_EXPORT_COMPILE_COMMANDS=ON in pyproject.toml — required by the code-lint skill and clang-tidy -p).
N/A — No new backend or device auto-detection.
Only one CUDA-like GPU backend is selectable at a time — the existing mutual-exclusion check in CMakeLists.txt is not broken.
Both CI workflows (clang-format.yml, ruff.yml) are green locally (or expected to be green on CI).
No new runtime dependency was added without updating pyproject.toml's [project.optional-dependencies] (or justified in the PR description).

Documentation

N/A — No user-facing docs or workflow changed.
N/A — No new operators, dispatch helpers, or public utilities.
N/A — No user-visible breaking change.

Security and Safety

No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
N/A — No third-party code added.
No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

fix(tests): run causal softmax reference on CPU

8067f90

voltjia mentioned this pull request May 16, 2026

fix(torch): make gemm fallback portable #611

Open

65 tasks

voltjia marked this pull request as ready for review May 16, 2026 05:36

voltjia requested a review from a team May 16, 2026 05:36

voltjia changed the title ~~fix(tests): run causal softmax reference on CPU~~ fix(tests): run causal_softmax reference on CPU May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tests): run `causal_softmax` reference on CPU#612

fix(tests): run `causal_softmax` reference on CPU#612
voltjia wants to merge 1 commit into
masterfrom
fix/causal-softmax-reference

voltjia commented May 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

voltjia commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Type of Change

Platforms Affected

Test Results on Supported Platforms

Benchmark / Performance Impact

Notes for Reviewers

Checklist

Title, Branch, and Commits

Scope and Design

General Code Hygiene (applies to all languages)

C++ Specific (if C++ files changed)

Python Specific (if Python files changed)

Testing

Build, CI, and Tooling

Documentation

Security and Safety

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

voltjia commented May 16, 2026 •

edited

Loading