adapt(norm): adapt tests/norm/ for Paddle compat by BingooYang · Pull Request #16 · PFCCLab/flashinfer

BingooYang · 2026-05-14T11:21:27Z

📌 Description

Adapt tests/norm/ for Paddle compat. All 4 test files handled:

Test file	Result	Change
`test_fused_dit_layernorm.py`	✅ 35 passed	Fix strided chunk + as_strided byte offset
`test_fused_rmsnorm_silu.py`	✅ 102 passed, 50 skipped	No change needed
`test_rmsnorm_fp4_quant_cute_dsl.py`	⏭ module-level skip	NVFP4 requires PyTorch 2.6+
`test_add_rmsnorm_fp4_quant_cute_dsl.py`	⏭ module-level skip	Same as above

3 key fixes:

Paddle chunk() returns contiguous copies (loses strides).
PyTorch chunk(6, dim=2) returns strided views (row stride=6×H); Paddle returns contiguous copies (row stride=H).
Fix: _chunk_strided() helper using torch.as_strided to reconstruct the correct stride.
Paddle as_strided storage_offset is in BYTES (not elements) — P0 silent data corruption.
Fix: storage_offset = chunk_idx * hidden_dim * temb.element_size()
pytest.skip(allow_module_level=True) required for module-level skip of NVFP4 tests.
pytestmark = pytest.mark.skip(...) does NOT prevent collection (2195 tests collected vs 0 with fix).

🔍 Related Issues

N/A

🚀 Pull Request Checklist

pre-commit checks pass
All adapted tests pass or are intentionally skipped with clear reason
scripts/paddle_all_test_cases.sh updated
adaptation_exp.md updated (§40–44, Section 十二)

🧪 Tests

tests/norm/test_fused_dit_layernorm.py           35 passed
tests/norm/test_fused_rmsnorm_silu.py            102 passed, 50 skipped
tests/norm/test_rmsnorm_fp4_quant_cute_dsl.py    SKIPPED (module-level)
tests/norm/test_add_rmsnorm_fp4_quant_cute_dsl.py SKIPPED (module-level)

Reviewer Notes

_chunk_strided in test_fused_dit_layernorm.py is the key fix; the byte-offset behaviour of Paddle as_strided differs from PyTorch and causes silent data corruption if not accounted for.
FP4 tests are skipped because torch.float4_e2m1fn_x2 (NVFP4 packed dtype) is only available in PyTorch 2.6+; the current Paddle compat environment ships an earlier version.

- test_fused_dit_layernorm.py: add _chunk_strided() helper using torch.as_strided to reconstruct correct stride from 4D temb tensor. Paddle chunk() returns contiguous copies (losing strides); kernel requires gate.stride(1)==6*hidden_dim. Offset uses byte units (Paddle as_strided storage_offset is in bytes, PyTorch in elements). Fix _make_strided_gate to use _chunk_strided instead of chunk(). - test_rmsnorm_fp4_quant_cute_dsl.py, test_add_rmsnorm_fp4_quant_cute_dsl.py: add module-level skip guard for torch.float4_e2m1fn_x2 (NVFP4 packed dtype, PyTorch 2.6+, not proxied in Paddle compat). Use pytest.skip(allow_module_level=True). - scripts/paddle_all_test_cases.sh: add test_fused_dit_layernorm.py; add comments for fp4 tests (skipped, unavailable dtype). Results: test_fused_rmsnorm_silu.py: 102 passed, 50 skipped test_fused_dit_layernorm.py: 35 passed fp4 tests: 2 skipped (dtype unavailable)

BingooYang merged commit b6fae6e into PFCCLab:0.6 May 14, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adapt(norm): adapt tests/norm/ for Paddle compat#16

adapt(norm): adapt tests/norm/ for Paddle compat#16
BingooYang merged 1 commit into
PFCCLab:0.6from
BingooYang:adapt/norm_all

BingooYang commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BingooYang commented May 14, 2026

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

🧪 Tests

Reviewer Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant