adapt(norm): adapt tests/norm/ for Paddle compat#16
Merged
Conversation
- test_fused_dit_layernorm.py: add _chunk_strided() helper using torch.as_strided to reconstruct correct stride from 4D temb tensor. Paddle chunk() returns contiguous copies (losing strides); kernel requires gate.stride(1)==6*hidden_dim. Offset uses byte units (Paddle as_strided storage_offset is in bytes, PyTorch in elements). Fix _make_strided_gate to use _chunk_strided instead of chunk(). - test_rmsnorm_fp4_quant_cute_dsl.py, test_add_rmsnorm_fp4_quant_cute_dsl.py: add module-level skip guard for torch.float4_e2m1fn_x2 (NVFP4 packed dtype, PyTorch 2.6+, not proxied in Paddle compat). Use pytest.skip(allow_module_level=True). - scripts/paddle_all_test_cases.sh: add test_fused_dit_layernorm.py; add comments for fp4 tests (skipped, unavailable dtype). Results: test_fused_rmsnorm_silu.py: 102 passed, 50 skipped test_fused_dit_layernorm.py: 35 passed fp4 tests: 2 skipped (dtype unavailable)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📌 Description
Adapt
tests/norm/for Paddle compat. All 4 test files handled:test_fused_dit_layernorm.pytest_fused_rmsnorm_silu.pytest_rmsnorm_fp4_quant_cute_dsl.pytest_add_rmsnorm_fp4_quant_cute_dsl.py3 key fixes:
Paddle
chunk()returns contiguous copies (loses strides).PyTorch
chunk(6, dim=2)returns strided views (row stride=6×H); Paddle returns contiguous copies (row stride=H).Fix:
_chunk_strided()helper usingtorch.as_stridedto reconstruct the correct stride.Paddle
as_stridedstorage_offsetis in BYTES (not elements) — P0 silent data corruption.Fix:
storage_offset = chunk_idx * hidden_dim * temb.element_size()pytest.skip(allow_module_level=True)required for module-level skip of NVFP4 tests.pytestmark = pytest.mark.skip(...)does NOT prevent collection (2195 tests collected vs 0 with fix).🔍 Related Issues
N/A
🚀 Pull Request Checklist
scripts/paddle_all_test_cases.shupdatedadaptation_exp.mdupdated (§40–44, Section 十二)🧪 Tests
Reviewer Notes
_chunk_stridedintest_fused_dit_layernorm.pyis the key fix; the byte-offset behaviour of Paddleas_strideddiffers from PyTorch and causes silent data corruption if not accounted for.torch.float4_e2m1fn_x2(NVFP4 packed dtype) is only available in PyTorch 2.6+; the current Paddle compat environment ships an earlier version.