-
Notifications
You must be signed in to change notification settings - Fork 75
Reduce numerical differences by using smaller dimensions #5797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Before the PR, this test failed with `mpirun -np 1`.
|
!test |
|
Review updated until commit def54ac Description
|
| Relevant files | |||
|---|---|---|---|
| Tests |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Numerical Precision Fix
e from 12288 to h * 8 (768) and removes explicit tolerance parameters in at::allclose. This should improve numerical stability, but the reviewer should verify that the test now passes consistently with mpirun -np 1 and that the reduced dimension still adequately exercises the vectorization path. |
Greptile OverviewGreptile SummaryThis PR fixes a numerical precision issue in the Changes MadeThe PR makes three key changes to improve test stability:
Why This WorksThe test performs BFloat16 operations including concatenation, reshape, casting to Float, and sum reduction. With the original dimension The test still exercises the intended functionality: outer reduction with sharded inner dimensions, vectorization, and multi-device coordination. Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Test as OuterReductionShardedInnerDimension Test
participant Fusion as Fusion Definition
participant Inputs as Input Tensors (BFloat16)
participant Shard as Sharding Operation
participant Exec as FusionExecutorCache
participant Ref as Reference Calculation
participant Valid as Validation
Note over Test: Setup: b=1, s=2048, h=96<br/>OLD: e=12288<br/>NEW: e=h*8=768
Test->>Fusion: Define fusion with 3 inputs shape {b,s,h,e/h}
Fusion->>Fusion: cat(tv0, tv1, tv2, dim=-1) → {b,s,h,3*e/h}
Fusion->>Fusion: reshape → {b,s,3*e}
Fusion->>Fusion: cast(BFloat16→Float)
Fusion->>Fusion: sum(dims={0,1}) → {3*e}
Fusion->>Fusion: cast(Float→BFloat16)
Test->>Inputs: Create 3 random BFloat16 tensors {b,s,h,e/h}
Inputs->>Shard: Shard on dimension 2 (h→h/d)
Note over Shard: Each device gets {b,s,h/d,e/h}
Shard->>Exec: Run fusion with sharded inputs
Exec->>Exec: Execute fusion on device
Exec-->>Test: Return nvf_out (BFloat16)
Test->>Ref: Compute reference output
Ref->>Ref: cat(sharded_inputs, dim=-1)
Ref->>Ref: view({b,s,3*e/d})
Ref->>Ref: sum(dims={0,1}) → {3*e/d}
Ref-->>Test: Return ref_out
Test->>Valid: Compare nvf_out vs ref_out
Note over Valid: OLD: allclose(rtol=1e-3, atol=1e-3)<br/>NEW: allclose(default tolerances)
Valid-->>Test: Assertion result
Test->>Test: Check vectorization enabled
Test->>Test: Check unroll_factor > 1
Note over Test: OLD: Expected exactly 8<br/>NEW: Expect > 1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
Before the PR, this test failed with
mpirun -np 1.