Reduce numerical differences by using smaller dimensions #5797

wujingyue · 2026-01-12T01:21:42Z

Before the PR, this test failed with mpirun -np 1.

Before the PR, this test failed with `mpirun -np 1`.

wujingyue · 2026-01-12T01:21:49Z

!test

github-actions · 2026-01-12T01:22:39Z

Review updated until commit def54ac

Description

Reduce tensor dimension size from 12288 to 768 (h*8) to minimize numerical differences
Remove explicit tolerance requirements from allclose assertion
Make unroll factor check more flexible (GT instead of EQ)
Fix test failure that occurred with mpirun -np 1

Changes walkthrough

Relevant files

Tests

test_multidevice_sharding.cpp `Fix multi-device test by reducing dimension size and tolerances` tests/cpp/test_multidevice_sharding.cpp Changed e dimension from fixed 12288 to h*8 (768) for vectorization Removed explicit tolerance parameters from allclose assertion Changed unroll factor check from exact match to greater than 1	+4/-3

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Numerical Precision Fix The PR changes tensor dimension `e` from 12288 to `h * 8` (768) and removes explicit tolerance parameters in `at::allclose`. This should improve numerical stability, but the reviewer should verify that the test now passes consistently with `mpirun -np 1` and that the reduced dimension still adequately exercises the vectorization path. int64_t b = 1, s = 2048, h = 96, e = h * 8; Test Robustness The unroll factor expectation was changed from exact value `8` to `> 1`. This makes the test more flexible but less specific. The reviewer should confirm this change doesn't mask underlying issues with the scheduler heuristics. EXPECT_GT(rparams->unroll_factor_iter_dom, 1);

greptile-apps · 2026-01-12T01:25:29Z

Greptile Overview

Greptile Summary

This PR fixes a numerical precision issue in the OuterReductionShardedInnerDimension test that was causing failures when run with mpirun -np 1.

Changes Made

The PR makes three key changes to improve test stability:

Reduced dimension size: Changed e from a hardcoded value of 12288 to h * 8 (which equals 768), reducing the dimension by a factor of 16. This significantly reduces numerical error accumulation during BFloat16 operations, particularly in the sum reduction over large tensors. The comment notes "e = h * 8 to exercise vectorization", establishing the relationship needed for vectorization to work properly.
Removed explicit tolerance parameters: Changed from at::allclose(nvf_out, ref_out, 1e-3, 1e-3) to at::allclose(nvf_out, ref_out), now using default tolerances (typically rtol=1e-5, atol=1e-8). The stricter default tolerances are appropriate given the reduced numerical errors from smaller dimensions.
Relaxed unroll factor assertion: Changed from EXPECT_EQ(rparams->unroll_factor_iter_dom, 8) to EXPECT_GT(rparams->unroll_factor_iter_dom, 1). With the reduced dimension size, the scheduler may choose different unroll factors based on heuristics, but the test still verifies that vectorization is enabled and some level of unrolling occurs.

Why This Works

The test performs BFloat16 operations including concatenation, reshape, casting to Float, and sum reduction. With the original dimension e = 12288, the large tensor sizes led to significant numerical error accumulation in BFloat16 (which has only 7 bits of precision in the mantissa). By reducing to e = 768, the errors stay within acceptable bounds even with stricter tolerance checks.

The test still exercises the intended functionality: outer reduction with sharded inner dimensions, vectorization, and multi-device coordination.

Confidence Score: 4/5

This PR is safe to merge with minimal risk - it only modifies test code to address numerical precision issues.
The changes are well-justified and appropriate for fixing the numerical precision issue. The dimension reduction (16x smaller) significantly reduces BFloat16 error accumulation while still testing the same functionality. However, the relaxed unroll factor check (from exactly 8 to just > 1) is quite permissive and could potentially hide future regressions in the scheduler's unrolling heuristics. The change from looser (1e-3) to stricter default tolerances is counterintuitive at first but makes sense given the reduced error accumulation.
No files require special attention - this is a straightforward test fix with appropriate changes to dimensions and validation criteria.

Important Files Changed

File Analysis

Filename	Score	Overview
tests/cpp/test_multidevice_sharding.cpp	4/5	Reduces dimension `e` from 12288 to 768 (16x smaller) to fix numerical precision issues in BFloat16 operations. Removes explicit tolerance parameters to use stricter defaults, and relaxes unroll factor check from exactly 8 to greater than 1. Changes are appropriate for addressing the test failure with smaller data.

Sequence Diagram

sequenceDiagram
    participant Test as OuterReductionShardedInnerDimension Test
    participant Fusion as Fusion Definition
    participant Inputs as Input Tensors (BFloat16)
    participant Shard as Sharding Operation
    participant Exec as FusionExecutorCache
    participant Ref as Reference Calculation
    participant Valid as Validation
    
    Note over Test: Setup: b=1, s=2048, h=96<br/>OLD: e=12288<br/>NEW: e=h*8=768
    
    Test->>Fusion: Define fusion with 3 inputs shape {b,s,h,e/h}
    Fusion->>Fusion: cat(tv0, tv1, tv2, dim=-1) → {b,s,h,3*e/h}
    Fusion->>Fusion: reshape → {b,s,3*e}
    Fusion->>Fusion: cast(BFloat16→Float)
    Fusion->>Fusion: sum(dims={0,1}) → {3*e}
    Fusion->>Fusion: cast(Float→BFloat16)
    
    Test->>Inputs: Create 3 random BFloat16 tensors {b,s,h,e/h}
    Inputs->>Shard: Shard on dimension 2 (h→h/d)
    Note over Shard: Each device gets {b,s,h/d,e/h}
    
    Shard->>Exec: Run fusion with sharded inputs
    Exec->>Exec: Execute fusion on device
    Exec-->>Test: Return nvf_out (BFloat16)
    
    Test->>Ref: Compute reference output
    Ref->>Ref: cat(sharded_inputs, dim=-1)
    Ref->>Ref: view({b,s,3*e/d})
    Ref->>Ref: sum(dims={0,1}) → {3*e/d}
    Ref-->>Test: Return ref_out
    
    Test->>Valid: Compare nvf_out vs ref_out
    Note over Valid: OLD: allclose(rtol=1e-3, atol=1e-3)<br/>NEW: allclose(default tolerances)
    Valid-->>Test: Assertion result
    
    Test->>Test: Check vectorization enabled
    Test->>Test: Check unroll_factor > 1
    Note over Test: OLD: Expected exactly 8<br/>NEW: Expect > 1

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

wujingyue · 2026-01-12T01:26:08Z

!test

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Reduce numerical differences by using smaller dimensions

8a28846

Before the PR, this test failed with `mpirun -np 1`.

wujingyue requested a review from Priya2698 January 12, 2026 01:21

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

Fix test

def54ac

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

Priya2698 approved these changes Jan 12, 2026

View reviewed changes

wujingyue merged commit 99a167f into main Jan 12, 2026
61 of 62 checks passed

wujingyue deleted the wjy/numeric branch January 12, 2026 23:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce numerical differences by using smaller dimensions #5797

Reduce numerical differences by using smaller dimensions #5797

Uh oh!

wujingyue commented Jan 12, 2026

Uh oh!

wujingyue commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026 •

edited

Loading

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

wujingyue commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Reduce numerical differences by using smaller dimensions #5797

Reduce numerical differences by using smaller dimensions #5797

Uh oh!

Conversation

wujingyue commented Jan 12, 2026

Uh oh!

wujingyue commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Changes Made

Why This Works

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

wujingyue commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jan 12, 2026 •

edited

Loading

greptile-apps bot commented Jan 12, 2026 •

edited

Loading