[Fix]: Relax Dflash Rregression Test Threshold fo 2GPUs#1373
Conversation
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThe regression test assertion threshold for offline DFLash model performance was relaxed. The final logged loss assertion was updated from a maximum of Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/regression/torch/speculative/test_dflash_offline.py`:
- Around line 160-162: The assertion that final_loss < 5.0 is currently applied
for all runs; change it so the relaxed ceiling (5.0) is only used when the
process is running with exactly two GPUs. In the test_dflash_offline.py test
(where final_loss is computed), branch on torch.cuda.device_count() == 2 and
assert final_loss < 5.0 only in that branch; for all other cases (including 0 or
1 GPU) keep the original stricter threshold (e.g., final_loss < 2.0 or the
previous value used for 1-GPU runs). Use torch.cuda.device_count() to detect GPU
count and update the assertion accordingly so only 2-GPU runs get the relaxed
ceiling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2fd4c06d-a9e8-48ae-b2a8-42067802dbcb
📒 Files selected for processing (1)
tests/regression/torch/speculative/test_dflash_offline.py
| # Sanity ceiling — same threshold as the online regression. Offline trains | ||
| # on fewer samples so we don't tighten it further here. | ||
| assert final_loss < 4.0, f"Final loss {final_loss:.3f} too high (expected < 4.0)" | ||
| assert final_loss < 5.0, f"Final loss {final_loss:.3f} too high (expected < 5.0)" |
There was a problem hiding this comment.
Gate the relaxed loss ceiling to 2-GPU runs only.
Line 162 currently relaxes the ceiling for all runs, but the PR objective says this should only apply when running on 2 GPUs. This weakens 1-GPU regression sensitivity unnecessarily.
💡 Proposed fix
- # Sanity ceiling — same threshold as the online regression. Offline trains
- # on fewer samples so we don't tighten it further here.
- assert final_loss < 5.0, f"Final loss {final_loss:.3f} too high (expected < 5.0)"
+ # Keep stricter ceiling for 1-GPU; relax only for 2-GPU runs (fewer total steps).
+ visible_devices = os.environ.get("CUDA_VISIBLE_DEVICES", "")
+ num_gpus = len([d for d in visible_devices.split(",") if d.strip()]) if visible_devices else 1
+ loss_ceiling = 5.0 if num_gpus == 2 else 4.0
+ assert final_loss < loss_ceiling, (
+ f"Final loss {final_loss:.3f} too high (expected < {loss_ceiling:.1f}, num_gpus={num_gpus})"
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/regression/torch/speculative/test_dflash_offline.py` around lines 160 -
162, The assertion that final_loss < 5.0 is currently applied for all runs;
change it so the relaxed ceiling (5.0) is only used when the process is running
with exactly two GPUs. In the test_dflash_offline.py test (where final_loss is
computed), branch on torch.cuda.device_count() == 2 and assert final_loss < 5.0
only in that branch; for all other cases (including 0 or 1 GPU) keep the
original stricter threshold (e.g., final_loss < 2.0 or the previous value used
for 1-GPU runs). Use torch.cuda.device_count() to detect GPU count and update
the assertion accordingly so only 2-GPU runs get the relaxed ceiling.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1373 +/- ##
==========================================
+ Coverage 76.48% 76.53% +0.04%
==========================================
Files 471 471
Lines 50487 50487
==========================================
+ Hits 38617 38639 +22
+ Misses 11870 11848 -22
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
What does this PR do?
Type of change: Bug fix
Test fail raised by @kevalmorabia97 :
Reason: The offline dflash regression test can be runned on 1 or 2 gpus. For 2 gpus, the total steps is half of 1 gpu. So loss gets higher.
Fix: This PR relax the failing threshold from 4 -> 5 for 2 gpu tests.
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information