[https://nvbugs/6211185][fix] Fix failed GSM8K accuracy tests for LagunaXS on B200/GB200/B300#14580
Conversation
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
/bot run --disable-fail-fast |
📝 WalkthroughWalkthroughThis PR adds hardware-specific RoPE fusion detection for Laguna attention on Blackwell GPUs (SM 100/103) and updates test configurations to validate the change. The implementation replaces a hardcoded ChangesRoPE Fusion Hardware Workaround and Test Updates
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/integration/test_lists/test-db/l0_b200.yml (1)
24-24: 🏗️ Heavy liftAdd a perf guard for the RoPE fusion path switch.
Line 24 improves functional coverage, but this PR also changes an attention-kernel execution path. Please add a LagunaXS perf-sanity entry in
tests/integration/test_lists/test-db/l0_perf.yml(and the matchingtests/integration/test_lists/qa/llm_perf_*.ymlif it should run on QA schedules) so latency/throughput regressions are caught, not just accuracy failures.As per coding guidelines: “If the PR touches performance-sensitive paths… check whether a perf test entry is present or updated in…
test-dband QA perf lists… [and] note if only functional correctness is tested where a performance regression would not be caught.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_lists/test-db/l0_b200.yml` at line 24, Add a LagunaXS performance guard entry corresponding to the functional test accuracy/test_llm_api_pytorch.py::TestLagunaXS::test_nvfp4 by adding a perf-sanity record in tests/integration/test_lists/test-db/l0_perf.yml (and mirror it in the matching QA perf list tests/integration/test_lists/qa/llm_perf_*.yml if this path should run on QA schedules); the perf entry should reference the same test target (TestLagunaXS::test_nvfp4 or an equivalent perf-target key), specify expected latency/throughput thresholds, and include the same platform/fixture tags used by the functional entry so any regression in the RoPE fusion attention-kernel path is caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/integration/test_lists/test-db/l0_b200.yml`:
- Line 24: Add a LagunaXS performance guard entry corresponding to the
functional test accuracy/test_llm_api_pytorch.py::TestLagunaXS::test_nvfp4 by
adding a perf-sanity record in tests/integration/test_lists/test-db/l0_perf.yml
(and mirror it in the matching QA perf list
tests/integration/test_lists/qa/llm_perf_*.yml if this path should run on QA
schedules); the perf entry should reference the same test target
(TestLagunaXS::test_nvfp4 or an equivalent perf-target key), specify expected
latency/throughput thresholds, and include the same platform/fixture tags used
by the functional entry so any regression in the RoPE fusion attention-kernel
path is caught.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 657c3ac2-737f-44d7-81d6-f7ee666f1867
📒 Files selected for processing (3)
tensorrt_llm/_torch/models/modeling_laguna.pytests/integration/test_lists/test-db/l0_b200.ymltests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
- tests/integration/test_lists/waives.txt
|
PR_Github #50355 [ run ] triggered by Bot. Commit: |
|
PR_Github #50355 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #50383 [ run ] triggered by Bot. Commit: |
|
PR_Github #50383 [ run ] completed with state |
…unaXS on B200/GB200/B300 (NVIDIA#14580) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Summary by CodeRabbit
Bug Fixes
Tests
Description
Temporary workaround: Hopper fails without unfused RoPE for Laguna, while Blackwell has issues when RoPE is unfused.
This PR is to unblock Blackwell on main for Poolside while we find proper fixes.
Also adds B200 coverage to prevent regression. B300 in addition is unnecessary as the failure mode is the same. RTX Pro is unaffected.
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.