[https://nvbugs/6221483][fix] AutoDeploy: Fix Eagle metadata host syncs by govind-ramnarayan · Pull Request #14714 · NVIDIA/TensorRT-LLM

govind-ramnarayan · 2026-05-29T00:17:55Z

Summary

Make AutoDeploy D2H metadata mirroring blocking so reused pinned host buffers are safe under overlap scheduling.
Add active_args_override for metadata mutation helpers so callers can narrow host mirroring to the next consumer's graph inputs.
Use the draft model placeholders in the Eagle draft loop, avoiding target-only Mamba host metadata syncs during drafting.

Validation

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$PWD pytest -sv tests/unittest/auto_deploy/singlegpu/custom_ops/test_switch_to_generate_inplace.py
PYTHONPATH=$PWD LLM_MODELS_ROOT=/lustre/fs1/portfolios/coreai/projects/coreai_comparch_autodeploy/autodeploy_data/llm-models-fake CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 pytest -sv tests/integration/defs/accuracy/test_llm_api_autodeploy.py::TestNemotronSuperV3::test_mtp[fp8_ws8_80gb-trtllm]
A/B sanity: reversing this fix reproduces indexSelectSmallIndex / CUDA error: device-side assert triggered on the same full repro.

Summary by CodeRabbit

Refactor
- Optimized device-to-host synchronization in KV-cache inference to selectively sync only required buffers, improving performance for graph-transformed model inference.
Tests
- Added test coverage to verify selective synchronization behavior during inference operations.

coderabbitai · 2026-05-29T00:23:09Z

📝 Walkthrough

Walkthrough

This PR enables selective host-mirror D2H synchronization in KV-cache inference by adding optional active_args_override parameters to SequenceInfo metadata update methods. Host copies now block, and a helper method computes which host buffers require sync after updates, filtered by consumer argument requirements. Eagle model integration extracts placeholder names and passes them as overrides during draft loops to skip redundant host copies. Tests verify out-of-scope host mirrors remain unsynced.

Changes

Host-mirror selective synchronization

Layer / File(s)	Summary
Host buffer sync foundation `tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py`	`InputBuffer.copy_to_host()` switches packed and truncatable tensor copies from non-blocking to blocking mode. `SequenceInfo._active_host_update_args()` helper computes which host args require D2H sync after metadata updates, optionally filtered by `active_args_override`.
Selective sync in offset_pos_and_cache_() `tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py`	Method signature updated to accept optional `active_args_override` parameter. Implementation uses `_active_host_update_args()` with override to scope D2H syncs to only host args needed by next consumer.
Selective sync in switch_to_generate_() `tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py`	Method signature updated to accept optional `active_args_override` parameter. Implementation delegates to `_active_host_update_args()` to determine D2H sync scope based on override.
Eagle model draft-loop integration `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_eagle.py`	`EagleWrapper._submodule_placeholder_names()` extracts GraphModule placeholder argument names. During draft loop, `draft_arg_names` computed once from model placeholders and passed as `active_args_override` to both metadata update calls, skipping redundant host copies.
Host-mirror scoping verification tests `tests/unittest/auto_deploy/singlegpu/custom_ops/test_switch_to_generate_inplace.py`	Two new tests verify out-of-scope host mirrors remain unchanged: first confirms `cu_seqlen_host` unchanged during `switch_to_generate_()` with override limiting to `input_ids`; second confirms `seq_len_with_cache_host` not synced when `offset_pos_and_cache_()` scoped to `input_ids`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

bmarimuthu-nv
tcherckez-nvidia

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description includes a clear summary of changes, validation commands with full reproduction steps, and A/B sanity testing notes, but lacks explicit Test Coverage section.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title directly addresses the main change: fixing AutoDeploy Eagle metadata host syncs, which aligns with the PR objectives of fixing device-to-host metadata mirroring and narrowing host syncs.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

tests/unittest/auto_deploy/singlegpu/custom_ops/test_switch_to_generate_inplace.py (2)
378-460: QA list update is not needed for this PR scope.

This change is unit-test-only (tests/unittest/...), so no tests/integration/test_lists/qa/* update is required.

As per coding guidelines "If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/unittest/auto_deploy/singlegpu/custom_ops/test_switch_to_generate_inplace.py`
around lines 378 - 460, The PR only modifies unit tests under tests/unittest/...
(specifically test_switch_to_generate_inplace.py and functions like
test_out_of_scope_active_host_mirror_not_synced_by_switch_to_generate and
related tests), so explicitly state in the PR description that no QA list update
is necessary per the guideline for changes limited to unittest/ or narrow unit
scope; update the PR description (or add a short note to the PR checklist)
saying "QA list update not required for unit-test-only changes" so reviewers see
this decision without changing tests.
378-394: ⚡ Quick win

Add positive scoped-sync cases for active_args_override.

These additions validate the “excluded host arg stays stale” path, but they don’t explicitly validate the complementary “included host arg is synced” path when override contains <arg>_host. Please add one positive scoped test for each API (switch_to_generate_, offset_pos_and_cache_) to lock down both sides of the contract.

As per coding guidelines "Coverage expectations: Assess whether new/changed tests cover happy path, important edge cases, and failure modes relevant to the feature or fix."

Also applies to: 443-460
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/unittest/auto_deploy/singlegpu/custom_ops/test_switch_to_generate_inplace.py`
around lines 378 - 394, The test currently only checks that excluded host args
remain stale after calling switch_to_generate_; add a complementary positive
scoped-sync test that passes an active_args_override including the host
placeholder name (e.g., "input_ids_host") and asserts the host mirror is synced
to staging values (mirror in si._input_buffer.get_host_view("cu_seqlen") matches
expected), and likewise add a matching positive case for offset_pos_and_cache_
to verify that when the override contains the host placeholder the host mirror
is synchronized; locate and modify the test functions around
test_out_of_scope_active_host_mirror_not_synced_by_switch_to_generate and the
analogous block for lines ~443-460 to add these assertions calling
si.switch_to_generate_ and si.offset_pos_and_cache_ with active_args_override
that includes "<arg>_host" and assert the host view equals the expected staged
values.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@tests/unittest/auto_deploy/singlegpu/custom_ops/test_switch_to_generate_inplace.py`:
- Around line 378-460: The PR only modifies unit tests under tests/unittest/...
(specifically test_switch_to_generate_inplace.py and functions like
test_out_of_scope_active_host_mirror_not_synced_by_switch_to_generate and
related tests), so explicitly state in the PR description that no QA list update
is necessary per the guideline for changes limited to unittest/ or narrow unit
scope; update the PR description (or add a short note to the PR checklist)
saying "QA list update not required for unit-test-only changes" so reviewers see
this decision without changing tests.
- Around line 378-394: The test currently only checks that excluded host args
remain stale after calling switch_to_generate_; add a complementary positive
scoped-sync test that passes an active_args_override including the host
placeholder name (e.g., "input_ids_host") and asserts the host mirror is synced
to staging values (mirror in si._input_buffer.get_host_view("cu_seqlen") matches
expected), and likewise add a matching positive case for offset_pos_and_cache_
to verify that when the override contains the host placeholder the host mirror
is synchronized; locate and modify the test functions around
test_out_of_scope_active_host_mirror_not_synced_by_switch_to_generate and the
analogous block for lines ~443-460 to add these assertions calling
si.switch_to_generate_ and si.offset_pos_and_cache_ with active_args_override
that includes "<arg>_host" and assert the host view equals the expected staged
values.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d2e01141-fcbf-410b-9fdf-cfbcbef6a2f8

📥 Commits

Reviewing files that changed from the base of the PR and between f6ba936 and f74aa4d.

📒 Files selected for processing (3)

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_eagle.py
tests/unittest/auto_deploy/singlegpu/custom_ops/test_switch_to_generate_inplace.py

govind-ramnarayan · 2026-05-29T18:21:44Z

Note: fold in an explicit error when trying to run flashinfer + Eagle + cudagraph (instead of silently changing to torch-simple)

govind-ramnarayan · 2026-05-29T19:56:50Z

Review follow-up pushed in 7cff2196d0.

Updates:

Removed brittle error-message matching in the FlashInfer + speculative + CUDA graph config test.
Removed the redundant torch-simple assertion.
Added positive scoped-sync tests for both switch_to_generate_() and offset_pos_and_cache_().
Added validation that active_args_override is a subset of active graph args, with unit coverage for both mutation APIs.

QA list update not required: this PR adds focused unittest coverage only and does not add, remove, or rename integration tests/test-list entries.

Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>

govind-ramnarayan · 2026-05-29T20:01:08Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1"

tensorrt-cicd · 2026-05-29T20:07:32Z

PR_Github #51100 [ run ] triggered by Bot. Commit: d8b9ad6 Link to invocation

tensorrt-cicd · 2026-05-29T20:24:23Z

PR_Github #51100 [ run ] completed with state FAILURE. Commit: d8b9ad6
/LLM/main/L0_MergeRequest_PR pipeline #40538 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

govind-ramnarayan · 2026-05-30T00:25:40Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1"

tensorrt-cicd · 2026-05-30T00:31:16Z

PR_Github #51129 [ run ] triggered by Bot. Commit: d8b9ad6 Link to invocation

tensorrt-cicd · 2026-05-30T04:49:35Z

PR_Github #51129 [ run ] completed with state SUCCESS. Commit: d8b9ad6
/LLM/main/L0_MergeRequest_PR pipeline #40565 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

govind-ramnarayan requested a review from a team as a code owner May 29, 2026 00:17

govind-ramnarayan requested a review from Fridah-nv May 29, 2026 00:17

github-actions Bot assigned govind-ramnarayan May 29, 2026

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

govind-ramnarayan changed the title ~~[NVBUG-6221483][fix] fix AutoDeploy MTP metadata host sync~~ [NVBUG-6221483][fix] AutoDeploy: Fix MTP metadata host syncs May 29, 2026

govind-ramnarayan changed the title ~~[NVBUG-6221483][fix] AutoDeploy: Fix MTP metadata host syncs~~ [https://nvbugs/6117814][fix] AutoDeploy: Fix MTP metadata host syncs May 29, 2026

govind-ramnarayan marked this pull request as draft May 29, 2026 00:25

govind-ramnarayan requested a review from hnover-nv May 29, 2026 00:27

govind-ramnarayan changed the title ~~[https://nvbugs/6117814][fix] AutoDeploy: Fix MTP metadata host syncs~~ [https://nvbugs/6221483][fix] AutoDeploy: Fix MTP metadata host syncs May 29, 2026

govind-ramnarayan marked this pull request as ready for review May 29, 2026 00:42

govind-ramnarayan marked this pull request as draft May 29, 2026 00:42

govind-ramnarayan changed the title ~~[https://nvbugs/6221483][fix] AutoDeploy: Fix MTP metadata host syncs~~ [https://nvbugs/6221483][fix] AutoDeploy: Fix Eagle metadata host syncs May 29, 2026

govind-ramnarayan force-pushed the gramnarayan/nvbug-6221483 branch 2 times, most recently from 8b58b36 to fde6af6 Compare May 29, 2026 19:16

govind-ramnarayan commented May 29, 2026

View reviewed changes

Comment thread tests/unittest/auto_deploy/singlegpu/shim/test_llm_config.py Outdated

govind-ramnarayan commented May 29, 2026

View reviewed changes

Comment thread tests/unittest/auto_deploy/singlegpu/shim/test_llm_config.py Outdated

govind-ramnarayan commented May 29, 2026

View reviewed changes

Comment thread tests/unittest/auto_deploy/singlegpu/custom_ops/test_switch_to_generate_inplace.py

govind-ramnarayan force-pushed the gramnarayan/nvbug-6221483 branch 2 times, most recently from 9c31278 to 7cff219 Compare May 29, 2026 19:56

[NVBUG-6221483][auto-deploy] fix draft metadata host sync

d8b9ad6

Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>

govind-ramnarayan force-pushed the gramnarayan/nvbug-6221483 branch from 7cff219 to d8b9ad6 Compare May 29, 2026 19:59

govind-ramnarayan marked this pull request as ready for review May 29, 2026 20:00

govind-ramnarayan requested a review from tcherckez-nvidia May 29, 2026 20:09

Conversation

govind-ramnarayan commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

govind-ramnarayan commented May 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

govind-ramnarayan commented May 29, 2026

Uh oh!

govind-ramnarayan commented May 29, 2026

Uh oh!

tensorrt-cicd commented May 29, 2026

Uh oh!

tensorrt-cicd commented May 29, 2026

Uh oh!

govind-ramnarayan commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

tensorrt-cicd commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

govind-ramnarayan commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading