Skip to content

[https://nvbugs/5969206][fix] BREAKING: Setting default value of KV cache transfer timeout to 60s#12249

Merged
pcastonguay merged 3 commits intoNVIDIA:mainfrom
pcastonguay:default_kv_transfer_timeout
Mar 18, 2026
Merged

[https://nvbugs/5969206][fix] BREAKING: Setting default value of KV cache transfer timeout to 60s#12249
pcastonguay merged 3 commits intoNVIDIA:mainfrom
pcastonguay:default_kv_transfer_timeout

Conversation

@pcastonguay
Copy link
Collaborator

@pcastonguay pcastonguay commented Mar 16, 2026

Summary by CodeRabbit

  • Bug Fixes
    • KV cache transfer now defaults to a 60-second timeout, preventing indefinite waiting in transfer scenarios.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@pcastonguay pcastonguay requested a review from Tabrizian March 16, 2026 16:46
@pcastonguay pcastonguay requested a review from a team as a code owner March 16, 2026 16:46
@pcastonguay pcastonguay requested a review from hchings March 16, 2026 16:46
@pcastonguay
Copy link
Collaborator Author

/bot run --disable-fail-fast

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 16, 2026

📝 Walkthrough

Walkthrough

The default value of kv_transfer_timeout_ms in CacheTransceiverConfig is changed from None to 60000 milliseconds, establishing an explicit 60-second default timeout for KV cache transfers when the parameter is not specified.

Changes

Cohort / File(s) Summary
KV Cache Transfer Timeout Default
tensorrt_llm/llmapi/llm_args.py
Modified default value of kv_transfer_timeout_ms field in CacheTransceiverConfig class from None to 60000 (milliseconds), changing implicit timeout behavior for KV cache transfer operations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely empty except for the template. No actual description, test coverage, or justification for the breaking change is provided. Fill in the Description section explaining why the default timeout was changed to 60 seconds and what impact this breaking change has. Document any test coverage validating this change.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title clearly and specifically describes the main change: setting a default KV cache transfer timeout to 60 seconds, which is the only modification in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai title placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

@pcastonguay pcastonguay changed the title [None][chore] BREAKING: Setting default value of KV cache transfer timeout to 60s [https://nvbugs/5969206][fix] BREAKING: Setting default value of KV cache transfer timeout to 60s Mar 16, 2026
@tensorrt-cicd
Copy link
Collaborator

PR_Github #39111 [ run ] triggered by Bot. Commit: f03a210 Link to invocation

Copy link
Collaborator

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39111 [ run ] completed with state FAILURE. Commit: f03a210
/LLM/main/L0_MergeRequest_PR pipeline #30370 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@Tabrizian
Copy link
Member

/bot run --disable-fail-fast

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
@Tabrizian Tabrizian force-pushed the default_kv_transfer_timeout branch from f03a210 to 94efdf6 Compare March 17, 2026 06:57
@tensorrt-cicd
Copy link
Collaborator

PR_Github #39209 [ run ] triggered by Bot. Commit: 94efdf6 Link to invocation

@pcastonguay
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39252 [ run ] triggered by Bot. Commit: 2d641b2 Link to invocation

@pcastonguay
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39270 [ run ] triggered by Bot. Commit: 30960da Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39270 [ run ] completed with state SUCCESS. Commit: 30960da
/LLM/main/L0_MergeRequest_PR pipeline #30529 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@pcastonguay
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39465 [ run ] triggered by Bot. Commit: 30960da Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39465 [ run ] completed with state SUCCESS. Commit: 30960da
/LLM/main/L0_MergeRequest_PR pipeline #30691 completed with status: 'SUCCESS'

CI Report

Link to invocation

@pcastonguay pcastonguay merged commit bd14845 into NVIDIA:main Mar 18, 2026
5 checks passed
limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026
…ache transfer timeout to 60s (NVIDIA#12249)

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants