Skip to content

[TRTLLM-10703][feature] abort, resume for Async RL in verl#12272

Open
hchings wants to merge 12 commits intoNVIDIA:mainfrom
hchings:abort_resume_trtllm
Open

[TRTLLM-10703][feature] abort, resume for Async RL in verl#12272
hchings wants to merge 12 commits intoNVIDIA:mainfrom
hchings:abort_resume_trtllm

Conversation

@hchings
Copy link
Copy Markdown
Collaborator

@hchings hchings commented Mar 17, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added pause and resume functionality to control generation request submission dynamically.
    • Added ability to abort all currently in-flight generation requests simultaneously.
    • Added capability to reset KV-cache prefix reuse state on demand.
  • Tests

    • Added comprehensive test coverage validating prefix cache reset behavior and reuse metrics tracking.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

update

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
@hchings hchings force-pushed the abort_resume_trtllm branch from e3294ca to 98bc625 Compare March 30, 2026 23:55
@hchings hchings marked this pull request as ready for review March 30, 2026 23:55
@hchings hchings requested review from a team as code owners March 30, 2026 23:55
@hchings hchings requested review from Naveassaf and byshiue March 30, 2026 23:55
@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Mar 30, 2026

/bot run

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

Changes introduce pause/resume generation functionality to AsyncLLM, add abort_all_requests method to RayExecutor, add reset_prefix_cache method to WorkerExtension, and include a new test validating prefix cache reset behavior with KV-cache reuse metrics.

Changes

Cohort / File(s) Summary
AsyncLLM Pause/Resume
tensorrt_llm/_torch/async_llm.py
Added _paused state flag, pause_generation() to block submissions and abort in-flight requests, resume_generation() to re-enable submissions, and overridden generate_async() to enforce the paused state.
RayExecutor Abort
tensorrt_llm/executor/ray_executor.py
Added public abort_all_requests() method that iterates over in-flight GenerationResult objects and calls abort() on each to cancel all active generation requests.
WorkerExtension Cache Reset
tensorrt_llm/llmapi/rlhf_utils.py
Added public reset_prefix_cache() method to invalidate KV-cache prefix reuse state by delegating to self.engine.reset_prefix_cache().
Test Coverage
tests/unittest/llmapi/test_async_llm.py
Added async test test_async_llm_reset_prefix_cache that validates KV-cache reuse metrics across three generate_async calls: cold (zero reuse), warm (positive reuse), and post-reset (zero reuse).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely a template with no actual content filled in; all sections lack implementation details, rationale, and test coverage explanations. Fill in the Description section explaining what abort/resume functionality does, why it's needed, and add Test Coverage listing the specific tests validating these features.
Docstring Coverage ⚠️ Warning Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the JIRA ticket, feature type, and main purpose—adding abort and resume functionality for Async RL.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/unittest/llmapi/test_async_llm.py (1)

140-178: Please add direct test coverage for pause/resume abort semantics.

This test validates reset_prefix_cache well, but the new feature also adds pause_generation() / resume_generation() and abort_all_requests(). A targeted async test should assert: (1) new submissions fail while paused, (2) in-flight requests are aborted, and (3) submissions succeed again after resume.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/llmapi/test_async_llm.py` around lines 140 - 178, Add a new
async test in tests/unittest/llmapi/test_async_llm.py that mirrors the
reset_prefix_cache pattern but specifically exercises
AsyncLLM.pause_generation(), AsyncLLM.resume_generation(), and
AsyncLLM.abort_all_requests(): open an AsyncLLM context (same model,
kv_cache_config, sampling params), start a long-running generate_async call
(e.g., with a high max_tokens or a prompt that triggers streaming) and
concurrently call pause_generation() (or llm.collective_rpc("pause_generation")
if RPC-only), assert that new generate_async submissions immediately fail/raise
with a paused/validation error, call abort_all_requests() and assert the
in-flight generate_async finishes with an aborted/error state, then call
resume_generation() and assert subsequent generate_async calls succeed normally.
Reference AsyncLLM, generate_async, pause_generation, resume_generation, and
abort_all_requests so the test locates the right methods.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/async_llm.py`:
- Around line 105-109: pause_generation currently calls the synchronous method
_executor.abort_all_requests(), which blocks the asyncio event loop; change
pause_generation to call that synchronous function via await
asyncio.to_thread(self._executor.abort_all_requests) so the abort runs off the
event loop, and ensure asyncio is imported; update the async def
pause_generation method to set self._paused = True then await
asyncio.to_thread(...) to perform the abort without blocking.

In `@tensorrt_llm/executor/ray_executor.py`:
- Around line 312-315: The loop in abort_all_requests builds
list(self._results.values()) without synchronizing access to self._results,
which can be mutated concurrently; fix by acquiring the mutex that protects
_results (e.g., self._results_lock or the existing lock used around
submission/response paths) while copying the values into a local list, then
release the lock and iterate over that copied list calling result.abort() so
aborts run without holding the lock. Ensure you reference the abort_all_requests
method and the _results container when applying the change.

---

Nitpick comments:
In `@tests/unittest/llmapi/test_async_llm.py`:
- Around line 140-178: Add a new async test in
tests/unittest/llmapi/test_async_llm.py that mirrors the reset_prefix_cache
pattern but specifically exercises AsyncLLM.pause_generation(),
AsyncLLM.resume_generation(), and AsyncLLM.abort_all_requests(): open an
AsyncLLM context (same model, kv_cache_config, sampling params), start a
long-running generate_async call (e.g., with a high max_tokens or a prompt that
triggers streaming) and concurrently call pause_generation() (or
llm.collective_rpc("pause_generation") if RPC-only), assert that new
generate_async submissions immediately fail/raise with a paused/validation
error, call abort_all_requests() and assert the in-flight generate_async
finishes with an aborted/error state, then call resume_generation() and assert
subsequent generate_async calls succeed normally. Reference AsyncLLM,
generate_async, pause_generation, resume_generation, and abort_all_requests so
the test locates the right methods.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d18175ed-7c06-4ed7-b46b-ce66ef8bd1c6

📥 Commits

Reviewing files that changed from the base of the PR and between d279344 and 88f495b.

📒 Files selected for processing (4)
  • tensorrt_llm/_torch/async_llm.py
  • tensorrt_llm/executor/ray_executor.py
  • tensorrt_llm/llmapi/rlhf_utils.py
  • tests/unittest/llmapi/test_async_llm.py

Comment thread tensorrt_llm/_torch/async_llm.py
Comment thread tensorrt_llm/executor/ray_executor.py
Copy link
Copy Markdown
Collaborator

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hchings hchings force-pushed the abort_resume_trtllm branch from 88f495b to 08ef80a Compare March 31, 2026 19:35
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
@hchings hchings force-pushed the abort_resume_trtllm branch from 08ef80a to 7260753 Compare March 31, 2026 19:35
@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Mar 31, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41009 [ run ] triggered by Bot. Commit: 56f50b2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41009 [ run ] completed with state FAILURE. Commit: 56f50b2
/LLM/main/L0_MergeRequest_PR pipeline #31989 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Apr 1, 2026

/bot run

@hchings hchings enabled auto-merge (squash) April 1, 2026 21:18
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41263 [ run ] triggered by Bot. Commit: 56f50b2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41263 [ run ] completed with state SUCCESS. Commit: 56f50b2
/LLM/main/L0_MergeRequest_PR pipeline #32221 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Apr 2, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42407 [ run ] completed with state SUCCESS. Commit: 8b1c58b
/LLM/main/L0_MergeRequest_PR pipeline #33180 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Apr 10, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42735 [ run ] triggered by Bot. Commit: 8b1c58b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42735 [ run ] completed with state FAILURE. Commit: 8b1c58b

Link to invocation

@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Apr 13, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42943 [ run ] triggered by Bot. Commit: e89160e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42943 [ run ] completed with state SUCCESS. Commit: e89160e
/LLM/main/L0_MergeRequest_PR pipeline #33601 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Apr 13, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43092 [ run ] triggered by Bot. Commit: e89160e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43092 [ run ] completed with state SUCCESS. Commit: e89160e
/LLM/main/L0_MergeRequest_PR pipeline #33731 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Apr 14, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43191 [ run ] triggered by Bot. Commit: e89160e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43191 [ run ] completed with state DISABLED
Freeze main and open the PR merge only after CI is back to healthy https://nvidia.slack.com/archives/C059LSY62BT/p1776141760843319?thread_ts=1775985925.442509&cid=C059LSY62BT

Link to invocation

@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Apr 14, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43283 [ run ] triggered by Bot. Commit: a033fee Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43283 [ run ] completed with state SUCCESS. Commit: a033fee
/LLM/main/L0_MergeRequest_PR pipeline #33830 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Apr 15, 2026

/bot run

@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Apr 15, 2026

/bot kill

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43551 [ run ] triggered by Bot. Commit: 5344d6b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43553 [ kill ] triggered by Bot. Commit: 5344d6b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43551 [ run ] completed with state ABORTED. Commit: 5344d6b

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43553 [ kill ] completed with state SUCCESS. Commit: 5344d6b
Successfully killed previous jobs for commit 5344d6b

Link to invocation

@hchings
Copy link
Copy Markdown
Collaborator Author

hchings commented Apr 16, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43682 [ run ] triggered by Bot. Commit: 87d993c Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43682 [ run ] completed with state SUCCESS. Commit: 87d993c
/LLM/main/L0_MergeRequest_PR pipeline #34167 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants