[TRTLLM-10703][feature] abort, resume for Async RL in verl by hchings · Pull Request #12272 · NVIDIA/TensorRT-LLM

hchings · 2026-03-17T06:43:35Z

Summary by CodeRabbit

Release Notes

New Features
- Added pause and resume functionality to control generation request submission dynamically.
- Added ability to abort all currently in-flight generation requests simultaneously.
- Added capability to reset KV-cache prefix reuse state on demand.
Tests
- Added comprehensive test coverage validating prefix cache reset behavior and reuse metrics tracking.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

update Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

hchings · 2026-03-30T23:59:16Z

/bot run

coderabbitai · 2026-03-31T00:03:18Z

📝 Walkthrough

Walkthrough

Changes introduce pause/resume generation functionality to AsyncLLM, add abort_all_requests method to RayExecutor, add reset_prefix_cache method to WorkerExtension, and include a new test validating prefix cache reset behavior with KV-cache reuse metrics.

Changes

Cohort / File(s)	Summary
AsyncLLM Pause/Resume `tensorrt_llm/_torch/async_llm.py`	Added `_paused` state flag, `pause_generation()` to block submissions and abort in-flight requests, `resume_generation()` to re-enable submissions, and overridden `generate_async()` to enforce the paused state.
RayExecutor Abort `tensorrt_llm/executor/ray_executor.py`	Added public `abort_all_requests()` method that iterates over in-flight `GenerationResult` objects and calls `abort()` on each to cancel all active generation requests.
WorkerExtension Cache Reset `tensorrt_llm/llmapi/rlhf_utils.py`	Added public `reset_prefix_cache()` method to invalidate KV-cache prefix reuse state by delegating to `self.engine.reset_prefix_cache()`.
Test Coverage `tests/unittest/llmapi/test_async_llm.py`	Added async test `test_async_llm_reset_prefix_cache` that validates KV-cache reuse metrics across three `generate_async` calls: cold (zero reuse), warm (positive reuse), and post-reset (zero reuse).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely a template with no actual content filled in; all sections lack implementation details, rationale, and test coverage explanations.	Fill in the Description section explaining what abort/resume functionality does, why it's needed, and add Test Coverage listing the specific tests validating these features.
Docstring Coverage	⚠️ Warning	Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the JIRA ticket, feature type, and main purpose—adding abort and resume functionality for Async RL.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/unittest/llmapi/test_async_llm.py (1)

140-178: Please add direct test coverage for pause/resume abort semantics.

This test validates reset_prefix_cache well, but the new feature also adds pause_generation() / resume_generation() and abort_all_requests(). A targeted async test should assert: (1) new submissions fail while paused, (2) in-flight requests are aborted, and (3) submissions succeed again after resume.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/llmapi/test_async_llm.py` around lines 140 - 178, Add a new
async test in tests/unittest/llmapi/test_async_llm.py that mirrors the
reset_prefix_cache pattern but specifically exercises
AsyncLLM.pause_generation(), AsyncLLM.resume_generation(), and
AsyncLLM.abort_all_requests(): open an AsyncLLM context (same model,
kv_cache_config, sampling params), start a long-running generate_async call
(e.g., with a high max_tokens or a prompt that triggers streaming) and
concurrently call pause_generation() (or llm.collective_rpc("pause_generation")
if RPC-only), assert that new generate_async submissions immediately fail/raise
with a paused/validation error, call abort_all_requests() and assert the
in-flight generate_async finishes with an aborted/error state, then call
resume_generation() and assert subsequent generate_async calls succeed normally.
Reference AsyncLLM, generate_async, pause_generation, resume_generation, and
abort_all_requests so the test locates the right methods.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/async_llm.py`:
- Around line 105-109: pause_generation currently calls the synchronous method
_executor.abort_all_requests(), which blocks the asyncio event loop; change
pause_generation to call that synchronous function via await
asyncio.to_thread(self._executor.abort_all_requests) so the abort runs off the
event loop, and ensure asyncio is imported; update the async def
pause_generation method to set self._paused = True then await
asyncio.to_thread(...) to perform the abort without blocking.

In `@tensorrt_llm/executor/ray_executor.py`:
- Around line 312-315: The loop in abort_all_requests builds
list(self._results.values()) without synchronizing access to self._results,
which can be mutated concurrently; fix by acquiring the mutex that protects
_results (e.g., self._results_lock or the existing lock used around
submission/response paths) while copying the values into a local list, then
release the lock and iterate over that copied list calling result.abort() so
aborts run without holding the lock. Ensure you reference the abort_all_requests
method and the _results container when applying the change.

---

Nitpick comments:
In `@tests/unittest/llmapi/test_async_llm.py`:
- Around line 140-178: Add a new async test in
tests/unittest/llmapi/test_async_llm.py that mirrors the reset_prefix_cache
pattern but specifically exercises AsyncLLM.pause_generation(),
AsyncLLM.resume_generation(), and AsyncLLM.abort_all_requests(): open an
AsyncLLM context (same model, kv_cache_config, sampling params), start a
long-running generate_async call (e.g., with a high max_tokens or a prompt that
triggers streaming) and concurrently call pause_generation() (or
llm.collective_rpc("pause_generation") if RPC-only), assert that new
generate_async submissions immediately fail/raise with a paused/validation
error, call abort_all_requests() and assert the in-flight generate_async
finishes with an aborted/error state, then call resume_generation() and assert
subsequent generate_async calls succeed normally. Reference AsyncLLM,
generate_async, pause_generation, resume_generation, and abort_all_requests so
the test locates the right methods.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d18175ed-7c06-4ed7-b46b-ce66ef8bd1c6

📥 Commits

Reviewing files that changed from the base of the PR and between d279344 and 88f495b.

📒 Files selected for processing (4)

tensorrt_llm/_torch/async_llm.py
tensorrt_llm/executor/ray_executor.py
tensorrt_llm/llmapi/rlhf_utils.py
tests/unittest/llmapi/test_async_llm.py

Superjomn

LGTM

Superjomn

LGTM

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

hchings · 2026-03-31T20:12:42Z

/bot run

tensorrt-cicd · 2026-03-31T20:19:30Z

PR_Github #41009 [ run ] triggered by Bot. Commit: 56f50b2 Link to invocation

tensorrt-cicd · 2026-03-31T22:01:11Z

PR_Github #41009 [ run ] completed with state FAILURE. Commit: 56f50b2
/LLM/main/L0_MergeRequest_PR pipeline #31989 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

hchings · 2026-04-01T21:17:51Z

/bot run

tensorrt-cicd · 2026-04-01T21:23:46Z

PR_Github #41263 [ run ] triggered by Bot. Commit: 56f50b2 Link to invocation

tensorrt-cicd · 2026-04-01T23:01:46Z

PR_Github #41263 [ run ] completed with state SUCCESS. Commit: 56f50b2
/LLM/main/L0_MergeRequest_PR pipeline #32221 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

hchings · 2026-04-02T20:37:23Z

/bot run

tensorrt-cicd · 2026-04-09T05:57:46Z

PR_Github #42407 [ run ] completed with state SUCCESS. Commit: 8b1c58b
/LLM/main/L0_MergeRequest_PR pipeline #33180 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

hchings · 2026-04-10T18:50:31Z

/bot run

tensorrt-cicd · 2026-04-10T18:56:42Z

PR_Github #42735 [ run ] triggered by Bot. Commit: 8b1c58b Link to invocation

tensorrt-cicd · 2026-04-10T19:02:55Z

PR_Github #42735 [ run ] completed with state FAILURE. Commit: 8b1c58b

Link to invocation

hchings · 2026-04-13T03:47:03Z

/bot run

tensorrt-cicd · 2026-04-13T03:52:26Z

PR_Github #42943 [ run ] triggered by Bot. Commit: e89160e Link to invocation

tensorrt-cicd · 2026-04-13T05:54:31Z

PR_Github #42943 [ run ] completed with state SUCCESS. Commit: e89160e
/LLM/main/L0_MergeRequest_PR pipeline #33601 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

hchings · 2026-04-13T17:36:04Z

/bot run

tensorrt-cicd · 2026-04-13T17:43:10Z

PR_Github #43092 [ run ] triggered by Bot. Commit: e89160e Link to invocation

tensorrt-cicd · 2026-04-13T19:52:41Z

PR_Github #43092 [ run ] completed with state SUCCESS. Commit: e89160e
/LLM/main/L0_MergeRequest_PR pipeline #33731 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

hchings · 2026-04-14T05:57:55Z

/bot run

tensorrt-cicd · 2026-04-14T06:04:41Z

PR_Github #43191 [ run ] triggered by Bot. Commit: e89160e Link to invocation

tensorrt-cicd · 2026-04-14T06:04:42Z

PR_Github #43191 [ run ] completed with state DISABLED
Freeze main and open the PR merge only after CI is back to healthy https://nvidia.slack.com/archives/C059LSY62BT/p1776141760843319?thread_ts=1775985925.442509&cid=C059LSY62BT

Link to invocation

hchings · 2026-04-14T20:58:52Z

/bot run

tensorrt-cicd · 2026-04-14T21:04:55Z

PR_Github #43283 [ run ] triggered by Bot. Commit: a033fee Link to invocation

tensorrt-cicd · 2026-04-15T01:22:38Z

PR_Github #43283 [ run ] completed with state SUCCESS. Commit: a033fee
/LLM/main/L0_MergeRequest_PR pipeline #33830 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

hchings · 2026-04-15T18:07:35Z

/bot run

hchings · 2026-04-15T18:09:02Z

/bot kill

tensorrt-cicd · 2026-04-15T18:13:21Z

PR_Github #43551 [ run ] triggered by Bot. Commit: 5344d6b Link to invocation

tensorrt-cicd · 2026-04-15T18:14:45Z

PR_Github #43553 [ kill ] triggered by Bot. Commit: 5344d6b Link to invocation

tensorrt-cicd · 2026-04-15T18:14:47Z

PR_Github #43551 [ run ] completed with state ABORTED. Commit: 5344d6b

Link to invocation

tensorrt-cicd · 2026-04-15T18:15:18Z

PR_Github #43553 [ kill ] completed with state SUCCESS. Commit: 5344d6b
Successfully killed previous jobs for commit 5344d6b

Link to invocation

hchings · 2026-04-16T06:59:50Z

/bot run

tensorrt-cicd · 2026-04-16T07:05:56Z

PR_Github #43682 [ run ] triggered by Bot. Commit: 87d993c Link to invocation

tensorrt-cicd · 2026-04-16T10:10:27Z

PR_Github #43682 [ run ] completed with state SUCCESS. Commit: 87d993c
/LLM/main/L0_MergeRequest_PR pipeline #34167 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

github-actions bot assigned hchings Mar 17, 2026

hchings mentioned this pull request Mar 17, 2026

[rollout][trtllm] enable Async RL for trtllm rollout verl-project/verl#5631

Draft

8 tasks

hchings requested a review from Superjomn March 18, 2026 01:57

hchings force-pushed the abort_resume_trtllm branch from b56ec4d to e3294ca Compare March 30, 2026 23:54

abort, resume

98bc625

update Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

hchings force-pushed the abort_resume_trtllm branch from e3294ca to 98bc625 Compare March 30, 2026 23:55

hchings marked this pull request as ready for review March 30, 2026 23:55

hchings requested review from a team as code owners March 30, 2026 23:55

hchings requested review from Naveassaf and byshiue March 30, 2026 23:55

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/async_llm.py

Comment thread tensorrt_llm/executor/ray_executor.py

Superjomn approved these changes Mar 31, 2026

View reviewed changes

hchings force-pushed the abort_resume_trtllm branch from 88f495b to 08ef80a Compare March 31, 2026 19:35

add tests

7260753

Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>

hchings force-pushed the abort_resume_trtllm branch from 08ef80a to 7260753 Compare March 31, 2026 19:35

Merge branch 'main' into abort_resume_trtllm

56f50b2

hchings enabled auto-merge (squash) April 1, 2026 21:18

Merge branch 'main' into abort_resume_trtllm

dfbc0b6

Merge branch 'main' into abort_resume_trtllm

0658607

Merge branch 'main' into abort_resume_trtllm

e89160e

Merge branch 'main' into abort_resume_trtllm

a033fee

Merge branch 'main' into abort_resume_trtllm

5344d6b

hchings added 2 commits April 15, 2026 12:16

Merge branch 'main' into abort_resume_trtllm

7a29de6

Merge branch 'main' into abort_resume_trtllm

87d993c

Conversation

hchings commented Mar 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

hchings commented Mar 30, 2026

Uh oh!

coderabbitai bot commented Mar 31, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Superjomn left a comment

Choose a reason for hiding this comment

Uh oh!

Superjomn left a comment

Choose a reason for hiding this comment

Uh oh!

hchings commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

hchings commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

tensorrt-cicd commented Apr 1, 2026

Uh oh!

hchings commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

hchings commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

hchings commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

hchings commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

hchings commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

hchings commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

hchings commented Apr 15, 2026

Uh oh!

hchings commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

hchings commented Mar 17, 2026 •

edited by coderabbitai bot

Loading