[Draft] Add NIXL transfer release cancellation hook#13495
[Draft] Add NIXL transfer release cancellation hook#13495yifjiang wants to merge 4 commits intoNVIDIA:mainfrom
Conversation
037258e to
1ce7e8a
Compare
|
Follow-up from E2E testing: The previous version recovered from context-transfer cancellation because the sender erased a queued This branch now makes that unwind explicit: when the sender takes the cancelled |
1ce7e8a to
a0bbb57
Compare
|
Correction to the previous follow-up: The first explicit-exception patch changed the ordering: it made the sender future ready before the cancelled response was removed from The branch has been amended again so cancellation now moves the |
Summary
This draft uses the same intended base/merge point as #13439:
4e69c14f732a6e6afce4f71616db5b5cd2b10530.It keeps the conservative request-lifetime hardening from #13439, then adds a
NIXL transfer-release hook so TRT-LLM can release backend transfer handles when
cancellation is observed.
The key safety boundary is intentional:
release()means the backend acceptedrelease of the transfer handle. It is not treated as proof that remote KV memory
is quiesced and immediately safe to recycle, especially for UCX-backed
one-sided RMA paths.
What Changed
Inherited from #13439:
std::shared_ptr<LlmRequest>while theasync receive future is outstanding.
std::shared_ptr<LlmRequest>while queued orexecuting.
in flight instead of freeing request resources under raw C++ users.
progress.
were expanded.
Added in this PR:
TransferStatus::release()to the C++ transfer-status interface.NixlTransferStatus::release()withnixlAgent::releaseXferReq().NixlTransferStatusrelease outstanding handles in its destructor as afinal cleanup guard.
getTransferTerminate(), and callstatus->release()before surfacingcancellation.
notification actually arrived.
treating that as a successful receive.
release()through nanobind and Python transfer-status wrappers.request-specific cancellation exception after sender bookkeeping has been
erased. This keeps the working v1 timing shape without relying on
std::future_error: Broken promise.Request Lifetime Before This Branch
Before the request-lifetime hardening, generation receive was vulnerable because
C++ tracked raw request pointers while Python could free resources through broad
error cleanup.
sequenceDiagram participant Py as Python executor participant CT as CacheTransceiver participant CR as CacheReceiver worker participant Req as LlmRequest participant RM as ResourceManager Py->>CT: start generation receive CT->>CR: queue raw LlmRequest pointer CT->>CT: store raw pointer plus future Py->>Py: broad error path handles active requests Py->>RM: free resources for request RM-->>Req: KV blocks and request resources may be released CR->>Req: later worker access through stale raw pointer CT->>Req: later status check through stale raw pointerImpact: if the request or its KV resources were freed while the transfer worker
or status checker still had only raw references, a later access could crash or
corrupt memory.
Request Lifetime After This Branch
The current branch pins the request object in C++ until transfer completion or
error, and Python avoids freeing resources on broad generation-transfer errors.
sequenceDiagram participant Py as Python executor participant CT as CacheTransceiver participant CR as CacheReceiver worker participant Req as shared LlmRequest participant RM as ResourceManager Py->>CT: start generation receive CT->>CR: queue shared_ptr LlmRequest CT->>CT: store shared_ptr plus future Py->>Py: broad error while generation receive is in flight Py->>Py: fail closed without freeing active request resources CR-->>CT: future resolves or reports error CT->>Req: set transfer complete or transfer error Py->>RM: free resources only after C++ tracking has drainedImpact: the
LlmRequestobject remains valid while C++ workers and futuretracking can still touch it. Ambiguous generation receive failures remain
fail-closed because receiver-side cancellation cannot prove that a remote sender
is no longer writing into the target KV blocks.
Context Send Cancellation In Current v3
The first version of this PR recovered by abandoning a
std::promise, which madethe waiting future ready with
std::future_error: Broken promise. A later patchmade the future ready too early and caused a regression. The current v3 keeps the
working ordering but uses an explicit cancellation exception.
sequenceDiagram participant Py as Python timeout path participant CS as CacheSender participant Resp as queued Response participant Fut as sender future participant CT as CacheTransceiver status Py->>CS: cancel context request CS->>CS: send not-ready signal CS->>Resp: move response out of ready map CS->>CS: erase ready response and cancel bookkeeping CS->>CS: clear current request and ready state CS->>Fut: set explicit cancellation exception CT->>Fut: future.get observes cancellation CT->>CT: mark request DISAGG_TRANS_ERROR and erase futureThis preserves the observed working cleanup path without depending on abandoned
promise semantics or racing the status checker against sender-side bookkeeping.
Generation Timeout In Current v3
The generation side still does not have a clean in-progress cancel path. When the
worker already owns the receive request,
CacheReceiver::cancelRequest()canreturn false and log
Cannot cancel request. In v3, the loop is bounded becausethe worker or future status path eventually resolves or errors and Python removes
the request from the active transfer path.
sequenceDiagram participant Py as Python timeout path participant CT as CacheTransceiver participant CR as CacheReceiver participant Fut as receiver future Py->>Py: KV transfer timeout flag is set Py->>CT: cancel generation request CT->>CR: cancelRequest CR-->>CT: false if worker already owns request CT-->>Py: cancellation still pending Py->>Py: later iterations retry while request remains active CR-->>Fut: worker completes or reports error CT->>Fut: future.get CT->>CT: set complete or DISAGG_TRANS_ERROR Py->>Py: active request cleanup stops retry loopThis is functionally acceptable in the current e2e burst harness, but it is not
as clean as the PR13301 deadline-driven path because it can still emit
Cannot cancel requestlog noise.E2E Observations
Tested with
pr13495 + pr13359v3 on the 1P1D burst harness at concurrency16, 48, and 128:
Cannot cancel requestwarnings and fail to recover, is fixed.Cannot cancel requestmarkers, sparseBroken promisemarkers, and decode-side KV timeout markers.cleanup avoids the
Cannot cancel requestloop.Safety Notes
releaseXferReq()is treated as backend handle release or cancellationrequest, not as a proof that remote KV memory can be immediately recycled.
remote sender RMA.
freeing and reusing KV memory while a sender may still be writing.
cascade-prune assertion in block reuse and eviction, not transport
cancellation. It should land independently or be included in test stacks that
exercise the NVBugs 6104831 burst workload.
Remaining Limitations
per-request cancel path.
Cannot cancel requestmay still appear when the receiver worker has alreadypopped the request from its queue.
Validation
git diff --checkPYTHONPYCACHEPREFIX=/tmp/trtllm-cancel-pycache python3 -m py_compile tensorrt_llm/_torch/disaggregation/base/agent.py tensorrt_llm/_torch/disaggregation/nixl/_agent_cpp.py tensorrt_llm/_torch/disaggregation/nixl/_agent_py.pypr13495 + pr13359v3 at concurrency16, 48, and 128 recovered successfully.
Not yet run from this branch: full C++ build or TRT-LLM runtime test suite.