Retain CUDA IPC events in MP adapter#3245
Conversation
There was a problem hiding this comment.
Code Review
This pull request modifies the LMCacheMPWorkerAdapter to maintain references to CUDA event objects during store and retrieve operations, ensuring they are not garbage collected while their IPC handles are in use. It also includes regression tests to verify that these events are correctly held and released upon request completion. I have no feedback to provide.
|
Follow-up pushed in 6dd0685. Changes:
Validation on Windows:
Note: python -m pytest tests\v1\test_vllm_mp_adapter.py -q is blocked locally before collection by the missing lmcache.c_ops native extension in this Windows checkout. |
6dd0685 to
ba4ccfb
Compare
6dd0685 to
530fd55
Compare
|
DCO is fixed now. I rebased the two PR commits with Signed-off-by footers and force-pushed the same code diff; current head is 530fd55. The temporary bad squash was immediately reverted, and the PR diff is back to the intended two files. |
530fd55 to
383d485
Compare
|
Rebased onto the latest dev and resolved the test conflict; current head is 383d485. The code diff is still limited to the MP adapter and its regression tests. Validation on Windows: python -m py_compile lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py; python -m ruff check lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py; python -m mypy --config-file pyproject.toml lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py; git diff --check upstream/dev...HEAD. Targeted pytest is still blocked before collection by missing lmcache.c_ops in this Windows checkout. |
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
383d485 to
3cd6b98
Compare
|
Updated again in 3cd6b98. Changes:
Validation on Windows:
The targeted pytest command is still blocked locally before collection in this Windows checkout by the missing native lmcache.c_ops extension; the previous CI failure itself should be covered by the Linux test job after this push. |
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
|
Pushed 195edaa to fix the Linux CI failures in the new event-retention tests. The adapter was already clearing store_events / retrieve_events; the test mock's call_args was still holding the FakeCudaEvent reference, so the weakref assertion was testing the fixture rather than adapter cleanup. Validation on Windows: py_compile for the adapter and test, ruff check on touched files, mypy on touched files, and git diff --check passed. The targeted pytest still cannot collect locally on this Windows host because lmcache.c_ops is not built. |
|
Follow-up for the CI failure: the test was still retaining the fake CUDA event through the parent Validated locally:
|
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
95af7d1 to
e50d972
Compare
|
@hlin99 Would you like to take a double check? |
* fix: retain CUDA IPC events in MP adapter Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> * fix: avoid CUDA event type coupling Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> * test: use transfer context in MP adapter event tests Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> * test: clear MP event test mock references Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> * test: drop parent mock event references Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> --------- Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
What this PR does / why we need it:
Fixes #3236.
LMCacheMPWorkerAdapternow keeps the producer-side CUDA event objects alive for pending MP store and retrieve requests. The adapter already tracks the matching futures; this adds matching event references and drops them when the future completes or when pending work is drained after the server becomes unhealthy.Without retaining the event object, the daemon can receive an IPC handle whose producer-side event has already been collected, which can make
torch.cuda.Event.from_ipc_handle(...)fail withCUDA error: invalid argument.Special notes for your reviewers:
The unit tests use a fake CUDA event plus
weakrefto check both sides of the lifetime: pending requests keep the event alive, andget_finished()releases it once the matching future completes.Validation run locally on Windows with a CPU/source-only test environment:
If applicable: