Skip to content

Retain CUDA IPC events in MP adapter#3245

Merged
hlin99 merged 7 commits into
LMCache:devfrom
he-yufeng:fix/mp-event-lifetime
Jun 2, 2026
Merged

Retain CUDA IPC events in MP adapter#3245
hlin99 merged 7 commits into
LMCache:devfrom
he-yufeng:fix/mp-event-lifetime

Conversation

@he-yufeng
Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

Fixes #3236.

LMCacheMPWorkerAdapter now keeps the producer-side CUDA event objects alive for pending MP store and retrieve requests. The adapter already tracks the matching futures; this adds matching event references and drops them when the future completes or when pending work is drained after the server becomes unhealthy.

Without retaining the event object, the daemon can receive an IPC handle whose producer-side event has already been collected, which can make torch.cuda.Event.from_ipc_handle(...) fail with CUDA error: invalid argument.

Special notes for your reviewers:

The unit tests use a fake CUDA event plus weakref to check both sides of the lifetime: pending requests keep the event alive, and get_finished() releases it once the matching future completes.

Validation run locally on Windows with a CPU/source-only test environment:

python -m ruff check lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py
python -m ruff format --check lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py
.\.venv\Scripts\python.exe -m pytest tests\v1\test_vllm_mp_adapter.py -q
.\.venv\Scripts\python.exe -m py_compile lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py
git diff --check
git fsck --no-progress

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the LMCacheMPWorkerAdapter to maintain references to CUDA event objects during store and retrieve operations, ensuring they are not garbage collected while their IPC handles are in use. It also includes regression tests to verify that these events are correctly held and released upon request completion. I have no feedback to provide.

Copy link
Copy Markdown
Contributor

@KuntaiDu KuntaiDu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @ApostaC @YaoJiayi can you also check?

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py Outdated
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Follow-up pushed in 6dd0685.

Changes:

  • replaced direct torch.cuda.Event annotations with a small structural ipc_handle() protocol, so this adapter API no longer couples the type surface to torch.cuda;
  • annotated the fake heartbeat callback slot to clear the mypy assignment error.

Validation on Windows:

  • python -m py_compile lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py
  • git diff --check
  • python -m ruff check lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py
  • python -m mypy --config-file pyproject.toml lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py

Note: python -m pytest tests\v1\test_vllm_mp_adapter.py -q is blocked locally before collection by the missing lmcache.c_ops native extension in this Windows checkout.

@he-yufeng he-yufeng force-pushed the fix/mp-event-lifetime branch from 6dd0685 to ba4ccfb Compare May 27, 2026 08:51
@he-yufeng he-yufeng force-pushed the fix/mp-event-lifetime branch 2 times, most recently from 6dd0685 to 530fd55 Compare May 27, 2026 08:52
@he-yufeng
Copy link
Copy Markdown
Contributor Author

DCO is fixed now. I rebased the two PR commits with Signed-off-by footers and force-pushed the same code diff; current head is 530fd55. The temporary bad squash was immediately reverted, and the PR diff is back to the intended two files.

@he-yufeng he-yufeng force-pushed the fix/mp-event-lifetime branch from 530fd55 to 383d485 Compare May 27, 2026 09:41
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Rebased onto the latest dev and resolved the test conflict; current head is 383d485. The code diff is still limited to the MP adapter and its regression tests. Validation on Windows: python -m py_compile lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py; python -m ruff check lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py; python -m mypy --config-file pyproject.toml lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py; git diff --check upstream/dev...HEAD. Targeted pytest is still blocked before collection by missing lmcache.c_ops in this Windows checkout.

he-yufeng added 3 commits May 28, 2026 02:08
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
@he-yufeng he-yufeng force-pushed the fix/mp-event-lifetime branch from 383d485 to 3cd6b98 Compare May 27, 2026 18:09
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Updated again in 3cd6b98.

Changes:

  • rebased the PR onto current dev;
  • fixed the CI failure in the new event-retention tests by stubbing the current TransferContext path instead of the older send_lmcache_request/to_cuda_future path;
  • the earlier torch_dev review concern is addressed in 872da20: the adapter retains IPC-capable events through a small _IpcEvent protocol and no longer exposes a torch.cuda.Event type dependency in this adapter API.

Validation on Windows:

  • python -m py_compile lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py
  • python -m ruff check lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py
  • python -m mypy --config-file pyproject.toml lmcache\integration\vllm\vllm_multi_process_adapter.py tests\v1\test_vllm_mp_adapter.py
  • git diff --check

The targeted pytest command is still blocked locally before collection in this Windows checkout by the missing native lmcache.c_ops extension; the previous CI failure itself should be covered by the Linux test job after this push.

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Pushed 195edaa to fix the Linux CI failures in the new event-retention tests. The adapter was already clearing store_events / retrieve_events; the test mock's call_args was still holding the FakeCudaEvent reference, so the weakref assertion was testing the fixture rather than adapter cleanup. Validation on Windows: py_compile for the adapter and test, ruff check on touched files, mypy on touched files, and git diff --check passed. The targeted pytest still cannot collect locally on this Windows host because lmcache.c_ops is not built.

@he-yufeng
Copy link
Copy Markdown
Contributor Author

Follow-up for the CI failure: the test was still retaining the fake CUDA event through the parent transfer_ctx mock's call history. I switched the cleanup to reset the parent mock, so the weakref check now verifies the adapter no longer owns the event after get_finished() removes it.

Validated locally:

  • python -m py_compile tests\v1\test_vllm_mp_adapter.py
  • ruff check tests\v1\test_vllm_mp_adapter.py
  • mypy tests\v1\test_vllm_mp_adapter.py
  • git diff --check

python -m pytest tests\v1\test_vllm_mp_adapter.py -q is still blocked in my Windows checkout before test collection by ModuleNotFoundError: No module named 'lmcache.c_ops' from tests/conftest.py.

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
@he-yufeng he-yufeng force-pushed the fix/mp-event-lifetime branch from 95af7d1 to e50d972 Compare May 30, 2026 13:41
Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@maobaolong
Copy link
Copy Markdown
Collaborator

@hlin99 Would you like to take a double check?

Copy link
Copy Markdown
Collaborator

@hlin99 hlin99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hlin99 hlin99 enabled auto-merge (squash) June 1, 2026 06:48
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Jun 1, 2026
@hlin99 hlin99 merged commit 5824ab3 into LMCache:dev Jun 2, 2026
26 of 27 checks passed
ruizhang0101 pushed a commit to ruizhang0101/LMCache that referenced this pull request Jun 2, 2026
* fix: retain CUDA IPC events in MP adapter

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

* fix: avoid CUDA event type coupling

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

* test: use transfer context in MP adapter event tests

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

* test: clear MP event test mock references

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

* test: drop parent mock event references

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>

---------

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][MP] LMCacheMPWorkerAdapter does not retain producer-side CUDA Event after exporting ipc_handle

5 participants