fix(mp): correct store cached requests in lmcache_mp_connector by maobaolong · Pull Request #3012 · LMCache/LMCache

maobaolong · 2026-04-12T07:50:26Z

Summary

This PR fixes two bookkeeping bugs in lmcache_mp_connector_0180.py that together cause KV cache blocks to be silently dropped during store operations.

Here is a relevant vllm side PR vllm-project/vllm#39655 , the changed content is totally same with this PR.

Bug 1: `GetStoreMetadata` — incorrect `min_available_blocks` upper bound

Root cause (causes missed stores):

In GetStoreMetadata, min_available_blocks was computed solely from num_scheduled_tokens // vllm_block_size. However, num_stored_blocks (set during the lookup phase) already includes the retrieve-hit blocks. When a request has LMCache hits, the subtraction min_available_blocks - num_stored_blocks can go negative, preventing any new blocks from being stored.

Example (block_size=64, chunk_size=256, blocks_in_chunk=4):

Suppose a request has 4096 tokens, and LMCache lookup hits 60 blocks (3840 tokens). The scheduler then schedules the remaining 256 tokens (= 4 blocks).

Variable	Old code	New code
`num_scheduled_tokens // block_size`	256 / 64 = 4	4
`num_lmcache_hit_blocks`	—	60
`computed_blocks`	4	4 + 60 = 64
`min_available_blocks`	4	64
`num_stored_blocks` (from lookup)	60	60
`num_staging_blocks`	4 − 60 = −56 ❌	64 − 60 = 4 ✅

With the old code, num_staging_blocks is negative, so no new block is ever stored. The fix adds num_lmcache_hit_blocks to computed_blocks so the upper bound matches the baseline that num_stored_blocks was set against. Hit blocks are not re-stored because they are already counted in num_stored_blocks.

Bug 2: `_process_cached_requests` — cumulative vs. incremental token count

Root cause (bookkeeping corruption):

In _process_cached_requests, num_new_tokens was read from cached_reqs.num_computed_tokens[idx], which is a cumulative value (total computed tokens so far). It was then passed to increase_num_scheduled_tokens(), which does += (accumulation). This double-counts tokens across scheduling rounds.

Example (block_size=64, chunk_size=256):

Suppose a request has 4096 tokens and the scheduler splits it into two prefill rounds of 2048 tokens each:

Round	Source	Value passed	Tracker after `+=`
Round 1 (new_req)	`num_scheduled_tokens`	2048	2048 ✅
Round 2 (cached_req, old)	`num_computed_tokens` (cumulative)	4096	2048 + 4096 = 6144 ❌
Round 2 (cached_req, new)	`num_scheduled_tokens` (incremental)	2048	2048 + 2048 = 4096 ✅

The inflated tracker value (6144 instead of 4096) does not cause out-of-bound stores thanks to the min() guard in GetStoreMetadata, but it corrupts internal bookkeeping and is a latent bug. The fix switches to scheduler_output.num_scheduled_tokens[request_id] (incremental), consistent with _process_new_requests.

Changes

GetStoreMetadata: include num_lmcache_hit_blocks in computed_blocks to restore the correct upper bound for num_staging_blocks.
_process_cached_requests: use scheduler_output.num_scheduled_tokens[request_id] (incremental) instead of cached_reqs.num_computed_tokens[idx] (cumulative).

Note

Medium Risk
Changes KV-cache store bookkeeping for cached and partially-hit requests; mistakes here could still cause missed stores or incorrect block accounting during prefill/generation.

Overview
Fixes LMCache MP store bookkeeping so KV blocks aren’t silently skipped when requests have a mix of vLLM APC hits and LMCache hits.

GetStoreMetadata now computes the storeable-block upper bound using scheduled blocks plus max(num_vllm_hit_blocks, num_lmcache_hit_blocks) (with added rationale comments), and _process_cached_requests now increments num_scheduled_tokens using per-step scheduler_output.num_scheduled_tokens[request_id] instead of a cumulative computed-token counter.

^{Reviewed by Cursor Bugbot for commit a8ee866. Bugbot is set up for automated code reviews on this repo. Configure here.}

gemini-code-assist

Code Review

This pull request modifies the block calculation logic in GetStoreMetadata to include hit blocks and updates _process_cached_requests to use incremental scheduled tokens for consistency. A review comment was provided requesting the addition of regression tests for these bug fixes, as mandated by the repository style guide.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d1e4179. Configure here.}

…connector_0180 Signed-off-by: baoloongmao <baoloongmao@tencent.com>

…s bugs Signed-off-by: baoloongmao <baoloongmao@tencent.com>

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

ApostaC · 2026-04-13T00:08:25Z

@maobaolong good catch for the bugs! Can you also create a PR in vLLM? Both me and @KuntaiDu can approve that for you!

ApostaC

LGTM!

maobaolong · 2026-04-13T02:27:25Z

@ApostaC Thanks for this quick review!

…tor LMCache#3012 Signed-off-by: baoloongmao <baoloongmao@tencent.com>

…tor LMCache#3012 (#15) Signed-off-by: baoloongmao <baoloongmao@tencent.com>

sammshen

LGTM!

ApostaC · 2026-04-14T20:18:34Z

@maobaolong Are we going to keep track with vllm-project/vllm#39719?

maobaolong · 2026-04-16T13:34:55Z

@maobaolong Are we going to keep track with vllm-project/vllm#39719?

Yeah, I will.

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

maobaolong requested review from YaoJiayi, deng451e and sammshen as code owners April 12, 2026 07:50

gemini-code-assist Bot reviewed Apr 12, 2026

View reviewed changes

Comment thread lmcache/integration/vllm/lmcache_mp_connector_0180.py Outdated

cursor Bot reviewed Apr 12, 2026

View reviewed changes

Comment thread lmcache/integration/vllm/lmcache_mp_connector_0180.py Outdated

maobaolong force-pushed the fix_not_store_bugs branch from d1e4179 to 469aace Compare April 12, 2026 08:14

maobaolong requested review from ApostaC and hickeyma as code owners April 12, 2026 08:20

maobaolong changed the title ~~fix(mp): correct store bookkeeping for cached requests in lmcache_mp_connector_0180~~ fix(mp): correct store cached requests in lmcache_mp_connector Apr 12, 2026

maobaolong added 3 commits April 12, 2026 16:45

fix(mp): correct store bookkeeping for cached requests in lmcache_mp_…

57efbf6

…connector_0180 Signed-off-by: baoloongmao <baoloongmao@tencent.com>

Add regression tests for GetStoreMetadata and _process_cached_request…

9dd116e

…s bugs Signed-off-by: baoloongmao <baoloongmao@tencent.com>

Remove regression test file (lmcache CI lacks vllm dependency)

befd463

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

maobaolong force-pushed the fix_not_store_bugs branch from adc5fd8 to befd463 Compare April 12, 2026 08:45

ApostaC approved these changes Apr 13, 2026

View reviewed changes

Comment thread lmcache/integration/vllm/lmcache_mp_connector_0180.py Outdated

maobaolong force-pushed the fix_not_store_bugs branch from 7fe7821 to 4571a84 Compare April 13, 2026 01:53

maobaolong added a commit to maobaolong/LMCache that referenced this pull request Apr 13, 2026

backport: fix(mp): correct store cached requests in lmcache_mp_connec…

d0445af

…tor LMCache#3012 Signed-off-by: baoloongmao <baoloongmao@tencent.com>

maobaolong added a commit to maobaolong/LMCache that referenced this pull request Apr 13, 2026

backport: fix(mp): correct store cached requests in lmcache_mp_connec…

9be2890

…tor LMCache#3012 (#15) Signed-off-by: baoloongmao <baoloongmao@tencent.com>

sammshen approved these changes Apr 13, 2026

View reviewed changes

maobaolong added 2 commits April 18, 2026 15:03

Revert unnecessary changes

27b4974

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

track the changes from vllm repo.

a4ada01

Signed-off-by: baoloongmao <baoloongmao@tencent.com>

maobaolong force-pushed the fix_not_store_bugs branch from 7f26074 to a4ada01 Compare April 18, 2026 07:03

Merge branch 'dev' into fix_not_store_bugs

a8ee866

ApostaC enabled auto-merge (squash) April 19, 2026 03:38

github-actions Bot added the full Run comprehensive tests on this PR label Apr 19, 2026

ApostaC merged commit f1e0db7 into LMCache:dev Apr 19, 2026
30 of 34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mp): correct store cached requests in lmcache_mp_connector#3012

fix(mp): correct store cached requests in lmcache_mp_connector#3012
ApostaC merged 6 commits intoLMCache:devfrom
maobaolong:fix_not_store_bugs

maobaolong commented Apr 12, 2026 •

edited by cursor Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

ApostaC commented Apr 13, 2026

Uh oh!

ApostaC left a comment

Uh oh!

Uh oh!

maobaolong commented Apr 13, 2026

Uh oh!

sammshen left a comment

Uh oh!

ApostaC commented Apr 14, 2026

Uh oh!

maobaolong commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maobaolong commented Apr 12, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bug 1: GetStoreMetadata — incorrect min_available_blocks upper bound

Bug 2: _process_cached_requests — cumulative vs. incremental token count

Changes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ApostaC commented Apr 13, 2026

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maobaolong commented Apr 13, 2026

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

ApostaC commented Apr 14, 2026

Uh oh!

maobaolong commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maobaolong commented Apr 12, 2026 •

edited by cursor Bot

Loading

Bug 1: `GetStoreMetadata` — incorrect `min_available_blocks` upper bound

Bug 2: `_process_cached_requests` — cumulative vs. incremental token count