Skip to content

Conversation

@grimoire
Copy link
Collaborator

@grimoire grimoire commented Nov 7, 2025

We have to remove the free ratio check in schedule prefill since:

  1. When a decoding request with a large session length is evicted, it would enter the waiting queue. The check might prevent it from being recomputed.
  2. If all requests are in either waiting or hanging(finish) queue, the engine might crash since resources of hanging would never be freed.

Since the reserved cache does not provide much acceleration, I think we should remove it until these problem getting solved.

@grimoire
Copy link
Collaborator Author

grimoire commented Nov 7, 2025

@Tsundoku958

@lvhan028 lvhan028 requested a review from CUHKSZzxy November 7, 2025 10:46
Copy link
Collaborator

@CUHKSZzxy CUHKSZzxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028 lvhan028 added the Bug:P0 label Nov 7, 2025
@lvhan028 lvhan028 merged commit b186cf0 into InternLM:main Nov 7, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants