[Usage]: What should I do if I want to skip the prefill of a new request? #14863

chenhongyu2048 · 2025-03-15T13:54:34Z

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

MyQuestion:
I want to add a new request to engine, and make some dummy kv cache for it, and then let it directly start the decode stage.
What should I do for this?
Maybe I should first modify the STATUS in SeqGroup, allocate and faked some kv cache in the block manager?
But i'm still confused about how to add the faked first output token for it?

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

chenhongyu2048 added the usage label Mar 15, 2025

chenhongyu2048 changed the title ~~[Usage]: What should I do if I want to skip the prefill of s new request?~~ [Usage]: What should I do if I want to skip the prefill of a new request? Mar 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: What should I do if I want to skip the prefill of a new request? #14863

[Usage]: What should I do if I want to skip the prefill of a new request? #14863

chenhongyu2048 commented Mar 15, 2025

[Usage]: What should I do if I want to skip the prefill of a new request? #14863

[Usage]: What should I do if I want to skip the prefill of a new request? #14863

Comments

chenhongyu2048 commented Mar 15, 2025

Your current environment

How would you like to use vllm

Before submitting a new issue...