Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: What should I do if I want to skip the prefill of a new request? #14863

Open
1 task done
chenhongyu2048 opened this issue Mar 15, 2025 · 0 comments
Open
1 task done
Labels
usage How to use vllm

Comments

@chenhongyu2048
Copy link

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

MyQuestion:
I want to add a new request to engine, and make some dummy kv cache for it, and then let it directly start the decode stage.
What should I do for this?
Maybe I should first modify the STATUS in SeqGroup, allocate and faked some kv cache in the block manager?
But i'm still confused about how to add the faked first output token for it?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@chenhongyu2048 chenhongyu2048 added the usage How to use vllm label Mar 15, 2025
@chenhongyu2048 chenhongyu2048 changed the title [Usage]: What should I do if I want to skip the prefill of s new request? [Usage]: What should I do if I want to skip the prefill of a new request? Mar 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

1 participant