Open
Description
🚀 The feature, motivation and pitch
To enable speculative decoding via vllm serve
, currently we need to pass in speculative-config
arg as a valid json string, such as:
'{"method": "eagle", "model": "yuhuili/EAGLE-LLaMA3.1-Instruct-8B", "num_speculative_tokens": 3, "draft_tensor_parallel_size": 1, "max_model_len": 2048}'
The above json string contains empty spaces and single / double quotes, which can get tricky to pass in as command line argument in some production environments. Using the following format where each setting is separated by a comma can solve this problem:
method:eagle,model:yuhuili/EAGLE-LLaMA3.1-Instruct-8B,num_speculative_tokens:3,draft_tensor_parallel_size:1,max_model_len:2048
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.