[Feature]: Simplify speculative-config format for vllm serve

### 🚀 The feature, motivation and pitch

To enable speculative decoding via `vllm serve`, currently we need to pass in `speculative-config` arg as a valid json string, such as:
> '{"method": "eagle", "model": "yuhuili/EAGLE-LLaMA3.1-Instruct-8B", "num_speculative_tokens": 3, "draft_tensor_parallel_size": 1, "max_model_len": 2048}'

The above json string contains empty spaces and single / double quotes, which can get tricky to pass in as command line argument in some production environments. Using the following format where each setting is separated by a comma can solve this problem:
> method:eagle,model:yuhuili/EAGLE-LLaMA3.1-Instruct-8B,num_speculative_tokens:3,draft_tensor_parallel_size:1,max_model_len:2048

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Simplify speculative-config format for vllm serve #19709

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Simplify speculative-config format for vllm serve #19709

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions