[Feature]: EXL3 support

### 🚀 The feature, motivation and pitch

EXL3 has strong potential to become the go-to format for small BPW quantization, offering excellent perplexity-to-size performance. The authors note that changes in the format make integration with VLLM more feasible.
See: https://github.com/turboderp-org/exllamav3?tab=readme-ov-file#exl3-quantization

Adding EXL3 support to VLLM would be a major win, as the native EXLlama engine struggles with large-scale serving, where VLLM excels.

### Alternatives

Using EXLLama for inference - bad performance, bad concurrency
Using different quantization formats - more vram, or worse model performance

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: EXL3 support #19896

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: EXL3 support #19896

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions