vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.9k
Star 20.8k

Code
Issues 866
Pull requests 265
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 30

Virtual Office Hours: Jun 5 and Jun 20

#4919 opened May 20, 2024 by robertgshaw2-neuralmagic

Open 2

v0.5.0 Release Tracker

#5224 opened Jun 3, 2024 by simon-mo

Open 5

Labels 42 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

866 Open 2,096 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: load nvidia/Llama3-ChatQA-1.5-8B model 15 min bug

Something isn't working

#5365 opened Jun 9, 2024 by JJplane

[Bug]: Falcon fails if trust_remote_code=True bug

Something isn't working

#5363 opened Jun 9, 2024 by robertgshaw2-neuralmagic

[Bug]: Multi GPU setup for VLLM in Openshift still does not work bug

Something isn't working

#5360 opened Jun 9, 2024 by jayteaftw

[Bug]: TorchSDPAMetadata is out of date bug

Something isn't working

#5351 opened Jun 7, 2024 by Reichenbachian

[RFC]: Refactor MoE RFC

#5346 opened Jun 7, 2024 by robertgshaw2-neuralmagic

[Bug]: with --enable-prefix-caching , /completions crashes server with echo=True above certain prompt length bug

Something isn't working

#5344 opened Jun 7, 2024 by hibukipanim

[Speculative decoding]: The content generated by speculative decoding is inconsistent with the content generated by : When I use the speculative mode and prompt_length+output_length > 2048, the error occurs bug

Something isn't working

#5342 opened Jun 7, 2024 by zhangxy1234

[Performance]: [Automatic Prefix Caching] When hitting the KV cached blocks, the first execute is slow, and then is fast. performance

Performance-related issues

#5339 opened Jun 7, 2024 by soacker

[Usage]: Howto quiet the terminal 'Info' outputs in vllm usage

How to use vllm

#5338 opened Jun 7, 2024 by rohitnanda1443

[Bug]: Getting an empty string ('') for every call on fine-tuned Code-Llama-7b-hf model bug

Something isn't working

#5336 opened Jun 7, 2024 by arthbohra

[Bug]: Unexpected prompt token logprob behaviors of llama 2 when setting echo=True for openai-api server bug

Something isn't working

#5334 opened Jun 7, 2024 by fywalter

[Bug]: vLLM does not support virtual GPU bug

Something isn't working

#5328 opened Jun 7, 2024 by youkaichao

[Usage]: Function calling for mistral v0.3 usage

How to use vllm

#5325 opened Jun 6, 2024 by mansirthd

[New Model]: mistralai/Codestral-22B-v0.1 new model

Requests to new models

#5318 opened Jun 6, 2024 by eduardozamudio

[Installation]: Compiling VLLM for cpu only. installation

Installation problems

#5317 opened Jun 6, 2024 by Zibri

[Performance]: gptq and awq quantization do not improve the performance performance

Performance-related issues

#5316 opened Jun 6, 2024 by aaronlyt

[Feature]: Make a unstable latest docker image feature request

#5315 opened Jun 6, 2024 by emillykkejensen

[Speculative decoding]: The content generated by speculative decoding is inconsistent with the content generated by the target model bug

Something isn't working

#5313 opened Jun 6, 2024 by YuCheng-Qi

[Bug]: RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. bug

Something isn't working

#5311 opened Jun 6, 2024 by wuyueandrew

[Feature]: Support selecting chat template feature request

#5309 opened Jun 6, 2024 by Theodotus1243

GLM-4-9B-Chat: new model

Requests to new models

#5306 opened Jun 6, 2024 by Geaming-CHN

[Bug]: GPU memory usage is inconsistent with gpu_memory_utilization settings bug

Something isn't working

#5305 opened Jun 6, 2024 by yecphaha

[Bug]: speculative decoding with max-num-seqs <= 2 * num-speculative-tokens bug

Something isn't working

#5302 opened Jun 6, 2024 by HappyLynn

[Bug]: After fine-tuning Qwen Lora, the inference results differ when using VLLM and Hugging Face to load bug

Something isn't working

#5298 opened Jun 6, 2024 by lonngxiang

[Usage]: the docker image v0.4.3 cannot work usage

How to use vllm

#5283 opened Jun 5, 2024 by BUJIDAOVS

Previous 1 2 3 4 5 … 34 35 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-06-06.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly