vllm-project / vllm Public

Notifications
Fork 2.6k
Star 19.6k

Code
Issues 830
Pull requests 230
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q2 2024

#3861 opened Apr 4, 2024 by simon-mo

Open 24

Labels 41 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

830 Open 1,926 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

v0.4.3 Release Tracker release

Related to new version release

#4895 opened May 18, 2024 by simon-mo

[Bug]: CohereForAI/c4ai-command-r-v01OSError: [Errno 12] Cannot allocate memory bug

Something isn't working

#4891 opened May 17, 2024 by epignatelli

[Bug]: assert parts[0] == "base_model" AssertionError bug

Something isn't working

#4883 opened May 17, 2024 by Edisonwei54

[Usage]: why can't I set gpu nums while use "tensor_parallel_size"? usage

How to use vllm

#4882 opened May 17, 2024 by GodHforever

[Installation]: Do we have the plan to update the pip package installation method for the CPU backend. installation

Installation problems

#4881 opened May 17, 2024 by Zhenzhong1

[Usage]: gpu memory usage when using tensor parallel usage

How to use vllm

#4880 opened May 17, 2024 by DaiJianghai

[Bug]: single lora request error make all processing requests error bug

Something isn't working

#4879 opened May 17, 2024 by jinzhen-lin

[RFC]: Add control panel support for vLLM RFC

#4873 opened May 17, 2024 by leiwen83

7 of 11 tasks

[Bug]: Shape error encountered in speculative decoding when enable_lora=True bug

Something isn't working

#4872 opened May 17, 2024 by mitchellstern

[Feature]: Health check for restart policy feature request

#4867 opened May 16, 2024 by pseudotensor

[Usage]: distributed inference with kuberay usage

How to use vllm

#4865 opened May 16, 2024 by hetian127

[Misc]: a question about chunked-prefill in flash-attn backends misc

#4863 opened May 16, 2024 by HarryWu99

[Bug]: No CUDA GPUs are available on 'CPU' use bug

Something isn't working

#4858 opened May 16, 2024 by mcr-ksh

[Usage]: How to determine how many concurrent requests can be supported in an acceptable time duration with demo api server? usage

How to use vllm

#4853 opened May 16, 2024 by senbinyu

[Bug]: Qwen1.5-72B L20x8 latest vLLM TPOT slower than v0.4.0.post, 48ms vs 39ms, why? bug

Something isn't working

#4852 opened May 16, 2024 by DefTruth

[Misc]: Assertion with no scription in vllm with DeepSeekMath 7b model, why, how to fix? misc

#4849 opened May 16, 2024 by brando90

[Feature]: Build and publish Neuron docker image feature request

#4838 opened May 15, 2024 by yaronr

[Bug]: Running vllm docker image with neuron fails bug

Something isn't working

#4836 opened May 15, 2024 by yaronr

[New Model]: Google's Paligemma family of models new model

Requests to new models

#4833 opened May 15, 2024 by nfplay

[Usage]: how to use run in mixed mode CPU/GPU (device_map="auto") usage

How to use vllm

#4832 opened May 15, 2024 by osafaimal

[Usage]: Passing image to the vllm api endpoint usage

How to use vllm

#4826 opened May 15, 2024 by davidramous

[Usage]: How to use tensor-parallel-size argument when deploy Llama3-8b with AsyncLLMEngine usage

How to use vllm

#4825 opened May 15, 2024 by ANYMS-A

[Feature]: rope_scaling for qwen2 feature request

#4824 opened May 15, 2024 by HappyLynn

[Performance]: Will memcpy happen with distributed kv caches while decoding ? performance

Performance-related issues

#4823 opened May 15, 2024 by GodHforever

Remove EOS token before passing the tokenized input to model misc

#4814 opened May 14, 2024 by VallabhMahajan1

Previous 1 2 3 4 5 … 33 34 Next

Previous Next

ProTip! Exclude everything labeled bug with -label:bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly