Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Something isn't working
CohereForAI/c4ai-command-r-v01
OSError: [Errno 12] Cannot allocate memory
bug
#4891
opened May 17, 2024 by
epignatelli
[Bug]: assert parts[0] == "base_model" AssertionError
bug
Something isn't working
#4883
opened May 17, 2024 by
Edisonwei54
[Usage]: why can't I set gpu nums while use "tensor_parallel_size"?
usage
How to use vllm
#4882
opened May 17, 2024 by
GodHforever
[Installation]: Do we have the plan to update the pip package installation method for the CPU backend.
installation
Installation problems
#4881
opened May 17, 2024 by
Zhenzhong1
[Usage]: gpu memory usage when using tensor parallel
usage
How to use vllm
#4880
opened May 17, 2024 by
DaiJianghai
[Bug]: single lora request error make all processing requests error
bug
Something isn't working
#4879
opened May 17, 2024 by
jinzhen-lin
[Bug]: Shape error encountered in speculative decoding when Something isn't working
enable_lora=True
bug
#4872
opened May 17, 2024 by
mitchellstern
[Feature]: Health check for restart policy
feature request
#4867
opened May 16, 2024 by
pseudotensor
[Usage]: distributed inference with kuberay
usage
How to use vllm
#4865
opened May 16, 2024 by
hetian127
[Misc]: a question about chunked-prefill in flash-attn backends
misc
#4863
opened May 16, 2024 by
HarryWu99
[Bug]: No CUDA GPUs are available on 'CPU' use
bug
Something isn't working
#4858
opened May 16, 2024 by
mcr-ksh
[Usage]: How to determine how many concurrent requests can be supported in an acceptable time duration with demo api server?
usage
How to use vllm
#4853
opened May 16, 2024 by
senbinyu
[Bug]: Qwen1.5-72B L20x8 latest vLLM TPOT slower than v0.4.0.post, 48ms vs 39ms, why?
bug
Something isn't working
#4852
opened May 16, 2024 by
DefTruth
[Misc]: Assertion with no scription in vllm with DeepSeekMath 7b model, why, how to fix?
misc
#4849
opened May 16, 2024 by
brando90
[Feature]: Build and publish Neuron docker image
feature request
#4838
opened May 15, 2024 by
yaronr
[Bug]: Running vllm docker image with neuron fails
bug
Something isn't working
#4836
opened May 15, 2024 by
yaronr
[New Model]: Google's Paligemma family of models
new model
Requests to new models
#4833
opened May 15, 2024 by
nfplay
[Usage]: how to use run in mixed mode CPU/GPU (device_map="auto")
usage
How to use vllm
#4832
opened May 15, 2024 by
osafaimal
[Usage]: Passing image to the vllm api endpoint
usage
How to use vllm
#4826
opened May 15, 2024 by
davidramous
[Usage]: How to use tensor-parallel-size argument when deploy Llama3-8b with AsyncLLMEngine
usage
How to use vllm
#4825
opened May 15, 2024 by
ANYMS-A
[Performance]: Will memcpy happen with distributed kv caches while decoding ?
performance
Performance-related issues
#4823
opened May 15, 2024 by
GodHforever
Remove EOS token before passing the tokenized input to model
misc
#4814
opened May 14, 2024 by
VallabhMahajan1
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.