-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: load nvidia/Llama3-ChatQA-1.5-8B model 15 min
bug
Something isn't working
#5365
opened Jun 9, 2024 by
JJplane
[Bug]: Falcon fails if Something isn't working
trust_remote_code=True
bug
#5363
opened Jun 9, 2024 by
robertgshaw2-neuralmagic
[Bug]: Multi GPU setup for VLLM in Openshift still does not work
bug
Something isn't working
#5360
opened Jun 9, 2024 by
jayteaftw
[Bug]: TorchSDPAMetadata is out of date
bug
Something isn't working
#5351
opened Jun 7, 2024 by
Reichenbachian
[Bug]: with Something isn't working
--enable-prefix-caching
, /completions
crashes server with echo=True
above certain prompt length
bug
#5344
opened Jun 7, 2024 by
hibukipanim
[Performance]: [Automatic Prefix Caching] When hitting the KV cached blocks, the first execute is slow, and then is fast.
performance
Performance-related issues
#5339
opened Jun 7, 2024 by
soacker
[Usage]: Howto quiet the terminal 'Info' outputs in vllm
usage
How to use vllm
#5338
opened Jun 7, 2024 by
rohitnanda1443
[Bug]: Getting an empty string ('') for every call on fine-tuned Code-Llama-7b-hf model
bug
Something isn't working
#5336
opened Jun 7, 2024 by
arthbohra
[Bug]: Unexpected prompt token logprob behaviors of llama 2 when setting echo=True for openai-api server
bug
Something isn't working
#5334
opened Jun 7, 2024 by
fywalter
[Bug]: vLLM does not support virtual GPU
bug
Something isn't working
#5328
opened Jun 7, 2024 by
youkaichao
[Usage]: Function calling for mistral v0.3
usage
How to use vllm
#5325
opened Jun 6, 2024 by
mansirthd
[New Model]: mistralai/Codestral-22B-v0.1
new model
Requests to new models
#5318
opened Jun 6, 2024 by
eduardozamudio
[Installation]: Compiling VLLM for cpu only.
installation
Installation problems
#5317
opened Jun 6, 2024 by
Zibri
[Performance]: gptq and awq quantization do not improve the performance
performance
Performance-related issues
#5316
opened Jun 6, 2024 by
aaronlyt
[Feature]: Make a unstable latest docker image
feature request
#5315
opened Jun 6, 2024 by
emillykkejensen
[Speculative decoding]: The content generated by speculative decoding is inconsistent with the content generated by the target model
bug
Something isn't working
#5313
opened Jun 6, 2024 by
YuCheng-Qi
[Feature]: Support selecting chat template
feature request
#5309
opened Jun 6, 2024 by
Theodotus1243
[Bug]: GPU memory usage is inconsistent with gpu_memory_utilization settings
bug
Something isn't working
#5305
opened Jun 6, 2024 by
yecphaha
[Bug]: speculative decoding with max-num-seqs <= 2 * num-speculative-tokens
bug
Something isn't working
#5302
opened Jun 6, 2024 by
HappyLynn
[Bug]: After fine-tuning Qwen Lora, the inference results differ when using VLLM and Hugging Face to load
bug
Something isn't working
#5298
opened Jun 6, 2024 by
lonngxiang
[Usage]: the docker image v0.4.3 cannot work
usage
How to use vllm
#5283
opened Jun 5, 2024 by
BUJIDAOVS
Previous Next
ProTip!
Updated in the last three days: updated:>2024-06-06.