-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] Fix arguments passed to
Sequence
in stop checker test
#5092
opened May 29, 2024 by
DarkLight1337
Loading…
[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint
#5074
opened May 27, 2024 by
youkaichao
Loading…
[CI/Build][Misc] Add scripts that performs a fair comparison between vLLM and alternatives (TGI and TRT)
#5073
opened May 27, 2024 by
KuntaiDu
Loading…
1 of 4 tasks
[Misc] Add a test case for 'microsoft/Phi-3-small-8k-instruct', because special tokens can cause a crash
#5068
opened May 27, 2024 by
AllenDou
Loading…
[Bugfix] Adds outlines performance improvement
#5053
opened May 26, 2024 by
lynkz-matt-psaltis
•
Draft
[Model] Enable FP8 QKV in MoE and refine kernel tuning script
#5039
opened May 24, 2024 by
comaniac
Loading…
[Core] Change LoRA embedding sharding to support loading methods
#5038
opened May 24, 2024 by
Yard1
Loading…
[CI/Build] CMakeLists: build all extensions' cmake targets at the same time
#5034
opened May 24, 2024 by
dtrifiro
Loading…
[Bugfix] logprobs is not compatible with the OpenAI spec #4795
#5031
opened May 24, 2024 by
Etelis
Loading…
[Bugfix][Frontend] Fix format of returned logprobs for OpenAI Chat Completions API
#5026
opened May 24, 2024 by
DarkLight1337
Loading…
[Kernel] Initial commit containing new Triton kernels for multi lora serving.
#5025
opened May 24, 2024 by
FurtherAI
Loading…
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-04-28.