vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.7k
Star 20.1k

Code
Issues 879
Pull requests 260
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 41 Milestones 0

New pull request New

260 Open 1,753 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Bugfix] Fix arguments passed to Sequence in stop checker test

#5092 opened May 29, 2024 by DarkLight1337

Loading…

[Core][Optimization] remove vllm-nccl

#5091 opened May 28, 2024 by youkaichao

Loading…

New vllm CLI

#5090 opened May 28, 2024 by EthanqX

Loading…

[Model] Support MAP-NEO model

#5081 opened May 28, 2024 by xingweiqu

Loading…

[Core][CUDA Graph] add output buffer for cudagraph to reduce memory footprint

#5074 opened May 27, 2024 by youkaichao

Loading…

[CI/Build][Misc] Add scripts that performs a fair comparison between vLLM and alternatives (TGI and TRT)

#5073 opened May 27, 2024 by KuntaiDu

Loading…

1 of 4 tasks

[Misc] Add a test case for 'microsoft/Phi-3-small-8k-instruct', because special tokens can cause a crash

#5068 opened May 27, 2024 by AllenDou

Loading…

[WIP] Hete spec decode

#5065 opened May 27, 2024 by jiqing-feng • Draft

[Model] Add Internlm2 LoRA support

#5064 opened May 27, 2024 by Isotr0py

Loading…

[Frontend] Add tokenize/detokenize endpoints

#5054 opened May 26, 2024 by sasha0552

Loading…

[Bugfix] Adds outlines performance improvement

#5053 opened May 26, 2024 by lynkz-matt-psaltis • Draft

Chat method for offline llm

#5049 opened May 25, 2024 by nunjunj

Loading…

Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops

#5047 opened May 25, 2024 by bnellnm • Draft

Bump version to v0.4.3

#5046 opened May 25, 2024 by simon-mo

Loading…

ci draft

#5040 opened May 24, 2024 by khluu • Draft

[Model] Enable FP8 QKV in MoE and refine kernel tuning script

#5039 opened May 24, 2024 by comaniac

Loading…

[Core] Change LoRA embedding sharding to support loading methods

#5038 opened May 24, 2024 by Yard1

Loading…

[Kernel] Dynamic Per-Token Activation Quantization

#5037 opened May 24, 2024 by dsikka • Draft

[Kernel][RFC] Refactor the punica kernel based on Triton

#5036 opened May 24, 2024 by jeejeelee • Draft

2 of 3 tasks

[CI/Build] CMakeLists: build all extensions' cmake targets at the same time

#5034 opened May 24, 2024 by dtrifiro

Loading…

[FRONTEND] OpenAI tools support named functions

#5032 opened May 24, 2024 by br3no

Loading…

[Bugfix] logprobs is not compatible with the OpenAI spec #4795

#5031 opened May 24, 2024 by Etelis

Loading…

[BUGFIX] [FRONTEND] Correct chat logprobs

#5029 opened May 24, 2024 by br3no

Loading…

[Bugfix][Frontend] Fix format of returned logprobs for OpenAI Chat Completions API

#5026 opened May 24, 2024 by DarkLight1337

Loading…

[Kernel] Initial commit containing new Triton kernels for multi lora serving.

#5025 opened May 24, 2024 by FurtherAI

Loading…

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-04-28.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly