vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.9k
Star 20.8k

Code
Issues 870
Pull requests 265
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 42 Milestones 0

New pull request New

265 Open 1,901 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Core][Distributed] add same-node detection

#5369 opened Jun 10, 2024 by youkaichao

Loading…

[Misc] Various simplifications and typing fixes

#5368 opened Jun 9, 2024 by njhill

Loading…

[WIP][Core] Support tensor parallel division with remainder of attention heads

#5367 opened Jun 9, 2024 by NadavShmayo

Loading…

[Core][Bugfix]: fix prefix caching for blockv2

#5364 opened Jun 9, 2024 by leiwen83

Loading…

[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy

#5362 opened Jun 9, 2024 by KuntaiDu • Draft

2 of 6 tasks

[Model][Bugfix] Add GLM-4v support

#5358 opened Jun 8, 2024 by songxxzp

Loading…

[Kernel][RFC] Initial commit containing new Triton kernels for multi lora serving.

#5356 opened Jun 8, 2024 by FurtherAI

Loading…

1 task

[Bugfix] Take the VRAM usage of prompt_logprobs into account

#5355 opened Jun 8, 2024 by Conless

Loading…

[Core][Distributed] merge two broadcast_tensor_dict

#5354 opened Jun 8, 2024 by youkaichao

Loading…

[Bugfix][Core] fix broken state for recompute

#5349 opened Jun 7, 2024 by youkaichao

Loading…

[WIP][Spec Decode] integrate typical acceptance sampler into Spec Decode Worker

#5348 opened Jun 7, 2024 by sroy745 • Draft

remove sort_keys=True in guided_decoding

#5332 opened Jun 7, 2024 by DeyangKong

Loading…

[ci] Use small_cpu_queue for doc build

#5331 opened Jun 7, 2024 by khluu

Loading…

[ci] Mount buildkite agent on Docker container to upload benchmark results

#5330 opened Jun 7, 2024 by khluu

Loading…

[Core] Fix sharing of stateful logits processors

#5329 opened Jun 7, 2024 by maxdebayser

Loading…

[MISC] Upgrade dependency to PyTorch 2.3.1

#5327 opened Jun 7, 2024 by comaniac

Loading…

[Doc] Add an automatic prefix caching section in vllm documentation

#5324 opened Jun 6, 2024 by KuntaiDu

Loading…

[Bugfix][CI/Build][Upgrade][AMD][ROCm]Fixed the cmake build bug which generate garbage on mi300x and rocm6.1 upgrade rocm

#5323 opened Jun 6, 2024 by hongxiayang

Loading…

[Core][Distributed] use device group for all broadcast

#5320 opened Jun 6, 2024 by youkaichao

Loading…

[Feature][Frontend]: Continued stream_options implementation also in CompletionRequest

#5319 opened Jun 6, 2024 by Etelis

Loading…

[Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs

#5312 opened Jun 6, 2024 by maor-ps

Loading…

[Bugfix]if the content is started with ":"(response of ping), client should i…

#5303 opened Jun 6, 2024 by sywangyi

Loading…

[Core][Distributed] add coordinator to reduce code duplication in tp and pp

#5293 opened Jun 5, 2024 by youkaichao

Loading…

[WIP][Hardware] Initial TPU integration tpu

Related to Google TPUs

#5292 opened Jun 5, 2024 by WoosukKwon • Draft

2 tasks

[Model] Dynamic image size support for LLaVA-NeXT

#5279 opened Jun 5, 2024 by DarkLight1337 • Draft

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

ProTip! Filter pull requests by the default branch with base:main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly