-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[WIP][Core] Support tensor parallel division with remainder of attention heads
#5367
opened Jun 9, 2024 by
NadavShmayo
Loading…
[Kernel][RFC] Initial commit containing new Triton kernels for multi lora serving.
#5356
opened Jun 8, 2024 by
FurtherAI
Loading…
1 task
[Bugfix] Take the VRAM usage of prompt_logprobs into account
#5355
opened Jun 8, 2024 by
Conless
Loading…
[ci] Mount buildkite agent on Docker container to upload benchmark results
#5330
opened Jun 7, 2024 by
khluu
Loading…
[Doc] Add an automatic prefix caching section in vllm documentation
#5324
opened Jun 6, 2024 by
KuntaiDu
Loading…
[Bugfix][CI/Build][Upgrade][AMD][ROCm]Fixed the cmake build bug which generate garbage on mi300x and rocm6.1 upgrade
rocm
#5323
opened Jun 6, 2024 by
hongxiayang
Loading…
[Core][Distributed] use device group for all broadcast
#5320
opened Jun 6, 2024 by
youkaichao
Loading…
[Feature][Frontend]: Continued
stream_options
implementation also in CompletionRequest
#5319
opened Jun 6, 2024 by
Etelis
Loading…
[Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs
#5312
opened Jun 6, 2024 by
maor-ps
Loading…
[Bugfix]if the content is started with ":"(response of ping), client should i…
#5303
opened Jun 6, 2024 by
sywangyi
Loading…
[Core][Distributed] add coordinator to reduce code duplication in tp and pp
#5293
opened Jun 5, 2024 by
youkaichao
Loading…
[WIP][Hardware] Initial TPU integration
tpu
Related to Google TPUs
#5292
opened Jun 5, 2024 by
WoosukKwon
•
Draft
2 tasks
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.