vllm-project / vllm Public

Notifications
Fork 2.6k
Star 19.6k

Code
Issues 830
Pull requests 229
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 41 Milestones 0

New pull request New

229 Open 1,691 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Update test_ignore_eos

#4898 opened May 18, 2024 by simon-mo

Loading…

[Core] Fix scheduler considering "no LoRA" as "LoRA"

#4897 opened May 18, 2024 by Yard1

Loading…

feat: Add batch API

#4896 opened May 18, 2024 by shehraj123

Loading…

[Core] Eliminate parallel worker per-step task scheduling overhead

#4894 opened May 18, 2024 by njhill

Loading…

[Misc] Load FP8 kv-cache scaling factors from checkpoints

#4893 opened May 17, 2024 by comaniac

Loading…

1 task

[Core] Sharded State Loader download from HF

#4889 opened May 17, 2024 by aurickq

Loading…

[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support)

#4888 opened May 17, 2024 by afeldman-nm • Draft

[Model] Add Phi-2 LoRA support

#4886 opened May 17, 2024 by Isotr0py • Draft

[Bugfix] Fix with verifying model max len

#4885 opened May 17, 2024 by dimaioksha

Loading…

[Build/CI] Extending AMD Tests

#4875 opened May 17, 2024 by Alexei-V-Ivanov-AMD

Loading…

[Draft][CI/Build] Optimize models tests

#4874 opened May 17, 2024 by DarkLight1337 • Draft

[CI/Build] Add health check

#4868 opened May 16, 2024 by pseudotensor

Loading…

Add control panel allow manage multi vllm instances

#4861 opened May 16, 2024 by leiwen83

Loading…

[Bugfix] Still download from huggingface while set VLLM_USE_MODELSCOPE = true

#4856 opened May 16, 2024 by liuzhenghua

Loading…

[Bugfix / Core] Prefix Caching Guards (merged with main)

#4846 opened May 16, 2024 by zhuohan123

Loading…

[Core] Avoid one broadcast op when propagating metadata

#4844 opened May 16, 2024 by njhill • Draft

Add a new kernel for fusing the dequantization in fused-moe gemm

#4841 opened May 15, 2024 by RezaYazdaniAminabadi

Loading…

[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support)

#4837 opened May 15, 2024 by afeldman-nm

Loading…

[Build/CI] Enabling AMD Entrypoints Test

rocm

#4834 opened May 15, 2024 by Alexei-V-Ivanov-AMD

Loading…

[Hardware][Intel] Add LoRA adapter support for CPU backend

x86 CPU

#4830 opened May 15, 2024 by Isotr0py

Loading…

[Bugfix][Model] Add base class for vision-language models

#4809 opened May 14, 2024 by DarkLight1337

Loading…

[Speculative decoding] Enable TP>1 speculative decoding

#4808 opened May 14, 2024 by cadedaniel

Loading…

[Doc] Add page for PoolingParams

#4800 opened May 14, 2024 by DarkLight1337

Loading…

[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model

#4799 opened May 14, 2024 by linxihui

Loading…

[CI/Build] PEP 517/518 improvements

#4791 opened May 13, 2024 by dtrifiro

Loading…

Previous 1 2 3 4 5 … 9 10 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly