Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Core] Eliminate parallel worker per-step task scheduling overhead
#4894
opened May 18, 2024 by
njhill
Loading…
[Misc] Load FP8 kv-cache scaling factors from checkpoints
#4893
opened May 17, 2024 by
comaniac
Loading…
1 task
[Bugfix] Still download from huggingface while set VLLM_USE_MODELSCOPE = true
#4856
opened May 16, 2024 by
liuzhenghua
Loading…
[Bugfix / Core] Prefix Caching Guards (merged with main)
#4846
opened May 16, 2024 by
zhuohan123
Loading…
Add a new kernel for fusing the dequantization in fused-moe gemm
#4841
opened May 15, 2024 by
RezaYazdaniAminabadi
Loading…
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support)
#4837
opened May 15, 2024 by
afeldman-nm
Loading…
[Build/CI] Enabling AMD Entrypoints Test
rocm
#4834
opened May 15, 2024 by
Alexei-V-Ivanov-AMD
Loading…
[Hardware][Intel] Add LoRA adapter support for CPU backend
x86 CPU
#4830
opened May 15, 2024 by
Isotr0py
Loading…
[Bugfix][Model] Add base class for vision-language models
#4809
opened May 14, 2024 by
DarkLight1337
Loading…
[Speculative decoding] Enable TP>1 speculative decoding
#4808
opened May 14, 2024 by
cadedaniel
Loading…
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model
#4799
opened May 14, 2024 by
linxihui
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.