-
Notifications
You must be signed in to change notification settings - Fork 31
[FEAT][executors]: optimize advanced vLLM sampler with shared prefix caching #11
Copy link
Copy link
Closed
Labels
component: executorsTasks involving the interaction of vLLM inference and DeepSpeed training endpoints.Tasks involving the interaction of vLLM inference and DeepSpeed training endpoints.enhancementNew feature or requestNew feature or requestplatform: cudaSpecific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)Specific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)platform: rocmSpecific tasks specific to AMD graphics cards (such as CK, bpreshuffle/FA)Specific tasks specific to AMD graphics cards (such as CK, bpreshuffle/FA)type: performancePerformance optimization tasks aimed at increasing throughput and reducing latency etc.Performance optimization tasks aimed at increasing throughput and reducing latency etc.
Metadata
Metadata
Assignees
Labels
component: executorsTasks involving the interaction of vLLM inference and DeepSpeed training endpoints.Tasks involving the interaction of vLLM inference and DeepSpeed training endpoints.enhancementNew feature or requestNew feature or requestplatform: cudaSpecific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)Specific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)platform: rocmSpecific tasks specific to AMD graphics cards (such as CK, bpreshuffle/FA)Specific tasks specific to AMD graphics cards (such as CK, bpreshuffle/FA)type: performancePerformance optimization tasks aimed at increasing throughput and reducing latency etc.Performance optimization tasks aimed at increasing throughput and reducing latency etc.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Refactor the rollout sampler to explicitly enable vLLM's Shared Prefix Caching mechanism. This is critical to eliminate KV cache redundancy during multi-candidate generation per prompt in GRPO.