Skip to content

[FEAT][executors]: optimize advanced vLLM sampler with shared prefix caching #11

@Flink-ddd

Description

@Flink-ddd

Refactor the rollout sampler to explicitly enable vLLM's Shared Prefix Caching mechanism. This is critical to eliminate KV cache redundancy during multi-candidate generation per prompt in GRPO.

Metadata

Metadata

Labels

component: executorsTasks involving the interaction of vLLM inference and DeepSpeed ​​training endpoints.enhancementNew feature or requestplatform: cudaSpecific optimizations or bugs in NVIDIA graphics cards (such as FlashInfer, TMA optimizations)platform: rocmSpecific tasks specific to AMD graphics cards (such as CK, bpreshuffle/FA)type: performancePerformance optimization tasks aimed at increasing throughput and reducing latency etc.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions