Listed in Q3 2026 roadmap. ~5ms overhead target. Plugin should:
- Hook into vLLM's
LLM.generate() postprocess
- Forward residual stream to loaded probe(s) at named layer
- Return score alongside text in response
Reference: vLLM's extension API + agent-probe-guard sklearn probe interface.
Listed in Q3 2026 roadmap. ~5ms overhead target. Plugin should:
LLM.generate()postprocessReference: vLLM's extension API + agent-probe-guard sklearn probe interface.