Internal state snapshot fuzzing for LLM inference engines.
Fracture finds deep bugs in LLM serving systems by snapshotting their internal state, mutating it, and checking what breaks. Unlike API-level fuzzers that send requests from the outside, Fracture operates inside the engine — directly manipulating KV cache block tables, prefix cache hash maps, scheduler queues, and memory allocators to reach states that would take millions of API calls to hit naturally.
LLM inference engines like vLLM and SGLang are complex stateful systems. They manage paged KV caches, prefix sharing trees, dynamic batching, and request lifecycle state machines — all under concurrent pressure. Real bugs in these systems include:
- KV cache corruption from stale prefix reuse across scheduler steps (vLLM #38715)
- Block table aliasing when cache groups with different layouts share raw tensors (vLLM #41657)
- TOCTOU races where cache hits are evicted between scheduler phases (vLLM #42289)
- Radix cache boundary bugs producing nondeterministic output (SGLang #22819)
- Empty batch crashes from KV-pool retraction creating zero-length CUDA kernel launches (SGLang #24252)
These bugs are hard to find because they emerge from specific internal state configurations that are difficult to reach through normal request traffic. Fracture makes them easy to find.
import fracture
from fracture.targets.vllm import VLLMTarget
# One line — good defaults, finds bugs immediately
findings = fracture.fuzz(VLLMTarget(model="Qwen/Qwen2.5-0.5B-Instruct"))Need more control? Compose mutators and oracles:
from fracture.mutators import BlockSwapMutator, PrefixCorruptMutator, ComposeMutator
from fracture.oracles import CrashOracle, InvariantOracle, CompositeOracle
target = VLLMTarget(model="Qwen/Qwen2.5-0.5B-Instruct")
findings = fracture.fuzz(
target,
mutator=ComposeMutator([
BlockSwapMutator(),
PrefixCorruptMutator(corruption_rate=0.1),
]),
oracle=CompositeOracle([
CrashOracle(),
InvariantOracle(),
]),
iterations=1000,
)
for finding in findings:
print(finding)Want to write your own mutator? Implement the protocol:
from fracture.interfaces import Mutator
from fracture.types import Snapshot
class MyMutator:
name = "my_custom_mutator"
def mutate(self, snapshot: Snapshot, rng: Random) -> Snapshot:
# Your mutation logic here
...pip install fracture
# With engine-specific support
pip install fracture[vllm]
pip install fracture[sglang]Fracture is engine-agnostic by design. The core library defines three protocols:
| Protocol | Purpose | Contract |
|---|---|---|
| Target | Talk to an inference engine | snapshot(), restore(), step(), check_invariants() |
| Mutator | Perturb a snapshot | mutate(snapshot, rng) -> Snapshot |
| Oracle | Detect bugs | check(before, after) -> Finding | None |
All engine-specific logic lives in target adapters. Adding support for a new engine means writing one module that implements the Target protocol. All existing mutators, oracles, and the fuzzing loop work immediately.
fracture/
core (engine-agnostic) targets (engine-specific)
├── types.py ├── vllm.py
├── interfaces.py ├── sglang.py (future)
├── loop.py ├── flashinfer.py (future)
├── mutators/ └── your_engine.py
└── oracles/
GRIEF is an API-level greybox fuzzer — it sends request traces to a live server and observes external behavior. Fracture operates at a different layer:
| GRIEF | Fracture | |
|---|---|---|
| Approach | External API request traces | Internal state snapshot + mutation |
| State access | Observes latency, outputs, KV events | Directly reads/writes block tables, scheduler queues, prefix caches |
| Reachability | Limited to states reachable via API | Can construct arbitrary internal states |
| Reproduction | Replay timed request traces | Restore exact snapshot + deterministic step |
| Speed | Requires live server per iteration | Snapshot/restore without full server restart |
The approaches are complementary — GRIEF finds bugs from realistic workload patterns, Fracture finds bugs from deep internal state exploration.
- GRIEF: Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing (arXiv 2026)
- Moneta: NDSS 2025 — GPU driver fuzzing
- KRR: OSDI 2025 — Kernel record/replay
- FlashInfer: MLSys 2025 Best Paper — Attention kernel library
- vLLM Architecture — Inside vLLM blog post
Apache-2.0