Fracture

Internal state snapshot fuzzing for LLM inference engines.

Fracture finds deep bugs in LLM serving systems by snapshotting their internal state, mutating it, and checking what breaks. Unlike API-level fuzzers that send requests from the outside, Fracture operates inside the engine — directly manipulating KV cache block tables, prefix cache hash maps, scheduler queues, and memory allocators to reach states that would take millions of API calls to hit naturally.

Why Fracture?

LLM inference engines like vLLM and SGLang are complex stateful systems. They manage paged KV caches, prefix sharing trees, dynamic batching, and request lifecycle state machines — all under concurrent pressure. Real bugs in these systems include:

KV cache corruption from stale prefix reuse across scheduler steps (vLLM #38715)
Block table aliasing when cache groups with different layouts share raw tensors (vLLM #41657)
TOCTOU races where cache hits are evicted between scheduler phases (vLLM #42289)
Radix cache boundary bugs producing nondeterministic output (SGLang #22819)
Empty batch crashes from KV-pool retraction creating zero-length CUDA kernel launches (SGLang #24252)

These bugs are hard to find because they emerge from specific internal state configurations that are difficult to reach through normal request traffic. Fracture makes them easy to find.

Quickstart

import fracture
from fracture.targets.vllm import VLLMTarget

# One line — good defaults, finds bugs immediately
findings = fracture.fuzz(VLLMTarget(model="Qwen/Qwen2.5-0.5B-Instruct"))

Need more control? Compose mutators and oracles:

from fracture.mutators import BlockSwapMutator, PrefixCorruptMutator, ComposeMutator
from fracture.oracles import CrashOracle, InvariantOracle, CompositeOracle

target = VLLMTarget(model="Qwen/Qwen2.5-0.5B-Instruct")

findings = fracture.fuzz(
    target,
    mutator=ComposeMutator([
        BlockSwapMutator(),
        PrefixCorruptMutator(corruption_rate=0.1),
    ]),
    oracle=CompositeOracle([
        CrashOracle(),
        InvariantOracle(),
    ]),
    iterations=1000,
)

for finding in findings:
    print(finding)

Want to write your own mutator? Implement the protocol:

from fracture.interfaces import Mutator
from fracture.types import Snapshot

class MyMutator:
    name = "my_custom_mutator"

    def mutate(self, snapshot: Snapshot, rng: Random) -> Snapshot:
        # Your mutation logic here
        ...

Installation

pip install fracture

# With engine-specific support
pip install fracture[vllm]
pip install fracture[sglang]

Architecture

Fracture is engine-agnostic by design. The core library defines three protocols:

Protocol	Purpose	Contract
Target	Talk to an inference engine	`snapshot()`, `restore()`, `step()`, `check_invariants()`
Mutator	Perturb a snapshot	`mutate(snapshot, rng) -> Snapshot`
Oracle	Detect bugs	`check(before, after) -> Finding \| None`

All engine-specific logic lives in target adapters. Adding support for a new engine means writing one module that implements the Target protocol. All existing mutators, oracles, and the fuzzing loop work immediately.

fracture/
  core (engine-agnostic)     targets (engine-specific)
  ├── types.py               ├── vllm.py
  ├── interfaces.py          ├── sglang.py (future)
  ├── loop.py                ├── flashinfer.py (future)
  ├── mutators/              └── your_engine.py
  └── oracles/

How It Differs from GRIEF

GRIEF is an API-level greybox fuzzer — it sends request traces to a live server and observes external behavior. Fracture operates at a different layer:

	GRIEF	Fracture
Approach	External API request traces	Internal state snapshot + mutation
State access	Observes latency, outputs, KV events	Directly reads/writes block tables, scheduler queues, prefix caches
Reachability	Limited to states reachable via API	Can construct arbitrary internal states
Reproduction	Replay timed request traces	Restore exact snapshot + deterministic step
Speed	Requires live server per iteration	Snapshot/restore without full server restart

The approaches are complementary — GRIEF finds bugs from realistic workload patterns, Fracture finds bugs from deep internal state exploration.

References

GRIEF: Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing (arXiv 2026)
Moneta: NDSS 2025 — GPU driver fuzzing
KRR: OSDI 2025 — Kernel record/replay
FlashInfer: MLSys 2025 Best Paper — Attention kernel library
vLLM Architecture — Inside vLLM blog post

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
experiments		experiments
paper		paper
src/fracture		src/fracture
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fracture

Why Fracture?

Quickstart

Installation

Architecture

How It Differs from GRIEF

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fracture

Why Fracture?

Quickstart

Installation

Architecture

How It Differs from GRIEF

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages