Skip to content

botirk38/fracture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fracture

Internal state snapshot fuzzing for LLM inference engines.

Fracture finds deep bugs in LLM serving systems by snapshotting their internal state, mutating it, and checking what breaks. Unlike API-level fuzzers that send requests from the outside, Fracture operates inside the engine — directly manipulating KV cache block tables, prefix cache hash maps, scheduler queues, and memory allocators to reach states that would take millions of API calls to hit naturally.

Why Fracture?

LLM inference engines like vLLM and SGLang are complex stateful systems. They manage paged KV caches, prefix sharing trees, dynamic batching, and request lifecycle state machines — all under concurrent pressure. Real bugs in these systems include:

  • KV cache corruption from stale prefix reuse across scheduler steps (vLLM #38715)
  • Block table aliasing when cache groups with different layouts share raw tensors (vLLM #41657)
  • TOCTOU races where cache hits are evicted between scheduler phases (vLLM #42289)
  • Radix cache boundary bugs producing nondeterministic output (SGLang #22819)
  • Empty batch crashes from KV-pool retraction creating zero-length CUDA kernel launches (SGLang #24252)

These bugs are hard to find because they emerge from specific internal state configurations that are difficult to reach through normal request traffic. Fracture makes them easy to find.

Quickstart

import fracture
from fracture.targets.vllm import VLLMTarget

# One line — good defaults, finds bugs immediately
findings = fracture.fuzz(VLLMTarget(model="Qwen/Qwen2.5-0.5B-Instruct"))

Need more control? Compose mutators and oracles:

from fracture.mutators import BlockSwapMutator, PrefixCorruptMutator, ComposeMutator
from fracture.oracles import CrashOracle, InvariantOracle, CompositeOracle

target = VLLMTarget(model="Qwen/Qwen2.5-0.5B-Instruct")

findings = fracture.fuzz(
    target,
    mutator=ComposeMutator([
        BlockSwapMutator(),
        PrefixCorruptMutator(corruption_rate=0.1),
    ]),
    oracle=CompositeOracle([
        CrashOracle(),
        InvariantOracle(),
    ]),
    iterations=1000,
)

for finding in findings:
    print(finding)

Want to write your own mutator? Implement the protocol:

from fracture.interfaces import Mutator
from fracture.types import Snapshot

class MyMutator:
    name = "my_custom_mutator"

    def mutate(self, snapshot: Snapshot, rng: Random) -> Snapshot:
        # Your mutation logic here
        ...

Installation

pip install fracture

# With engine-specific support
pip install fracture[vllm]
pip install fracture[sglang]

Architecture

Fracture is engine-agnostic by design. The core library defines three protocols:

Protocol Purpose Contract
Target Talk to an inference engine snapshot(), restore(), step(), check_invariants()
Mutator Perturb a snapshot mutate(snapshot, rng) -> Snapshot
Oracle Detect bugs check(before, after) -> Finding | None

All engine-specific logic lives in target adapters. Adding support for a new engine means writing one module that implements the Target protocol. All existing mutators, oracles, and the fuzzing loop work immediately.

fracture/
  core (engine-agnostic)     targets (engine-specific)
  ├── types.py               ├── vllm.py
  ├── interfaces.py          ├── sglang.py (future)
  ├── loop.py                ├── flashinfer.py (future)
  ├── mutators/              └── your_engine.py
  └── oracles/

How It Differs from GRIEF

GRIEF is an API-level greybox fuzzer — it sends request traces to a live server and observes external behavior. Fracture operates at a different layer:

GRIEF Fracture
Approach External API request traces Internal state snapshot + mutation
State access Observes latency, outputs, KV events Directly reads/writes block tables, scheduler queues, prefix caches
Reachability Limited to states reachable via API Can construct arbitrary internal states
Reproduction Replay timed request traces Restore exact snapshot + deterministic step
Speed Requires live server per iteration Snapshot/restore without full server restart

The approaches are complementary — GRIEF finds bugs from realistic workload patterns, Fracture finds bugs from deep internal state exploration.

References

License

Apache-2.0

About

Internal state snapshot fuzzing for LLM inference engines

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors