Skip to content

fix: unbounded regex quantifiers prevent Outlines DFA state explosion#203

Merged
abrichr merged 2 commits into
mainfrom
fix/dfa-state-explosion
Mar 28, 2026
Merged

fix: unbounded regex quantifiers prevent Outlines DFA state explosion#203
abrichr merged 2 commits into
mainfrom
fix/dfa-state-explosion

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 28, 2026

Summary

Bounded quantifiers ({1,500}, {0,200}) in the constrained decoding regex created counting DFA states that exceeded Outlines' 2^31 state limit. Replaced with unbounded +/* (single-state self-loops). max_new_tokens provides the actual length limit.

Error before: Failed to build DFA number of DFA states exceeds limit of 2147483647

Changes

Quantifier Before After DFA states
Thought prefix [^\n]{1,500} [^\n]+ 1,500 → 1
TYPE text [^"]{0,200} [^"]* 200 → 1
Coordinates \d{1,3} \d+ 3 → 1

New regression test: test_no_bounded_quantifiers_in_regex asserts no quantifier exceeds {N,10}.

Test plan

  • 34/34 tests pass
  • Bounded quantifier regression test catches future DFA issues

🤖 Generated with Claude Code

abrichr and others added 2 commits March 28, 2026 15:18
The outlines v1.2 API requires:
1. Wrapping the HF model+tokenizer in outlines.Transformers
2. Calling get_regex_logits_processor(None, wrapped, regex)

Prior code tried to construct OutlinesLogitsProcessor directly with
a tokenizer= kwarg that doesn't exist in v1.2. The error was caught
and silently fell back to unconstrained generation.

Tests now verify the ACTUAL API surface (import paths + factory
function signature) instead of just checking class names exist.
This would have caught all three prior Outlines bugs:
- PR #197: wrong class name (RegexLogitsProcessor)
- PR #201: wrong constructor (tokenizer= kwarg)
- This PR: wrong API pattern (direct constructor vs factory)

33/33 tests pass with outlines 1.2.12 installed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bounded quantifiers like {1,500} create counting DFA states that
cross-product with every alternative in the regex. The Thought
prefix alone created 1,500 states, exceeding Outlines' 2^31 limit.

Changes:
- [^\n]{1,500} → [^\n]+  (Thought prefix: 1 state vs 1,500)
- [^"]{0,200} → [^"]*    (TYPE text: 1 state vs 200)
- \d{1,3} → \d+          (coordinates: 1 state vs 3)

max_new_tokens=512 provides the actual length limit. The DFA
doesn't need to count characters.

New test: test_no_bounded_quantifiers_in_regex asserts no quantifier
in the regex exceeds {N,10}, preventing future regressions.

34/34 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant