fix: unbounded regex quantifiers prevent Outlines DFA state explosion#203
Merged
Conversation
The outlines v1.2 API requires: 1. Wrapping the HF model+tokenizer in outlines.Transformers 2. Calling get_regex_logits_processor(None, wrapped, regex) Prior code tried to construct OutlinesLogitsProcessor directly with a tokenizer= kwarg that doesn't exist in v1.2. The error was caught and silently fell back to unconstrained generation. Tests now verify the ACTUAL API surface (import paths + factory function signature) instead of just checking class names exist. This would have caught all three prior Outlines bugs: - PR #197: wrong class name (RegexLogitsProcessor) - PR #201: wrong constructor (tokenizer= kwarg) - This PR: wrong API pattern (direct constructor vs factory) 33/33 tests pass with outlines 1.2.12 installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bounded quantifiers like {1,500} create counting DFA states that
cross-product with every alternative in the regex. The Thought
prefix alone created 1,500 states, exceeding Outlines' 2^31 limit.
Changes:
- [^\n]{1,500} → [^\n]+ (Thought prefix: 1 state vs 1,500)
- [^"]{0,200} → [^"]* (TYPE text: 1 state vs 200)
- \d{1,3} → \d+ (coordinates: 1 state vs 3)
max_new_tokens=512 provides the actual length limit. The DFA
doesn't need to count characters.
New test: test_no_bounded_quantifiers_in_regex asserts no quantifier
in the regex exceeds {N,10}, preventing future regressions.
34/34 tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bounded quantifiers (
{1,500},{0,200}) in the constrained decoding regex created counting DFA states that exceeded Outlines' 2^31 state limit. Replaced with unbounded+/*(single-state self-loops).max_new_tokensprovides the actual length limit.Error before:
Failed to build DFA number of DFA states exceeds limit of 2147483647Changes
[^\n]{1,500}[^\n]+[^"]{0,200}[^"]*\d{1,3}\d+New regression test:
test_no_bounded_quantifiers_in_regexasserts no quantifier exceeds{N,10}.Test plan
🤖 Generated with Claude Code