feat: lazy DFA cache (R1+R2) over OPTIMIZED_NFA for large anchor-free patterns#67
Conversation
This comment has been minimized.
This comment has been minimized.
Benchmark results — 3-fork baseline runEnvironment: JDK 21.0.10 (Zulu), 3 forks × 10 measurement iterations,
hitPath is 26% slower than JDK — root-cause candidatesThe warm path should theoretically win on repeated identical input (one array read per character vs JDK's NFA simulation), but it doesn't. Likely overhead sources:
frozenPath and jdkMissBaseline comparisons are misleadingBoth are dominated by early-exit on random non-matching inputs. The frozen-path benchmark needs to use Follow-up work (not blocking this PR)
🤖 Generated with Claude Code |
There was a problem hiding this comment.
Pull request overview
Adds a new LAZY_DFA strategy that lazily caches DFA-style transitions over NFA execution for large, group-free, anchor-free patterns to improve warm-path matching performance.
Changes:
- Adds
LazyDFACache,NfaStep, andStateSetKeyruntime support. - Adds
LazyDFABytecodeGeneratorand routes qualifying patterns viaPatternAnalyzer/RuntimeCompiler. - Adds unit tests, a JMH benchmark, and a design spec for the lazy DFA cache.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/LazyDFACache.java |
Implements lazy DFA state interning, ASCII transition tables, freeze/fallback behavior. |
reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/NfaStep.java |
Defines the generated NFA step interface used by the cache. |
reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/StateSetKey.java |
Adds content-based keys for cached NFA state sets. |
reggie-runtime/src/main/java/com/datadoghq/reggie/runtime/RuntimeCompiler.java |
Wires LAZY_DFA generation into runtime bytecode compilation. |
reggie-codegen/src/main/java/com/datadoghq/reggie/codegen/codegen/LazyDFABytecodeGenerator.java |
Emits lazy-DFA-specific bytecode, NFA tables, and bounded/full match methods. |
reggie-codegen/src/main/java/com/datadoghq/reggie/codegen/analysis/PatternAnalyzer.java |
Adds LAZY_DFA strategy selection and capturing-group detection. |
reggie-runtime/src/test/java/com/datadoghq/reggie/runtime/LazyDFACacheTest.java |
Tests cache behavior, freeze/fallback, non-ASCII, and concurrency basics. |
reggie-runtime/src/test/java/com/datadoghq/reggie/runtime/StateSetKeyTest.java |
Tests state-set key equality and hash behavior. |
reggie-runtime/src/test/java/com/datadoghq/reggie/runtime/LazyDFABytecodeGeneratorTest.java |
Tests generated lazy-DFA matcher behavior and cache sharing. |
reggie-codegen/src/test/java/com/datadoghq/reggie/codegen/analysis/PatternAnalyzerLazyDFATest.java |
Tests analyzer routing into and away from LAZY_DFA. |
reggie-benchmark/src/main/java/com/datadoghq/reggie/benchmark/LazyDFABenchmark.java |
Adds JMH benchmarks for hit, miss, and frozen paths. |
docs/superpowers/specs/2026-05-28-lazy-dfa-design.md |
Documents the lazy DFA design, cache policy, and test plan. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Follow-up:
|
| Benchmark | Before (ops/ms) | After (ops/ms) | Δ |
|---|---|---|---|
hitPath (LAZY_DFA warm) |
987 ±78 | 2243 ±174 | +127% |
jdkHitBaseline |
1335 ±17 | 1334 ±19 | flat |
LAZY_DFA warm path now beats JDK by +68%.
Commit: 0bb1b4d
🤖 Generated with Claude Code
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5e2b8e2a1c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
73c81d2 to
17d6e13
Compare
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…iler Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…bound interning map, add missing plan doc
…enPath benchmark input
…imization
After rebasing onto jb/logs-backend, patterns like (?:[a-z][0-9]){200} that
were previously routed to LAZY_DFA now route to DFA_TABLE (the new table-driven
DFA backend fits these in under the 1 MB budget). LAZY_DFA is now triggered only
for patterns where DFA construction hits the 10 000-state explosion limit.
Switch test/benchmark patterns to (?:a+b+|b+a+){75} which genuinely causes
DFA state explosion while keeping the NFA small enough to avoid method-too-large
in the NFA delegate methods. Also removes the early-out shortcut in PatternAnalyzer
(it incorrectly blocked DFA_TABLE routing for patterns that fit in the table).
c86528d to
e9ece84
Compare
Rebased onto main (post-logs-backend merge)Rebased 15 commits cleanly onto Updated benchmark results (post-rebase,
|
| Benchmark | ops/ms | ±error | Notes |
|---|---|---|---|
hitPath (LAZY_DFA warm) |
2212 | ±264 | +90% vs JDK |
hardMissPath (LAZY_DFA, late-failing all-[ab] input) |
2148 | ±182 | +182% vs JDK |
jdkHitBaseline |
1163 | ±151 | — |
jdkHardMissBaseline |
762 | ±59 | — |
frozenPath (NFA fallback after cache freeze) |
1979 | ±224 | full traversal |
hardMissPath vs jdkHardMissBaseline is the fair miss comparison: all-[ab] inputs that fail after 60–74 complete groups, forcing real NFA/DFA traversal on both engines. The 2.8× gap closes to ~2× on hit, confirming LAZY_DFA earns its keep on non-trivial inputs.
Note on deterministic patterns: (?:[a-z][0-9]){200} and similar deterministic patterns now route to DFA_TABLE (the table-driven DFA backend added in this merge) rather than LAZY_DFA. LAZY_DFA specifically targets patterns where DFA construction hits the 10 000-state explosion limit.
🤖 Generated with Claude Code
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: db6a8974e0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…rage, spec doc - LazyDFACache: add INT_ARRAY_VH for int[]; use setRelease/getAcquire on existing-table writes (item 3325667007) - LazyDFABytecodeGenerator: replace IALOAD with VarHandle getAcquire in inlined hot loop (item 3325667007) - LazyDFABenchmark FrozenState: change warm-up alphabet to "ab" so cache actually fills (item 3325673306) - LazyDFABytecodeGeneratorTest: add match/matchBounded/findMatchFrom coverage (item 3325673394) - ReggieMatcherBytecodeGeneratorTest: add LAZY_DFA processor end-to-end test (item 3325673350) - docs: fix "c & 0x7F" → c < 128 guard description in lazy-dfa-design.md (item 3325673423) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3e514183ce
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ecde02a95c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…p test; update spec
…l parity, package guard
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
What does this PR do?
Adds a lazily-materialized DFA cache (
LazyDFACache) over NFA execution for patterns with ≥300 NFA states, no anchors, and no capturing groups. On the warm path, matching cost drops from O(NFA-states) per character to a singleint[128]array read.Motivation
OPTIMIZED_NFApatterns recomputeclosure(stateSet, c)on every character of everymatches()call. For large NFA patterns this is expensive. Recommendations R1 (lazy DFA state interning) and R2 (per-state ASCII transition table) fromdoc/plans/glob-perf-nfa-improvements.mdaddress this. The glob_perf benchmark shows lazydfa is the consistent winner over plain NFA simulation once the cache is warm.Related Issue(s)
Implements R1 + R2 from
doc/plans/glob-perf-nfa-improvements.md.Change Type
Checklist
./gradlew build)Performance Impact
New
LazyDFABenchmarkwith three variants measures the full performance envelope:hitPath— warm cache, all DFA transitions cached → singleint[128]read per charmissPath— cold cache, fresh diverse inputs →nfaStep+ interning overheadfrozenPath— cache at 4096-state cap, all new transitions fall back to NFABaseline comparison: same patterns via
NFAFallbackBenchmark.Additional Notes
Design decisions:
NfaStepdirectly (publicapplybridge → package-privatenfaStep) instead ofLambdaMetafactoryINVOKEDYNAMIC — hidden classes defined viadefineHiddenClasscannot name themselves in lambda bootstrap descriptors.PatternAnalyzershort-circuits DFA construction entirely for LAZY_DFA-eligible patterns (savesSubsetConstructorcost).ConcurrentHashMap.computeIfAbsentfor state interning;VarHandle.storeStoreFence()before publishingint[128]ASCII table references to prevent JIT reordering on weakly-ordered architectures (stale null reads safely fall back tolookupOrCompute).🤖 Generated with Claude Code