-
Notifications
You must be signed in to change notification settings - Fork 2
Closed
Description
Summary
The Longest() method is currently a no-op stub. For full stdlib regexp compatibility, it should switch matching semantics from leftmost-first to leftmost-longest.
Current behavior
re := coregex.MustCompile(`(#|#!)`)
re.Longest()
result := re.ReplaceAllString("#!a", "")
// Returns: "!a" (leftmost-first, matches "#")
// Expected: "a" (leftmost-longest, matches "#!")Expected behavior (stdlib compatible)
// Default: leftmost-first (Perl semantics)
re := regexp.MustCompile(`(#|#!)`)
re.ReplaceAllString("#!a", "") // "!a"
// After Longest(): leftmost-longest (POSIX semantics)
re.Longest()
re.ReplaceAllString("#!a", "") // "a"Research findings
| Engine | Default | Longest() support |
|---|---|---|
| Go stdlib | leftmost-first | ✅ Yes |
| Rust regex | leftmost-first | ❌ No |
| RE2 | leftmost-first | ❌ No |
| coregex | leftmost-first | ❌ No (stub) |
Note: Neither Rust regex nor RE2 implement leftmost-longest. However, for true stdlib drop-in compatibility, we should support it.
Implementation plan
- Add
longest boolflag toRegexstruct - Modify
Longest()to set the flag - Update PikeVM search to continue looking for longer matches when flag is set
- Propagate flag through meta engine coordination
- Benchmark to ensure no performance regression in default mode
Performance considerations
- Default mode: Expected ~0% overhead (single bool check)
- Longest mode: Expected 10-50% overhead (must check all alternations)
Acceptance criteria
-
Longest()actually switches to leftmost-longest semantics - Default mode performance unchanged (verified by benchmarks)
- All stdlib
Longest()behaviors matched - Documentation updated
Related
- Discovered via GoAWK integration: DRAFT: test the coregex library benhoyt/goawk#264
- AWK uses POSIX (leftmost-longest) semantics by default
Metadata
Metadata
Assignees
Labels
No labels