Skip to content

Implement Longest() method for stdlib compatibility #26

@kolkov

Description

@kolkov

Summary

The Longest() method is currently a no-op stub. For full stdlib regexp compatibility, it should switch matching semantics from leftmost-first to leftmost-longest.

Current behavior

re := coregex.MustCompile(`(#|#!)`)
re.Longest()
result := re.ReplaceAllString("#!a", "")
// Returns: "!a" (leftmost-first, matches "#")
// Expected: "a" (leftmost-longest, matches "#!")

Expected behavior (stdlib compatible)

// Default: leftmost-first (Perl semantics)
re := regexp.MustCompile(`(#|#!)`)
re.ReplaceAllString("#!a", "") // "!a"

// After Longest(): leftmost-longest (POSIX semantics)
re.Longest()
re.ReplaceAllString("#!a", "") // "a"

Research findings

Engine Default Longest() support
Go stdlib leftmost-first ✅ Yes
Rust regex leftmost-first ❌ No
RE2 leftmost-first ❌ No
coregex leftmost-first ❌ No (stub)

Note: Neither Rust regex nor RE2 implement leftmost-longest. However, for true stdlib drop-in compatibility, we should support it.

Implementation plan

  1. Add longest bool flag to Regex struct
  2. Modify Longest() to set the flag
  3. Update PikeVM search to continue looking for longer matches when flag is set
  4. Propagate flag through meta engine coordination
  5. Benchmark to ensure no performance regression in default mode

Performance considerations

  • Default mode: Expected ~0% overhead (single bool check)
  • Longest mode: Expected 10-50% overhead (must check all alternations)

Acceptance criteria

  • Longest() actually switches to leftmost-longest semantics
  • Default mode performance unchanged (verified by benchmarks)
  • All stdlib Longest() behaviors matched
  • Documentation updated

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions