An advanced regular expression engine for Rust, inspired by
mrab-regex (the Python regex
module).
eregex aims to bring the richer feature set of mrab-regex to Rust:
- Named groups, duplicate group names, repeated captures
- Greedy / lazy / possessive quantifiers
- Atomic groups
(?>...) - Variable-length lookbehind
- Nested character-class set operations
[a-z&&[^aeiou]](planned) - Inline, scoped flags
(?flags-flags:...) - Backreferences
\1,\g<name>,(?P=name) - Unicode properties
\p{L},\P{^N}(subset) - Fuzzy / approximate matching
(?:foo){e<=1}(planned) - Recursive patterns
(?R),(?(DEFINE)...)(planned) - Partial matches, POSIX matching, reverse search (planned)
This crate currently implements a strong foundation. See Feature status below for what is ready today and what is on the roadmap.
use eregex::{Regex, flags};
let re = Regex::new(r"(\w+)\s+(\w+)").unwrap();
let m = re.find("hello world").unwrap();
assert_eq!(m.group(1), Some("hello"));
assert_eq!(m.group(2), Some("world"));
let re = Regex::new_with_flags(r"(?i)hello", flags::IGNORECASE).unwrap();
assert!(re.is_match("HELLO, World"));
// Repeated captures (signature mrab-regex feature)
let re = Regex::new(r"(\w)+").unwrap();
let m = re.find("abc").unwrap();
assert_eq!(m.captures(1), vec![Some("a"), Some("b"), Some("c")]);
// Partial matching: is the input a prefix of some full match?
let re = Regex::new(r"token=([a-z]+)([0-9]+)").unwrap();
// "token=abc" is incomplete — more input could turn it into a full match.
let p = re.find_partial("xxx token=abc").unwrap();
assert!(p.is_partial());
assert_eq!(p.matched, "token=abc");
// Group 1 fully matched, group 2 is still empty/partial.
assert_eq!(p.group(1), Some("abc"));
assert_eq!(p.group(2), Some(""));
// A wrong character rules out any continuation -> no match at all.
assert!(re.find_partial("xxx token=abc!").is_none());- Literals,
., anchors^ $ \A \z \b \B - Predefined classes
\d \D \w \W \s \S(ASCII + Unicode viastd) - Character classes
[...]with ranges, negation, escapes - Alternation
a|b|c - Quantifiers
* + ? {m} {m,} {m,n}with greedy?-lazy and+-possessive - Capturing / non-capturing / named groups
(...) (?:...) (?P<n>...) (?<n>...) - Atomic groups
(?>...) - Backreferences
\1 \g<n> \g<name> (?P=name) - Lookahead / lookbehind
(?=...) (?!...) (?<=...) (?<!...)(variable length) - Partial (end-anchored) matching via
find_partial - Inline scoped flags
(?i) (?i:...) (?i-m:...) - Inline comments
(?#...)and free-spacing (VERBOSE) - Named & unicode properties
\p{...}(a curated subset) - Repeated captures (
captures,captures_iter) is_match,find,find_at,find_iter,find_partial,captures,captures_iterreplace,replace_allwith$1/${name}/$$templatessplit,split_iterescape
- Fuzzy / approximate matching
{e<=2} - Recursive patterns & subexpression calls
(?R) (?1) (?&name) (?(DEFINE)...) - Branch reset
(?|...|...) - Nested set operations
[a&&b] [a--b] [a||b] [a~~b] - Full Unicode case-folding (ß ↔ ss); currently simple casefolding
\K,(*PRUNE),(*SKIP),(*FAIL),\Gsemantics- POSIX (
leftmost-longest) and reverse ((?r)) matching modes - Concurrent/GIL-free operation, timeouts
\L<name>named lists
Regex— a compiled pattern. Compile once withRegex::new(orRegex::new_with_flags), then search many inputs.Match— a successful full match, with group lookup by index or name and full repeated- capture history.PartialMatch— the result ofRegex::find_partial, carrying aMatchStatusofFullorPartialand per-groupGroupMatchstate.Flagsand theflagsmodule — compile-time flags (IGNORECASE,MULTILINE,DOTALL, …) and their inline(?im)syntax.
All fallible operations return Result<T, Error>.
Error carries
an ErrorKind
(syntax error, bad escape, bad quantifier, unknown group, …) plus the byte
offset in the pattern where the problem was detected, when known.
use eregex::Regex;
let err = Regex::new(r"(").unwrap_err();
println!("{}", err); // e.g. "eregex error at position 1: unclosed group"The examples/ directory contains runnable programs:
demo.rs— a tour of the core API.gap_match.rs— gap-tolerant ("fuzzy") matching built onfind_at+find_partial, for inputs where the target is split by noise (a workaround while in-pattern fuzzy matching is on the roadmap).
Run them with cargo run --example demo / cargo run --example gap_match.
The Rust core is wrapped by three companion crates, each under crates/:
| package | technology | install |
|---|---|---|
@a5i/eregex (npm) |
napi-rs (native addon) |
npm i @a5i/eregex |
@a5i/eregex-wasm (npm) |
wasm-bindgen / wasm-pack |
npm i @a5i/eregex-wasm |
eregex (PyPI) |
pyo3 / maturin |
pip install eregex |
The Node and WASM packages expose the same JavaScript API (Regex,
Match, PartialMatch, the flag constants, parseFlags, …) and the same
null-on-absent semantics, so they are interchangeable: pick the native
build for raw speed, or the WASM build for a single portable binary that can
also be rebuilt for bundlers / browsers (wasm-pack build --target web).
A shared pre-commit hook (.githooks/pre-commit) runs the full test matrix
before each commit: cargo fmt --all --check, then cargo test --workspace
(core), plus the Python, Node, and WASM binding smoke suites.
The hook is gated on what you stage, so it stays fast:
- a change to the shared core (
src/,examples/,tests/, root*.rs,Cargo.toml,Cargo.lock) fans out to all four suites, since every binding wraps the core; - a change to a single binding crate (
crates/eregex-{node,python,wasm}/) runs only that binding's suite (pluscargo fmton any.rs); - docs-only changes skip everything.
Each binding suite is skipped gracefully if its toolchain isn't installed
(Python venv at crates/eregex-python/.venv, wasm-pack, node/npm), so a
contributor who only works on one layer isn't blocked.
Enable it once per clone:
git config core.hooksPath .githooksBypass it for a single commit with git commit --no-verify.
Setting up the optional binding toolchains (so the hook runs every suite):
# Node (already installed if you `npm ci` in crates/eregex-node)
cd crates/eregex-node && npm ci
# WASM (cargo binary)
cargo install wasm-pack
# Python (local venv used by the hook)
python -m venv crates/eregex-python/.venv
crates/eregex-python/.venv/Scripts/python -m pip install maturin # Windows
crates/eregex-python/.venv/bin/python -m pip install maturin # Unix- MSRV: 1.85 (uses the 2024 edition).
- License: Apache-2.0.
#![forbid(unsafe_code)]is enforced crate-wide.
Apache-2.0, matching the upstream mrab-regex project.