Skip to content

fix(regex): keep escaped hyphen \- literal inside a character class (#4425)#4429

Merged
proggeramlug merged 1 commit into
mainfrom
worktree-fix-regex-hyphen-4425
Jun 4, 2026
Merged

fix(regex): keep escaped hyphen \- literal inside a character class (#4425)#4429
proggeramlug merged 1 commit into
mainfrom
worktree-fix-regex-hyphen-4425

Conversation

@proggeramlug
Copy link
Copy Markdown
Contributor

Summary

Fixes #4425. An escaped hyphen \- inside a JS character class is always a literal hyphen, but Perry's regex translator emitted a bare - for it via push_escaped_literal. The Rust regex crate then read a - flanked by members as a range operator, so [a\- ] translated to the invalid range [a- ] (a 0x61 → space 0x20) and new RegExp threw invalid pattern.

[\-] (sole member) worked because there was nothing to form a range with; [a-z] worked because it's a valid range. Only \- flanked by other members hit the bug.

Fix

In js_regex_to_rust (crates/perry-runtime/src/regex/grammar.rs), when inside a character class, preserve the escape for \- so it stays a literal hyphen regardless of position. Outside a class a hyphen carries no range meaning, so it's still emitted bare.

Verification

The issue's repro now reports OK for every pattern, including marked's GFM table-delimiter regex:

OK   /[\-]/
OK   /[a\- ]/
OK   /[a-z]/
OK   /[:]/
OK   /[ ]/
OK   /[a-z ]/
OK   /[:\- ]/
OK   / {0,3}\|?(?:[:\- ]*\|)+[\:\- ]*\n/

Behavioral check confirms literal-hyphen semantics — /[a\- ]/ matches -, a, and space, but not z.

Added a runtime regression test (escaped_hyphen_in_class_stays_literal) covering both the translation output and that the previously-crashing patterns construct. All 9 regex:: tests pass; cargo fmt --check clean.

This unblocks a native build of marked (#4362 fixed codegen+link; this was the last blocker — the binary now boots instead of throwing on module-init regex).

…4425)

An escaped hyphen `\-` inside a JS character class is always a literal
hyphen, but `push_escaped_literal` emitted a bare `-`. The Rust `regex`
crate then read a `-` flanked by members as a range operator, so
`[a\- ]` translated to the invalid range `[a- ]` and construction failed
with "invalid pattern". When inside a class, preserve the escape so the
hyphen stays a literal regardless of position.

This unblocks a native build of `marked`, whose GFM table-delimiter
regex ` {0,3}\|?(?:[:\- ]*\|)+[\:\- ]*\n` was built at module-init and
crashed the binary on boot.
@proggeramlug proggeramlug merged commit 3e098f4 into main Jun 4, 2026
13 checks passed
@proggeramlug proggeramlug deleted the worktree-fix-regex-hyphen-4425 branch June 4, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

regex: escaped hyphen \- as a literal in a character class is rejected ("invalid pattern") when flanked by other members

1 participant