Fix regex parser to reject quantifiers on anchors (e.g., ^*, $+, \b?)#120638
Fix regex parser to reject quantifiers on anchors (e.g., ^*, $+, \b?)#120638
^*, $+, \b?)#120638Conversation
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
^*, $+, \b?)
@copilot, does this mean there are mainstream engines that permit this? |
Yes, the Rust regex engine does permit quantified anchors. These test cases were ported from the Rust regex library test suite (as noted in the file header), and they demonstrate that Rust treats patterns like For example, However, most other mainstream engines (JavaScript, Python, Java, PCRE) reject these patterns as errors. My change aligns .NET with the majority of engines rather than with Rust's permissive behavior. |
|
Thanks. In that case, we should leave the behavior as-is. Having the quantifier on the anchor does not hurt anything, is logically valid (it's just applying the semantics of the quantifier to the anchor, which while not particularly useful, follows reasonable rules), and is supported by other engines. Since we already have tests exercising the behavior, we can just consider the issue closed. |
Fixes #18880
Summary
The .NET regex parser was incorrectly accepting patterns where anchors are quantified, such as
^*,$+,\b?, etc. These patterns are meaningless since anchors are zero-width assertions and quantifying them has no practical effect. Most major regex engines (JavaScript, Python, Java, PCRE) correctly reject these patterns with errors like "The preceding token is not quantifiable."Note: The Rust regex engine does permit quantified anchors and treats them as valid patterns with specific matching behavior. This change aligns .NET with the majority of regex engines rather than Rust's permissive approach.
Changes
Core Implementation
RegexNode.cs
IsAnchorproperty to identify anchor node kinds that cannot be quantified^,$,\A,\Z,\z,\b,\B,\G, and internalUpdateBumpalongRegexParser.cs
MakeQuantifierto check if the node is an anchorQuantifierAfterNothingerror with appropriate message when quantifiers are applied to anchors*,+,?,{n},{n,m}Test Updates
New Tests (RegexParserTests.cs)
*,+,?) and range quantifiers ({n},{n,m})Updated Existing Tests
RegexParserTests.netcoreapp.csto expect errors for\b*,\b+,\b?,\A{10}RegexRustTests.cs(patterns like(?m)^*,(?m)$+) that were valid in Rust but are now invalid in .NETRegex.Match.Tests.cs(lookbehind patterns with quantified anchors)Regex.KnownPattern.Tests.csto filter out real-world patterns containing quantified anchorsExample
Breaking Change
This is a breaking change for code that relies on the current behavior of accepting quantified anchors. However:
Fixes #29174
Original prompt
Fixes #29174
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.