Summary
Patterns that mix deeply nested capturing groups, literal digit characters immediately following a backref, and multiple independent groups fail to extract the correct group values.
Failing PCRE Test
- Pattern:
(cat(a(ract|tonic)|erpillar)) \1()2(3)
- Input:
cataract cataract23
- Expected: group 1 =
cataract, group 2 = ata, group 3 = ract, group 4 = ``, group 5 = 3
- Actual: wrong values or match failure
Expected gain: +2 PCRE conformance tests (Category 9)
Root Cause
The backref \1 followed by the literal 2 is likely parsed or matched incorrectly (ambiguity between \12 and \1 + 2). Additionally, the empty group () and subsequent literal group (3) may not be tracked correctly in the current group-capture logic.
Implementation Notes
Summary
Patterns that mix deeply nested capturing groups, literal digit characters immediately following a backref, and multiple independent groups fail to extract the correct group values.
Failing PCRE Test
(cat(a(ract|tonic)|erpillar)) \1()2(3)cataract cataract23cataract, group 2 =ata, group 3 =ract, group 4 = ``, group 5 =3Expected gain: +2 PCRE conformance tests (Category 9)
Root Cause
The backref
\1followed by the literal2is likely parsed or matched incorrectly (ambiguity between\12and\1+2). Additionally, the empty group()and subsequent literal group(3)may not be tracked correctly in the current group-capture logic.Implementation Notes
doc/plans/pcre-conformance-roadmap.mdas Phase 2, item 2.6RegexParser.java(backref number parsing), group-capture bytecode