Skip to content
Permalink
Browse files
Add support in named capture group identifiers for direct surrogate p…
…airs

https://bugs.webkit.org/show_bug.cgi?id=178174

Reviewed by Darin Adler and Michael Saboff.

JSTests:

* test262/expectations.yaml: Mark 2 test cases as passing.

Source/JavaScriptCore:

This change:

a) Adds support for unescaped astral symbols in RegExp identifier names [1],
   aligning JSC with V8.

b) Rewords InvalidUnicodeEscape error code to be used for \uXXXX escapes in
   Unicode patterns and named groups/references instead of InvalidIdentityEscape,
   matching error messages in V8 and SpiderMonkey.

c) Adds hasError() checks after tryConsumeGroupName() so errors generated in
   tryConsumeIdentifierCharacter() would not get overriden.

d) Removes code duplication by using tryConsumeUnicodeEscape() for parsing \u
   in parseEscape(); cleans up parsing \u{} escapes a bit, preferring ASSERTs
   over hasError() checks.

[1]: https://tc39.es/ecma262/#prod-RegExpIdentifierName

* yarr/YarrErrorCode.cpp:
(JSC::Yarr::errorMessage):
(JSC::Yarr::errorToThrow):
* yarr/YarrErrorCode.h:
* yarr/YarrParser.h:
(JSC::Yarr::Parser::parseEscape):
(JSC::Yarr::Parser::parseParenthesesBegin):
(JSC::Yarr::Parser::tryConsumeUnicodeEscape):
(JSC::Yarr::Parser::tryConsumeIdentifierCharacter):

LayoutTests:

Adjusted tests for error messages changes and added coverage for messages
of syntax errors due to invalid \u escapes inside named groups/references.

* js/regexp-named-capture-groups-expected.txt:
* js/regexp-unicode-expected.txt:
* js/regress-158080-expected.txt:
* js/script-tests/regexp-named-capture-groups.js:
* js/script-tests/regexp-unicode.js:


Canonical link: https://commits.webkit.org/222707@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@259262 268f45cc-cd09-0410-ab3c-d52691b4dbfc
  • Loading branch information
shvaikalesh committed Mar 31, 2020
1 parent ae62e18 commit ca2f8b2ca0fc7a807458b92649b5eb6b583e29fa
Showing 12 changed files with 131 additions and 105 deletions.
@@ -1,3 +1,12 @@
2020-03-30 Alexey Shvayka <shvaikalesh@gmail.com>

Add support in named capture group identifiers for direct surrogate pairs
https://bugs.webkit.org/show_bug.cgi?id=178174

Reviewed by Darin Adler and Michael Saboff.

* test262/expectations.yaml: Mark 2 test cases as passing.

2020-03-30 Ross Kirsling <ross.kirsling@sony.com>

RegExp.prototype.exec must always access lastIndex
@@ -1246,9 +1246,6 @@ test/built-ins/Proxy/set/trap-is-undefined-target-is-proxy.js:
test/built-ins/Reflect/ownKeys/order-after-define-property.js:
default: 'Test262Error: Expected [Symbol(b), Symbol(a)] and [Symbol(a), Symbol(b)] to have the same contents. '
strict mode: 'Test262Error: Expected [Symbol(b), Symbol(a)] and [Symbol(a), Symbol(b)] to have the same contents. '
test/built-ins/RegExp/named-groups/unicode-property-names.js:
default: 'SyntaxError: Invalid regular expression: invalid group specifier name'
strict mode: 'SyntaxError: Invalid regular expression: invalid group specifier name'
test/built-ins/RegExp/property-escapes/generated/Alphabetic.js:
default: 'Test262Error: `\p{Alphabetic}` should match U+001CFA (`ᳺ`)'
strict mode: 'Test262Error: `\p{Alphabetic}` should match U+001CFA (`ᳺ`)'
@@ -1,3 +1,19 @@
2020-03-30 Alexey Shvayka <shvaikalesh@gmail.com>

Add support in named capture group identifiers for direct surrogate pairs
https://bugs.webkit.org/show_bug.cgi?id=178174

Reviewed by Darin Adler and Michael Saboff.

Adjusted tests for error messages changes and added coverage for messages
of syntax errors due to invalid \u escapes inside named groups/references.

* js/regexp-named-capture-groups-expected.txt:
* js/regexp-unicode-expected.txt:
* js/regress-158080-expected.txt:
* js/script-tests/regexp-named-capture-groups.js:
* js/script-tests/regexp-unicode.js:

2020-03-30 Devin Rousso <drousso@apple.com>

[CSS Selectors 4] Add support for `:is()` with the same logic for the existing `:matches()`
@@ -61,6 +61,10 @@ PASS let r = new RegExp("/(?<𐆐groupName1>abc)/u") threw exception SyntaxError
PASS let r = new RegExp("/(?<g𐆛oupName1>abc)/u") threw exception SyntaxError: Invalid regular expression: invalid group specifier name.
PASS let r = new RegExp("/(?<‌groupName1>abc)/u") threw exception SyntaxError: Invalid regular expression: invalid group specifier name.
PASS let r = new RegExp("/(?<‍groupName1>abc)/u") threw exception SyntaxError: Invalid regular expression: invalid group specifier name.
PASS /(?<\u>.)/u threw exception SyntaxError: Invalid regular expression: invalid Unicode \u escape.
PASS /\k<\uzzz>/u threw exception SyntaxError: Invalid regular expression: invalid Unicode \u escape.
PASS /(?<\u{>.)/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS /\k<\u{0>/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS "XzzXzz".match(/\k<z>X(?<z>z*)X\k<z>/) is ["XzzXzz", "zz"]
PASS "XzzXzz".match(/\k<z>X(?<z>z*)X\k<z>/u) is ["XzzXzz", "zz"]
PASS "1122332211".match(/\k<ones>\k<twos>\k<threes>(?<ones>1*)(?<twos>2*)(?<threes>3*)\k<threes>\k<twos>\k<ones>/) is ["1122332211", "11", "22", "3"]
@@ -178,7 +178,7 @@ PASS "Testing ሴ\n1 2 3".match(/g [က-𐃿]$\n1/um)[0] is "g ሴ\n1"
PASS "Testing 𐃰\n1 2 3".match(/g [က-𐃿]$\n1/um)[0] is "g 𐃰\n1"
PASS "this is ba test".match(/is b\cha test/u)[0].length is 11
PASS new RegExp("\\/", "u").source is "\\/"
PASS r = new RegExp("\\u{110000}", "u") threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS r = new RegExp("\\u{110000}", "u") threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS r = new RegExp("𐐅{2147483648}", "u") threw exception SyntaxError: Invalid regular expression: pattern exceeds string length limits.
PASS /{/u threw exception SyntaxError: Invalid regular expression: incomplete {} quantifier for Unicode pattern.
PASS /[a-\d]/u threw exception SyntaxError: Invalid regular expression: invalid range in character class for Unicode pattern.
@@ -190,10 +190,10 @@ PASS r = new RegExp("[\\a]", "u") threw exception SyntaxError: Invalid regular e
PASS r = new RegExp("[\\B]", "u") threw exception SyntaxError: Invalid regular expression: invalid escaped character for Unicode pattern.
PASS r = new RegExp("\\x", "u") threw exception SyntaxError: Invalid regular expression: invalid escaped character for Unicode pattern.
PASS r = new RegExp("[\\x]", "u") threw exception SyntaxError: Invalid regular expression: invalid escaped character for Unicode pattern.
PASS r = new RegExp("\\u", "u") threw exception SyntaxError: Invalid regular expression: invalid escaped character for Unicode pattern.
PASS r = new RegExp("[\\u]", "u") threw exception SyntaxError: Invalid regular expression: invalid escaped character for Unicode pattern.
PASS r = new RegExp("\\u{", "u") threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS r = new RegExp("\\u{\udead", "u") threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS r = new RegExp("\\u", "u") threw exception SyntaxError: Invalid regular expression: invalid Unicode \u escape.
PASS r = new RegExp("[\\u]", "u") threw exception SyntaxError: Invalid regular expression: invalid Unicode \u escape.
PASS r = new RegExp("\\u{", "u") threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS r = new RegExp("\\u{\udead", "u") threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS /\1/u threw exception SyntaxError: Invalid regular expression: invalid backreference for Unicode pattern.
PASS /\2/u threw exception SyntaxError: Invalid regular expression: invalid backreference for Unicode pattern.
PASS /\3/u threw exception SyntaxError: Invalid regular expression: invalid backreference for Unicode pattern.
@@ -3,17 +3,17 @@ Regresion test for 158080. This test should pass and not crash.
On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".


PASS let r = /\u{|abc/u threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS let r = /\u{/u threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS let r = /\u{1/u threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS let r = /\u{12/u threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS let r = /\u{123/u threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS let r = /\u{1234/u threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS let r = /\u{abcde/u threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS let r = /\u{abcdef/u threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS let r = /\u{1111111}/u threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS let r = /\u{fedbca98}/u threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS let r = /\u{1{123}}/u threw exception SyntaxError: Invalid regular expression: invalid Unicode {} escape.
PASS let r = /\u{|abc/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS let r = /\u{/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS let r = /\u{1/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS let r = /\u{12/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS let r = /\u{123/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS let r = /\u{1234/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS let r = /\u{abcde/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS let r = /\u{abcdef/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS let r = /\u{1111111}/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS let r = /\u{fedbca98}/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS let r = /\u{1{123}}/u threw exception SyntaxError: Invalid regular expression: invalid Unicode code point \u{} escape.
PASS successfullyParsed is true

TEST COMPLETE
@@ -104,6 +104,12 @@ shouldThrow('let r = new RegExp("/(?<g\u{1019b}oupName1>abc)/u")', '"SyntaxError
shouldThrow('let r = new RegExp("/(?<\u200cgroupName1>abc)/u")', '"SyntaxError: Invalid regular expression: invalid group specifier name"');
shouldThrow('let r = new RegExp("/(?<\u200dgroupName1>abc)/u")', '"SyntaxError: Invalid regular expression: invalid group specifier name"');

// Check that invalid \u escape errors are not get overriden.
shouldThrow('/(?<\\u>.)/u', '"SyntaxError: Invalid regular expression: invalid Unicode \\\\u escape"');
shouldThrow('/\\k<\\uzzz>/u', '"SyntaxError: Invalid regular expression: invalid Unicode \\\\u escape"');
shouldThrow('/(?<\\u{>.)/u', '"SyntaxError: Invalid regular expression: invalid Unicode code point \\\\u{} escape"');
shouldThrow('/\\k<\\u{0>/u', '"SyntaxError: Invalid regular expression: invalid Unicode code point \\\\u{} escape"');

// Check the named forward references work
shouldBe('"XzzXzz".match(/\\\k<z>X(?<z>z*)X\\\k<z>/)', '["XzzXzz", "zz"]');
shouldBe('"XzzXzz".match(/\\\k<z>X(?<z>z*)X\\\k<z>/u)', '["XzzXzz", "zz"]');
@@ -227,7 +227,7 @@ shouldBe('"this is b\ba test".match(/is b\\cha test/u)[0].length', '11');

// Check that invalid unicode patterns throw exceptions
shouldBe('new RegExp("\\\\/", "u").source', '"\\\\/"');
shouldThrow('r = new RegExp("\\\\u{110000}", "u")', '"SyntaxError: Invalid regular expression: invalid Unicode {} escape"');
shouldThrow('r = new RegExp("\\\\u{110000}", "u")', '"SyntaxError: Invalid regular expression: invalid Unicode code point \\\\u{} escape"');
shouldThrow('r = new RegExp("\u{10405}{2147483648}", "u")', '"SyntaxError: Invalid regular expression: pattern exceeds string length limits"');
shouldThrow('/{/u', '"SyntaxError: Invalid regular expression: incomplete {} quantifier for Unicode pattern"');
shouldThrow('/[a-\\d]/u', '"SyntaxError: Invalid regular expression: invalid range in character class for Unicode pattern"');
@@ -250,11 +250,11 @@ shouldThrowInvalidEscape("[\\\\a]");
shouldThrowInvalidEscape("[\\\\B]");
shouldThrowInvalidEscape("\\\\x");
shouldThrowInvalidEscape("[\\\\x]");
shouldThrowInvalidEscape("\\\\u");
shouldThrowInvalidEscape("[\\\\u]");
shouldThrowInvalidEscape("\\\\u", '"SyntaxError: Invalid regular expression: invalid Unicode \\\\u escape"');
shouldThrowInvalidEscape("[\\\\u]", '"SyntaxError: Invalid regular expression: invalid Unicode \\\\u escape"');

shouldThrowInvalidEscape("\\\\u{", '"SyntaxError: Invalid regular expression: invalid Unicode {} escape"');
shouldThrowInvalidEscape("\\\\u{\\udead", '"SyntaxError: Invalid regular expression: invalid Unicode {} escape"');
shouldThrowInvalidEscape("\\\\u{", '"SyntaxError: Invalid regular expression: invalid Unicode code point \\\\u{} escape"');
shouldThrowInvalidEscape("\\\\u{\\udead", '"SyntaxError: Invalid regular expression: invalid Unicode code point \\\\u{} escape"');

// Check that invalid backreferences in unicode patterns throw exceptions.
shouldThrow(`/\\1/u`);
@@ -1,3 +1,38 @@
2020-03-30 Alexey Shvayka <shvaikalesh@gmail.com>

Add support in named capture group identifiers for direct surrogate pairs
https://bugs.webkit.org/show_bug.cgi?id=178174

Reviewed by Darin Adler and Michael Saboff.

This change:

a) Adds support for unescaped astral symbols in RegExp identifier names [1],
aligning JSC with V8.

b) Rewords InvalidUnicodeEscape error code to be used for \uXXXX escapes in
Unicode patterns and named groups/references instead of InvalidIdentityEscape,
matching error messages in V8 and SpiderMonkey.

c) Adds hasError() checks after tryConsumeGroupName() so errors generated in
tryConsumeIdentifierCharacter() would not get overriden.

d) Removes code duplication by using tryConsumeUnicodeEscape() for parsing \u
in parseEscape(); cleans up parsing \u{} escapes a bit, preferring ASSERTs
over hasError() checks.

[1]: https://tc39.es/ecma262/#prod-RegExpIdentifierName

* yarr/YarrErrorCode.cpp:
(JSC::Yarr::errorMessage):
(JSC::Yarr::errorToThrow):
* yarr/YarrErrorCode.h:
* yarr/YarrParser.h:
(JSC::Yarr::Parser::parseEscape):
(JSC::Yarr::Parser::parseParenthesesBegin):
(JSC::Yarr::Parser::tryConsumeUnicodeEscape):
(JSC::Yarr::Parser::tryConsumeIdentifierCharacter):

2020-03-30 Ross Kirsling <ross.kirsling@sony.com>

RegExp.prototype.exec must always access lastIndex
@@ -51,7 +51,8 @@ const char* errorMessage(ErrorCode error)
REGEXP_ERROR_PREFIX "range out of order in character class", // CharacterClassRangeOutOfOrder
REGEXP_ERROR_PREFIX "invalid range in character class for Unicode pattern", // CharacterClassRangeInvalid
REGEXP_ERROR_PREFIX "\\ at end of pattern", // EscapeUnterminated
REGEXP_ERROR_PREFIX "invalid Unicode {} escape", // InvalidUnicodeEscape
REGEXP_ERROR_PREFIX "invalid Unicode \\u escape", // InvalidUnicodeEscape
REGEXP_ERROR_PREFIX "invalid Unicode code point \\u{} escape", // InvalidUnicodeCodePointEscape
REGEXP_ERROR_PREFIX "invalid backreference for Unicode pattern", // InvalidBackreference
REGEXP_ERROR_PREFIX "invalid \\k<> named backreference", // InvalidNamedBackReference
REGEXP_ERROR_PREFIX "invalid escaped character for Unicode pattern", // InvalidIdentityEscape
@@ -87,6 +88,7 @@ JSObject* errorToThrow(JSGlobalObject* globalObject, ErrorCode error)
case ErrorCode::CharacterClassRangeInvalid:
case ErrorCode::EscapeUnterminated:
case ErrorCode::InvalidUnicodeEscape:
case ErrorCode::InvalidUnicodeCodePointEscape:
case ErrorCode::InvalidBackreference:
case ErrorCode::InvalidNamedBackReference:
case ErrorCode::InvalidIdentityEscape:
@@ -51,6 +51,7 @@ enum class ErrorCode : uint8_t {
CharacterClassRangeInvalid,
EscapeUnterminated,
InvalidUnicodeEscape,
InvalidUnicodeCodePointEscape,
InvalidBackreference,
InvalidNamedBackReference,
InvalidIdentityEscape,

0 comments on commit ca2f8b2

Please sign in to comment.