New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement RegExp v
flag with set notation + properties of strings
#10984
Implement RegExp v
flag with set notation + properties of strings
#10984
Conversation
var regexpUnicode = @tryGetById(regexp, "unicodeSets"); | ||
if (regexpUnicode !== @regExpProtoUnicodeGetter) | ||
return true; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add optimization for unicodeSets
. See JSGlobalObject.cpp etc. This is comparing with @regExpProtoUnicodeGetter
, which is unicode
getter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need to add optimization done for unicode
getter etc. Like, DFGAbstractInterpreterInlines.h etc. for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See DFGFixupPhase.cpp
and DFGAbstractInterpreterInlines.h
too. Let's grep with propertyNames->global
for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed these.
var regexpUnicode = @tryGetById(regexp, "unicodeSets"); | ||
if (regexpUnicode !== @regExpProtoUnicodeGetter) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
var regexpUnicode = @tryGetById(regexp, "unicodeSets"); | ||
if (regexpUnicode !== @regExpProtoUnicodeGetter) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. This is unicode
getter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be sure to update the test262 results in expectations.yaml; in addition to the 7(x2) outstanding failures introduced by #10355 (displayed here), this patch appears to introduce an additional 2(x2) failures:
test/built-ins/RegExp/prototype/unicodeSets/uv-flags-constructor.js:
default: 'Test262Error: Expected a SyntaxError to be thrown but no exception was thrown at all'
strict mode: 'Test262Error: Expected a SyntaxError to be thrown but no exception was thrown at all'
test/built-ins/RegExp/prototype/unicodeSets/uv-flags.js:
default: 'Test262: This statement should not be evaluated.'
strict mode: 'Test262: This statement should not be evaluated.'
64ed139
to
b632354
Compare
EWS run on previous version of this PR (hash b632354)
|
@rkirsling I'll update test262 results in a follow on PR. |
@@ -3555,6 +3555,9 @@ bool AbstractInterpreter<AbstractStateType>::executeEffects(unsigned clobberLimi | |||
if (attemptToFold(m_vm.propertyNames->unicode.impl(), globalObject->regExpProtoUnicodeGetter())) | |||
break; | |||
|
|||
if (attemptToFold(m_vm.propertyNames->unicode.impl(), globalObject->regExpProtoUnicodeSetsGetter())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not unicode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy paste error - Fixed.
b632354
to
0045ca6
Compare
EWS run on previous version of this PR (hash 0045ca6)
|
Fixed these failures in the latest version. |
@@ -1416,6 +1416,9 @@ capitalName ## Constructor* lowerName ## Constructor = featureFlag ? capitalName | |||
GetterSetter* regExpProtoUnicodeGetter = getGetterById(this, m_regExpPrototype.get(), vm.propertyNames->unicode); | |||
catchScope.assertNoException(); | |||
m_linkTimeConstants[static_cast<unsigned>(LinkTimeConstant::regExpProtoUnicodeGetter)].set(vm, this, regExpProtoUnicodeGetter); | |||
GetterSetter* regExpProtoUnicodeSetsGetter = getGetterById(this, m_regExpPrototype.get(), vm.propertyNames->unicodeSets); | |||
catchScope.assertNoException(); | |||
m_linkTimeConstants[static_cast<unsigned>(LinkTimeConstant::regExpProtoUnicodeSetsGetter)].set(vm, this, regExpProtoUnicodeSetsGetter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think RegExpPrototype watchpoint optimization is probably missing. Please grep all use of existing getters carefully and align the new one with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I grep'ed the other uses of the RegExp getters and only find uses in DFGAbstractInterpreterInlines.h and DFGFixupPhase. The unicodeSets getter is already added to those sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rs=me with watchpoint fix.
0045ca6
to
033aaa0
Compare
EWS run on current version of this PR (hash 033aaa0)
|
033aaa0
to
b309ad8
Compare
https://bugs.webkit.org/show_bug.cgi?id=241593 rdar://100337109 Reviewed by Yusuke Suzuki. This change implements the TC39 stage 3 proposal RegExp v flag with set notation + properties of strings, https://github.com/tc39/proposal-regexp-v-flag. It adds a new "unicodeSets" compile mode for the Yarr engine. Like the prior Unicode Yarr features, this change is driven by Unicode Database Files (UCD). This change includes two such new files, JavaScriptCore/ucd/{emoji-sequences.txt & emoji-zwj-sequences.txt}. The newly added properties include lists of strings. These strings are processed via the character class syntax through. When it comes to matching however, there is some desuguraing that turns such a property of strings into a list of alternations. For example, say a property has strings str1...strN plus a traditional character class, single-character-class, we create the pattern equivalent of: (?:str1|str2|...|strN|[single-character-class]) Per the spec, longer strings appear earlier in the alternation, and before the traditional character class. This allows for searching for longer properties in a property list where substrings of other strings are included in that list. There are new set of combining operators allowed in the class set character classes. Two character class elements that appear adjacent to each other implicitly have the Union combining operations. There is also an Intersection operation with the && operator and a Subtraction operation with the || operator. There is new ClassSet parsing that follows new "cleaner" rules that traditional character classes. The prior ccharacter class constructor and delegates are mostly unchanged, except for the compile mode now being switched on an enum instead of a bool. Added check that both 'u' and 'v' flags don't appear in the same RegExp. Added unicodeSets getter watchpoint to the m_regExpPrimordialPropertiesWatchpointSet. * JSTests/es6/Proxy_internal_get_calls_RegExp.prototype.flags.js: * JSTests/stress/regexp-vflag-property-of-strings.js: Added. (arrayToString): (objectToString): (dumpValue): (compareArray): (compareGroups): (testRegExp): (testRegExpSyntaxError): * JSTests/stress/static-getter-in-names.js: * JSTests/test262/config.yaml: * LayoutTests/js/Object-getOwnPropertyNames-expected.txt: * LayoutTests/js/script-tests/Object-getOwnPropertyNames.js: * Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj: * Source/JavaScriptCore/builtins/BuiltinNames.h: * Source/JavaScriptCore/builtins/RegExpPrototype.js: (linkTimeConstant.hasObservableSideEffectsForRegExpMatch): (linkTimeConstant.hasObservableSideEffectsForRegExpSplit): (overriddenName.string_appeared_here.split): * Source/JavaScriptCore/builtins/StringPrototype.js: (linkTimeConstant.hasObservableSideEffectsForStringReplace): * Source/JavaScriptCore/bytecode/LinkTimeConstant.h: * Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h: (JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects): * Source/JavaScriptCore/dfg/DFGFixupPhase.cpp: (JSC::DFG::FixupPhase::addStringReplacePrimordialChecks): * Source/JavaScriptCore/dfg/DFGOperations.cpp: (JSC::DFG::JSC_DEFINE_JIT_OPERATION): * Source/JavaScriptCore/dfg/DFGStrengthReductionPhase.cpp: (JSC::DFG::StrengthReductionPhase::handleNode): * Source/JavaScriptCore/runtime/CachedTypes.cpp: * Source/JavaScriptCore/runtime/CommonIdentifiers.h: * Source/JavaScriptCore/runtime/JSGlobalObject.cpp: (JSC::JSGlobalObject::init): * Source/JavaScriptCore/runtime/JSGlobalObject.h: * Source/JavaScriptCore/runtime/JSGlobalObjectInlines.h: (JSC::JSGlobalObject::regExpProtoUnicodeSetsGetter const): * Source/JavaScriptCore/runtime/RegExp.h: * Source/JavaScriptCore/runtime/RegExpCache.h: * Source/JavaScriptCore/runtime/RegExpObject.cpp: (JSC::RegExpObject::matchGlobal): * Source/JavaScriptCore/runtime/RegExpPrototype.cpp: (JSC::RegExpPrototype::finishCreation): (JSC::JSC_DEFINE_HOST_FUNCTION): * Source/JavaScriptCore/ucd/emoji-sequences.txt: Added. * Source/JavaScriptCore/ucd/emoji-zwj-sequences.txt: Added. * Source/JavaScriptCore/yarr/Yarr.h: * Source/JavaScriptCore/yarr/YarrErrorCode.cpp: (JSC::Yarr::errorMessage): (JSC::Yarr::errorToThrow): * Source/JavaScriptCore/yarr/YarrErrorCode.h: * Source/JavaScriptCore/yarr/YarrFlags.h: * Source/JavaScriptCore/yarr/YarrInterpreter.cpp: (JSC::Yarr::ByteTermDumper::ByteTermDumper): (JSC::Yarr::ByteTermDumper::unicode const): (JSC::Yarr::ByteTermDumper::unicodeSets const): (JSC::Yarr::ByteTermDumper::eitherUnicode const): (JSC::Yarr::Interpreter::tryConsumeBackReference): (JSC::Yarr::Interpreter::matchCharacterClass): (JSC::Yarr::Interpreter::backtrackCharacterClass): (JSC::Yarr::Interpreter::matchDisjunction): (JSC::Yarr::Interpreter::Interpreter): (JSC::Yarr::Interpreter::isLegacyCompilation const): (JSC::Yarr::Interpreter::isUnicodeCompilation const): (JSC::Yarr::Interpreter::isUnicodeSetsCompilation const): (JSC::Yarr::Interpreter::isEitherUnicodeCompilation const): (JSC::Yarr::ByteTermDumper::dumpTerm): (JSC::Yarr::ByteTermDumper::unicode): Deleted. * Source/JavaScriptCore/yarr/YarrInterpreter.h: (JSC::Yarr::BytecodePattern::BytecodePattern): (JSC::Yarr::BytecodePattern::compileMode const): (JSC::Yarr::BytecodePattern::unicodeSets const): (JSC::Yarr::BytecodePattern::eitherUnicode const): * Source/JavaScriptCore/yarr/YarrJIT.cpp: * Source/JavaScriptCore/yarr/YarrParser.h: (JSC::Yarr::Parser::CharacterClassParserDelegate::CharacterClassParserDelegate): (JSC::Yarr::Parser::ClassSetParserDelegate::ClassSetParserDelegate): (JSC::Yarr::Parser::ClassSetParserDelegate::begin): (JSC::Yarr::Parser::ClassSetParserDelegate::nestedClassBegin): (JSC::Yarr::Parser::ClassSetParserDelegate::doneAfterCharacterClassEnd): (JSC::Yarr::Parser::ClassSetParserDelegate::setUnionOp): (JSC::Yarr::Parser::ClassSetParserDelegate::setSubtractOp): (JSC::Yarr::Parser::ClassSetParserDelegate::setIntersectionOp): (JSC::Yarr::Parser::ClassSetParserDelegate::flushCachedCharacterIfNeeded): (JSC::Yarr::Parser::ClassSetParserDelegate::atomPatternCharacter): (JSC::Yarr::Parser::ClassSetParserDelegate::atomBuiltInCharacterClass): (JSC::Yarr::Parser::ClassSetParserDelegate::end): (JSC::Yarr::Parser::ClassSetParserDelegate::error): (JSC::Yarr::Parser::ClassSetParserDelegate::assertionWordBoundary): (JSC::Yarr::Parser::ClassSetParserDelegate::atomBackReference): (JSC::Yarr::Parser::ClassSetParserDelegate::atomNamedBackReference): (JSC::Yarr::Parser::ClassSetParserDelegate::atomNamedForwardReference): (JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::ClassStringDisjunctionParserDelegate): (JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::atomPatternCharacter): (JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::newAlternative): (JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::end): (JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::assertionWordBoundary): (JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::atomBackReference): (JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::atomNamedBackReference): (JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::atomNamedForwardReference): (JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::atomBuiltInCharacterClass): (JSC::Yarr::Parser::Parser): (JSC::Yarr::Parser::isIdentityEscapeAnError): (JSC::Yarr::Parser::parseEscape): (JSC::Yarr::Parser::consumePossibleSurrogatePair): (JSC::Yarr::Parser::parseAtomEscape): (JSC::Yarr::Parser::parseCharacterClassEscape): (JSC::Yarr::Parser::parseClassSetEscape): (JSC::Yarr::Parser::parseClassStringDisjunctionEscape): (JSC::Yarr::Parser::parseCharacterClass): (JSC::Yarr::Parser::parseClassSet): (JSC::Yarr::Parser::parseClassStringDisjunction): (JSC::Yarr::Parser::parseParenthesesEnd): (JSC::Yarr::Parser::parseTokens): (JSC::Yarr::Parser::handleIllegalReferences): (JSC::Yarr::Parser::tryConsumeUnicodeEscape): (JSC::Yarr::Parser::tryConsumeUnicodePropertyExpression): (JSC::Yarr::Parser::isLegacyCompilation const): (JSC::Yarr::Parser::isUnicodeCompilation const): (JSC::Yarr::Parser::isUnicodeSetsCompilation const): (JSC::Yarr::Parser::isEitherUnicodeCompilation const): (JSC::Yarr::compileMode): (JSC::Yarr::parse): * Source/JavaScriptCore/yarr/YarrPattern.cpp: (JSC::Yarr::CharacterClassConstructor::CharacterClassConstructor): (JSC::Yarr::CharacterClassConstructor::reset): (JSC::Yarr::CharacterClassConstructor::combiningSetOp): (JSC::Yarr::CharacterClassConstructor::append): (JSC::Yarr::CharacterClassConstructor::appendInverted): (JSC::Yarr::CharacterClassConstructor::putRange): (JSC::Yarr::CharacterClassConstructor::atomClassStringDisjunction): (JSC::Yarr::CharacterClassConstructor::performSetOpWith): (JSC::Yarr::CharacterClassConstructor::performSetOpWithStrings): (JSC::Yarr::CharacterClassConstructor::performSetOpWithMatches): (JSC::Yarr::CharacterClassConstructor::hasInverteStrings): (JSC::Yarr::CharacterClassConstructor::compareUTF32Strings): (JSC::Yarr::CharacterClassConstructor::sort): (JSC::Yarr::CharacterClassConstructor::charClass): (JSC::Yarr::CharacterClassConstructor::mergeRangesFrom): (JSC::Yarr::CharacterClassConstructor::unionStrings): (JSC::Yarr::CharacterClassConstructor::intersectionStrings): (JSC::Yarr::CharacterClassConstructor::subtractionStrings): (JSC::Yarr::CharacterClassConstructor::asciiOpSorted): (JSC::Yarr::CharacterClassConstructor::unicodeOpSorted): (JSC::Yarr::YarrPatternConstructor::YarrPatternConstructor): (JSC::Yarr::YarrPatternConstructor::resetForReparsing): (JSC::Yarr::YarrPatternConstructor::atomPatternCharacter): (JSC::Yarr::YarrPatternConstructor::atomBuiltInCharacterClass): (JSC::Yarr::YarrPatternConstructor::atomCharacterClassAtom): (JSC::Yarr::YarrPatternConstructor::atomCharacterClassRange): (JSC::Yarr::YarrPatternConstructor::atomCharacterClassBuiltIn): (JSC::Yarr::YarrPatternConstructor::atomClassStringDisjunction): (JSC::Yarr::YarrPatternConstructor::atomCharacterClassSetOp): (JSC::Yarr::YarrPatternConstructor::atomCharacterClassPushNested): (JSC::Yarr::YarrPatternConstructor::atomCharacterClassPopNested): (JSC::Yarr::YarrPatternConstructor::atomCharacterClassEnd): (JSC::Yarr::YarrPattern::compile): (JSC::Yarr::PatternTerm::dump): (JSC::Yarr::YarrPattern::dumpPatternString): (JSC::Yarr::YarrPattern::dumpPattern): * Source/JavaScriptCore/yarr/YarrPattern.h: (JSC::Yarr::CharacterClass::CharacterClass): (JSC::Yarr::CharacterClass::hasNonBMPCharacters const): (JSC::Yarr::CharacterClass::hasOneCharacterSize const): (JSC::Yarr::CharacterClass::hasOnlyNonBMPCharacters const): (JSC::Yarr::CharacterClass::hasStrings const): (JSC::Yarr::CharacterClass::hasSingleCharacters const): (JSC::Yarr::ClassSet::ClassSet): (JSC::Yarr::YarrPattern::unicodeSets const): (JSC::Yarr::YarrPattern::eitherUnicode const): (JSC::Yarr::YarrPattern::compileMode const): (JSC::Yarr::CharacterClass::hasNonBMPCharacters): Deleted. (JSC::Yarr::CharacterClass::hasOneCharacterSize): Deleted. (JSC::Yarr::CharacterClass::hasOnlyNonBMPCharacters): Deleted. * Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp: (JSC::Yarr::SyntaxChecker::atomClassStringDisjunction): (JSC::Yarr::SyntaxChecker::atomCharacterClassSetOp): (JSC::Yarr::SyntaxChecker::atomCharacterClassPushNested): (JSC::Yarr::SyntaxChecker::atomCharacterClassPopNested): (JSC::Yarr::checkSyntax): * Source/JavaScriptCore/yarr/YarrUnicodeProperties.cpp: (JSC::Yarr::unicodeMatchProperty): (JSC::Yarr::createUnicodeCharacterClassFor): (JSC::Yarr::characterClassMayContainStrings): * Source/JavaScriptCore/yarr/YarrUnicodeProperties.h: * Source/JavaScriptCore/yarr/generateYarrUnicodePropertyTables.py: (PropertyData.__init__): (PropertyData.addMatchString): (PropertyData.stringsCompare): (PropertyData): (PropertyData.sortStrings): (PropertyData.dumpMatchData): (PropertyData.convertStringToCppFormat): (PropertyData.dump): (PropertyData.dumpAll): (PropertyData.dumpMayContainStringFunc): (BinaryProperty.dump): (SequenceProperty): (SequenceProperty.__init__): (SequenceProperty.parsePropertyFile): (SequenceProperty.dump): * Source/WebCore/contentextensions/URLFilterParser.cpp: (WebCore::ContentExtensions::PatternParser::atomClassStringDisjunction): (WebCore::ContentExtensions::PatternParser::atomCharacterClassSetOp): (WebCore::ContentExtensions::PatternParser::atomCharacterClassPushNested): (WebCore::ContentExtensions::PatternParser::atomCharacterClassPopNested): (WebCore::ContentExtensions::URLFilterParser::addPattern): Canonical link: https://commits.webkit.org/261188@main
b309ad8
to
270c824
Compare
Committed 261188@main (270c824): https://commits.webkit.org/261188@main Reviewed commits have been landed. Closing PR #10984 and removing active labels. |
270c824
033aaa0
π§ͺ styleπ iosπ macπ wpeπ wincairoπ§ͺ bindingsπ ios-simπ mac-AS-debugπ§ͺ wpe-wk2π§ͺ ios-wk2π§ͺ api-macπ gtkπ§ͺ api-iosπ§ͺ mac-wk1π§ͺ gtk-wk2π π§ͺ jscπ tvπ§ͺ mac-wk2π§ͺ api-gtkπ tv-simπ§ͺ mac-AS-debug-wk2π watchπ§ͺ mac-wk2-stressπ§ͺ jsc-armv7-testsπ watch-simπ§ͺ jsc-mips-tests