Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Add support for RegExp lookbehind assertions
https://bugs.webkit.org/show_bug.cgi?id=174931 rdar://33183185 This change implements RegExp lookbehind in the Yarr interpreter. This change introduces the notion of match direction, either forward or backward. The forward match direction is the way the current code works, matching disjunciton terms and the subject string in a right to left manner. Lookbehind assertions, as defined in the EcmaScript spec, process disjunctions terms right to left matching the correspondding subject string right to left as well. Except for the Yarr JIT, almost all of the Yarr code has been touched to account for this backward matching. An additional Byteterm has been added, HaveCheckedInput, which checks that there is at least as many characters available in the input stream, but it doesn't move the input stream position. This is basically a CheckInput, without moving the input position. For variable counted terms, we still need to check that we won't try to access characters beyond the first character of the subject string. For functions like readSurrogatePairChecked(), we check for input before calling the funcion. For new input functions with a try prefix like tryReadBackward, the function itselfs checks for available input. After these checks prove that it is safe to access an offset to the left of the current input position, the actual matching can be performed. The Yarr parser, parses regular expression in left to right order. It also computes character offest in forward order. When we Byteterm compile, we process backward matching disjunctions right to left. The parser also has special handling of forward references within a backward matching parenthetical group. All such forward references are saved for that parenthetical group and are processed at the end of the group. Every one of these forward reference are check to see if a capture to the right of the forward reference was found, if so the forward reference is converted to a back reference. As part of this work, the ByteTerm dumping code was significantly updated to allow for not only dumping of the ByteCode after it has been generated, but to dump ByteCode while it is being interpreted. This ByteTerm dumping while interpreting is enabled with the Interpreter::verbose compile time constant. Reviewed by Yusuke Suzuki. * JSTests/stress/regexp-lookbehind.js: New tests. (arrayToString): (dumpValue): (compareArray): (testRegExp): * JSTests/test262/config.yaml: * Source/JavaScriptCore/runtime/RegExp.cpp: (JSC::RegExp::compile): (JSC::RegExp::compileMatchOnly): * Source/JavaScriptCore/yarr/YarrInterpreter.cpp: (JSC::Yarr::ByteTermDumper::ByteTermDumper): (JSC::Yarr::ByteTermDumper::unicode): (JSC::Yarr::Interpreter::InputStream::readForCharacterDump): (JSC::Yarr::Interpreter::InputStream::tryReadBackward): (JSC::Yarr::Interpreter::InputStream::tryUncheckInput): (JSC::Yarr::Interpreter::InputStream::isValidNegativeInputOffset): (JSC::Yarr::Interpreter::InputStream::dump const): (JSC::Yarr::Interpreter::checkCharacter): (JSC::Yarr::Interpreter::checkSurrogatePair): (JSC::Yarr::Interpreter::checkCasedCharacter): (JSC::Yarr::Interpreter::checkCharacterClass): (JSC::Yarr::Interpreter::checkCharacterClassDontAdvanceInputForNonBMP): (JSC::Yarr::Interpreter::tryConsumeBackReference): (JSC::Yarr::Interpreter::matchAssertionWordBoundary): (JSC::Yarr::Interpreter::backtrackPatternCharacter): (JSC::Yarr::Interpreter::backtrackPatternCasedCharacter): (JSC::Yarr::Interpreter::matchCharacterClass): (JSC::Yarr::Interpreter::backtrackCharacterClass): (JSC::Yarr::Interpreter::matchBackReference): (JSC::Yarr::Interpreter::backtrackBackReference): (JSC::Yarr::Interpreter::recordParenthesesMatch): (JSC::Yarr::Interpreter::matchParenthesesOnceBegin): (JSC::Yarr::Interpreter::matchParenthesesOnceEnd): (JSC::Yarr::Interpreter::backtrackParenthesesOnceEnd): (JSC::Yarr::Interpreter::matchParentheticalAssertionBegin): (JSC::Yarr::Interpreter::backtrackParentheticalAssertionBegin): (JSC::Yarr::Interpreter::matchDisjunction): (JSC::Yarr::ByteCompiler::compile): (JSC::Yarr::ByteCompiler::haveCheckedInput): (JSC::Yarr::ByteCompiler::assertionWordBoundary): (JSC::Yarr::ByteCompiler::atomPatternCharacter): (JSC::Yarr::ByteCompiler::atomCharacterClass): (JSC::Yarr::ByteCompiler::atomBackReference): (JSC::Yarr::ByteCompiler::atomParenthesesOnceBegin): (JSC::Yarr::ByteCompiler::atomParenthesesTerminalBegin): (JSC::Yarr::ByteCompiler::atomParenthesesSubpatternBegin): (JSC::Yarr::ByteCompiler::atomParentheticalAssertionBegin): (JSC::Yarr::ByteCompiler::atomParentheticalAssertionEnd): (JSC::Yarr::ByteCompiler::atomParenthesesSubpatternEnd): (JSC::Yarr::ByteCompiler::atomParenthesesOnceEnd): (JSC::Yarr::ByteCompiler::atomParenthesesTerminalEnd): (JSC::Yarr::ByteCompiler::emitDisjunction): (JSC::Yarr::ByteCompiler::isSafeToRecurse): (JSC::Yarr::ByteTermDumper::dumpTerm): (JSC::Yarr::ByteTermDumper::dumpDisjunction): (JSC::Yarr::Interpreter::InputStream::readPair): Deleted. (JSC::Yarr::ByteCompiler::dumpDisjunction): Deleted. * Source/JavaScriptCore/yarr/YarrInterpreter.h: (JSC::Yarr::ByteTerm::ByteTerm): (JSC::Yarr::ByteTerm::HaveCheckedInput): (JSC::Yarr::ByteTerm::WordBoundary): (JSC::Yarr::ByteTerm::BackReference): (JSC::Yarr::ByteTerm::isCharacterType): (JSC::Yarr::ByteTerm::isCasedCharacterType): (JSC::Yarr::ByteTerm::isCharacterClass): (JSC::Yarr::ByteTerm::matchDirection): * Source/JavaScriptCore/yarr/YarrJIT.cpp: (JSC::Yarr::dumpCompileFailure): * Source/JavaScriptCore/yarr/YarrJIT.h: * Source/JavaScriptCore/yarr/YarrParser.h: (JSC::Yarr::Parser::parseParenthesesBegin): * Source/JavaScriptCore/yarr/YarrPattern.cpp: (JSC::Yarr::YarrPatternConstructor::resetForReparsing): (JSC::Yarr::YarrPatternConstructor::assertionBOL): (JSC::Yarr::YarrPatternConstructor::atomPatternCharacter): (JSC::Yarr::YarrPatternConstructor::atomBuiltInCharacterClass): (JSC::Yarr::YarrPatternConstructor::atomParenthesesSubpatternBegin): (JSC::Yarr::YarrPatternConstructor::atomParentheticalAssertionBegin): (JSC::Yarr::YarrPatternConstructor::atomParenthesesEnd): (JSC::Yarr::YarrPatternConstructor::atomBackReference): (JSC::Yarr::YarrPatternConstructor::copyDisjunction): (JSC::Yarr::YarrPatternConstructor::quantifyAtom): (JSC::Yarr::YarrPatternConstructor::disjunction): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::SavedContext::SavedContext): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::SavedContext::restore): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::ParenthesisContext): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::push): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::pop): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::setInvert): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::invert const): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::setMatchDirection): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::matchDirection const): (JSC::Yarr::YarrPatternConstructor::ParenthesisContext::reset): (JSC::Yarr::YarrPatternConstructor::pushParenthesisContext): (JSC::Yarr::YarrPatternConstructor::popParenthesisContext): (JSC::Yarr::YarrPatternConstructor::setParenthesisInvert): (JSC::Yarr::YarrPatternConstructor::parenthesisInvert const): (JSC::Yarr::YarrPatternConstructor::setParenthesisMatchDirection): (JSC::Yarr::YarrPatternConstructor::parenthesisMatchDirection const): (JSC::Yarr::YarrPattern::YarrPattern): (JSC::Yarr::dumpCharacterClass): (JSC::Yarr::PatternTerm::dump): * Source/JavaScriptCore/yarr/YarrPattern.h: (JSC::Yarr::PatternTerm::PatternTerm): (JSC::Yarr::PatternTerm::convertToBackreference): (JSC::Yarr::PatternTerm::setMatchDirection): (JSC::Yarr::PatternTerm::matchDirection const): (JSC::Yarr::PatternAlternative::PatternAlternative): (JSC::Yarr::PatternAlternative::matchDirection const): (JSC::Yarr::PatternDisjunction::addNewAlternative): (JSC::Yarr::YarrPattern::resetForReparsing): * Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp: (JSC::Yarr::SyntaxChecker::atomParentheticalAssertionBegin): * Source/WTF/wtf/PrintStream.cpp: (WTF::printInternal): * Source/WTF/wtf/PrintStream.h: * Source/WebCore/contentextensions/URLFilterParser.cpp: (WebCore::ContentExtensions::PatternParser::atomParentheticalAssertionBegin): Canonical link: https://commits.webkit.org/257823@main
- Loading branch information