Skip to content

Commit

Permalink
Adopt char32_t instead of UChar32 to sidestep ambiguity with int and …
Browse files Browse the repository at this point in the history
…int32_t

https://bugs.webkit.org/show_bug.cgi?id=265478
rdar://118900819

Reviewed by Alex Christensen.

Use char32_t instead of UChar32 almost everywhere. This allows us to remove
StringBuilder::appendCharacter since we can make StringBuilder::append handle
char32_t as a character rather than as int32_t (what UChar32 is defined as).

The ICU functions and macros that work with UChar32 mostly just work with char32_t.
There is a tiny number of exceptions to that rule that result in us using UChar32
in a couple places, but this changes every other instance outside ICU.

Also fixed UTF-8 conversion in the XSLT code to work more efficiently with
StringBuilder; added a new FromUTF8 adapter to makeString and StringBuilder::append.

* Source/JavaScriptCore/parser/Lexer.cpp:
(JSC::ParsedUnicodeEscapeValue::ParsedUnicodeEscapeValue): Use char32_t.
(JSC::ParsedUnicodeEscapeValue::isValid const): Ditto.
(JSC::ParsedUnicodeEscapeValue::value const): Ditto.
(JSC::Lexer<CharacterType>::parseUnicodeEscape): Ditto.
(JSC::isNonLatin1IdentStart): Ditto.
(JSC::isIdentStart): Ditto.
(JSC::isSingleCharacterIdentStart): Ditto.
(JSC::isNonLatin1IdentPart): Ditto.
(JSC::isIdentPart): Ditto.
(JSC::isSingleCharacterIdentPart): Ditto.
(JSC::Lexer<LChar>::currentCodePoint const): Ditto.
(JSC::Lexer<UChar>::currentCodePoint const): Ditto.
(JSC::Lexer<CharacterType>::recordUnicodeCodePoint): Ditto.
(JSC::Lexer<CharacterType>::parseIdentifierSlowCase): Ditto.
(JSC::Lexer<T>::lexWithoutClearingLineTerminator): Ditto.

* Source/JavaScriptCore/parser/Lexer.h: Use char32_t.
Define errorCodePoint.

* Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp:
(JSC::encode): Use char32_t.
(JSC::decode): Use char32_t and U_SENTINEL.

* Source/JavaScriptCore/runtime/StringPrototype.cpp:
(JSC::codePointAt): Use char32_t.
* Source/JavaScriptCore/yarr/YarrCanonicalize.h:
(JSC::Yarr::canonicalCharacterSetInfo): Ditto.
(JSC::Yarr::canonicalRangeInfoFor): Ditto.
(JSC::Yarr::getCanonicalPair): Ditto.
(JSC::Yarr::isCanonicallyUnique): Ditto.
(JSC::Yarr::areCanonicallyEquivalent): Ditto.
* Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp: Ditto.
* Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.js:
(printHeader): Ditto.

* Source/JavaScriptCore/yarr/YarrInterpreter.cpp:
(JSC::Yarr::Interpreter::InputStream::read): Use char32_t and errorCodePoint.
(JSC::Yarr::Interpreter::InputStream::readChecked): Ditto.
(JSC::Yarr::Interpreter::InputStream::readCheckedDontAdvance): Ditto.
(JSC::Yarr::Interpreter::InputStream::readForCharacterDump): Ditto.
(JSC::Yarr::Interpreter::InputStream::tryReadBackward): Ditto.
(JSC::Yarr::Interpreter::InputStream::readSurrogatePairChecked): Ditto.
(JSC::Yarr::Interpreter::InputStream::reread): Ditto.
(JSC::Yarr::Interpreter::InputStream::prev): Ditto.
(JSC::Yarr::Interpreter::testCharacterClass): Ditto.
(JSC::Yarr::Interpreter::checkCharacter): Ditto.
(JSC::Yarr::Interpreter::checkSurrogatePair): Ditto.
(JSC::Yarr::Interpreter::checkCasedCharacter): Ditto.
(JSC::Yarr::Interpreter::checkCharacterClass): Ditto.
(JSC::Yarr::Interpreter::checkCharacterClassDontAdvanceInputForNonBMP): Ditto.
(JSC::Yarr::ByteCompiler::atomPatternCharacter): Ditto.

* Source/JavaScriptCore/yarr/YarrInterpreter.h:
(JSC::Yarr::ByteTerm::ByteTerm): Use char32_t.
* Source/JavaScriptCore/yarr/YarrJIT.cpp:
(JSC::Yarr::BoyerMooreInfo::set): Ditto.
(JSC::Yarr::BoyerMooreInfo::addCharacters): Ditto.
* Source/JavaScriptCore/yarr/YarrJIT.h:
(JSC::Yarr::BoyerMooreFastCandidates::at const): Ditto.
(JSC::Yarr::BoyerMooreFastCandidates::add): Ditto.
(JSC::Yarr::BoyerMooreBitmap::add): Ditto.
(JSC::Yarr::BoyerMooreBitmap::addCharacters): Ditto.
(JSC::Yarr::BoyerMooreBitmap::addRanges): Ditto.

* Source/JavaScriptCore/yarr/YarrParser.h: Use char32_t and errorCodePoint.
(JSC::Yarr::Parser::CharacterClassParserDelegate::atomPatternCharacter): Ditto.
(JSC::Yarr::Parser::ClassSetParserDelegate::atomPatternCharacter): Ditto.
(JSC::Yarr::Parser::ClassStringDisjunctionParserDelegate::atomPatternCharacter): Ditto.
(JSC::Yarr::Parser::isIdentityEscapeAnError): Ditto.
(JSC::Yarr::Parser::parseEscape): Ditto.
(JSC::Yarr::Parser::consumePossibleSurrogatePair): Ditto.
(JSC::Yarr::Parser::consumeAndCheckIfValidClassSetCharacter): Ditto.
(JSC::Yarr::Parser::parseClassSet): Ditto.
(JSC::Yarr::Parser::parseClassStringDisjunction): Ditto.
(JSC::Yarr::Parser::peek): Ditto.
(JSC::Yarr::Parser::tryConsumeUnicodeEscape): Ditto.
(JSC::Yarr::Parser::tryConsumeIdentifierCharacter): Ditto.
(JSC::Yarr::Parser::isIdentifierStart): Ditto.
(JSC::Yarr::Parser::isIdentifierPart): Ditto.
(JSC::Yarr::Parser::isUnicodePropertyValueExpressionChar): Ditto.
(JSC::Yarr::Parser::consume): Ditto.
(JSC::Yarr::Parser::tryConsumeHex): Ditto.
(JSC::Yarr::Parser::tryConsumeGroupName): Ditto.
(JSC::Yarr::Parser::tryConsumeUnicodePropertyExpression): Ditto.

* Source/JavaScriptCore/yarr/YarrPattern.cpp:
(JSC::Yarr::CharacterClassConstructor::appendInverted): Use char32_t.
(JSC::Yarr::CharacterClassConstructor::putChar): Ditto.
(JSC::Yarr::CharacterClassConstructor::putCharNonUnion): Ditto.
(JSC::Yarr::CharacterClassConstructor::putUnicodeIgnoreCase): Ditto.
(JSC::Yarr::CharacterClassConstructor::putRange): Ditto.
(JSC::Yarr::CharacterClassConstructor::atomClassStringDisjunction): Ditto.
(JSC::Yarr::CharacterClassConstructor::performSetOpWithStrings): Ditto.
(JSC::Yarr::CharacterClassConstructor::performSetOpWithMatches): Ditto.
(JSC::Yarr::CharacterClassConstructor::compareUTF32Strings): Ditto.
(JSC::Yarr::CharacterClassConstructor::sort): Ditto.
(JSC::Yarr::CharacterClassConstructor::addSorted): Ditto.
(JSC::Yarr::CharacterClassConstructor::addSortedRange): Ditto.
(JSC::Yarr::CharacterClassConstructor::unionStrings): Ditto.
(JSC::Yarr::CharacterClassConstructor::intersectionStrings): Ditto.
(JSC::Yarr::CharacterClassConstructor::subtractionStrings): Ditto.
(JSC::Yarr::CharacterClassConstructor::asciiOpSorted): Ditto.
(JSC::Yarr::CharacterClassConstructor::unicodeOpSorted): Ditto.
(JSC::Yarr::CharacterClassConstructor::coalesceTables): Ditto.
(JSC::Yarr::YarrPatternConstructor::atomPatternCharacter): Ditto.
(JSC::Yarr::YarrPatternConstructor::atomCharacterClassAtom): Ditto.
(JSC::Yarr::YarrPatternConstructor::atomCharacterClassRange): Ditto.
(JSC::Yarr::YarrPatternConstructor::atomClassStringDisjunction): Ditto.
(JSC::Yarr::dumpUChar32): Ditto.
(JSC::Yarr::dumpCharacterClass): Ditto.
* Source/JavaScriptCore/yarr/YarrPattern.h:
(JSC::Yarr::CharacterRange::CharacterRange): Ditto.
(JSC::Yarr::CharacterClass::CharacterClass): Ditto.
(JSC::Yarr::ClassSet::ClassSet): Ditto.
(JSC::Yarr::PatternTerm::PatternTerm): Ditto.
* Source/JavaScriptCore/yarr/YarrSyntaxChecker.cpp:
(JSC::Yarr::SyntaxChecker::atomPatternCharacter): Ditto.
(JSC::Yarr::SyntaxChecker::atomClassStringDisjunction): Ditto.
* Source/JavaScriptCore/yarr/generateYarrCanonicalizeUnicode:
(Canonicalize.createTables): Ditto.
* Source/JavaScriptCore/yarr/generateYarrUnicodePropertyTables.py:
(PropertyData.convertStringToCppFormat): Ditto.
(PropertyData.dump): Ditto.

* Source/WTF/wtf/ASCIICType.h:
(WTF::isASCIIAlphaCaselessEqual): Updated to compile without warnings both on
platforms where char is signed and platforms where char is unsigned.

* Source/WTF/wtf/PrintStream.cpp:
(WTF::printInternal): Use char32_t.
* Source/WTF/wtf/PrintStream.h: Ditto.
* Source/WTF/wtf/URLHelpers.cpp:
(WTF::URLHelpers::isLookalikeCharacterOfScriptType<USCRIPT_ARMENIAN>): Ditto.
(WTF::URLHelpers::isLookalikeCharacterOfScriptType<USCRIPT_TAMIL>): Ditto.
(WTF::URLHelpers::isLookalikeCharacterOfScriptType<USCRIPT_CANADIAN_ABORIGINAL>): Ditto.
(WTF::URLHelpers::isLookalikeCharacterOfScriptType<USCRIPT_THAI>): Ditto.
(WTF::URLHelpers::isOfScriptType): Ditto.
(WTF::URLHelpers::isLookalikeSequence): Ditto.
(WTF::URLHelpers::isLookalikeSequence<USCRIPT_ARABIC>): Ditto.
(WTF::URLHelpers::isLookalikeCharacter): Ditto.
(WTF::URLHelpers::allCharactersInAllowedIDNScriptList): Ditto.
(WTF::URLHelpers::escapeUnsafeCharacters): Ditto.
* Source/WTF/wtf/URLParser.cpp:
(WTF::appendCodePoint): Ditto.
(WTF::URLParser::appendToASCIIBuffer): Ditto.
(WTF::URLParser::utf8PercentEncode): Ditto.
(WTF::URLParser::utf8QueryEncode): Ditto.
(WTF::URLParser::isSingleDotPathSegment): Ditto.
(WTF::URLParser::isDoubleDotPathSegment): Ditto.
(WTF::URLParser::consumeSingleDotPathSegment): Ditto.
(WTF::URLParser::consumeDoubleDotPathSegment): Ditto.
(WTF::URLParser::checkLocalhostCodePoint): Ditto.
* Source/WTF/wtf/URLParser.h: Ditto.

* Source/WTF/wtf/text/AtomString.cpp:
(WTF::replaceUnpairedSurrogatesWithReplacementCharacterInternal): Use append.

* Source/WTF/wtf/text/CodePointIterator.h:
(WTF::CodePointIterator<LChar>::operator const): Use char32_t.
(WTF::CodePointIterator<UChar>::operator const): Ditto.

* Source/WTF/wtf/text/StringBuilder.h:
(WTF::StringBuilder::appendCharacter): Deleted.

* Source/WTF/wtf/text/StringConcatenate.h: Added two new adapters. One makes
char32_t work with makeString and StringBuilder::append, and the other, FromUT8,
supports converting from UTF-8 as part of the process of calling makeString or
StringBuilder::append.

* Source/WTF/wtf/text/StringImpl.cpp:
(WTF::StringImpl::characterStartingAt): Use char32_t.
* Source/WTF/wtf/text/StringImpl.h: Ditto.
* Source/WTF/wtf/text/StringView.h:
(WTF::StringView::CodePoints::Iterator::operator* const): Ditto.
* Source/WTF/wtf/text/WTFString.cpp:
(WTF::String::characterStartingAt const): Ditto.
(WTF::String::fromCodePoint): Ditto.
* Source/WTF/wtf/text/WTFString.h: Ditto.
* Source/WTF/wtf/unicode/CharacterNames.h: Ditto.

* Source/WTF/wtf/unicode/UTF8Conversion.cpp:
(WTF::Unicode::convertLatin1ToUTF8): Use char32_t and sentinelCodePoint.
Removed unneeded reinterpret_cast.
(WTF::Unicode::convertUTF16ToUTF8): Ditto.
(WTF::Unicode::convertUTF8ToUTF16Impl): Ditto.
(WTF::Unicode::calculateStringHashAndLengthFromUTF8MaskingTop8Bits): Ditto.
(WTF::Unicode::equalUTF16WithUTF8): Rewrite in a straightforward way using
U8_NEXT and U16_NEXT_UNSAFE instead of our own more complex code.
(WTF::Unicode::equalLatin1WithUTF8): Rewrite in a straightforward way using
U8_NEXT instead of our own more complex code.
(WTF::Unicode::computeUTFLengths): Added.
* Source/WTF/wtf/unicode/UTF8Conversion.h: Added computeUTFLengths.

* Source/WebCore/PAL/pal/text/EncodingTables.cpp: Use char32_t.
* Source/WebCore/PAL/pal/text/EncodingTables.h: Ditto.
* Source/WebCore/PAL/pal/text/TextCodec.cpp:
(PAL::TextCodec::getUnencodableReplacement): Ditto.
* Source/WebCore/PAL/pal/text/TextCodec.h: Ditto.
* Source/WebCore/PAL/pal/text/TextCodecCJK.cpp:
(PAL::TextCodecCJK::eucJPDecode): Ditto.
(PAL::eucJPEncode): Ditto.
(PAL::iso2022JPEncode): Ditto.
(PAL::shiftJISEncode): Ditto.
(PAL::eucKREncode): Ditto.
(PAL::big5Encode): Ditto.
(PAL::gb18030Ranges): Ditto.
(PAL::gb18030RangesCodePoint): Ditto.
(PAL::gb18030RangesPointer): Ditto.
(PAL::gb180302022Encode): Ditto.
(PAL::gb180302022Decode): Ditto.
(PAL::TextCodecCJK::gb18030Decode): Ditto.
(PAL::gbEncodeShared): Ditto.
(PAL::gb18030Encode): Ditto.
(PAL::gbkEncode): Ditto.
(PAL::appendDecimal): Ditto.
(PAL::urlEncodedEntityUnencodableHandler): Ditto.
(PAL::entityUnencodableHandler): Ditto.
(PAL::unencodableHandler): Ditto.
(PAL::TextCodecCJK::big5Decode): Ditto. Use append. The old code was using
appendCharacter, which was overkill for appending BMP characters.

* Source/WebCore/PAL/pal/text/TextCodecSingleByte.cpp:
(PAL::encode): Use char32_t.
* Source/WebCore/PAL/pal/text/TextCodecUTF16.cpp:
(PAL::TextCodecUTF16::decode): Ditto.
* Source/WebCore/accessibility/AXObjectCache.cpp:
(WebCore::characterForCharacterOffset): Ditto.
(WebCore::AXObjectCache::characterAfter): Ditto.
(WebCore::AXObjectCache::characterBefore): Ditto.
* Source/WebCore/accessibility/AXObjectCache.h: Ditto.
* Source/WebCore/accessibility/atspi/AccessibilityObjectTextAtspi.cpp:
(WebCore::offsetMapping): Ditto.
* Source/WebCore/contentextensions/URLFilterParser.cpp:
(WebCore::ContentExtensions::PatternParser::atomClassStringDisjunction): Ditto.
* Source/WebCore/css/CSSFontFace.cpp:
(WebCore::CSSFontFace::rangesMatchCodePoint const): Ditto.
* Source/WebCore/css/CSSFontFace.h: Ditto.
* Source/WebCore/css/CSSFontFaceSet.cpp:
(WebCore::codePointsFromString): Ditto.

* Source/WebCore/css/CSSMarkup.cpp:
(WebCore::serializeCharacter): Use append and char32_t.
(WebCore::serializeCharacterAsCodePoint): Ditto.
(WebCore::serializeIdentifier): Ditto.
(WebCore::serializeString): Ditto.

* Source/WebCore/css/CSSUnicodeRangeValue.h: Use char32_t.
* Source/WebCore/css/parser/CSSPropertyParserWorkerSafe.cpp:
(WebCore::CSSPropertyParserHelpersWorkerSafe::consumeUnicodeRange): Ditto.

* Source/WebCore/css/parser/CSSTokenizer.cpp:
(WebCore::CSSTokenizer::consumeStringTokenUntil): Use append.
(WebCore::CSSTokenizer::consumeURLToken): Ditto.
(WebCore::CSSTokenizer::consumeName): Ditto.
(WebCore::CSSTokenizer::consumeEscape): Use char32_t.
* Source/WebCore/css/parser/CSSTokenizer.h: Ditto.

* Source/WebCore/dom/Document.cpp:
(WebCore::isValidNameStart): Use char32_t.
(WebCore::isValidNamePart): Ditto.
(WebCore::operator<): Ditto.
(WebCore::isPotentialCustomElementNameCharacter): Ditto.
(WebCore::isValidNameNonASCII): Ditto.
(WebCore::Document::parseQualifiedName): Ditto.
* Source/WebCore/editing/Editor.cpp:
(WebCore::Editor::editorUIUpdateTimerFired): Ditto.
(WebCore::candidateWouldReplaceText): Ditto.
* Source/WebCore/editing/ReplaceSelectionCommand.cpp:
(WebCore::isCharacterSmartReplaceExemptConsideringNonBreakingSpace): Ditto.
* Source/WebCore/editing/SmartReplace.cpp:
(WebCore::isCharacterSmartReplaceExempt): Ditto.
* Source/WebCore/editing/SmartReplace.h: Ditto.
* Source/WebCore/editing/SmartReplaceCF.cpp:
(WebCore::isCharacterSmartReplaceExempt): Ditto.
* Source/WebCore/editing/TextIterator.cpp:
(WebCore::isNonLatin1Separator): Ditto.
(WebCore::isSeparator): Ditto.
(WebCore::SearchBuffer::SearchBuffer): Ditto.
(WebCore::SearchBuffer::isWordStartMatch const): Ditto.
* Source/WebCore/editing/TypingCommand.cpp:
(WebCore::TypingCommand::markMisspellingsAfterTyping): Ditto.
* Source/WebCore/editing/VisiblePosition.cpp:
(WebCore::VisiblePosition::characterAfter const): Ditto.
* Source/WebCore/editing/VisiblePosition.h:
(WebCore::VisiblePosition::characterBefore const): Ditto.
* Source/WebCore/editing/VisibleUnits.cpp:
(WebCore::charactersAroundPosition): Ditto.
* Source/WebCore/editing/VisibleUnits.h: Ditto.

* Source/WebCore/html/parser/HTMLEntityParser.cpp:
(WebCore::makeEntity): Use char32_t.
(WebCore::consumeDecimalHTMLEntity): Use uint32_t.
(WebCore::consumeHexHTMLEntity): Ditto.

* Source/WebCore/layout/formattingContexts/inline/InlineContentBreaker.cpp:
(WebCore::Layout::canBreakBefore): Use char32_t.
* Source/WebCore/layout/formattingContexts/inline/InlineItemsBuilder.cpp:
(WebCore::Layout::replaceNonPreservedNewLineAndTabCharactersAndAppend): Ditto.

* Source/WebCore/layout/formattingContexts/inline/text/TextUtil.cpp:
(WebCore::Layout::fallbackFontsForRunWithIterator): Use char32_t.
Removed unhelpful second call to u_toupper.
(WebCore::Layout::enclosingGlyphBoundsForRunWithIterator): Ditto.
(WebCore::Layout::TextUtil::isStrongDirectionalityCharacter): Ditto.
(WebCore::Layout::TextUtil::containsStrongDirectionalityText): Ditto.
(WebCore::Layout::TextUtil::firstUserPerceivedCharacterLength): Ditto.
(WebCore::Layout::TextUtil::hasPositionDependentContentWidth): Ditto.
* Source/WebCore/layout/formattingContexts/inline/text/TextUtil.h: Ditto.

* Source/WebCore/mathml/MathMLOperatorDictionary.cpp:
(WebCore::ExtractChar): Use char32_t.
(WebCore::ExtractKeyHorizontal): Ditto.
(WebCore::MathMLOperatorDictionary::search): Ditto.
(WebCore::MathMLOperatorDictionary::isVertical): Ditto.
* Source/WebCore/mathml/MathMLOperatorDictionary.h: Ditto.
* Source/WebCore/mathml/MathMLOperatorElement.h: Ditto.
* Source/WebCore/mathml/MathMLTokenElement.cpp:
(WebCore::MathMLTokenElement::convertToSingleCodePoint): Ditto.
* Source/WebCore/mathml/MathMLTokenElement.h: Ditto.
* Source/WebCore/platform/graphics/ComplexTextController.cpp:
(WebCore::ComplexTextController::advanceByCombiningCharacterSequence): Ditto.
(WebCore::ComplexTextController::collectComplexTextRuns): Ditto.
(WebCore::ComplexTextController::adjustGlyphsAndAdvances): Ditto.
(WebCore::ComplexTextController::ComplexTextRun::ComplexTextRun): Ditto.
* Source/WebCore/platform/graphics/ComplexTextController.h: Ditto.
* Source/WebCore/platform/graphics/ComposedCharacterClusterTextIterator.h:
(WebCore::ComposedCharacterClusterTextIterator::consume): Ditto.
* Source/WebCore/platform/graphics/Font.cpp:
(WebCore::Font::platformGlyphInit): Ditto.
(WebCore::codePointSupportIndex): Ditto.
(WebCore::Font::glyphForCharacter const): Ditto.
(WebCore::Font::glyphDataForCharacter const): Ditto.
(WebCore::Font::supportsCodePoint const): Ditto.
* Source/WebCore/platform/graphics/Font.h: Ditto.
* Source/WebCore/platform/graphics/FontCascade.cpp:
(WebCore::FontCascade::glyphDataForCharacter const): Ditto.
(WebCore::FontCascade::characterRangeCodePath): Ditto.
(WebCore::FontCascade::isCJKIdeograph): Ditto.
(WebCore::FontCascade::isCJKIdeographOrSymbol): Ditto.
(WebCore::FontCascade::expansionOpportunityCountInternal): Ditto.
(WebCore::FontCascade::leftExpansionOpportunity): Ditto.
(WebCore::FontCascade::rightExpansionOpportunity): Ditto.
(WebCore::FontCascade::canReceiveTextEmphasis): Ditto.
(WebCore::computeUnderlineType): Ditto.
(WebCore::FontCascade::getEmphasisMarkGlyphData const): Ditto.
(WebCore::FontCascade::fontForCombiningCharacterSequence const): Ditto.
(WebCore::shouldSynthesizeSmallCaps): Ditto.
(WebCore::capitalized): Ditto.
* Source/WebCore/platform/graphics/FontCascade.h: Ditto.
(WebCore::FontCascade::treatAsSpace): Ditto.
(WebCore::FontCascade::isCharacterWhoseGlyphsShouldBeDeletedForTextRendering): Ditto.
(WebCore::FontCascade::treatAsZeroWidthSpace): Ditto.
(WebCore::FontCascade::treatAsZeroWidthSpaceInComplexScript): Ditto.
* Source/WebCore/platform/graphics/FontCascadeFonts.cpp:
(WebCore::MixedFontGlyphPage::glyphDataForCharacter const): Ditto.
(WebCore::MixedFontGlyphPage::setGlyphDataForCharacter): Ditto.
(WebCore::FontCascadeFonts::GlyphPageCacheEntry::glyphDataForCharacter): Ditto.
(WebCore::FontCascadeFonts::GlyphPageCacheEntry::setGlyphDataForCharacter): Ditto.
(WebCore::isInRange): Ditto.
(WebCore::shouldIgnoreRotation): Ditto.
(WebCore::glyphDataForNonCJKCharacterWithGlyphOrientation): Ditto.
(WebCore::findBestFallbackFont): Ditto.
(WebCore::FontCascadeFonts::glyphDataForSystemFallback): Ditto.
(WebCore::FontCascadeFonts::glyphDataForVariant): Ditto.
(WebCore::glyphPageFromFontRanges): Ditto.
(WebCore::FontCascadeFonts::glyphDataForCharacter): Ditto.
* Source/WebCore/platform/graphics/FontCascadeFonts.h: Ditto.
* Source/WebCore/platform/graphics/FontRanges.cpp:
(WebCore::FontRanges::glyphDataForCharacter const): Ditto.
(WebCore::FontRanges::fontForCharacter const): Ditto.
* Source/WebCore/platform/graphics/FontRanges.h: Ditto.
(WebCore::FontRanges::Range::Range): Ditto.
(WebCore::FontRanges::Range::from const): Ditto.
(WebCore::FontRanges::Range::to const): Ditto.
* Source/WebCore/platform/graphics/GlyphPage.h:
(WebCore::GlyphPage::indexForCodePoint): Ditto.
(WebCore::GlyphPage::pageNumberForCodePoint): Ditto.
(WebCore::GlyphPage::startingCodePointInPageNumber): Ditto.
(WebCore::GlyphPage::glyphDataForCharacter const): Ditto.
(WebCore::GlyphPage::glyphForCharacter const): Ditto.
* Source/WebCore/platform/graphics/Latin1TextIterator.h:
(WebCore::Latin1TextIterator::consume): Ditto.
* Source/WebCore/platform/graphics/SurrogatePairAwareTextIterator.h:
(WebCore::SurrogatePairAwareTextIterator::consume): Ditto.
* Source/WebCore/platform/graphics/WidthIterator.cpp:
(WebCore::addToGlyphBuffer): Ditto.
(WebCore::updateCharacterAndSmallCapsIfNeeded): Ditto.
(WebCore::WidthIterator::advanceInternal): Ditto.
(WebCore::WidthIterator::characterCanUseSimplifiedTextMeasuring): Ditto.
* Source/WebCore/platform/graphics/WidthIterator.h: Ditto.
* Source/WebCore/platform/graphics/cairo/FontCairo.cpp:
(WebCore::FontCascade::resolveEmojiPolicy): Ditto.
* Source/WebCore/platform/graphics/cairo/FontCairoHarfbuzzNG.cpp:
(WebCore::characterSequenceIsEmoji): Ditto.
(WebCore::FontCascade::fontForCombiningCharacterSequence const):
* Source/WebCore/platform/graphics/coretext/FontCascadeCoreText.cpp:
(WebCore::FontCascade::fontForCombiningCharacterSequence const): Ditto.
(WebCore::FontCascade::resolveEmojiPolicy): Ditto.
* Source/WebCore/platform/graphics/coretext/FontCoreText.cpp:
(WebCore::Font::platformSupportsCodePoint const): Ditto.

* Source/WebCore/platform/graphics/freetype/FontCacheFreeType.cpp: Remove
unused UTF16UChar32Iterator.h.
* Source/WebCore/platform/graphics/freetype/FontSetCache.cpp: Remove unused
UTF16UChar32Iterator.h.
(WebCore::FontSetCache::bestForCharacters): Use char32_t.
* Source/WebCore/platform/graphics/freetype/GlyphPageTreeNodeFreeType.cpp:
Remove UTF16UChar32Iterator.h.
(WebCore::GlyphPage::fill): Use char32_t and U16_NEXT.
* Source/WebCore/platform/graphics/freetype/SimpleFontDataFreeType.cpp: Remove
unused UTF16UChar32Iterator.h.
(WebCore::Font::platformSupportsCodePoint const): Use char32_t.
* Source/WebCore/platform/graphics/freetype/UTF16UChar32Iterator.h: Removed.

* Source/WebCore/platform/graphics/harfbuzz/ComplexTextControllerHarfBuzz.cpp:
(WebCore::characterScript): Use char32_t.
(WebCore::findNextRun): Ditto.
* Source/WebCore/platform/graphics/mac/ComplexTextControllerCoreText.mm:
(WebCore::ComplexTextController::collectComplexTextRunsForCharacters): Ditto.
* Source/WebCore/platform/graphics/win/FontCacheWin.cpp:
(WebCore::currentFontContainsCharacter): Ditto.
* Source/WebCore/platform/graphics/win/SimpleFontDataCairoWin.cpp:
(WebCore::Font::platformSupportsCodePoint const): Ditto.

* Source/WebCore/platform/libwpe/PlatformKeyboardEventLibWPE.cpp:
(WebCore::PlatformKeyboardEvent::keyValueForWPEKeyCode): Use char32_t and
makeString.
(WebCore::PlatformKeyboardEvent::singleCharacterString): Ditto.

* Source/WebCore/platform/network/soup/ResourceResponseSoup.cpp:
(WebCore::sanitizeFilename): Use HashSet<uint16_t> for a set of UChar rather
than a HashSet<UChar32>.

* Source/WebCore/platform/text/CharacterProperties.h:
(WebCore::isEmojiGroupCandidate): Use char32_t.
(WebCore::isEmojiFitzpatrickModifier): Ditto.
(WebCore::isVariationSelector): Ditto.
(WebCore::isEmojiKeycapBase): Ditto.
(WebCore::isEmojiRegionalIndicator): Ditto.
(WebCore::isEmojiWithPresentationByDefault): Ditto.
(WebCore::isEmojiModifierBase): Ditto.
(WebCore::isDefaultIgnorableCodePoint): Ditto.
(WebCore::isControlCharacter): Ditto.
(WebCore::isPrivateUseAreaCharacter): Ditto.
* Source/WebCore/platform/text/TextBoundaries.cpp:
(WebCore::endOfFirstWordBoundaryContext): Ditto.
(WebCore::startOfLastWordBoundaryContext): Ditto.
* Source/WebCore/platform/text/TextBoundaries.h:
(WebCore::requiresContextForWordBoundary): Ditto.

* Source/WebCore/platform/text/hyphen/HyphenationLibHyphen.cpp:
(WebCore::countLeadingSpaces): Use char32_t and U_SENTINEL.

* Source/WebCore/platform/text/mac/TextBoundaries.mm:
(WebCore::isSkipCharacter): Use char32_t.
(WebCore::isWhitespaceCharacter): Ditto.
(WebCore::isWordDelimitingCharacter): Ditto.
(WebCore::isSymbolCharacter): Ditto.
(WebCore::isAmbiguousBoundaryCharacter): Ditto.
(WebCore::findSimpleWordBoundary): Ditto.
(WebCore::findWordBoundary): Ditto.
* Source/WebCore/rendering/LegacyInlineIterator.h:
(WebCore::LegacyInlineIterator::incrementByCodePointInTextNode): Ditto.

* Source/WebCore/rendering/RenderText.cpp:
(WebCore::capitalize): Use char32_t.
(WebCore::RenderText::initiateFontLoadingByAccessingGlyphDataAndComputeCanUseSimplifiedTextMeasuring):
Use char32_t and append.
(WebCore::convertToFullSizeKana): Use char32_t. Updated fast path where no conversion
is needed to be even faster by not allocating a StringBuilder.

* Source/WebCore/rendering/mathml/MathOperator.cpp:
(WebCore::MathOperator::setOperator): Use char32_t.
(WebCore::MathOperator::getGlyph const): Ditto.
(WebCore::glyphDataForCodePointOrFallbackGlyph): Ditto.
* Source/WebCore/rendering/mathml/MathOperator.h: Ditto.
* Source/WebCore/rendering/mathml/RenderMathMLFencedOperator.h: Ditto.
* Source/WebCore/rendering/mathml/RenderMathMLOperator.cpp: Ditto.
(WebCore::RenderMathMLOperator::textContent const): Ditto.
(WebCore::RenderMathMLOperator::isInvisibleOperator const): Ditto.
* Source/WebCore/rendering/mathml/RenderMathMLOperator.h: Ditto.
* Source/WebCore/rendering/mathml/RenderMathMLToken.cpp:
(WebCore::ExtractKey): Ditto.
(WebCore::MathVariantMappingSearch): Ditto.
(WebCore::mathVariant): Ditto.
(WebCore::RenderMathMLToken::updateMathVariantGlyph): Ditto.
* Source/WebCore/rendering/mathml/RenderMathMLToken.h: Ditto.
* Source/WebCore/rendering/updating/RenderTreeBuilderFirstLetter.cpp:
(WebCore::isPunctuationForFirstLetter): Ditto.
(WebCore::shouldSkipForFirstLetter): Ditto.
(WebCore::RenderTreeBuilder::FirstLetter::createRenderers): Ditto.
* Source/WebCore/svg/SVGParserUtilities.h: Ditto.
* Source/WebCore/svg/SVGToOTFFontConversion.cpp:
(WebCore::SVGToOTFFontConverter::appendFormat12CMAPTable): Ditto.
(WebCore::SVGToOTFFontConverter::appendFormat4CMAPTable): Ditto.
(WebCore::SVGToOTFFontConverter::appendCMAPTable): Ditto.
(WebCore::SVGToOTFFontConverter::firstGlyph const): Ditto.
(WebCore::codepointToString): Ditto.
(WebCore::SVGToOTFFontConverter::glyphsForCodepoint const): Ditto.
(WebCore::SVGToOTFFontConverter::appendLigatureGlyphs): Ditto.
(WebCore::SVGToOTFFontConverter::compareCodepointsLexicographically): Ditto.

* Source/WebCore/xml/XSLTProcessorLibxslt.cpp:
(WebCore::writeToStringBuilder): Use FromUTF8, resolving the inefficiency
mentioned here in a FIXME comment and removing a use of UChar32.

* Source/WebKit/Shared/EditorState.h: Use char32_t.
* Source/WebKit/Shared/EditorState.serialization.in: Ditto.

* Source/WebKit/Shared/wpe/WebKeyboardEventWPE.cpp:
(WebKit::WebKeyboardEvent::keyValueStringForWPEKeyval): Use char32_t and
makeString.
(WebKit::WebKeyboardEvent::singleCharacterStringForWPEKeyval): Ditto.

* Source/WebKit/UIProcess/Automation/SimulatedInputDispatcher.h: Ditto.
* Source/WebKit/UIProcess/Automation/WebAutomationSession.cpp:
(WebKit::pressedCharKey): Ditto.
* Source/WebKit/UIProcess/ios/WKContentViewInteraction.h: Ditto.

* Source/WebKit/UIProcess/ios/WKContentViewInteraction.mm:
(textRelativeToSelectionStart): Use char32_t and append.

* Source/WebKitLegacy/ios/WebCoreSupport/WebFrameIOS.mm:
(isAlphaNumericCharacter): Use char32_t.
(SimpleSmartExtendStart): Ditto.
(SimpleSmartExtendEnd): Ditto.
* Source/WebKitLegacy/ios/WebCoreSupport/WebVisiblePosition.mm:
(-[WebVisiblePosition positionAtStartOrEndOfWord]): Ditto.
(-[WebVisiblePosition atAlphaNumericBoundaryInDirection:]): Ditto.

* Tools/TestWebKitAPI/Tests/WTF/StringBuilder.cpp: Use char32_t
and literals of type char32_t and char16_t.
* Tools/TestWebKitAPI/Tests/WTF/StringView.cpp: Ditto.

Canonical link: https://commits.webkit.org/271373@main
  • Loading branch information
darinadler committed Dec 1, 2023
1 parent 920ba07 commit c3f9044
Show file tree
Hide file tree
Showing 136 changed files with 969 additions and 1,012 deletions.
47 changes: 23 additions & 24 deletions Source/JavaScriptCore/parser/Lexer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -611,37 +611,37 @@ ALWAYS_INLINE T Lexer<T>::peek(int offset) const
}

struct ParsedUnicodeEscapeValue {
ParsedUnicodeEscapeValue(UChar32 value)
ParsedUnicodeEscapeValue(char32_t value)
: m_value(value)
{
ASSERT(isValid());
}

enum SpecialValueType { Incomplete = -2, Invalid = -1 };
enum SpecialValueType : char32_t { Incomplete = 0xFFFFFFFEu, Invalid = 0xFFFFFFFFu };
ParsedUnicodeEscapeValue(SpecialValueType type)
: m_value(type)
{
}

bool isValid() const { return m_value >= 0; }
bool isValid() const { return m_value != Incomplete && m_value != Invalid; }
bool isIncomplete() const { return m_value == Incomplete; }

UChar32 value() const
char32_t value() const
{
ASSERT(isValid());
return m_value;
}

private:
UChar32 m_value;
char32_t m_value;
};

template<typename CharacterType>
ParsedUnicodeEscapeValue Lexer<CharacterType>::parseUnicodeEscape()
{
if (m_current == '{') {
shift();
UChar32 codePoint = 0;
char32_t codePoint = 0;
do {
if (!isASCIIHexDigit(m_current))
return m_current ? ParsedUnicodeEscapeValue::Invalid : ParsedUnicodeEscapeValue::Incomplete;
Expand Down Expand Up @@ -728,15 +728,15 @@ ALWAYS_INLINE void Lexer<T>::skipWhitespace()
shift();
}

static bool isNonLatin1IdentStart(UChar32 c)
static bool isNonLatin1IdentStart(char32_t c)
{
return u_hasBinaryProperty(c, UCHAR_ID_START);
}

template<typename CharacterType>
static ALWAYS_INLINE bool isIdentStart(CharacterType c)
{
static_assert(std::is_same_v<CharacterType, LChar> || std::is_same_v<CharacterType, UChar32>, "Call isSingleCharacterIdentStart for UChars that don't need to check for surrogate pairs");
static_assert(std::is_same_v<CharacterType, LChar> || std::is_same_v<CharacterType, char32_t>, "Call isSingleCharacterIdentStart for UChars that don't need to check for surrogate pairs");
if (!isLatin1(c))
return isNonLatin1IdentStart(c);
return typesOfLatin1Characters[static_cast<LChar>(c)] == CharacterIdentifierStart;
Expand All @@ -746,7 +746,7 @@ static ALWAYS_INLINE UNUSED_FUNCTION bool isSingleCharacterIdentStart(UChar c)
{
if (LIKELY(isLatin1(c)))
return isIdentStart(static_cast<LChar>(c));
return !U16_IS_SURROGATE(c) && isIdentStart(static_cast<UChar32>(c));
return !U16_IS_SURROGATE(c) && isIdentStart(static_cast<char32_t>(c));
}

static ALWAYS_INLINE bool cannotBeIdentStart(LChar c)
Expand All @@ -761,15 +761,15 @@ static ALWAYS_INLINE bool cannotBeIdentStart(UChar c)
return Lexer<UChar>::isWhiteSpace(c) || Lexer<UChar>::isLineTerminator(c);
}

static NEVER_INLINE bool isNonLatin1IdentPart(UChar32 c)
static NEVER_INLINE bool isNonLatin1IdentPart(char32_t c)
{
return u_hasBinaryProperty(c, UCHAR_ID_CONTINUE) || c == 0x200C || c == 0x200D;
}

template<typename CharacterType>
static ALWAYS_INLINE bool isIdentPart(CharacterType c)
{
static_assert(std::is_same_v<CharacterType, LChar> || std::is_same_v<CharacterType, UChar32>, "Call isSingleCharacterIdentPart for UChars that don't need to check for surrogate pairs");
static_assert(std::is_same_v<CharacterType, LChar> || std::is_same_v<CharacterType, char32_t>, "Call isSingleCharacterIdentPart for UChars that don't need to check for surrogate pairs");
if (!isLatin1(c))
return isNonLatin1IdentPart(c);

Expand All @@ -783,7 +783,7 @@ static ALWAYS_INLINE bool isSingleCharacterIdentPart(UChar c)
{
if (LIKELY(isLatin1(c)))
return isIdentPart(static_cast<LChar>(c));
return !U16_IS_SURROGATE(c) && isIdentPart(static_cast<UChar32>(c));
return !U16_IS_SURROGATE(c) && isIdentPart(static_cast<char32_t>(c));
}

static ALWAYS_INLINE bool cannotBeIdentPartOrEscapeStart(LChar c)
Expand All @@ -802,24 +802,23 @@ static ALWAYS_INLINE bool cannotBeIdentPartOrEscapeStart(UChar c)


template<>
ALWAYS_INLINE UChar32 Lexer<LChar>::currentCodePoint() const
ALWAYS_INLINE char32_t Lexer<LChar>::currentCodePoint() const
{
return m_current;
}

template<>
ALWAYS_INLINE UChar32 Lexer<UChar>::currentCodePoint() const
ALWAYS_INLINE char32_t Lexer<UChar>::currentCodePoint() const
{
ASSERT_WITH_MESSAGE(!isIdentStart(static_cast<UChar32>(U_SENTINEL)), "error values shouldn't appear as a valid identifier start code point");
ASSERT_WITH_MESSAGE(!isIdentStart(errorCodePoint), "error values shouldn't appear as a valid identifier start code point");
if (!U16_IS_SURROGATE(m_current))
return m_current;

UChar trail = peek(1);
if (UNLIKELY(!U16_IS_LEAD(m_current) || !U16_IS_SURROGATE_TRAIL(trail)))
return U_SENTINEL;
return errorCodePoint;

UChar32 codePoint = U16_GET_SUPPLEMENTARY(m_current, trail);
return codePoint;
return U16_GET_SUPPLEMENTARY(m_current, trail);
}

template<typename CharacterType>
Expand Down Expand Up @@ -901,12 +900,12 @@ inline void Lexer<T>::record16(int c)
m_buffer16.append(static_cast<UChar>(c));
}

template<typename CharacterType> inline void Lexer<CharacterType>::recordUnicodeCodePoint(UChar32 codePoint)
template<typename CharacterType> inline void Lexer<CharacterType>::recordUnicodeCodePoint(char32_t codePoint)
{
ASSERT(codePoint >= 0);
ASSERT(codePoint <= UCHAR_MAX_VALUE);
if (U_IS_BMP(codePoint))
record16(codePoint);
record16(static_cast<UChar>(codePoint));
else {
UChar codeUnits[2] = { U16_LEAD(codePoint), U16_TRAIL(codePoint) };
append16(codeUnits, 2);
Expand Down Expand Up @@ -1119,8 +1118,8 @@ JSTokenType Lexer<CharacterType>::parseIdentifierSlowCase(JSTokenData* tokenData
if (UNLIKELY(!U16_IS_SURROGATE_LEAD(m_current)))
return INVALID_UNICODE_ENCODING_ERRORTOK;

UChar32 codePoint = currentCodePoint();
if (UNLIKELY(codePoint == U_SENTINEL))
char32_t codePoint = currentCodePoint();
if (UNLIKELY(codePoint == errorCodePoint))
return INVALID_UNICODE_ENCODING_ERRORTOK;
if (UNLIKELY(isStart ? !isNonLatin1IdentStart(codePoint) : !isNonLatin1IdentPart(codePoint)))
return INVALID_IDENTIFIER_UNICODE_ERRORTOK;
Expand Down Expand Up @@ -1928,7 +1927,7 @@ JSTokenType Lexer<T>::lexWithoutClearingLineTerminator(JSToken* tokenRecord, Opt
if (LIKELY(isLatin1(m_current)))
type = static_cast<CharacterType>(typesOfLatin1Characters[m_current]);
else {
UChar32 codePoint;
char32_t codePoint;
U16_GET(m_code, 0, 0, m_codeEnd - m_code, codePoint);
if (isNonLatin1IdentStart(codePoint))
type = CharacterIdentifierStart;
Expand Down Expand Up @@ -2493,7 +2492,7 @@ JSTokenType Lexer<T>::lexWithoutClearingLineTerminator(JSToken* tokenRecord, Opt
}
case CharacterIdentifierStart: {
if constexpr (ASSERT_ENABLED) {
UChar32 codePoint;
char32_t codePoint;
U16_GET(m_code, 0, 0, m_codeEnd - m_code, codePoint);
ASSERT(isIdentStart(codePoint));
}
Expand Down
5 changes: 3 additions & 2 deletions Source/JavaScriptCore/parser/Lexer.h
Original file line number Diff line number Diff line change
Expand Up @@ -132,11 +132,12 @@ class Lexer {
void append8(const T*, size_t);
void record16(int);
void record16(T);
void recordUnicodeCodePoint(UChar32);
void recordUnicodeCodePoint(char32_t);
void append16(const LChar*, size_t);
void append16(const UChar* characters, size_t length) { m_buffer16.append(characters, length); }

UChar32 currentCodePoint() const;
static constexpr char32_t errorCodePoint = 0xFFFFFFFFu;
char32_t currentCodePoint() const;
ALWAYS_INLINE void shift();
ALWAYS_INLINE bool atEnd() const;
ALWAYS_INLINE T peek(int offset) const;
Expand Down
6 changes: 3 additions & 3 deletions Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ static JSValue encode(JSGlobalObject* globalObject, const WTF::BitSet<256>& doNo

// 4-d-ii. If the code unit value of C is less than 0xD800 or greater than 0xDBFF, then
// 4-d-ii-1. Let V be the code unit value of C.
UChar32 codePoint;
char32_t codePoint;
if (!U16_IS_LEAD(character))
codePoint = character;
else {
Expand Down Expand Up @@ -186,10 +186,10 @@ static JSValue decode(JSGlobalObject* globalObject, const CharType* characters,
}
}
if (charLen != 0) {
UChar32 character;
char32_t character;
int32_t offset = 0;
U8_NEXT(sequence, offset, sequenceLen, character);
if (character < 0)
if (character == static_cast<char32_t>(U_SENTINEL))
charLen = 0;
else if (!U_IS_BMP(character)) {
// Convert to surrogate pair.
Expand Down
4 changes: 2 additions & 2 deletions Source/JavaScriptCore/runtime/StringPrototype.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1024,12 +1024,12 @@ JSC_DEFINE_HOST_FUNCTION(stringProtoFuncCharCodeAt, (JSGlobalObject* globalObjec
return JSValue::encode(jsNaN());
}

static inline UChar32 codePointAt(const String& string, unsigned position, unsigned length)
static inline char32_t codePointAt(const String& string, unsigned position, unsigned length)
{
RELEASE_ASSERT(position < length);
if (string.is8Bit())
return string.characters8()[position];
UChar32 character;
char32_t character;
U16_NEXT(string.characters16(), position, length, character);
return character;
}
Expand Down
26 changes: 13 additions & 13 deletions Source/JavaScriptCore/yarr/YarrCanonicalize.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,31 +44,31 @@ enum UCS2CanonicalizationType {
CanonicalizeAlternatingUnaligned, // Unaligned consequtive pair, e.g. 0x241,0x242.
};
struct CanonicalizationRange {
UChar32 begin;
UChar32 end;
UChar32 value;
char32_t begin;
char32_t end;
char32_t value;
UCS2CanonicalizationType type;
};

extern const size_t UCS2_CANONICALIZATION_RANGES;
extern const UChar32* const ucs2CharacterSetInfo[];
extern const char32_t* const ucs2CharacterSetInfo[];
extern const CanonicalizationRange ucs2RangeInfo[];
extern const uint16_t canonicalTableLChar[256];

extern const size_t UNICODE_CANONICALIZATION_RANGES;
extern const UChar32* const unicodeCharacterSetInfo[];
extern const char32_t* const unicodeCharacterSetInfo[];
extern const CanonicalizationRange unicodeRangeInfo[];

enum class CanonicalMode { UCS2, Unicode };

inline const UChar32* canonicalCharacterSetInfo(unsigned index, CanonicalMode canonicalMode)
inline const char32_t* canonicalCharacterSetInfo(unsigned index, CanonicalMode canonicalMode)
{
const UChar32* const* rangeInfo = canonicalMode == CanonicalMode::UCS2 ? ucs2CharacterSetInfo : unicodeCharacterSetInfo;
const char32_t* const* rangeInfo = canonicalMode == CanonicalMode::UCS2 ? ucs2CharacterSetInfo : unicodeCharacterSetInfo;

This comment has been minimized.

Copy link
@darinadler

darinadler Dec 1, 2023

Author Member

Should have just used auto* here.

return rangeInfo[index];
}

// This searches in log2 time over ~400-600 entries, so should typically result in 9 compares.
inline const CanonicalizationRange* canonicalRangeInfoFor(UChar32 ch, CanonicalMode canonicalMode = CanonicalMode::UCS2)
inline const CanonicalizationRange* canonicalRangeInfoFor(char32_t ch, CanonicalMode canonicalMode = CanonicalMode::UCS2)
{
const CanonicalizationRange* info = canonicalMode == CanonicalMode::UCS2 ? ucs2RangeInfo : unicodeRangeInfo;
size_t entries = canonicalMode == CanonicalMode::UCS2 ? UCS2_CANONICALIZATION_RANGES : UNICODE_CANONICALIZATION_RANGES;
Expand All @@ -88,7 +88,7 @@ inline const CanonicalizationRange* canonicalRangeInfoFor(UChar32 ch, CanonicalM
}

// Should only be called for characters that have one canonically matching value.
inline UChar32 getCanonicalPair(const CanonicalizationRange* info, UChar32 ch)
inline char32_t getCanonicalPair(const CanonicalizationRange* info, char32_t ch)
{
ASSERT(ch >= info->begin && ch <= info->end);
switch (info->type) {
Expand All @@ -108,20 +108,20 @@ inline UChar32 getCanonicalPair(const CanonicalizationRange* info, UChar32 ch)
}

// Returns true if no other UCS2 codepoint can match this value.
inline bool isCanonicallyUnique(UChar32 ch, CanonicalMode canonicalMode = CanonicalMode::UCS2)
inline bool isCanonicallyUnique(char32_t ch, CanonicalMode canonicalMode = CanonicalMode::UCS2)
{
return canonicalRangeInfoFor(ch, canonicalMode)->type == CanonicalizeUnique;
}

// Returns true if values are equal, under the canonicalization rules.
inline bool areCanonicallyEquivalent(UChar32 a, UChar32 b, CanonicalMode canonicalMode = CanonicalMode::UCS2)
inline bool areCanonicallyEquivalent(char32_t a, char32_t b, CanonicalMode canonicalMode = CanonicalMode::UCS2)
{
const CanonicalizationRange* info = canonicalRangeInfoFor(a, canonicalMode);
auto* info = canonicalRangeInfoFor(a, canonicalMode);
switch (info->type) {
case CanonicalizeUnique:
return a == b;
case CanonicalizeSet: {
for (const UChar32* set = canonicalCharacterSetInfo(info->value, canonicalMode); (a = *set); ++set) {
for (auto* set = canonicalCharacterSetInfo(info->value, canonicalMode); (a = *set); ++set) {
if (a == b)
return true;
}
Expand Down
50 changes: 25 additions & 25 deletions Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,39 +23,39 @@
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

// DO NOT EDIT! - this file autogenerated by YarrCanonicalize.js
// DO NOT EDIT! - this file autogenerated by YarrCanonicalizeUCS2.js

#include "config.h"
#include "YarrCanonicalize.h"

namespace JSC { namespace Yarr {

const UChar32 ucs2CharacterSet0[] = { 0x01c4, 0x01c5, 0x01c6, 0 };
const UChar32 ucs2CharacterSet1[] = { 0x01c7, 0x01c8, 0x01c9, 0 };
const UChar32 ucs2CharacterSet2[] = { 0x01ca, 0x01cb, 0x01cc, 0 };
const UChar32 ucs2CharacterSet3[] = { 0x01f1, 0x01f2, 0x01f3, 0 };
const UChar32 ucs2CharacterSet4[] = { 0x0392, 0x03b2, 0x03d0, 0 };
const UChar32 ucs2CharacterSet5[] = { 0x0395, 0x03b5, 0x03f5, 0 };
const UChar32 ucs2CharacterSet6[] = { 0x0398, 0x03b8, 0x03d1, 0 };
const UChar32 ucs2CharacterSet7[] = { 0x0345, 0x0399, 0x03b9, 0x1fbe, 0 };
const UChar32 ucs2CharacterSet8[] = { 0x039a, 0x03ba, 0x03f0, 0 };
const UChar32 ucs2CharacterSet9[] = { 0x00b5, 0x039c, 0x03bc, 0 };
const UChar32 ucs2CharacterSet10[] = { 0x03a0, 0x03c0, 0x03d6, 0 };
const UChar32 ucs2CharacterSet11[] = { 0x03a1, 0x03c1, 0x03f1, 0 };
const UChar32 ucs2CharacterSet12[] = { 0x03a3, 0x03c2, 0x03c3, 0 };
const UChar32 ucs2CharacterSet13[] = { 0x03a6, 0x03c6, 0x03d5, 0 };
const UChar32 ucs2CharacterSet14[] = { 0x0412, 0x0432, 0x1c80, 0 };
const UChar32 ucs2CharacterSet15[] = { 0x0414, 0x0434, 0x1c81, 0 };
const UChar32 ucs2CharacterSet16[] = { 0x041e, 0x043e, 0x1c82, 0 };
const UChar32 ucs2CharacterSet17[] = { 0x0421, 0x0441, 0x1c83, 0 };
const UChar32 ucs2CharacterSet18[] = { 0x0422, 0x0442, 0x1c84, 0x1c85, 0 };
const UChar32 ucs2CharacterSet19[] = { 0x042a, 0x044a, 0x1c86, 0 };
const UChar32 ucs2CharacterSet20[] = { 0x0462, 0x0463, 0x1c87, 0 };
const UChar32 ucs2CharacterSet21[] = { 0x1e60, 0x1e61, 0x1e9b, 0 };
const UChar32 ucs2CharacterSet22[] = { 0x1c88, 0xa64a, 0xa64b, 0 };
constexpr char32_t ucs2CharacterSet0[] = { 0x01c4, 0x01c5, 0x01c6, 0 };
constexpr char32_t ucs2CharacterSet1[] = { 0x01c7, 0x01c8, 0x01c9, 0 };
constexpr char32_t ucs2CharacterSet2[] = { 0x01ca, 0x01cb, 0x01cc, 0 };
constexpr char32_t ucs2CharacterSet3[] = { 0x01f1, 0x01f2, 0x01f3, 0 };
constexpr char32_t ucs2CharacterSet4[] = { 0x0392, 0x03b2, 0x03d0, 0 };
constexpr char32_t ucs2CharacterSet5[] = { 0x0395, 0x03b5, 0x03f5, 0 };
constexpr char32_t ucs2CharacterSet6[] = { 0x0398, 0x03b8, 0x03d1, 0 };
constexpr char32_t ucs2CharacterSet7[] = { 0x0345, 0x0399, 0x03b9, 0x1fbe, 0 };
constexpr char32_t ucs2CharacterSet8[] = { 0x039a, 0x03ba, 0x03f0, 0 };
constexpr char32_t ucs2CharacterSet9[] = { 0x00b5, 0x039c, 0x03bc, 0 };
constexpr char32_t ucs2CharacterSet10[] = { 0x03a0, 0x03c0, 0x03d6, 0 };
constexpr char32_t ucs2CharacterSet11[] = { 0x03a1, 0x03c1, 0x03f1, 0 };
constexpr char32_t ucs2CharacterSet12[] = { 0x03a3, 0x03c2, 0x03c3, 0 };
constexpr char32_t ucs2CharacterSet13[] = { 0x03a6, 0x03c6, 0x03d5, 0 };
constexpr char32_t ucs2CharacterSet14[] = { 0x0412, 0x0432, 0x1c80, 0 };
constexpr char32_t ucs2CharacterSet15[] = { 0x0414, 0x0434, 0x1c81, 0 };
constexpr char32_t ucs2CharacterSet16[] = { 0x041e, 0x043e, 0x1c82, 0 };
constexpr char32_t ucs2CharacterSet17[] = { 0x0421, 0x0441, 0x1c83, 0 };
constexpr char32_t ucs2CharacterSet18[] = { 0x0422, 0x0442, 0x1c84, 0x1c85, 0 };
constexpr char32_t ucs2CharacterSet19[] = { 0x042a, 0x044a, 0x1c86, 0 };
constexpr char32_t ucs2CharacterSet20[] = { 0x0462, 0x0463, 0x1c87, 0 };
constexpr char32_t ucs2CharacterSet21[] = { 0x1e60, 0x1e61, 0x1e9b, 0 };
constexpr char32_t ucs2CharacterSet22[] = { 0x1c88, 0xa64a, 0xa64b, 0 };

static constexpr size_t UCS2_CANONICALIZATION_SETS = 23;
const UChar32* const ucs2CharacterSetInfo[UCS2_CANONICALIZATION_SETS] = {
const char32_t* const ucs2CharacterSetInfo[UCS2_CANONICALIZATION_SETS] = {
ucs2CharacterSet0,
ucs2CharacterSet1,
ucs2CharacterSet2,
Expand Down
8 changes: 4 additions & 4 deletions Source/JavaScriptCore/yarr/YarrCanonicalizeUCS2.js
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ function printHeader()

print(copyright);
print();
print("// DO NOT EDIT! - this file autogenerated by YarrCanonicalize.js");
print("// DO NOT EDIT! - this file autogenerated by YarrCanonicalizeUCS2.js");
print();
print('#include "config.h"');
print('#include "YarrCanonicalize.h"');
Expand All @@ -68,7 +68,7 @@ function printFooter()
print();
}

// Helper function to convert a number to a fixed width hex representation of a UChar32.
// Helper function to convert a number to a fixed width hex representation of a char32_t.
function hex(x)
{
var s = Number(x).toString(16);
Expand Down Expand Up @@ -165,11 +165,11 @@ function createTables(prefix, maxValue, canonicalGroups)
var set = characterSetInfo[i];
for (var j in set)
characters += hex(set[j]) + ", ";
print("const UChar32 " + prefixLower + "CharacterSet" + i + "[] = { " + characters + "0 };");
print("constexpr char32_t " + prefixLower + "CharacterSet" + i + "[] = { " + characters + "0 };");
}
print();
print("static constexpr size_t " + prefixUpper + "_CANONICALIZATION_SETS = " + characterSetInfo.length + ";");
print("const UChar32* const " + prefixLower + "CharacterSetInfo[" + prefixUpper + "_CANONICALIZATION_SETS] = {");
print("const char32_t* const " + prefixLower + "CharacterSetInfo[" + prefixUpper + "_CANONICALIZATION_SETS] = {");
for (i in characterSetInfo)
print(" " + prefixLower + "CharacterSet" + i + ",");
print("};");
Expand Down
Loading

0 comments on commit c3f9044

Please sign in to comment.