Skip to content

Conversation

darinadler
Copy link
Member

@darinadler darinadler commented Jul 27, 2025

774c0dc

Make LChar a distinct type from uint8_t so it can imply character encoding as char8/16/32_t do
https://bugs.webkit.org/show_bug.cgi?id=296539
rdar://156856072

Reviewed by Chris Dumez, Sam Weinig, and Geoffrey Garen.

LChar is now a struct so it can be a distinct type, so LChar means Latin-1, char8_t means UTF-8,
and uint8_t and char remain ambiguous about encoding.

Tried to mostly stay with the minimum to get things compiling, without a lot of "cleanup".

As part of this made it possible to construct String directly from std::span<char8_t>
without having to utter String::fromUTF8 since the type is unambiguous.

We should follow up by removing more overloads and functions that interpret uint8_t, char, or
even std::byte as particular encodings, and use byteCast to make encoding explicit.

* Source/JavaScriptCore/API/JSScript.mm:
(+[JSScript scriptOfType:memoryMappedFromASCIIFile:withSourceURL:andBytecodeCache:inVirtualMachine:error:]):
Cast to LChar.

* Source/JavaScriptCore/API/JSStringRefCF.cpp:
(JSStringCreateWithCFString): Cast to UInt8.
(JSStringCopyCFString): Ditto.

* Source/JavaScriptCore/Scripts/xxd.pl:
Use constexpr std::array<LChar> for C++.

* Source/JavaScriptCore/heap/HeapSnapshotBuilder.cpp:
(JSC::HeapSnapshotBuilder::json): Rely on serialization of enum as a numeral,
rather than explicitly calling edgeTypeToNumber.

* Source/JavaScriptCore/inspector/remote/socket/RemoteInspectorConnectionClient.cpp:
(Inspector::RemoteInspectorConnectionClient::extractEvent): Cast to char8_t.

* Source/JavaScriptCore/inspector/remote/socket/RemoteInspectorSocket.cpp:
(Inspector::RemoteInspector::backendCommands const): Eliminate use of String::adopt.
It doesn't really work for vectors any more, and likely we should remove it to avoid
making a promise we can't keep. It doesn't work with byteCast, which is why we need
to do this here now.

* Source/JavaScriptCore/parser/Lexer.cpp:
(JSC::Lexer<T>::peek const): Tweak so the conditional operator compiles.
(JSC::Lexer<T>::parseString): Use SIMD::SizedUnsigned.
(JSC::Lexer<T>::parseStringSlowCase): Removed a static assertion that's not
super important and a bit tricky to write given that LChar is no longer a scalar.
(JSC::Lexer<T>::lexWithoutClearingLineTerminator): Use SIMD::SizedUnsigned.
* Source/JavaScriptCore/runtime/ISO8601.h: Cast to char.

* Source/JavaScriptCore/runtime/IntlObject.cpp:
(JSC::parseVariantCode): Removed a static assertion that's not super important
and a bit tricky to write given that LChar is no longer a scalar.

* Source/JavaScriptCore/runtime/JSGenericTypedArrayViewPrototype.cpp:
(JSC::uint8ArrayPrototypeToHex): Cast to uint8_t.

* Source/JavaScriptCore/runtime/JSONObject.cpp:
(JSC::stringCopySameType): Use SIMD::SizedUnsigned.
(JSC::stringCopyUpconvert): Use uint8_t directly.

* Source/JavaScriptCore/runtime/LiteralParser.cpp:
(JSC::reviverMode>::Lexer::lexString): Reduce mixing char with LChar a bit.
Also use SIMD::SizedUnsigned.

* Source/WTF/wtf/HexNumber.cpp:
(WTF::Internal::hexDigitsForMode): Moved this here from the header since it's
only used here.

* Source/WTF/wtf/HexNumber.h:
(WTF::Internal::appendHex): Added an overload that takes LChar and forwards all
the other arguments.

* Source/WTF/wtf/JSONValues.cpp: Use char16_t for local since that works with
a switch statement, LChar would not since it's a struct rather than a scalar.

* Source/WTF/wtf/PrintStream.h:
(WTF::printInternal): Added char8_t overload.

* Source/WTF/wtf/SIMDHelpers.h: Added SizedUnsigned, which works for LChar.
(WTF::SIMD::find): Use SizedUnsigned.
(WTF::SIMD::count): Ditto.

* Source/WTF/wtf/SortedArrayMap.h:
(WTF::foldForComparison): Use SIMD::SizedUnsigned.

* Source/WTF/wtf/StdLibExtras.h: Update ByteType to work with LChar. Also
tweaked it a little and renamed it IsByte and IsMutableByte.
(WTF::ByteCastTraits<T>::cast): Ditto.
(WTF::ByteCastTraits<T::cast): Ditto.
(WTF::byteCast): Ditto.

* Source/WTF/wtf/URLParser.cpp:
(WTF::URLParser::appendNumberToASCIIBuffer): Cast to char.
(WTF::URLParser::formURLDecode): Cast to char8_t.

* Source/WTF/wtf/cf/CFURLExtras.cpp:
(WTF::bytesAsString): Cast to UInt8.
(WTF::isSameOrigin): Cast to LChar.

* Source/WTF/wtf/cf/URLCF.cpp:
(WTF::URL::createCFURL): Cast to UInt8.

* Source/WTF/wtf/cocoa/NSURLExtras.mm:
(WTF::userVisibleString): Cast to LChar.

* Source/WTF/wtf/cocoa/SpanCocoa.h:
(WTF::toNSData): Added overloads to make this work with all IsByte types.
(WTF::toNSDataNoCopy): Ditto.

* Source/WTF/wtf/persistence/PersistentCoders.cpp:
(WTF::Persistence::Coder<String>::encodeForPersistence): Use asBytes to
convert result of span8 into bytes, not LChar.

* Source/WTF/wtf/persistence/PersistentDecoder.h:
(WTF::Persistence::Decoder::bufferIsLargeEnoughToContain const):
Update assertion to allow LChar.

* Source/WTF/wtf/text/ASCIIFastPath.h:
Redid the NonASCIIMask and NonLatin1Mask to not rely on specific types
for CharacterType and depend on the size of the type instead.

* Source/WTF/wtf/text/Base64.cpp:
(WTF::base64DecodeInternal): Take the vector type as the template parameter
instead of just the Malloc. Simplifies things a bit.
(WTF::base64DecodeToString): Updated for the above, also simplified by
removing the lambda since it reads well without it.

* Source/WTF/wtf/text/IntegerToStringConversion.h:
(WTF::numberToStringImpl): Cast to char.
(WTF::writeIntegerToBufferImpl): Ditto.

* Source/WTF/wtf/text/LChar.h: Make LChar a struct in the WTF namespace.
Added constexpr functions to smooth the use of this in code that treats
it as an integer. Added IsStringStorageCharacter concept so we can write
templates that work with LChar and char16_t without accidentally allowing
other types as well.

* Source/WTF/wtf/text/ParsingUtilities.h: Added an overload of skipWhile for char8_t.

* Source/WTF/wtf/text/StringBuilder.h:
(WTF::StringBuilder::append): Added an overload for uint8_t, needed since there is code
that depends on single argument append treating it as a character and other code that
depends on variadic append treating it as a numeral!
(WTF::StringBuilder::operator[] const): Tweak so the conditional operator compiles.

* Source/WTF/wtf/text/StringCommon.h: Added an IsFindableCharacter concept so
function templates compile for the correct types including LChar. Use SIMD::SizedUnsigned.
(WTF::equalLettersIgnoringASCIICaseWithLength):
(WTF::compareEach): Added a cast to int to silence a compiler warning about our
intentional use of bitwise with booleans.

* Source/WTF/wtf/text/StringHasher.h:
(WTF::StringHasher::DefaultConverter::convert): Fixed this so it works with LChar.

* Source/WTF/wtf/text/StringImpl.cpp:
(WTF::StringImpl::create): Added overload for char8_t spans.

* Source/WTF/wtf/text/StringImpl.h: Use IsStringStorageCharacter to make templates
type check a bit better.
(WTF::StringImpl::create): Added a new create function for UTF-8 strings
to move the logic from WTF::String to here.
(WTF::StringImpl::copyCharacters): Tweak to work with LChar.
(WTF::StringImpl::tryCreateUninitialized):
(WTF::StringImpl::at const): Tweak so the conditional operator compiles.
(WTF::StringImpl::createByReplacingInCharacters): Ditto.

* Source/WTF/wtf/text/WTFString.cpp:
(WTF::String::String): Added a constructor that takes a std::span<const char8_t>
for use instead of String::fromUTF8. Use StringImpl::create.
(WTF::String::ascii const): Tweak so the conditional operator compiles.
(WTF::String::fromUTF8ReplacingInvalidSequences): Use StringImpl::create.
(WTF::fromUTF8Impl): Deleted. Uses StringImpl instead.
(WTF::String::fromUTF8): Deleted the overload that takes const char8_t; instead
the template allows any byte type.

* Source/WTF/wtf/text/WTFString.h: Added the new constructor for span<char8_t>
and rearranged from functions so they work with more types. Since the explicitly
state the encoding, we can allow them to take any byte type.

* Source/WTF/wtf/text/cf/StringCF.cpp:
(WTF::String::String): Cast to UInt8.
* Source/WTF/wtf/text/cf/StringImplCF.cpp:
(WTF::StringImpl::createCFString): Ditto.
* Source/WTF/wtf/text/cf/StringViewCF.cpp:
(WTF::StringView::createCFString const): Ditto.
(WTF::StringView::createCFStringWithoutCopying const): Ditto.

* Source/WebCore/DerivedSources-input.xcfilelist: Updated for scripts from
JavaScriptCore that were always used but are now dependencies.

* Source/WebCore/DerivedSources.make: Added missing dependencies in the rules
that produce XMLViewerCSS.h and XMLViewerJS.h so they are regenerated if the
scripts that produce them change, as with this patch that changes xxd.pl.

* Source/WebCore/Modules/encryptedmedia/InitDataRegistry.cpp:
(WebCore::extractKeyIDsKeyids): Cast to LChar to pass to parseJSON and remove
the unnecessary copy into a temporary String.

* Source/WebCore/Modules/mediastream/PeerConnectionBackend.cpp:
(WebCore::PeerConnectionBackend::handleLogMessage): Cast to uint8_t.
* Source/WebCore/Modules/mediastream/RTCRtpSFrameTransformerCocoa.cpp:
(WebCore::RTCRtpSFrameTransformer::computeSaltKey): Ditto.
(WebCore::createBaseSFrameKey): Ditto.
(WebCore::RTCRtpSFrameTransformer::computeAuthenticationKey): Ditto.
(WebCore::RTCRtpSFrameTransformer::computeEncryptionKey): Ditto.
* Source/WebCore/Modules/mediastream/gstreamer/GStreamerDtlsTransportBackend.cpp:
(WebCore::GStreamerDtlsTransportBackendObserver::stateChanged): Ditto.
* Source/WebCore/Modules/push-api/PushMessageCrypto.cpp:
(WebCore::PushCrypto::decryptAES128GCMPayload): Ditto.
(WebCore::PushCrypto::decryptAESGCMPayload): Ditto.

* Source/WebCore/Modules/url-pattern/URLPatternParser.cpp:
(WebCore::URLPatternUtilities::escapeRegexStringForCharacters): Specify the type
and size of the array to make it compile.
(WebCore::URLPatternUtilities::escapePatternStringForCharacters): Ditto.

* Source/WebCore/Modules/websockets/WebSocketExtensionDispatcher.cpp:
(WebCore::WebSocketExtensionDispatcher::processHeaderValue): Use char8_t
since the code currently parses a UTF-8 representation of the header value.

* Source/WebCore/Modules/websockets/WebSocketExtensionParser.cpp:
(WebCore::isSpaceOrTab): Use char8_t, for now at least.
* Source/WebCore/Modules/websockets/WebSocketExtensionParser.h: Ditto.

* Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:
(WebCore::trimInputSample): Cast to LChar.
(WebCore::WebSocketHandshake::readStatusLine): Ditto.

* Source/WebCore/PAL/PAL.xcodeproj/project.pbxproj: Removed Gunzip.cpp/h.
* Source/WebCore/PAL/pal/CMakeLists.txt: Removed Gunzip.h
* Source/WebCore/PAL/pal/Gunzip.h: Removed.
* Source/WebCore/PAL/pal/PlatformMac.cmake: Removed Gunzip.cpp.
* Source/WebCore/PAL/pal/cocoa/Gunzip.cpp: Removed.

* Source/WebCore/bindings/js/ScriptBufferSourceProvider.h: Cast to LChar.
* Source/WebCore/bindings/js/SerializedScriptValue.cpp:
(WebCore::CloneDeserializer::readString): Ditto.

* Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:
(WebCore::ContentExtensions::DFABytecodeInterpreter::interpretJumpTable):
Cast to char so the conditional operator compiles.
(WebCore::ContentExtensions::DFABytecodeInterpreter::interpret): DIto.

* Source/WebCore/crypto/SubtleCrypto.cpp:
(WebCore::SubtleCrypto::unwrapKey): Cast to LChar to pass to JSONParse and remove
the unnecessary copy into a temporary String.

* Source/WebCore/editing/cocoa/WebContentReaderCocoa.mm:
(WebCore::replaceRichContentWithAttachments): Cast to LChar.
* Source/WebCore/fileapi/FileReaderLoader.cpp:
(WebCore::FileReaderLoader::stringResult): Ditto.
* Source/WebCore/html/FTPDirectoryDocument.cpp:
(WebCore::FTPDirectoryDocumentParser::loadDocumentTemplate): Ditto.

* Source/WebCore/html/parser/HTMLEntityParser.cpp:
(WebCore::StringParsingBufferSource::currentCharacter const): Tweak so
the conditional operator compiles.
* Source/WebCore/html/track/VTTScanner.h:
(WebCore::VTTScanner::currentChar const): Ditto.

* Source/WebCore/html/track/WebVTTParser.cpp:
(WebCore::WebVTTParser::fileFinished): Cast to uint8_t.

* Source/WebCore/loader/FTPDirectoryParser.cpp:
(WebCore::parseOneFTPLine): Cast to LChar.

* Source/WebCore/loader/FormSubmission.cpp:
(WebCore::appendMailtoPostFormDataToURL): Cast to LChar.
(WebCore::FormSubmission::create): Ditto.

* Source/WebCore/loader/TextResourceDecoder.cpp:
(WebCore::findXMLEncoding): Cast to uint8_t.
(WebCore::TextResourceDecoder::checkForCSSCharset): Cast to uint8_t and LChar.
(WebCore::TextResourceDecoder::checkForHeadCharset): Ditto.

* Source/WebCore/loader/cache/CachedScript.cpp:
(WebCore::CachedScript::script): Cast to LChar.
(WebCore::CachedScript::codeBlockHashConcurrently): Ditto.

* Source/WebCore/platform/encryptedmedia/CDMUtilities.cpp:
(WebCore::CDMUtilities::parseJSONObject): Cast to LChar to pass to parseJSON
and remove the unnecessary copy into a temporary String.
* Source/WebCore/platform/graphics/avfoundation/CDMFairPlayStreaming.cpp:
(WebCore::extractSinfData): Ditto.
(WebCore::CDMPrivateFairPlayStreaming::extractKeyIDsMpts): Ditto.

* Source/WebCore/platform/graphics/avfoundation/objc/CDMInstanceFairPlayStreamingAVFObjC.mm:
(WebCore::parseJSONValue): Cast to LChar.

* Source/WebCore/platform/graphics/freetype/FontCacheFreeType.cpp:
(WebCore::fontNameMapName): Cast to LChar.

* Source/WebCore/platform/graphics/gstreamer/eme/CDMThunder.cpp:
(WebCore::ParsedResponseMessage::ParsedResponseMessage): Cast to LChar.
(WebCore::CDMInstanceSessionThunder::loadSession): Ditto.

* Source/WebCore/platform/graphics/iso/ISOVTTCue.cpp:
Use char8_t for the buffer we are treating as UTF-8.

* Source/WebCore/platform/gstreamer/GStreamerElementHarness.cpp:
(WebCore::MermaidBuilder::span const): Cast to uint8_t.

* Source/WebCore/platform/image-decoders/png/PNGImageDecoder.cpp:
(WebCore::decodingWarning): Cast to char.
(WebCore::PNGImageDecoder::readChunks): Ditto.

* Source/WebCore/platform/mediarecorder/MediaRecorderPrivateMock.cpp:
(WebCore::MediaRecorderPrivateMock::fetchData): Cast to uint8_t.

* Source/WebCore/platform/network/HTTPParsers.cpp:
(WebCore::trimInputSample): Make into a non-template function.
(WebCore::parseHTTPHeader): Cast to LChar.

* Source/WebCore/platform/network/curl/OpenSSLHelper.cpp:
(OpenSSL::BIO::getDataAsString const): Cast to LChar.
(OpenSSL::toString): Ditto.
* Source/WebCore/platform/network/soup/CertificateInfoSoup.cpp:
(WebCore::CertificateInfo::summary const): Ditto.

* Source/WebCore/platform/text/SegmentedString.h:
(WebCore::SegmentedString::Substring::currentCharacter const):
Tweak to compile conditional operator.

* Source/WebCore/rendering/BreakLines.h:
(WebCore::BreakLines::nextBreakablePosition): Switch to use a simpler scoped
CharacterInfo struct instead of a struct template with conversions.

* Source/WebCore/testing/MockCDMFactory.cpp:
(WebCore::MockCDM::sanitizeResponse const): Cast to LChar.
(WebCore::MockCDMInstance::setServerCertificate): Ditto.
(WebCore::MockCDMInstanceSession::updateLicense): Ditto.

* Source/WebGPU/WGSL/Lexer.cpp:
(WGSL::Lexer<CharacterType>::makeToken): Moved from header.
(WGSL::Lexer<CharacterType>::makeFloatToken): Ditto.
(WGSL::Lexer<CharacterType>::makeIntegerToken): Ditto.
(WGSL::Lexer<CharacterType>::makeIdentifierToken): Ditto.

* Source/WebGPU/WGSL/Lexer.h: Simplified constructor to take a span instead
of a String, and moved some private function implementations out of the header.

* Source/WebGPU/WGSL/Parser.cpp:
(WGSL::parse): Changed to take CharacterType as the template argument rather than
the Lexer type, then we can pass a span into the Lexer instead of a String.

* Source/WebGPU/WGSL/ParserPrivate.h: Added include.

* Source/WebGPU/WebGPU/Pipeline.mm:
(WebKit::printToFileForPsoRepro): Cast to uint8_t.
* Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:
(WebKit::NetworkCache::Cache::dumpContentsToFile): Ditto.

* Source/WebKit/NetworkProcess/curl/WebSocketTaskCurl.cpp:
(WebKit::WebSocketTask::didReceiveData): Cast to char8_t. Simplify some
code that is extracting bytes.
(WebKit::WebSocketTask::sendClosingHandshakeIfNeeded): Ditto.

* Source/WebKit/NetworkProcess/storage/CacheStorageManager.cpp:
(WebKit::readSizeFile): Cast to LChar.

* Source/WebKit/Platform/IPC/DaemonCoders.h:
(WebKit::Daemon::Coder<WTF::String>::encode): Cast to uint8_t.

* Source/WebKit/Platform/IPC/DaemonDecoder.h:
(WebKit::Daemon::Decoder::bufferIsLargeEnoughToContain const):
Update assertion to allow LChar.

* Source/WebKit/Shared/API/c/cf/WKStringCF.mm:
(WKStringCopyCFString): Cast to UInt8.

* Source/WebKit/Shared/Cocoa/SandboxExtensionCocoa.mm:
(WebKit::SandboxExtensionImpl::SandboxExtensionImpl): Cast to LChar.

* Source/WebKit/UIProcess/API/APIContentRuleListStore.cpp:
(API::getContentRuleListSourceFromMappedFile): Cast to LChar.

* Source/WebKit/UIProcess/API/C/WKPage.cpp:
(dataFrom): Cast to uint8_t.

* Source/WebKit/UIProcess/Cocoa/WebPasteboardProxyCocoa.mm:
(WebKit::WebPasteboardProxy::testIPCSharedMemory): Cast to LChar.

* Source/WebKit/UIProcess/Inspector/glib/RemoteInspectorClient.cpp:
(WebKit::RemoteInspectorClient::setBackendCommands): Cast to std::byte.

* Source/WebKit/UIProcess/Inspector/mac/RemoteWebInspectorUIProxyMac.mm:
(WebKit::RemoteWebInspectorUIProxy::platformLoad): Cast to LChar.

* Source/WebKit/UIProcess/Inspector/mac/WebInspectorUIProxyMac.mm:
(WebKit::WebInspectorUIProxy::platformLoad): Cast to LChar.

* Source/WebKit/UIProcess/wpe/WebPasteboardProxyWPE.cpp:
(WebKit::WebPasteboardProxy::readURLFromPasteboard): Cast to LChar.

* Source/WebKit/WebProcess/Network/webrtc/RTCDataChannelRemoteManager.cpp:
(WebKit::RTCDataChannelRemoteManager::sendData): Cast to LChar.

* Source/WebKit/WebProcess/cocoa/WebProcessCocoa.mm:
(WebKit::registerLogClient): Cast to uint8_t.

* Source/WebKitLegacy/WebCoreSupport/WebSocketChannel.cpp:
(WebCore::WebSocketChannel::processFrame): Cast to char8_t.

* Tools/Scripts/webkitpy/style/checkers/cpp.py:
(check_spacing): Add missing operator| check.

* Tools/TestWebKitAPI/Tests/WGSL/ConstLiteralTests.cpp:
(parseLCharPrimaryExpression): Use ASCIILiteral.

* Tools/TestWebKitAPI/Tests/WGSL/LexerTests.cpp:
(TestWGSLAPI::TestLexer::TestLexer): Use ASCIILiteral.
(TestWGSLAPI::checkSingleToken): Ditto.
(TestWGSLAPI::checkSingleIntegerLiteral): Ditto.
(TestWGSLAPI::checkSingleFloatLiteral): Ditto.

* Tools/TestWebKitAPI/Tests/WTF/Base64.cpp:
(TestWebKitAPI::TEST(Base64, Encode)): Cast to uint8_t.
(TestWebKitAPI::TEST(Base64, EncodeOmitPadding)): Ditto.
(TestWebKitAPI::TEST(Base64, EncodeURL)): Ditto.
(TestWebKitAPI::TEST(Base64, EncodeURLOmitPadding)): Ditto.

* Tools/TestWebKitAPI/Tests/WTF/FileSystem.cpp:
(TestWebKitAPI::createTestFile): Cast to uint8_t.
(TestWebKitAPI::TEST_F(FileSystemTest, openExistingFileTruncate)): Ditto.
(TestWebKitAPI::TEST_F(FileSystemTest, openExistingFileReadWrite)): Ditto.
(TestWebKitAPI::TEST_F(FileSystemTest, deleteEmptyDirectoryContainingDSStoreFile)): Ditto.
(TestWebKitAPI::TEST_F(FileSystemTest, deleteEmptyDirectoryOnNonEmptyDirectory)): Ditto.
(TestWebKitAPI::TEST_F(FileSystemTest, moveDirectory)): Ditto.
(TestWebKitAPI::runGetFileModificationTimeTest): Ditto.
(TestWebKitAPI::TEST_F(FileSystemTest, readEntireFile)): Ditto.

* Tools/TestWebKitAPI/Tests/WTF/StringImpl.cpp:
(TestWebKitAPI::TEST(WTF, ExternalStringImplCreate8bit)): Use char and cast to LChar.
(TestWebKitAPI::TEST(WTF, ExternalStringAtom)): Ditto.

* Tools/TestWebKitAPI/Tests/WTF/StringView.cpp:
(TestWebKitAPI::TEST(WTF, StringViewEqualIgnoringASCIICaseWithLatin1Characters)):
Use byteCast instead of reinterpret_cast.

* Tools/TestWebKitAPI/Tests/WTF/cocoa/URLExtras.mm:
(TestWebKitAPI::dataAsString): Pass a character instead of an int.

* Tools/TestWebKitAPI/Tests/WebCore/FileMonitor.cpp:
(TestWebKitAPI::readContentsOfFile): Cast to LChar.

* Tools/TestWebKitAPI/Tests/WebCore/PushMessageCrypto.cpp:
(TestWebKitAPI::TEST(PushMessageCrypto, AES128GCMPayloadWithMinimalPadding)): Eliminate
use of String::adopt.
(TestWebKitAPI::TEST(PushMessageCrypto, AES128GCMPayloadWithPadding)): Ditto.
(TestWebKitAPI::TEST(PushMessageCrypto, AESGCMPayloadWithMinimalPadding)): Ditto.
(TestWebKitAPI::TEST(PushMessageCrypto, AESGCMPayloadWithPadding)): Ditto.

* Tools/TestWebKitAPI/Tests/WebCore/SharedBuffer.cpp:
(TestWebKitAPI::TEST_F(FragmentedSharedBufferTest, createWithContentsOfExistingFile)):
Cast to LChar.
(TestWebKitAPI::TEST_F(FragmentedSharedBufferTest, read)): Ditto.
(TestWebKitAPI::TEST_F(SharedBufferChunkReaderTest, includeSeparator)): Do not mix
uint8_t with LChar.
(TestWebKitAPI::TEST_F(SharedBufferChunkReaderTest, peekData)): Cast to LChar.

* Tools/TestWebKitAPI/Tests/WebCore/SharedBufferTest.cpp:
(TestWebKitAPI::FragmentedSharedBufferTest::SetUp): Cast to uint8_t.
* Tools/TestWebKitAPI/Tests/WebCore/curl/CurlMultipartHandleTests.cpp:
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, SimpleMessage)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoHeader)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoBody)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, TransportPadding)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoEndOfBoundary)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoEndOfBoundaryAfterCompleted)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoCloseDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoCloseDelimiterAfterCompleted)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, CloseDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, CloseDelimiterAfterCompleted)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideFirstDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideSecondDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideLastDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideCloseDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideTransportPadding)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideHeader)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideBody)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, CompleteWhileHeaderProcessing)): Ditto.
* Tools/TestWebKitAPI/Tests/WebKitCocoa/IndexedDBPersistence.mm:
(-[IndexedDBOpenPanelUIDelegate webView:runOpenPanelWithParameters:initiatedByFrame:completionHandler:]):
Ditto.

* Tools/TestWebKitAPI/Tests/WebKitCocoa/WebPushDaemon.mm: Cast to LChar.

* Tools/WebKitTestRunner/StringFunctions.h:
(WTR::toWTFString): Use char8_t and char.

Canonical link: https://commits.webkit.org/300282@main

3a139dd

Misc iOS, visionOS, tvOS & watchOS macOS Linux Windows Apple Internal
✅ 🧪 style ✅ 🛠 ios ✅ 🛠 mac ✅ 🛠 wpe ✅ 🛠 win ✅ 🛠 ios-apple
✅ 🧪 bindings ✅ 🛠 ios-sim ✅ 🛠 mac-AS-debug 🧪 wpe-wk2 ✅ 🧪 win-tests ✅ 🛠 mac-apple
✅ 🧪 webkitperl ✅ 🧪 ios-wk2 ✅ 🧪 api-mac ✅ 🧪 api-wpe ✅ 🛠 vision-apple
✅ 🧪 webkitpy ✅ 🧪 ios-wk2-wpt ✅ 🧪 mac-wk1 ✅ 🛠 wpe-cairo
✅ 🛠 🧪 jsc ✅ 🧪 api-ios ✅ 🧪 mac-wk2 ✅ 🛠 gtk
✅ 🛠 🧪 jsc-arm64 ✅ 🛠 vision 🧪 mac-AS-debug-wk2 ✅ 🧪 gtk-wk2
✅ 🛠 vision-sim ✅ 🧪 mac-wk2-stress ✅ 🧪 api-gtk
✅ 🛠 🧪 merge ✅ 🧪 vision-wk2 ✅ 🧪 mac-intel-wk2 ✅ 🛠 playstation
✅ 🛠 tv ✅ 🛠 mac-safer-cpp ✅ 🛠 jsc-armv7
✅ 🛠 tv-sim ❌ 🧪 jsc-armv7-tests
✅ 🛠 watch
✅ 🛠 watch-sim

@darinadler darinadler self-assigned this Jul 27, 2025
@darinadler darinadler changed the title I was usMake LChar a distinct type from uint8_t so it can imply character encoding as char8/16/32_t do Make LChar a distinct type from uint8_t so it can imply character encoding as char8/16/32_t do Jul 27, 2025
@darinadler darinadler force-pushed the eng/Make-LChar-a-distinct-type-from-uint8_t-so-it-can-imply-character-encoding-as-char8-16-32_t-do branch from 3ac13b1 to 2f75985 Compare July 27, 2025 20:24
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Jul 27, 2025
@darinadler darinadler removed the merging-blocked Applied to prevent a change from being merged label Jul 28, 2025
@darinadler darinadler force-pushed the eng/Make-LChar-a-distinct-type-from-uint8_t-so-it-can-imply-character-encoding-as-char8-16-32_t-do branch from 2f75985 to bb9dd3c Compare July 28, 2025 00:36
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Jul 28, 2025
@darinadler darinadler removed the merging-blocked Applied to prevent a change from being merged label Jul 28, 2025
@darinadler darinadler force-pushed the eng/Make-LChar-a-distinct-type-from-uint8_t-so-it-can-imply-character-encoding-as-char8-16-32_t-do branch from bb9dd3c to 0aa73b0 Compare July 28, 2025 00:53
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Jul 28, 2025
@darinadler darinadler removed the merging-blocked Applied to prevent a change from being merged label Jul 28, 2025
@darinadler darinadler force-pushed the eng/Make-LChar-a-distinct-type-from-uint8_t-so-it-can-imply-character-encoding-as-char8-16-32_t-do branch from 0aa73b0 to f550a51 Compare July 28, 2025 01:30
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Jul 28, 2025
@darinadler darinadler removed the merging-blocked Applied to prevent a change from being merged label Sep 7, 2025
@darinadler darinadler force-pushed the eng/Make-LChar-a-distinct-type-from-uint8_t-so-it-can-imply-character-encoding-as-char8-16-32_t-do branch from f550a51 to f02d9e5 Compare September 7, 2025 18:03
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Sep 7, 2025
@darinadler darinadler removed the merging-blocked Applied to prevent a change from being merged label Sep 7, 2025
@darinadler darinadler force-pushed the eng/Make-LChar-a-distinct-type-from-uint8_t-so-it-can-imply-character-encoding-as-char8-16-32_t-do branch from f02d9e5 to 4903612 Compare September 7, 2025 19:19
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Sep 7, 2025
@darinadler darinadler removed the merging-blocked Applied to prevent a change from being merged label Sep 7, 2025
@darinadler darinadler force-pushed the eng/Make-LChar-a-distinct-type-from-uint8_t-so-it-can-imply-character-encoding-as-char8-16-32_t-do branch from 4903612 to bfbf1fa Compare September 7, 2025 19:46
Copy link
Contributor

@cdumez cdumez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice.

}
if (frame.payload.size() >= 3)
m_closeEventReason = String::fromUTF8({ &frame.payload[2], frame.payload.size() - 2 });
m_closeEventReason = String::fromUTF8(byteCast<char8_t>(frame.payload.subspan(2)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should no longer need ::fromUTF8() here right?

static void createTestFile(const String& path)
{
auto written = FileSystem::overwriteEntireFile(path, FileSystemTestData.span8());
auto written = FileSystem::overwriteEntireFile(path, byteCast<uint8_t>(FileSystemTestData.span8()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in the future we'll want to use std::byte for this sort of thing?

Copy link
Member Author

@darinadler darinadler Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's right.

We also will probably want to rethink names like span8 since that could mean any 8-bit type. In the context of a String it can mean LChar but elsewhere maybe not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to use span<T>() with a template parameter for clarity in the future instead of span8() / span16().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it’s really cool to have the actual type of the span be there at the call site. The implementation doesn’t really have to be generic, can just be the two functions but use template syntax.

{
constexpr LChar buffer[] = "hello";
constexpr size_t bufferStringLength = sizeof(buffer) - 1;
static constinit char buffer[] = "hello";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constinit vs constexpr

constexpr LChar buffer[] = "hello";
constexpr size_t bufferStringLength = sizeof(buffer) - 1;
static constinit char buffer[] = "hello";
static constinit size_t bufferStringLength = sizeof(buffer) - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constinit vs constexpr

static const std::array<LChar, 16>& hexDigitsForMode(HexConversionMode mode)
{
static constinit std::array<LChar, 16> lowercaseHexDigits { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
static constinit std::array<LChar, 16> uppercaseHexDigits { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should stay constexpr. constinit really should only be used when constant global initialization of a then mutable value is needed.

}

template<typename CharacterType1, typename CharacterType2, std::enable_if_t<std::is_integral_v<CharacterType1> && std::is_integral_v<CharacterType2>>* = nullptr>
template<IsFindableCharacter CharacterType1, IsFindableCharacter CharacterType2>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loving the adopting of concepts here. Much more clear!

}

// Construct a string with Latin-1 data.
String::String(std::span<const char> characters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this one? Ideally we would get rid of it and make callers be explicit here.

String json { buffer.span() };

auto value = JSON::Value::parseJSON(json);
auto value = JSON::Value::parseJSON(byteCast<LChar>(buffer.span()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not new here, but worth a FIXME. The spec link above indicates this should be interpreted as UTF-8.

// Check if the third message is a multi-lines string, concatenating such message would look ugly in log events.
if (values.size() >= 3 && values[2].value.find("\r\n"_s) != notFound)
event = generateJSONLogEvent(MessageLogEvent { values[1].value, { values[2].value.span8() } }, false);
event = generateJSONLogEvent(MessageLogEvent { values[1].value, { byteCast<uint8_t>(values[2].value.span8()) } }, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not new, and necessary to change now, but the name span8() leaves a lot to be desired (probably should call it something like spanLatin1()). I think using the helper span<LChar>() might help here, though I wonder why it is safe to assume this string is latin-1.

return false;
}

const CString headerValueData = headerValue.utf8();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we need to replace CString with explicit UTF8String and Latin1String classes and remove the need for this kind of casting.

return input;
return makeString(input.first(maxInputSampleSize), horizontalEllipsis);
return byteCast<LChar>(input);
return makeString(byteCast<LChar>(input.first(maxInputSampleSize)), horizontalEllipsis);
Copy link
Contributor

@weinig weinig Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is LChar right here, as opposed to UTF-8? (not new, just if you know, might be worth a comment). Same below.


StringView statusCodeString(header.subspan(*firstSpaceIndex + 1, *secondSpaceIndex - *firstSpaceIndex - 1));
StringView statusCodeString(byteCast<LChar>(header.subspan(*firstSpaceIndex + 1, *secondSpaceIndex - *firstSpaceIndex - 1)));
if (statusCodeString.length() != 3) // Status code must consist of three digits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the StringView doing anything useful here other than taking a few branches? (Explicit use of String::length() or StringView::length() like this always makes me suspicious).

String jwkString(bytes.span());
JSLockHolder locker(vm);
auto jwkObject = JSONParse(&state, jwkString);
auto jwkObject = JSONParse(&state, byteCast<LChar>(bytes.span()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another case where this should probably be UTF-8.

Vector<uint8_t> bodyData("body="_span);
FormDataBuilder::encodeStringAsFormData(bodyData, body.utf8());
body = makeStringByReplacingAll(bodyData.span(), '+', "%20"_s);
body = makeStringByReplacingAll(byteCast<LChar>(bodyData.span()), '+', "%20"_s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing, but I guess FormDataBuilder::encodeStringAsFormData() converts the utf-8 body into ascii or latin-1?

Should we make FormDataBuilder::encodeStringAsFormData just take a Vector<LChar> then?

Copy link
Contributor

@weinig weinig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been fascinating to review. My major takeaways for future work here:

  • Most uses of byteCast<LChar> probably need to looked at after this lands and checked to make sure they are right. While many are probably fine (passing along some base64 encoded string), I am not super confident in quite a few. The explicitness here is great!

  • Would love for us to adopt std::byte instead of uint8_t, just to make it more clear (uint8_t looks a bit to close to char8_t for me).

  • Would love to use long names for these character types, even the standard ones.

    • LChar -> Latin1Character
    • char8_t -> UTF8Character`
    • char16_t -> UTF16Character`
      We have this historical exception for this type using abbreviations, but I don't think it warrants it.

Copy link
Contributor

@geoffreygaren geoffreygaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of this change! Previously, it took me an embarrassingly long time to "decode" WebKit's approach to character types.

@mcatanzaro
Copy link
Contributor

I like the proposal to rename LChar to Latin1Character.

Using longer custom names for char8_t and char16_t seems much less valuable, since these are standard types that C++ developers ought to be familiar with already.

@cdumez
Copy link
Contributor

cdumez commented Sep 17, 2025

I like the proposal to rename LChar to Latin1Character.

Using longer custom names for char8_t and char16_t seems much less valuable, since these are standard types that C++ developers ought to be familiar with already.

I don't know, char8_t wasn't clear to me initially as representing UTF-8 🤷🏻

@darinadler darinadler force-pushed the eng/Make-LChar-a-distinct-type-from-uint8_t-so-it-can-imply-character-encoding-as-char8-16-32_t-do branch from 3912a64 to 3a139dd Compare September 20, 2025 15:49
String jsonData = String::fromUTF8(data);

auto messageValue = JSON::Value::parseJSON(jsonData);
auto messageValue = JSON::Value::parseJSON(String { byteCast<char8_t>(data.span()) });
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized I could have simplified this further:

Suggested change
auto messageValue = JSON::Value::parseJSON(String { byteCast<char8_t>(data.span()) });
auto messageValue = JSON::Value::parseJSON(byteCast<char8_t>(data.span()));

@darinadler darinadler added the merge-queue Applied to send a pull request to merge-queue label Sep 20, 2025
…oding as char8/16/32_t do

https://bugs.webkit.org/show_bug.cgi?id=296539
rdar://156856072

Reviewed by Chris Dumez, Sam Weinig, and Geoffrey Garen.

LChar is now a struct so it can be a distinct type, so LChar means Latin-1, char8_t means UTF-8,
and uint8_t and char remain ambiguous about encoding.

Tried to mostly stay with the minimum to get things compiling, without a lot of "cleanup".

As part of this made it possible to construct String directly from std::span<char8_t>
without having to utter String::fromUTF8 since the type is unambiguous.

We should follow up by removing more overloads and functions that interpret uint8_t, char, or
even std::byte as particular encodings, and use byteCast to make encoding explicit.

* Source/JavaScriptCore/API/JSScript.mm:
(+[JSScript scriptOfType:memoryMappedFromASCIIFile:withSourceURL:andBytecodeCache:inVirtualMachine:error:]):
Cast to LChar.

* Source/JavaScriptCore/API/JSStringRefCF.cpp:
(JSStringCreateWithCFString): Cast to UInt8.
(JSStringCopyCFString): Ditto.

* Source/JavaScriptCore/Scripts/xxd.pl:
Use constexpr std::array<LChar> for C++.

* Source/JavaScriptCore/heap/HeapSnapshotBuilder.cpp:
(JSC::HeapSnapshotBuilder::json): Rely on serialization of enum as a numeral,
rather than explicitly calling edgeTypeToNumber.

* Source/JavaScriptCore/inspector/remote/socket/RemoteInspectorConnectionClient.cpp:
(Inspector::RemoteInspectorConnectionClient::extractEvent): Cast to char8_t.

* Source/JavaScriptCore/inspector/remote/socket/RemoteInspectorSocket.cpp:
(Inspector::RemoteInspector::backendCommands const): Eliminate use of String::adopt.
It doesn't really work for vectors any more, and likely we should remove it to avoid
making a promise we can't keep. It doesn't work with byteCast, which is why we need
to do this here now.

* Source/JavaScriptCore/parser/Lexer.cpp:
(JSC::Lexer<T>::peek const): Tweak so the conditional operator compiles.
(JSC::Lexer<T>::parseString): Use SIMD::SizedUnsigned.
(JSC::Lexer<T>::parseStringSlowCase): Removed a static assertion that's not
super important and a bit tricky to write given that LChar is no longer a scalar.
(JSC::Lexer<T>::lexWithoutClearingLineTerminator): Use SIMD::SizedUnsigned.
* Source/JavaScriptCore/runtime/ISO8601.h: Cast to char.

* Source/JavaScriptCore/runtime/IntlObject.cpp:
(JSC::parseVariantCode): Removed a static assertion that's not super important
and a bit tricky to write given that LChar is no longer a scalar.

* Source/JavaScriptCore/runtime/JSGenericTypedArrayViewPrototype.cpp:
(JSC::uint8ArrayPrototypeToHex): Cast to uint8_t.

* Source/JavaScriptCore/runtime/JSONObject.cpp:
(JSC::stringCopySameType): Use SIMD::SizedUnsigned.
(JSC::stringCopyUpconvert): Use uint8_t directly.

* Source/JavaScriptCore/runtime/LiteralParser.cpp:
(JSC::reviverMode>::Lexer::lexString): Reduce mixing char with LChar a bit.
Also use SIMD::SizedUnsigned.

* Source/WTF/wtf/HexNumber.cpp:
(WTF::Internal::hexDigitsForMode): Moved this here from the header since it's
only used here.

* Source/WTF/wtf/HexNumber.h:
(WTF::Internal::appendHex): Added an overload that takes LChar and forwards all
the other arguments.

* Source/WTF/wtf/JSONValues.cpp: Use char16_t for local since that works with
a switch statement, LChar would not since it's a struct rather than a scalar.

* Source/WTF/wtf/PrintStream.h:
(WTF::printInternal): Added char8_t overload.

* Source/WTF/wtf/SIMDHelpers.h: Added SizedUnsigned, which works for LChar.
(WTF::SIMD::find): Use SizedUnsigned.
(WTF::SIMD::count): Ditto.

* Source/WTF/wtf/SortedArrayMap.h:
(WTF::foldForComparison): Use SIMD::SizedUnsigned.

* Source/WTF/wtf/StdLibExtras.h: Update ByteType to work with LChar. Also
tweaked it a little and renamed it IsByte and IsMutableByte.
(WTF::ByteCastTraits<T>::cast): Ditto.
(WTF::ByteCastTraits<T::cast): Ditto.
(WTF::byteCast): Ditto.

* Source/WTF/wtf/URLParser.cpp:
(WTF::URLParser::appendNumberToASCIIBuffer): Cast to char.
(WTF::URLParser::formURLDecode): Cast to char8_t.

* Source/WTF/wtf/cf/CFURLExtras.cpp:
(WTF::bytesAsString): Cast to UInt8.
(WTF::isSameOrigin): Cast to LChar.

* Source/WTF/wtf/cf/URLCF.cpp:
(WTF::URL::createCFURL): Cast to UInt8.

* Source/WTF/wtf/cocoa/NSURLExtras.mm:
(WTF::userVisibleString): Cast to LChar.

* Source/WTF/wtf/cocoa/SpanCocoa.h:
(WTF::toNSData): Added overloads to make this work with all IsByte types.
(WTF::toNSDataNoCopy): Ditto.

* Source/WTF/wtf/persistence/PersistentCoders.cpp:
(WTF::Persistence::Coder<String>::encodeForPersistence): Use asBytes to
convert result of span8 into bytes, not LChar.

* Source/WTF/wtf/persistence/PersistentDecoder.h:
(WTF::Persistence::Decoder::bufferIsLargeEnoughToContain const):
Update assertion to allow LChar.

* Source/WTF/wtf/text/ASCIIFastPath.h:
Redid the NonASCIIMask and NonLatin1Mask to not rely on specific types
for CharacterType and depend on the size of the type instead.

* Source/WTF/wtf/text/Base64.cpp:
(WTF::base64DecodeInternal): Take the vector type as the template parameter
instead of just the Malloc. Simplifies things a bit.
(WTF::base64DecodeToString): Updated for the above, also simplified by
removing the lambda since it reads well without it.

* Source/WTF/wtf/text/IntegerToStringConversion.h:
(WTF::numberToStringImpl): Cast to char.
(WTF::writeIntegerToBufferImpl): Ditto.

* Source/WTF/wtf/text/LChar.h: Make LChar a struct in the WTF namespace.
Added constexpr functions to smooth the use of this in code that treats
it as an integer. Added IsStringStorageCharacter concept so we can write
templates that work with LChar and char16_t without accidentally allowing
other types as well.

* Source/WTF/wtf/text/ParsingUtilities.h: Added an overload of skipWhile for char8_t.

* Source/WTF/wtf/text/StringBuilder.h:
(WTF::StringBuilder::append): Added an overload for uint8_t, needed since there is code
that depends on single argument append treating it as a character and other code that
depends on variadic append treating it as a numeral!
(WTF::StringBuilder::operator[] const): Tweak so the conditional operator compiles.

* Source/WTF/wtf/text/StringCommon.h: Added an IsFindableCharacter concept so
function templates compile for the correct types including LChar. Use SIMD::SizedUnsigned.
(WTF::equalLettersIgnoringASCIICaseWithLength):
(WTF::compareEach): Added a cast to int to silence a compiler warning about our
intentional use of bitwise with booleans.

* Source/WTF/wtf/text/StringHasher.h:
(WTF::StringHasher::DefaultConverter::convert): Fixed this so it works with LChar.

* Source/WTF/wtf/text/StringImpl.cpp:
(WTF::StringImpl::create): Added overload for char8_t spans.

* Source/WTF/wtf/text/StringImpl.h: Use IsStringStorageCharacter to make templates
type check a bit better.
(WTF::StringImpl::create): Added a new create function for UTF-8 strings
to move the logic from WTF::String to here.
(WTF::StringImpl::copyCharacters): Tweak to work with LChar.
(WTF::StringImpl::tryCreateUninitialized):
(WTF::StringImpl::at const): Tweak so the conditional operator compiles.
(WTF::StringImpl::createByReplacingInCharacters): Ditto.

* Source/WTF/wtf/text/WTFString.cpp:
(WTF::String::String): Added a constructor that takes a std::span<const char8_t>
for use instead of String::fromUTF8. Use StringImpl::create.
(WTF::String::ascii const): Tweak so the conditional operator compiles.
(WTF::String::fromUTF8ReplacingInvalidSequences): Use StringImpl::create.
(WTF::fromUTF8Impl): Deleted. Uses StringImpl instead.
(WTF::String::fromUTF8): Deleted the overload that takes const char8_t; instead
the template allows any byte type.

* Source/WTF/wtf/text/WTFString.h: Added the new constructor for span<char8_t>
and rearranged from functions so they work with more types. Since the explicitly
state the encoding, we can allow them to take any byte type.

* Source/WTF/wtf/text/cf/StringCF.cpp:
(WTF::String::String): Cast to UInt8.
* Source/WTF/wtf/text/cf/StringImplCF.cpp:
(WTF::StringImpl::createCFString): Ditto.
* Source/WTF/wtf/text/cf/StringViewCF.cpp:
(WTF::StringView::createCFString const): Ditto.
(WTF::StringView::createCFStringWithoutCopying const): Ditto.

* Source/WebCore/DerivedSources-input.xcfilelist: Updated for scripts from
JavaScriptCore that were always used but are now dependencies.

* Source/WebCore/DerivedSources.make: Added missing dependencies in the rules
that produce XMLViewerCSS.h and XMLViewerJS.h so they are regenerated if the
scripts that produce them change, as with this patch that changes xxd.pl.

* Source/WebCore/Modules/encryptedmedia/InitDataRegistry.cpp:
(WebCore::extractKeyIDsKeyids): Cast to LChar to pass to parseJSON and remove
the unnecessary copy into a temporary String.

* Source/WebCore/Modules/mediastream/PeerConnectionBackend.cpp:
(WebCore::PeerConnectionBackend::handleLogMessage): Cast to uint8_t.
* Source/WebCore/Modules/mediastream/RTCRtpSFrameTransformerCocoa.cpp:
(WebCore::RTCRtpSFrameTransformer::computeSaltKey): Ditto.
(WebCore::createBaseSFrameKey): Ditto.
(WebCore::RTCRtpSFrameTransformer::computeAuthenticationKey): Ditto.
(WebCore::RTCRtpSFrameTransformer::computeEncryptionKey): Ditto.
* Source/WebCore/Modules/mediastream/gstreamer/GStreamerDtlsTransportBackend.cpp:
(WebCore::GStreamerDtlsTransportBackendObserver::stateChanged): Ditto.
* Source/WebCore/Modules/push-api/PushMessageCrypto.cpp:
(WebCore::PushCrypto::decryptAES128GCMPayload): Ditto.
(WebCore::PushCrypto::decryptAESGCMPayload): Ditto.

* Source/WebCore/Modules/url-pattern/URLPatternParser.cpp:
(WebCore::URLPatternUtilities::escapeRegexStringForCharacters): Specify the type
and size of the array to make it compile.
(WebCore::URLPatternUtilities::escapePatternStringForCharacters): Ditto.

* Source/WebCore/Modules/websockets/WebSocketExtensionDispatcher.cpp:
(WebCore::WebSocketExtensionDispatcher::processHeaderValue): Use char8_t
since the code currently parses a UTF-8 representation of the header value.

* Source/WebCore/Modules/websockets/WebSocketExtensionParser.cpp:
(WebCore::isSpaceOrTab): Use char8_t, for now at least.
* Source/WebCore/Modules/websockets/WebSocketExtensionParser.h: Ditto.

* Source/WebCore/Modules/websockets/WebSocketHandshake.cpp:
(WebCore::trimInputSample): Cast to LChar.
(WebCore::WebSocketHandshake::readStatusLine): Ditto.

* Source/WebCore/PAL/PAL.xcodeproj/project.pbxproj: Removed Gunzip.cpp/h.
* Source/WebCore/PAL/pal/CMakeLists.txt: Removed Gunzip.h
* Source/WebCore/PAL/pal/Gunzip.h: Removed.
* Source/WebCore/PAL/pal/PlatformMac.cmake: Removed Gunzip.cpp.
* Source/WebCore/PAL/pal/cocoa/Gunzip.cpp: Removed.

* Source/WebCore/bindings/js/ScriptBufferSourceProvider.h: Cast to LChar.
* Source/WebCore/bindings/js/SerializedScriptValue.cpp:
(WebCore::CloneDeserializer::readString): Ditto.

* Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp:
(WebCore::ContentExtensions::DFABytecodeInterpreter::interpretJumpTable):
Cast to char so the conditional operator compiles.
(WebCore::ContentExtensions::DFABytecodeInterpreter::interpret): DIto.

* Source/WebCore/crypto/SubtleCrypto.cpp:
(WebCore::SubtleCrypto::unwrapKey): Cast to LChar to pass to JSONParse and remove
the unnecessary copy into a temporary String.

* Source/WebCore/editing/cocoa/WebContentReaderCocoa.mm:
(WebCore::replaceRichContentWithAttachments): Cast to LChar.
* Source/WebCore/fileapi/FileReaderLoader.cpp:
(WebCore::FileReaderLoader::stringResult): Ditto.
* Source/WebCore/html/FTPDirectoryDocument.cpp:
(WebCore::FTPDirectoryDocumentParser::loadDocumentTemplate): Ditto.

* Source/WebCore/html/parser/HTMLEntityParser.cpp:
(WebCore::StringParsingBufferSource::currentCharacter const): Tweak so
the conditional operator compiles.
* Source/WebCore/html/track/VTTScanner.h:
(WebCore::VTTScanner::currentChar const): Ditto.

* Source/WebCore/html/track/WebVTTParser.cpp:
(WebCore::WebVTTParser::fileFinished): Cast to uint8_t.

* Source/WebCore/loader/FTPDirectoryParser.cpp:
(WebCore::parseOneFTPLine): Cast to LChar.

* Source/WebCore/loader/FormSubmission.cpp:
(WebCore::appendMailtoPostFormDataToURL): Cast to LChar.
(WebCore::FormSubmission::create): Ditto.

* Source/WebCore/loader/TextResourceDecoder.cpp:
(WebCore::findXMLEncoding): Cast to uint8_t.
(WebCore::TextResourceDecoder::checkForCSSCharset): Cast to uint8_t and LChar.
(WebCore::TextResourceDecoder::checkForHeadCharset): Ditto.

* Source/WebCore/loader/cache/CachedScript.cpp:
(WebCore::CachedScript::script): Cast to LChar.
(WebCore::CachedScript::codeBlockHashConcurrently): Ditto.

* Source/WebCore/platform/encryptedmedia/CDMUtilities.cpp:
(WebCore::CDMUtilities::parseJSONObject): Cast to LChar to pass to parseJSON
and remove the unnecessary copy into a temporary String.
* Source/WebCore/platform/graphics/avfoundation/CDMFairPlayStreaming.cpp:
(WebCore::extractSinfData): Ditto.
(WebCore::CDMPrivateFairPlayStreaming::extractKeyIDsMpts): Ditto.

* Source/WebCore/platform/graphics/avfoundation/objc/CDMInstanceFairPlayStreamingAVFObjC.mm:
(WebCore::parseJSONValue): Cast to LChar.

* Source/WebCore/platform/graphics/freetype/FontCacheFreeType.cpp:
(WebCore::fontNameMapName): Cast to LChar.

* Source/WebCore/platform/graphics/gstreamer/eme/CDMThunder.cpp:
(WebCore::ParsedResponseMessage::ParsedResponseMessage): Cast to LChar.
(WebCore::CDMInstanceSessionThunder::loadSession): Ditto.

* Source/WebCore/platform/graphics/iso/ISOVTTCue.cpp:
Use char8_t for the buffer we are treating as UTF-8.

* Source/WebCore/platform/gstreamer/GStreamerElementHarness.cpp:
(WebCore::MermaidBuilder::span const): Cast to uint8_t.

* Source/WebCore/platform/image-decoders/png/PNGImageDecoder.cpp:
(WebCore::decodingWarning): Cast to char.
(WebCore::PNGImageDecoder::readChunks): Ditto.

* Source/WebCore/platform/mediarecorder/MediaRecorderPrivateMock.cpp:
(WebCore::MediaRecorderPrivateMock::fetchData): Cast to uint8_t.

* Source/WebCore/platform/network/HTTPParsers.cpp:
(WebCore::trimInputSample): Make into a non-template function.
(WebCore::parseHTTPHeader): Cast to LChar.

* Source/WebCore/platform/network/curl/OpenSSLHelper.cpp:
(OpenSSL::BIO::getDataAsString const): Cast to LChar.
(OpenSSL::toString): Ditto.
* Source/WebCore/platform/network/soup/CertificateInfoSoup.cpp:
(WebCore::CertificateInfo::summary const): Ditto.

* Source/WebCore/platform/text/SegmentedString.h:
(WebCore::SegmentedString::Substring::currentCharacter const):
Tweak to compile conditional operator.

* Source/WebCore/rendering/BreakLines.h:
(WebCore::BreakLines::nextBreakablePosition): Switch to use a simpler scoped
CharacterInfo struct instead of a struct template with conversions.

* Source/WebCore/testing/MockCDMFactory.cpp:
(WebCore::MockCDM::sanitizeResponse const): Cast to LChar.
(WebCore::MockCDMInstance::setServerCertificate): Ditto.
(WebCore::MockCDMInstanceSession::updateLicense): Ditto.

* Source/WebGPU/WGSL/Lexer.cpp:
(WGSL::Lexer<CharacterType>::makeToken): Moved from header.
(WGSL::Lexer<CharacterType>::makeFloatToken): Ditto.
(WGSL::Lexer<CharacterType>::makeIntegerToken): Ditto.
(WGSL::Lexer<CharacterType>::makeIdentifierToken): Ditto.

* Source/WebGPU/WGSL/Lexer.h: Simplified constructor to take a span instead
of a String, and moved some private function implementations out of the header.

* Source/WebGPU/WGSL/Parser.cpp:
(WGSL::parse): Changed to take CharacterType as the template argument rather than
the Lexer type, then we can pass a span into the Lexer instead of a String.

* Source/WebGPU/WGSL/ParserPrivate.h: Added include.

* Source/WebGPU/WebGPU/Pipeline.mm:
(WebKit::printToFileForPsoRepro): Cast to uint8_t.
* Source/WebKit/NetworkProcess/cache/NetworkCache.cpp:
(WebKit::NetworkCache::Cache::dumpContentsToFile): Ditto.

* Source/WebKit/NetworkProcess/curl/WebSocketTaskCurl.cpp:
(WebKit::WebSocketTask::didReceiveData): Cast to char8_t. Simplify some
code that is extracting bytes.
(WebKit::WebSocketTask::sendClosingHandshakeIfNeeded): Ditto.

* Source/WebKit/NetworkProcess/storage/CacheStorageManager.cpp:
(WebKit::readSizeFile): Cast to LChar.

* Source/WebKit/Platform/IPC/DaemonCoders.h:
(WebKit::Daemon::Coder<WTF::String>::encode): Cast to uint8_t.

* Source/WebKit/Platform/IPC/DaemonDecoder.h:
(WebKit::Daemon::Decoder::bufferIsLargeEnoughToContain const):
Update assertion to allow LChar.

* Source/WebKit/Shared/API/c/cf/WKStringCF.mm:
(WKStringCopyCFString): Cast to UInt8.

* Source/WebKit/Shared/Cocoa/SandboxExtensionCocoa.mm:
(WebKit::SandboxExtensionImpl::SandboxExtensionImpl): Cast to LChar.

* Source/WebKit/UIProcess/API/APIContentRuleListStore.cpp:
(API::getContentRuleListSourceFromMappedFile): Cast to LChar.

* Source/WebKit/UIProcess/API/C/WKPage.cpp:
(dataFrom): Cast to uint8_t.

* Source/WebKit/UIProcess/Cocoa/WebPasteboardProxyCocoa.mm:
(WebKit::WebPasteboardProxy::testIPCSharedMemory): Cast to LChar.

* Source/WebKit/UIProcess/Inspector/glib/RemoteInspectorClient.cpp:
(WebKit::RemoteInspectorClient::setBackendCommands): Cast to std::byte.

* Source/WebKit/UIProcess/Inspector/mac/RemoteWebInspectorUIProxyMac.mm:
(WebKit::RemoteWebInspectorUIProxy::platformLoad): Cast to LChar.

* Source/WebKit/UIProcess/Inspector/mac/WebInspectorUIProxyMac.mm:
(WebKit::WebInspectorUIProxy::platformLoad): Cast to LChar.

* Source/WebKit/UIProcess/wpe/WebPasteboardProxyWPE.cpp:
(WebKit::WebPasteboardProxy::readURLFromPasteboard): Cast to LChar.

* Source/WebKit/WebProcess/Network/webrtc/RTCDataChannelRemoteManager.cpp:
(WebKit::RTCDataChannelRemoteManager::sendData): Cast to LChar.

* Source/WebKit/WebProcess/cocoa/WebProcessCocoa.mm:
(WebKit::registerLogClient): Cast to uint8_t.

* Source/WebKitLegacy/WebCoreSupport/WebSocketChannel.cpp:
(WebCore::WebSocketChannel::processFrame): Cast to char8_t.

* Tools/Scripts/webkitpy/style/checkers/cpp.py:
(check_spacing): Add missing operator| check.

* Tools/TestWebKitAPI/Tests/WGSL/ConstLiteralTests.cpp:
(parseLCharPrimaryExpression): Use ASCIILiteral.

* Tools/TestWebKitAPI/Tests/WGSL/LexerTests.cpp:
(TestWGSLAPI::TestLexer::TestLexer): Use ASCIILiteral.
(TestWGSLAPI::checkSingleToken): Ditto.
(TestWGSLAPI::checkSingleIntegerLiteral): Ditto.
(TestWGSLAPI::checkSingleFloatLiteral): Ditto.

* Tools/TestWebKitAPI/Tests/WTF/Base64.cpp:
(TestWebKitAPI::TEST(Base64, Encode)): Cast to uint8_t.
(TestWebKitAPI::TEST(Base64, EncodeOmitPadding)): Ditto.
(TestWebKitAPI::TEST(Base64, EncodeURL)): Ditto.
(TestWebKitAPI::TEST(Base64, EncodeURLOmitPadding)): Ditto.

* Tools/TestWebKitAPI/Tests/WTF/FileSystem.cpp:
(TestWebKitAPI::createTestFile): Cast to uint8_t.
(TestWebKitAPI::TEST_F(FileSystemTest, openExistingFileTruncate)): Ditto.
(TestWebKitAPI::TEST_F(FileSystemTest, openExistingFileReadWrite)): Ditto.
(TestWebKitAPI::TEST_F(FileSystemTest, deleteEmptyDirectoryContainingDSStoreFile)): Ditto.
(TestWebKitAPI::TEST_F(FileSystemTest, deleteEmptyDirectoryOnNonEmptyDirectory)): Ditto.
(TestWebKitAPI::TEST_F(FileSystemTest, moveDirectory)): Ditto.
(TestWebKitAPI::runGetFileModificationTimeTest): Ditto.
(TestWebKitAPI::TEST_F(FileSystemTest, readEntireFile)): Ditto.

* Tools/TestWebKitAPI/Tests/WTF/StringImpl.cpp:
(TestWebKitAPI::TEST(WTF, ExternalStringImplCreate8bit)): Use char and cast to LChar.
(TestWebKitAPI::TEST(WTF, ExternalStringAtom)): Ditto.

* Tools/TestWebKitAPI/Tests/WTF/StringView.cpp:
(TestWebKitAPI::TEST(WTF, StringViewEqualIgnoringASCIICaseWithLatin1Characters)):
Use byteCast instead of reinterpret_cast.

* Tools/TestWebKitAPI/Tests/WTF/cocoa/URLExtras.mm:
(TestWebKitAPI::dataAsString): Pass a character instead of an int.

* Tools/TestWebKitAPI/Tests/WebCore/FileMonitor.cpp:
(TestWebKitAPI::readContentsOfFile): Cast to LChar.

* Tools/TestWebKitAPI/Tests/WebCore/PushMessageCrypto.cpp:
(TestWebKitAPI::TEST(PushMessageCrypto, AES128GCMPayloadWithMinimalPadding)): Eliminate
use of String::adopt.
(TestWebKitAPI::TEST(PushMessageCrypto, AES128GCMPayloadWithPadding)): Ditto.
(TestWebKitAPI::TEST(PushMessageCrypto, AESGCMPayloadWithMinimalPadding)): Ditto.
(TestWebKitAPI::TEST(PushMessageCrypto, AESGCMPayloadWithPadding)): Ditto.

* Tools/TestWebKitAPI/Tests/WebCore/SharedBuffer.cpp:
(TestWebKitAPI::TEST_F(FragmentedSharedBufferTest, createWithContentsOfExistingFile)):
Cast to LChar.
(TestWebKitAPI::TEST_F(FragmentedSharedBufferTest, read)): Ditto.
(TestWebKitAPI::TEST_F(SharedBufferChunkReaderTest, includeSeparator)): Do not mix
uint8_t with LChar.
(TestWebKitAPI::TEST_F(SharedBufferChunkReaderTest, peekData)): Cast to LChar.

* Tools/TestWebKitAPI/Tests/WebCore/SharedBufferTest.cpp:
(TestWebKitAPI::FragmentedSharedBufferTest::SetUp): Cast to uint8_t.
* Tools/TestWebKitAPI/Tests/WebCore/curl/CurlMultipartHandleTests.cpp:
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, SimpleMessage)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoHeader)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoBody)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, TransportPadding)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoEndOfBoundary)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoEndOfBoundaryAfterCompleted)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoCloseDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, NoCloseDelimiterAfterCompleted)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, CloseDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, CloseDelimiterAfterCompleted)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideFirstDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideSecondDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideLastDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideCloseDelimiter)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideTransportPadding)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideHeader)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, DivideBody)): Ditto.
(TestWebKitAPI::Curl::TEST(CurlMultipartHandleTests, CompleteWhileHeaderProcessing)): Ditto.
* Tools/TestWebKitAPI/Tests/WebKitCocoa/IndexedDBPersistence.mm:
(-[IndexedDBOpenPanelUIDelegate webView:runOpenPanelWithParameters:initiatedByFrame:completionHandler:]):
Ditto.

* Tools/TestWebKitAPI/Tests/WebKitCocoa/WebPushDaemon.mm: Cast to LChar.

* Tools/WebKitTestRunner/StringFunctions.h:
(WTR::toWTFString): Use char8_t and char.

Canonical link: https://commits.webkit.org/300282@main
@webkit-commit-queue webkit-commit-queue force-pushed the eng/Make-LChar-a-distinct-type-from-uint8_t-so-it-can-imply-character-encoding-as-char8-16-32_t-do branch from 3a139dd to 774c0dc Compare September 20, 2025 17:49
@webkit-commit-queue
Copy link
Collaborator

Committed 300282@main (774c0dc): https://commits.webkit.org/300282@main

Reviewed commits have been landed. Closing PR #48579 and removing active labels.

@webkit-commit-queue webkit-commit-queue merged commit 774c0dc into WebKit:main Sep 20, 2025
@webkit-commit-queue webkit-commit-queue removed the merge-queue Applied to send a pull request to merge-queue label Sep 20, 2025
@darinadler darinadler deleted the eng/Make-LChar-a-distinct-type-from-uint8_t-so-it-can-imply-character-encoding-as-char8-16-32_t-do branch September 20, 2025 18:02
@mcatanzaro
Copy link
Contributor

I don't know, char8_t wasn't clear to me initially as representing UTF-8 🤷🏻

After spending a bit of time in the char*_t weeds, I again more strongly suggest that we stick with standard C++ types. They're a little new, but we will eventually get used to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants