Skip to content

Use std::span and char8_t for UTF-8#26377

Merged
webkit-commit-queue merged 1 commit intoWebKit:mainfrom
darinadler:eng/Use-stdspan-and-char8_t-for-UTF-8
Apr 9, 2024
Merged

Use std::span and char8_t for UTF-8#26377
webkit-commit-queue merged 1 commit intoWebKit:mainfrom
darinadler:eng/Use-stdspan-and-char8_t-for-UTF-8

Conversation

@darinadler
Copy link
Member

@darinadler darinadler commented Mar 23, 2024

353a203

Use std::span and char8_t for UTF-8
https://bugs.webkit.org/show_bug.cgi?id=271527
rdar://problem/125298503

Reviewed by Chris Dumez.

An advantage of using char8_t is that it expresses the fact that
the characters are UTF-8, so we can just use std::span<char8_t>
instead of things like FromUTF8. Also, makeString never returns
a null string, which turns out to be what many of these callers
want, so we save code.

* Source/JavaScriptCore/API/JSStringRef.cpp:
(JSStringCreateWithUTF8CString): Cast to char8_t.

* Source/JavaScriptCore/wasm/WasmFormat.cpp:
(JSC::Wasm::makeString): Use makeString instead of String::fromUTF8.

* Source/JavaScriptCore/wasm/WasmName.h: Use char8_t.

* Source/JavaScriptCore/wasm/WasmParser.h:
(JSC::Wasm::Parser::fail const): Remove unneeded call to String::number.
(JSC::Wasm::Parser<SuccessType>::consumeUTF8String): Use span and
spanReinterpretCast<const char8_t>.

* Source/JavaScriptCore/wasm/WasmSectionParser.cpp:
(JSC::Wasm::SectionParser::parseExport): Use makeString instead of
String::fromUTF8.

* Source/JavaScriptCore/wasm/js/JSWebAssemblyInstance.cpp:
(JSC::JSWebAssemblyInstance::tryCreate): Use makeAtomString instead of
String::fromUTF8.
* Source/JavaScriptCore/wasm/js/JSWebAssemblyModule.cpp:
(JSC::JSWebAssemblyModule::finishCreation): Ditto.

* Source/JavaScriptCore/wasm/js/WebAssemblyModuleConstructor.cpp:
(JSC::webAssemblyModuleCustomSections): Use makeString instead of
String::fromUTF8.
* Source/JavaScriptCore/wasm/js/WebAssemblyModuleRecord.cpp:
(JSC::WebAssemblyModuleRecord::finishCreation): Ditto.
(JSC::WebAssemblyModuleRecord::initializeImports): Take advantage of
the fact that makeString already handles this without calling
String::UTF8, and call makeAtomString instead of String::fromUTF8..
(JSC::WebAssemblyModuleRecord::initializeExports): Ditto.

* Source/WTF/wtf/text/AtomStringImpl.cpp:
(WTF::HashAndUTF8CharactersTranslator::translate): Cast to char8_t.

* Source/WTF/wtf/text/StringConcatenate.h: Renamed FromUTF8 to UTF8Adapter
since there is little need to use it directly any more. Use char8_t instead
of char. Added a specialization for span<char8_t> that uses the UTF8Adapter.

* Source/WTF/wtf/text/WTFString.cpp:
(WTF::fromUTF8Impl): Use spanReinterpretCast<const char8_t>.

* Source/WTF/wtf/unicode/UTF8Conversion.cpp:
(WTF::Unicode::convertUTF8ToUTF16Impl): Use std::span<const char8_t>.
(WTF::Unicode::convertUTF8ToUTF16): Ditto.
(WTF::Unicode::convertUTF8ToUTF16ReplacingInvalidSequences): Ditto.
(WTF::Unicode::computeUTFLengths): Ditto.
* Source/WTF/wtf/unicode/UTF8Conversion.h: Ditto.

* Source/WebCore/workers/ScriptBuffer.cpp:
(WebCore::ScriptBuffer::toString const): Use
spanReinterpretCast<const char8_t>.

* Source/WebCore/xml/XSLTProcessorLibxslt.cpp:
(WebCore::writeToStringBuilder): Update to use UTF8Adapter.

Canonical link: https://commits.webkit.org/277249@main

978b196

Misc iOS, tvOS & watchOS macOS Linux Windows
✅ 🧪 style ✅ 🛠 ios ✅ 🛠 mac loading 🛠 wpe loading 🛠 wincairo
✅ 🧪 bindings ✅ 🛠 ios-sim ✅ 🛠 mac-AS-debug ✅ 🧪 wpe-wk2
✅ 🧪 webkitperl ✅ 🧪 ios-wk2 🧪 api-mac ✅ 🧪 api-wpe
✅ 🧪 ios-wk2-wpt 🧪 mac-wk1 loading 🛠 wpe-skia
✅ 🛠 🧪 jsc ✅ 🧪 api-ios 🧪 mac-wk2 loading 🛠 gtk
loading 🛠 🧪 jsc-arm64 ✅ 🛠 tv ✅ 🧪 mac-AS-debug-wk2 ✅ 🧪 gtk-wk2
✅ 🛠 tv-sim ✅ 🧪 api-gtk
loading 🛠 watch ✅ 🛠 jsc-armv7
✅ 🛠 🧪 unsafe-merge ✅ 🛠 watch-sim ✅ 🧪 jsc-armv7-tests

@darinadler darinadler self-assigned this Mar 23, 2024
@darinadler darinadler requested a review from a team as a code owner March 23, 2024 22:02
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Mar 23, 2024
@darinadler darinadler removed the merging-blocked Applied to prevent a change from being merged label Mar 23, 2024
@darinadler darinadler force-pushed the eng/Use-stdspan-and-char8_t-for-UTF-8 branch from be863f2 to 4ac008d Compare March 23, 2024 22:57
@webkit-ews-buildbot webkit-ews-buildbot added the merging-blocked Applied to prevent a change from being merged label Mar 24, 2024
@darinadler darinadler marked this pull request as draft March 24, 2024 11:19
@darinadler darinadler removed the merging-blocked Applied to prevent a change from being merged label Apr 9, 2024
@darinadler darinadler force-pushed the eng/Use-stdspan-and-char8_t-for-UTF-8 branch from 4ac008d to 978b196 Compare April 9, 2024 02:28
@darinadler darinadler marked this pull request as ready for review April 9, 2024 02:28
@darinadler
Copy link
Member Author

I'm not sure why EWS doesn't show it but this is working and passing all the tests. Ready for review. I am a little surprised you didn't review already, @cdumez.

@cdumez
Copy link
Contributor

cdumez commented Apr 9, 2024

I'm not sure why EWS doesn't show it but this is working and passing all the tests. Ready for review. I am a little surprised you didn't review already, @cdumez.

On it. The last time I looked EWS was red and then I got side-tracked.

Copy link
Contributor

@cdumez cdumez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@darinadler darinadler added the unsafe-merge-queue Applied to send a pull request to merge-queue, but skip building and testing label Apr 9, 2024
https://bugs.webkit.org/show_bug.cgi?id=271527
rdar://problem/125298503

Reviewed by Chris Dumez.

An advantage of using char8_t is that it expresses the fact that
the characters are UTF-8, so we can just use std::span<char8_t>
instead of things like FromUTF8. Also, makeString never returns
a null string, which turns out to be what many of these callers
want, so we save code.

* Source/JavaScriptCore/API/JSStringRef.cpp:
(JSStringCreateWithUTF8CString): Cast to char8_t.

* Source/JavaScriptCore/wasm/WasmFormat.cpp:
(JSC::Wasm::makeString): Use makeString instead of String::fromUTF8.

* Source/JavaScriptCore/wasm/WasmName.h: Use char8_t.

* Source/JavaScriptCore/wasm/WasmParser.h:
(JSC::Wasm::Parser::fail const): Remove unneeded call to String::number.
(JSC::Wasm::Parser<SuccessType>::consumeUTF8String): Use span and
spanReinterpretCast<const char8_t>.

* Source/JavaScriptCore/wasm/WasmSectionParser.cpp:
(JSC::Wasm::SectionParser::parseExport): Use makeString instead of
String::fromUTF8.

* Source/JavaScriptCore/wasm/js/JSWebAssemblyInstance.cpp:
(JSC::JSWebAssemblyInstance::tryCreate): Use makeAtomString instead of
String::fromUTF8.
* Source/JavaScriptCore/wasm/js/JSWebAssemblyModule.cpp:
(JSC::JSWebAssemblyModule::finishCreation): Ditto.

* Source/JavaScriptCore/wasm/js/WebAssemblyModuleConstructor.cpp:
(JSC::webAssemblyModuleCustomSections): Use makeString instead of
String::fromUTF8.
* Source/JavaScriptCore/wasm/js/WebAssemblyModuleRecord.cpp:
(JSC::WebAssemblyModuleRecord::finishCreation): Ditto.
(JSC::WebAssemblyModuleRecord::initializeImports): Take advantage of
the fact that makeString already handles this without calling
String::UTF8, and call makeAtomString instead of String::fromUTF8..
(JSC::WebAssemblyModuleRecord::initializeExports): Ditto.

* Source/WTF/wtf/text/AtomStringImpl.cpp:
(WTF::HashAndUTF8CharactersTranslator::translate): Cast to char8_t.

* Source/WTF/wtf/text/StringConcatenate.h: Renamed FromUTF8 to UTF8Adapter
since there is little need to use it directly any more. Use char8_t instead
of char. Added a specialization for span<char8_t> that uses the UTF8Adapter.

* Source/WTF/wtf/text/WTFString.cpp:
(WTF::fromUTF8Impl): Use spanReinterpretCast<const char8_t>.

* Source/WTF/wtf/unicode/UTF8Conversion.cpp:
(WTF::Unicode::convertUTF8ToUTF16Impl): Use std::span<const char8_t>.
(WTF::Unicode::convertUTF8ToUTF16): Ditto.
(WTF::Unicode::convertUTF8ToUTF16ReplacingInvalidSequences): Ditto.
(WTF::Unicode::computeUTFLengths): Ditto.
* Source/WTF/wtf/unicode/UTF8Conversion.h: Ditto.

* Source/WebCore/workers/ScriptBuffer.cpp:
(WebCore::ScriptBuffer::toString const): Use
spanReinterpretCast<const char8_t>.

* Source/WebCore/xml/XSLTProcessorLibxslt.cpp:
(WebCore::writeToStringBuilder): Update to use UTF8Adapter.

Canonical link: https://commits.webkit.org/277249@main
@webkit-commit-queue webkit-commit-queue force-pushed the eng/Use-stdspan-and-char8_t-for-UTF-8 branch from 978b196 to 353a203 Compare April 9, 2024 16:00
@webkit-commit-queue
Copy link
Collaborator

Committed 277249@main (353a203): https://commits.webkit.org/277249@main

Reviewed commits have been landed. Closing PR #26377 and removing active labels.

@webkit-commit-queue webkit-commit-queue merged commit 353a203 into WebKit:main Apr 9, 2024
@webkit-commit-queue webkit-commit-queue removed the unsafe-merge-queue Applied to send a pull request to merge-queue, but skip building and testing label Apr 9, 2024
@darinadler darinadler deleted the eng/Use-stdspan-and-char8_t-for-UTF-8 branch April 10, 2024 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants