Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Parsing HTML entities shouldn't call malloc
https://bugs.webkit.org/show_bug.cgi?id=119921 rdar://109976279 Reviewed by Chris Dumez. This was inspired by some work done on Chromium. While I didn't use the Chromium patch, I did much the same work, taking advantage of the fact that HTML entity parsing only generates a sequence of 1-3 UTF-16 code points, not arbitrary strings. Also fixed mismatch between the interface and the needs in the fast path HTML parser. * Source/WebCore/WebCore.xcodeproj/project.pbxproj: Removed CharacterReferenceParserInlines.h. * Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp: (WebCore::HTMLFastPathParser::scanHTMLCharacterReference): Use consumeHTMLEntity function that takes a StringParsingBuffer. This elimintes the need for a temporary SegmentedString, and resolves the FIXME that was here. * Source/WebCore/html/parser/HTMLEntityParser.cpp: (WebCore::DecodedHTMLEntity::DecodedHTMLEntity): Added constructors for the class used for the return type. (WebCore::makeEntity): Added. Converts a UChar32, Checked<UChar32>, or HTMLEntityTableEntry into a DecodedHTMLEntity. (WebCore::SegmentedStringSource): Added. Adapter for SegmentedString so we can share a single set of parser functions. (WebCore::StringParsingBufferSource): Added. Adapter for StringParsingBuffer. (WebCore::consumeDecimalHTMLEntity): Added. Refactored from code formerly in CharacterReferenceParserInlines.h. (WebCore::consumeHexHTMLEntity): Ditto. (WebCore::consumeNamedHTMLEntity): Added. Refactored from code formerly in HTMLEntityParser::consumeNamedEntity. (WebCore::consumeHTMLEntity): Added. Refactored from code formerly in CharacterReferenceParserInlines.h. (WebCore::decodeNamedHTMLEntityForXMLParser): Renamed from decodeNamedEntityToUCharArray. We now take a std::array& for safety so it's no longer necessary to put the data type in the function name. * Source/WebCore/html/parser/HTMLEntityParser.h: Updated includes. Added a new DecodedHTMLEntity type for the return value from the parser. Got rid of out parameters and put the error cases in the return value. Another alternative would have been std::expected. * Source/WebCore/html/parser/HTMLTokenizer.cpp: (WebCore::HTMLTokenizer::processEntity): Updated for changes to consumeHTMLEntity. (WebCore::HTMLTokenizer::processToken): Ditto. * Source/WebCore/xml/parser/CharacterReferenceParserInlines.h: Removed. * Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp: (WebCore::convertUTF16EntityToUTF8): Updated to use std::span. (WebCore::getXHTMLEntity): Updated for decodeNamedHTMLEntityForXMLParser. Canonical link: https://commits.webkit.org/264675@main
- Loading branch information