Skip to content

Commit

Permalink
Cherry-pick 7d0bddb. rdar://110454455
Browse files Browse the repository at this point in the history
    Speculative fix for crash at WebCore: PAL::newTextCodec
    https://bugs.webkit.org/show_bug.cgi?id=263084
    rdar://110454455

    Reviewed by Chris Dumez.

    This is a speculative fix for CrashTracer: com.apple.WebKit.WebContent at WebCore: PAL::newTextCodec.

    Text encodings work with the use of 2 static maps:
    - Name map, TextEncodingNameMap: maps encoding names to a canonical encoding name.
    - Codec map, TextCodecMap: maps canonical encoding names to a function that creates
    the desired codec. All possible canonical name has an entry in the map.

    When a TextEncoding object is created, it receives a string name that
    is transformed in a canonical name by atomCanonicalTextEncodingName().
    A TextEncoding name can only be null, in which case it is not valid, or
    a valid canonical name. Therefore, it should not be possible to create a
    valid name which is not canonical.

    When a canonical name is being created, we populate both Name and Codec
    maps. First we try to populate them with a reduced set (buildBaseTextCodecMaps()),
    if not sufficient we populate them with an extended set (extendTextCodecMaps()).

    Since it is not possible to create a valid non-canonical name and that
    all canonical names have an entry in the Codec maps, it should not be
    possible to have no entry in the Codec map for a valid TextEncoding, i.e.
    a TextEncoding with a valid name.

    For that reason, the callers of newTextCodec() only assert that m_encoding,
    which is a TextEncoding, isValid(). This should theoretically be enough,
    but since it is crashing, it is not. The crash seems to happen because
    we are trying to deref a value from the TextEncodingNameMap that is null,
    that can happen if:
    [1]. we hace an entry for the encoding name in the map but the value is null.
    [2]. we have a valid encoding name that has no entry in the codec map.

    [1] is most likely not what is happening because we would need:
    1.a: Having one of the functions that populate the codec maps (registerCodecs())
    registering a null value. This shouldn't be possible becuse all of these function
    register lambda functions.

    1.b: race-condition: This could also happen if we would get a valid entry in the map and
    between this point in time and the time we access the value of such iterator,
    the entry would be invalidated. That also shouldn't be possible, because
    all functions that mutate/access the maps are guarded by a static lock (encodingRegistryLock).

    [2] For testing this hypothesis, I've forced WebKit to populate
    the name and codec maps in the extended version, so I could verify a state
    in which all possible canonical names (and its codecs) are registered.
    I've then checked that in fact there is no canonical name without an
    entry in the codec map. Therefore, theoretically this should also be ruled
    out.

    As I haven't found a valid hypothesis for now, and also as I can't reproduce
    this issue, I'm trying a speculative fix, in which, we no longer just
    assert that m_encoding is valid and assume that there will be a
    an entry for it on the codec map. Instead, newTextCodec() will prevent a
    null deref by returning a default UTF-8 codec for:
    - a invalid encoding
    - if there is no entry for a valid encoding in the codec map
    - if there is an entry for a valid encoding in the codec map, but that
    entry has a null value.

    We also generate proper log that will help on further investigation.

    * Source/WebCore/PAL/pal/Logging.h:
    * Source/WebCore/PAL/pal/text/TextCodecUTF8.cpp:
    (PAL::TextCodecUTF8::codec):
    (PAL::TextCodecUTF8::registerCodecs):
    * Source/WebCore/PAL/pal/text/TextCodecUTF8.h:
    * Source/WebCore/PAL/pal/text/TextEncodingRegistry.cpp:
    (PAL::newTextCodec):

    Canonical link: https://commits.webkit.org/269304@main
Identifier: 267815.311@safari-7617.1.11.13-branch
  • Loading branch information
vitorroriz authored and MyahCobbs committed Oct 16, 2023
1 parent a9d12e2 commit 1d3a1c1
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 3 deletions.
3 changes: 2 additions & 1 deletion Source/WebCore/PAL/pal/Logging.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ PAL_EXPORT void registerNotifyCallback(ASCIILiteral, Function<void()>&&);
#endif

#define PAL_LOG_CHANNELS(M) \
M(Media)
M(Media) \
M(TextEncoding)

#undef DECLARE_LOG_CHANNEL
#define DECLARE_LOG_CHANNEL(name) \
Expand Down
7 changes: 6 additions & 1 deletion Source/WebCore/PAL/pal/text/TextCodecUTF8.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,15 @@ void TextCodecUTF8::registerEncodingNames(EncodingNameRegistrar registrar)
registrar("x-unicode20utf8", "UTF-8");
}

std::unique_ptr<TextCodecUTF8> TextCodecUTF8::codec()
{
return makeUnique<TextCodecUTF8>();
}

void TextCodecUTF8::registerCodecs(TextCodecRegistrar registrar)
{
registrar("UTF-8", [] {
return makeUnique<TextCodecUTF8>();
return codec();
});
}

Expand Down
1 change: 1 addition & 0 deletions Source/WebCore/PAL/pal/text/TextCodecUTF8.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ class TextCodecUTF8 final : public TextCodec {
static void registerCodecs(TextCodecRegistrar);

static Vector<uint8_t> encodeUTF8(StringView);
static std::unique_ptr<TextCodecUTF8> codec();

private:
void stripByteOrderMark() final { m_shouldStripByteOrderMark = true; }
Expand Down
14 changes: 13 additions & 1 deletion Source/WebCore/PAL/pal/text/TextEncodingRegistry.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#include "config.h"
#include "TextEncodingRegistry.h"

#include "Logging.h"
#include "TextCodecCJK.h"
#include "TextCodecICU.h"
#include "TextCodecLatin1.h"
Expand Down Expand Up @@ -254,8 +255,19 @@ std::unique_ptr<TextCodec> newTextCodec(const TextEncoding& encoding)
Locker locker { encodingRegistryLock };

ASSERT(textCodecMap);
if (!encoding.isValid()) {
RELEASE_LOG_ERROR(TextEncoding, "Trying to create new text codec with invalid (null) encoding name. Will default to UTF-8.");
return TextCodecUTF8::codec();
}
auto result = textCodecMap->find(encoding.name());
ASSERT(result != textCodecMap->end());
if (result == textCodecMap->end()) {
RELEASE_LOG_ERROR(TextEncoding, "Can't find codec for valid encoding %" PUBLIC_LOG_STRING ". Will default to UTF-8.", encoding.name());
return TextCodecUTF8::codec();
}
if (!result->value) {
RELEASE_LOG_ERROR(TextEncoding, "Codec for encoding %" PUBLIC_LOG_STRING " is null. Will default to UTF-8", encoding.name());
return TextCodecUTF8::codec();
}
return result->value();
}

Expand Down

0 comments on commit 1d3a1c1

Please sign in to comment.