Skip to content

Commit

Permalink
[JSC] Introduce wyhash to StringImpl
Browse files Browse the repository at this point in the history
https://bugs.webkit.org/show_bug.cgi?id=260147
rdar://113860312

Reviewed by Yusuke Suzuki.

StringImpl::hashSlowCase() is one of the hottest functions when running
JetStream2. However, the existing hash function is slow for large
string since it's based on incremental hashing of each character.
This patch introduces wyhash[1] (hashing data in bulks) to StringImpl
with OSS approval (ID OSS-16203), since wyhash provides the best hash
score without quality issues according to SMhasher[2].

In this patch:
1. Rename previous StringHasher.h to SuperFastHash.h.
2. Use SuperFastHash for small strings and wyhash for large strings
in order to gain best performances on JetStream2.

[1] https://github.com/wangyi-fudan/wyhash
[2] https://github.com/rurban/smhasher

* Source/JavaScriptCore/create_hash_table:
* Source/JavaScriptCore/runtime/JSCBytecodeCacheVersion.cpp.in:
* Source/JavaScriptCore/runtime/JSCell.cpp:
(JSC::reportZappedCellAndCrash):
* Source/JavaScriptCore/runtime/TemplateObjectDescriptor.h:
(JSC::TemplateObjectDescriptor::calculateHash):
* Source/JavaScriptCore/tools/JSDollarVM.cpp:
* Source/JavaScriptCore/tools/VMInspector.cpp:
(JSC::VMInspector::dumpSubspaceHashes):
* Source/JavaScriptCore/yarr/hasher.py:
(stringHash):
(finalizeAndMaskTop8Bits):
(superFastHash):
(wyhash):
(wyhash.add64):
(wyhash.multi64):
(wyhash.wymum):
(wyhash.wymix):
(wyhash.convert32BitTo64Bit):
(wyhash.convert16BitTo32Bit):
(wyhash.c2i):
(wyhash.wyr8):
(wyhash.wyr4):
(wyhash.wyr2):
(ceilingToPowerOf2):
(createHashTable):
* Source/WTF/WTF.xcodeproj/project.pbxproj:
* Source/WTF/wtf/CMakeLists.txt:
* Source/WTF/wtf/Hasher.h:
* Source/WTF/wtf/text/CString.cpp:
(WTF::CString::hash const):
* Source/WTF/wtf/text/StringHash.h:
(WTF::ASCIICaseInsensitiveHash::hash):
* Source/WTF/wtf/text/StringHasher.h:
(WTF::StringHasher::StringHasher):
(WTF::StringHasher::finalize):
(WTF::StringHasher::finalizeAndMaskTop8Bits):
(WTF::StringHasher::addCharactersAssumingAligned): Deleted.
(WTF::StringHasher::addCharacter): Deleted.
(WTF::StringHasher::addCharacters): Deleted.
(WTF::StringHasher::hashWithTop8BitsMasked const): Deleted.
(WTF::StringHasher::hash const): Deleted.
(WTF::StringHasher::computeHashAndMaskTop8Bits): Deleted.
(WTF::StringHasher::computeHash): Deleted.
(WTF::StringHasher::computeLiteralHash): Deleted.
(WTF::StringHasher::computeLiteralHashAndMaskTop8Bits): Deleted.
(WTF::StringHasher::calculateWithRemainingLastCharacter): Deleted.
(WTF::StringHasher::calculateWithTwoCharacters): Deleted.
(WTF::StringHasher::computeHashImpl): Deleted.
(WTF::StringHasher::processPendingCharacter const): Deleted.
* Source/WTF/wtf/text/StringHasherInlines.h: Added.
(WTF::StringHasher::computeHashAndMaskTop8Bits):
(WTF::StringHasher::computeLiteralHashAndMaskTop8Bits):
(WTF::StringHasher::addCharacter):
(WTF::StringHasher::hashWithTop8BitsMasked):
* Source/WTF/wtf/text/StringImpl.h:
* Source/WTF/wtf/text/SuperFastHash.h: Copied from Source/WTF/wtf/text/StringHasher.h.
(WTF::SuperFastHash::addCharactersAssumingAligned):
(WTF::SuperFastHash::addCharacter):
(WTF::SuperFastHash::addCharacters):
(WTF::SuperFastHash::hashWithTop8BitsMasked const):
(WTF::SuperFastHash::hash const):
(WTF::SuperFastHash::computeHashAndMaskTop8Bits):
(WTF::SuperFastHash::computeHash):
(WTF::SuperFastHash::computeLiteralHash):
(WTF::SuperFastHash::computeLiteralHashAndMaskTop8Bits):
(WTF::SuperFastHash::calculateWithRemainingLastCharacter):
(WTF::SuperFastHash::calculateWithTwoCharacters):
(WTF::SuperFastHash::computeHashImpl):
(WTF::SuperFastHash::processPendingCharacter const):
* Source/WTF/wtf/text/WYHash.h: Added.
(WTF::WYHash::computeHashAndMaskTop8Bits):
(WTF::WYHash::wyrot):
(WTF::WYHash::wymum):
(WTF::WYHash::wymix):
(WTF::WYHash::Reader16Bit::hasDefaultConverter):
(WTF::WYHash::Reader16Bit::convert):
(WTF::WYHash::Reader16Bit::wyr3):
(WTF::WYHash::Reader16Bit::wyr4WithConvert):
(WTF::WYHash::Reader16Bit::wyr4):
(WTF::WYHash::Reader16Bit::wyr8WithConvert):
(WTF::WYHash::Reader16Bit::wyr8):
(WTF::WYHash::Reader8Bit::hasDefaultConverter):
(WTF::WYHash::Reader8Bit::convert):
(WTF::WYHash::Reader8Bit::wyr3):
(WTF::WYHash::Reader8Bit::wyr4WithConvert):
(WTF::WYHash::Reader8Bit::wyr4):
(WTF::WYHash::Reader8Bit::wyr8WithConvert):
(WTF::WYHash::Reader8Bit::wyr8):
(WTF::WYHash::initSeed):
(WTF::WYHash::consume24Characters):
(WTF::WYHash::handleEndCase):
(WTF::WYHash::handleGreaterThan8CharactersCase):
(WTF::WYHash::hash):
(WTF::WYHash::computeHashImpl):
* Source/WTF/wtf/unicode/UTF8Conversion.cpp:
(WTF::Unicode::calculateStringHashAndLengthFromUTF8MaskingTop8Bits):
* Source/WebCore/bindings/scripts/Hasher.pm:
(finalizeAndMaskTop8Bits):
(superFastHash):
(uint64_add):
(uint64_multi):
(GenerateHashValue): Deleted.
* Source/WebCore/contentextensions/DFAMinimizer.cpp:
* Source/WebCore/contentextensions/HashableActionList.h:
(WebCore::ContentExtensions::HashableActionList::HashableActionList):
* Source/WebCore/platform/SharedStringHash.cpp:
(WebCore::computeSharedStringHashInline):
* Source/WebCore/platform/graphics/WidthCache.h:
(WebCore::WidthCache::SmallStringKey::SmallStringKey):
* Tools/TestWebKitAPI/CMakeLists.txt:
* Tools/TestWebKitAPI/TestWebKitAPI.xcodeproj/project.pbxproj:
* Tools/TestWebKitAPI/Tests/WTF/StringHasher.cpp:
(TestWebKitAPI::TEST):
* Tools/TestWebKitAPI/Tests/WTF/SuperFastHash.cpp: Copied from Tools/TestWebKitAPI/Tests/WTF/StringHasher.cpp.
(TestWebKitAPI::TEST):
* Tools/TestWebKitAPI/Tests/WTF/WYHash.cpp: Added.
(TestWebKitAPI::TEST):

Canonical link: https://commits.webkit.org/266929@main
  • Loading branch information
hyjorc1 authored and Constellation committed Aug 15, 2023
1 parent 56861d2 commit ff19fc5
Show file tree
Hide file tree
Showing 29 changed files with 1,993 additions and 838 deletions.
256 changes: 204 additions & 52 deletions Source/JavaScriptCore/create_hash_table
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# (c) 2000-2002 by Harri Porten <porten@kde.org> and
# David Faure <faure@kde.org>
# Modified (c) 2004 by Nikolas Zimmermann <wildfox@kde.org>
# Copyright (C) 2007-2022 Apple Inc. All rights reserved.
# Copyright (C) 2007-2023 Apple Inc. All rights reserved.
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
Expand All @@ -24,6 +24,7 @@

use strict;
use warnings;
use bigint;
use Getopt::Long qw(:config pass_through);

my $file = shift @ARGV or die("Must provide source file as final argument.");
Expand All @@ -46,6 +47,8 @@ my $pefectHashSize;
my $compactSize;
my $compactHashSizeMask;
my $banner = 0;
my $mask64 = 2**64 - 1;
my $mask32 = 2**32 - 1;
sub calcPerfectHashSize();
sub calcCompactHashSize();
sub output();
Expand Down Expand Up @@ -194,60 +197,209 @@ sub calcCompactHashSize()
}
}

sub finalizeAndMaskTop8Bits($) {
my ($value) = @_;

$value &= $mask32;

# Force "avalanching" of lower 32 bits
$value ^= leftShift($value, 3);
$value += ($value >> 5);
$value = ($value & $mask32);
$value ^= (leftShift($value, 2) & $mask32);
$value += ($value >> 15);
$value = $value & $mask32;
$value ^= (leftShift($value, 10) & $mask32);

# Save 8 bits for StringImpl to use as flags.
$value &= 0xffffff;

# This avoids ever returning a hash code of 0, since that is used to
# signal "hash not computed yet". Setting the high bit maintains
# reasonable fidelity to a hash code of 0 because it is likely to yield
# exactly 0 when hash lookup masks out the high bits.
$value = (0x80000000 >> 8) if ($value == 0);

return $value;
}

# Paul Hsieh's SuperFastHash
# http://www.azillionmonkeys.com/qed/hash.html
sub superFastHash {
my @chars = @_;

# This hash is designed to work on 16-bit chunks at a time. But since the normal case
# (above) is to hash UTF-16 characters, we just treat the 8-bit chars as if they
# were 16-bit chunks, which should give matching results

my $hash = 0x9e3779b9;
my $l = scalar @chars; #I wish this was in Ruby --- Maks
my $rem = $l & 1;
$l = $l >> 1;

my $s = 0;

# Main loop
for (; $l > 0; $l--) {
$hash += ord($chars[$s]);
my $tmp = leftShift(ord($chars[$s+1]), 11) ^ $hash;
$hash = (leftShift($hash, 16) & $mask32) ^ $tmp;
$s += 2;
$hash += $hash >> 11;
$hash &= $mask32;
}

# Handle end case
if ($rem != 0) {
$hash += ord($chars[$s]);
$hash ^= (leftShift($hash, 11) & $mask32);
$hash += $hash >> 17;
}

return finalizeAndMaskTop8Bits($hash);
}

sub uint64_add($$) {
my ($a, $b) = @_;
my $sum = $a + $b;
return $sum & $mask64;
}

sub uint64_multi($$) {
my ($a, $b) = @_;
my $product = $a * $b;
return $product & $mask64;
}

sub wymum($$) {
my ($A, $B) = @_;

my $ha = $A >> 32;
my $hb = $B >> 32;
my $la = $A & $mask32;
my $lb = $B & $mask32;
my $hi;
my $lo;
my $rh = uint64_multi($ha, $hb);
my $rm0 = uint64_multi($ha, $lb);
my $rm1 = uint64_multi($hb, $la);
my $rl = uint64_multi($la, $lb);
my $t = uint64_add($rl, ($rm0 << 32));
my $c = int($t < $rl);

$lo = uint64_add($t, ($rm1 << 32));
$c += int($lo < $t);
$hi = uint64_add($rh, uint64_add(($rm0 >> 32), uint64_add(($rm1 >> 32), $c)));

return ($lo, $hi);
};

sub wymix($$) {
my ($A, $B) = @_;
($A, $B) = wymum($A, $B);
return $A ^ $B;
}

sub convert32BitTo64Bit($) {
my ($v) = @_;
my ($mask1) = 281470681808895; # 0x0000_ffff_0000_ffff
$v = ($v | ($v << 16)) & $mask1;
my ($mask2) = 71777214294589695; # 0x00ff_00ff_00ff_00ff
return ($v | ($v << 8)) & $mask2;
}

sub convert16BitTo32Bit($) {
my ($v) = @_;
return ($v | ($v << 8)) & 0x00ff_00ff;
}

sub wyhash {
# https://github.com/wangyi-fudan/wyhash
my @chars = @_;
my $charCount = scalar @chars;
my $byteCount = $charCount << 1;
my $charIndex = 0;
my $seed = 0;
my @secret = ( 11562461410679940143, 16646288086500911323, 10285213230658275043, 6384245875588680899 );
my $move1 = (($byteCount >> 3) << 2) >> 1;

$seed ^= wymix($seed ^ $secret[0], $secret[1]);
my $a = 0;
my $b = 0;

local *c2i = sub {
my ($i) = @_;
return ord($chars[$i]);
};

local *wyr8 = sub {
my ($i) = @_;
my $v = c2i($i) | (c2i($i + 1) << 8) | (c2i($i + 2) << 16) | (c2i($i + 3) << 24);
return convert32BitTo64Bit($v);
};

local *wyr4 = sub {
my ($i) = @_;
my $v = c2i($i) | (c2i($i + 1) << 8);
return convert16BitTo32Bit($v);
};

local *wyr2 = sub {
my ($i) = @_;
return c2i($i) << 16;
};

if ($byteCount <= 16) {
if ($byteCount >= 4) {
$a = (wyr4($charIndex) << 32) | wyr4($charIndex + $move1);
$charIndex = $charIndex + $charCount - 2;
$b = (wyr4($charIndex) << 32) | wyr4($charIndex - $move1);
} elsif ($byteCount > 0) {
$a = wyr2($charIndex);
$b = 0;
} else {
$a = $b = 0;
}
} else {
my $i = $byteCount;
if ($i > 48) {
my $see1 = $seed;
my $see2 = $seed;
do {
$seed = wymix(wyr8($charIndex) ^ $secret[1], wyr8($charIndex + 4) ^ $seed);
$see1 = wymix(wyr8($charIndex + 8) ^ $secret[2], wyr8($charIndex + 12) ^ $see1);
$see2 = wymix(wyr8($charIndex + 16) ^ $secret[3], wyr8($charIndex + 20) ^ $see2);
$charIndex += 24;
$i -= 48;
} while ($i > 48);
$seed ^= $see1 ^ $see2;
}
while ($i > 16) {
$seed = wymix(wyr8($charIndex) ^ $secret[1], wyr8($charIndex + 4) ^ $seed);
$i -= 16;
$charIndex += 8;
}
my $move2 = $i >> 1;
$a = wyr8($charIndex + $move2 - 8);
$b = wyr8($charIndex + $move2 - 4);
}
$a ^= $secret[1];
$b ^= $seed;

($a, $b) = wymum($a, $b);
my $hash = wymix($a ^ $secret[0] ^ $byteCount, $b ^ $secret[1]) & $mask32;

return finalizeAndMaskTop8Bits($hash);
}

sub hashValue($) {
my @chars = split(/ */, $_[0]);

# This hash is designed to work on 16-bit chunks at a time. But since the normal case
# (above) is to hash UTF-16 characters, we just treat the 8-bit chars as if they
# were 16-bit chunks, which should give matching results

my $EXP2_32 = 4294967296;

my $hash = 0x9e3779b9;
my $l = scalar @chars; #I wish this was in Ruby --- Maks
my $rem = $l & 1;
$l = $l >> 1;

my $s = 0;

# Main loop
for (; $l > 0; $l--) {
$hash += ord($chars[$s]);
my $tmp = leftShift(ord($chars[$s+1]), 11) ^ $hash;
$hash = (leftShift($hash, 16)% $EXP2_32) ^ $tmp;
$s += 2;
$hash += $hash >> 11;
$hash %= $EXP2_32;
}

# Handle end case
if ($rem != 0) {
$hash += ord($chars[$s]);
$hash ^= (leftShift($hash, 11)% $EXP2_32);
$hash += $hash >> 17;
}

# Force "avalanching" of final 127 bits
$hash ^= leftShift($hash, 3);
$hash += ($hash >> 5);
$hash = ($hash% $EXP2_32);
$hash ^= (leftShift($hash, 2)% $EXP2_32);
$hash += ($hash >> 15);
$hash = $hash% $EXP2_32;
$hash ^= (leftShift($hash, 10)% $EXP2_32);

# Save 8 bits for StringImpl to use as flags.
$hash &= 0xffffff;

# This avoids ever returning a hash code of 0, since that is used to
# signal "hash not computed yet". Setting the high bit maintains
# reasonable fidelity to a hash code of 0 because it is likely to yield
# exactly 0 when hash lookup masks out the high bits.
$hash = (0x80000000 >> 8) if ($hash == 0);

return $hash;
my $string = $_[0];
my @chars = split(/ */, $string);
my $charCount = scalar @chars;
if ($charCount <= 48) {
return superFastHash(@chars);
}
return wyhash(@chars);
}

sub output() {
Expand Down
4 changes: 2 additions & 2 deletions Source/JavaScriptCore/runtime/JSCBytecodeCacheVersion.cpp.in
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
#include "JSCBytecodeCacheVersion.h"

#include "config.h"
#include <wtf/text/StringHasher.h>
#include <wtf/text/SuperFastHash.h>

const uint32_t JSCBytecodeCacheVersion = StringHasher::computeHash(CACHED_TYPES_CKSUM);
const uint32_t JSCBytecodeCacheVersion = SuperFastHash::computeHash(CACHED_TYPES_CKSUM);
6 changes: 3 additions & 3 deletions Source/JavaScriptCore/runtime/JSCell.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*
* Copyright (C) 1999-2001 Harri Porten (porten@kde.org)
* Copyright (C) 2001 Peter Kelly (pmk@post.com)
* Copyright (C) 2003-2020 Apple Inc. All rights reserved.
* Copyright (C) 2003-2023 Apple Inc. All rights reserved.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Library General Public
Expand Down Expand Up @@ -299,7 +299,7 @@ NEVER_INLINE NO_RETURN_DUE_TO_CRASH NOT_TAIL_CALLED void reportZappedCellAndCras
MarkedBlock* foundBlock = nullptr;
if (foundBlockHandle) {
foundBlock = &foundBlockHandle->block();
subspaceHash = StringHasher::computeHash(foundBlockHandle->subspace()->name());
subspaceHash = SuperFastHash::computeHash(foundBlockHandle->subspace()->name());
cellSize = foundBlockHandle->cellSize();

variousState |= static_cast<uint64_t>(foundBlockHandle->isFreeListed()) << 0;
Expand Down Expand Up @@ -335,7 +335,7 @@ NEVER_INLINE NO_RETURN_DUE_TO_CRASH NOT_TAIL_CALLED void reportZappedCellAndCras
return IterationStatus::Continue;
});
if (foundPreciseAllocation) {
subspaceHash = StringHasher::computeHash(foundPreciseAllocation->subspace()->name());
subspaceHash = SuperFastHash::computeHash(foundPreciseAllocation->subspace()->name());
cellSize = foundPreciseAllocation->cellSize();

variousState |= static_cast<uint64_t>(isFreeListed) << 0;
Expand Down
4 changes: 2 additions & 2 deletions Source/JavaScriptCore/runtime/TemplateObjectDescriptor.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/*
* Copyright (C) 2015 Yusuke Suzuki <utatane.tea@gmail.com>.
* Copyright (C) 2019 Apple Inc. All rights reserved.
* Copyright (C) 2019-2023 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -97,7 +97,7 @@ inline TemplateObjectDescriptor::TemplateObjectDescriptor(EmptyValueTag)

inline unsigned TemplateObjectDescriptor::calculateHash(const StringVector& rawStrings)
{
StringHasher hasher;
SuperFastHash hasher;
for (const String& string : rawStrings) {
if (string.is8Bit())
hasher.addCharacters(string.characters8(), string.length());
Expand Down
Loading

0 comments on commit ff19fc5

Please sign in to comment.