x1.4 Faster LruCache #5251

benaadams · 2023-02-08T10:17:23Z

Changes

Strongly typed references
Devirtualize key when Keccak vs KeccakKey struct
Remove secondary data structure to maintain least recently used
Use whole set of Keccak bytes for GetHashCode to reduce dictionary collisions
Add Unsafe.Same + .Length comparisons for some span compares for early exit when same reference

Types of changes

What types of changes does your code introduce?

Bugfix (a non-breaking change that fixes an issue)
New feature (a non-breaking change that adds functionality)
Breaking change (a change that causes existing functionality not to work as expected)
Optimization
Refactoring
Documentation update
Build-related changes
Other: Description

Testing

Requires testing

Yes
No

If yes, did you write tests?

Yes
No

Notes on testing

Performance tests (before/after)

src/Nethermind/Nethermind.Core/Caching/LruCache.cs

src/Nethermind/Nethermind.Core/Caching/MemCountingCache.cs

LukaszRozmej · 2023-02-08T10:34:51Z

Do we need KeccakKey when we already have ValueKeccak?
Wonder how it benchmarks, also compared to #2978

benaadams · 2023-02-08T11:38:41Z

Do we need KeccakKey when we already have ValueKeccak?

KeccakKey is just a wrapper to devirtualise the dictionary through using a struct as key, rather than class (generics all use the same Jitted path for classes, so calls on the generic type like .Equals for Dictionary act like interface dispatch, whereas structs the Jit turns into a concrete implementation per struct, so .Equals and .GetHashCode become inlinable direct calls for struct keys)

ValueKeccak would have additional copying and allocation size costs on the conversions from Keccak; as well as not having a fast-path on Bytes being reference equal (which may or may not be common). It would be interesting to investigate, but probably to get the benefits would be a larger task of moving everything to ValueKeccak and dropping Keccak

Wonder how it benchmarks, also compared to #2978

ConcurrentLru uses a ConcurrentDictionary and 3 x ConcurrentQueues so is much heavier on memory usage; depends how much contention you experience on the caches?

LukaszRozmej · 2023-02-08T12:30:07Z

Wonder how it benchmarks, also compared to #2978

ConcurrentLru uses a ConcurrentDictionary and 3 x ConcurrentQueues so is much heavier on memory usage; depends how much contention you experience on the caches?

Good question, I think the only way to know is profile the node as a whole and try to see the differences.

benaadams · 2023-02-08T12:34:22Z

~x1.4 improvement

BenchmarkDotNet=v0.13.4, OS=Windows 11 (10.0.22623.1250)
Intel Core i5-9400F CPU 2.90GHz (Coffee Lake), 1 CPU, 6 logical and 6 physical cores
.NET SDK=7.0.200-preview.22628.1
[Host] : .NET 7.0.2 (7.0.222.60605), X64 RyuJIT AVX2
Job-UJOYGK : .NET 7.0.2 (7.0.222.60605), X64 RyuJIT AVX2

Method	MaxCapacity	ItemsCount	Mean	Error	Ratio	Alloc Ratio
WithItems	16	1	101.1 ns	0.27 ns	0.78	0.91
WithItems_Previous	16	1	130.2 ns	0.62 ns	1.00	1.00

WithItems	16	2	144.7 ns	0.58 ns	0.71	0.88
WithItems_Previous	16	2	205.2 ns	2.75 ns	1.00	1.00

WithItems	16	8	451.9 ns	5.10 ns	0.70	0.78
WithItems_Previous	16	8	647.0 ns	4.94 ns	1.00	1.00

WithItems	16	32	1,977.0 ns	5.19 ns	0.69	0.72
WithItems_Previous	16	32	2,878.1 ns	10.62 ns	1.00	1.00

WithItems	16	64	4,448.1 ns	9.15 ns	0.69	0.72
WithItems_Previous	16	64	6,439.9 ns	54.36 ns	1.00	1.00

WithItems	32	1	126.2 ns	0.62 ns	0.80	0.95
WithItems_Previous	32	1	158.1 ns	1.15 ns	1.00	1.00

WithItems	32	2	171.2 ns	0.68 ns	0.76	0.93
WithItems_Previous	32	2	223.9 ns	1.40 ns	1.00	1.00

WithItems	32	8	466.6 ns	3.99 ns	0.69	0.84
WithItems_Previous	32	8	676.8 ns	5.14 ns	1.00	1.00

WithItems	32	32	1,731.8 ns	15.45 ns	0.71	0.72
WithItems_Previous	32	32	2,441.5 ns	8.47 ns	1.00	1.00

WithItems	32	64	4,146.0 ns	56.99 ns	0.71	0.72
WithItems_Previous	32	64	5,858.2 ns	41.07 ns	1.00	1.00

WithItems	128	1	242.2 ns	2.25 ns	0.88	0.98
WithItems_Previous	128	1	274.8 ns	2.07 ns	1.00	1.00

WithItems	128	2	291.6 ns	2.56 ns	0.82	0.97
WithItems_Previous	128	2	354.9 ns	6.84 ns	1.00	1.00

WithItems	128	8	593.9 ns	8.64 ns	0.76	0.93
WithItems_Previous	128	8	783.7 ns	7.19 ns	1.00	1.00

WithItems	128	32	1,713.1 ns	4.10 ns	0.69	0.83
WithItems_Previous	128	32	2,489.2 ns	11.77 ns	1.00	1.00

WithItems	128	64	3,342.2 ns	7.24 ns	0.69	0.77
WithItems_Previous	128	64	4,848.3 ns	43.01 ns	1.00	1.00

Scooletz

General comments/remarks.

src/Nethermind/Nethermind.Core/Caching/LruKeyCache.cs

Scooletz · 2023-02-09T09:36:05Z

src/Nethermind/Nethermind.Core/Crypto/Keccak.cs

+        {
+            if (Bytes is null) return 0;
+
+            long v0 = Unsafe.ReadUnaligned<long>(ref MemoryMarshal.GetArrayDataReference(Bytes));


Have you noticed collisions at all? This is Keccak on its own so it should have low collision rate. Also as GetHashCode has int as a return parameter, we'll still use 32bits so that

if (Bytes is null) return 0; return Unsafe.ReadUnaligned<int>(ref MemoryMarshal.GetArrayDataReference(Bytes));

Good question, if Keccak has equal distribution its first 4 bytes should also have equal distribution and mixing all of them might be an overkill?

It is a world where people brute force Keccak hashes to get vanity addresses[1][2] with as many zeros as possible with 32bits of consecutive zeros or 0 nibbles of zeros taking a fairly small amount of time [1]; with gpu brute forcing becoming so powerful that you can inverse brute force from the 8+ character vanity address to the private keys if you used a deterministic random start [3].

The distribution of the returns from GetHashcode from user input can lead to a hashcode collision attack in Dictionary causing a DoS CVE[4] as handling collisions are an O(n²) (e.g. CVE-2011-3414)

So for example if someone sent more than 1000+ txns that their Keccak hash all started with 8 zeros; through brute forcing txn construction, as the hashcode only takes the first 4 bytes its entirely possible to turn that into a DoS of the execution client. Whereas using all 256 bits makes that an impossibility with current hardware.

[1] https://github.com/1inch/profanity2
[2] https://github.com/johguse/ERADICATE2
[3] https://blog.1inch.io/a-vulnerability-disclosed-in-profanity-an-ethereum-vanity-address-tool-68ed7455fc8c
[4] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3414

tl;dr is safer to fold in all the bits since some of the Keccak hashes are directly determined by EOA; and 8 zero leading hashes (32bits) are brute forcible

Interesting finding about the vanity addresses. Looks good then!

I may have dabbled in vanity addresses 😅

Also that was my CVE report for .NET 🤣

It is a world where people brute force Keccak hashes to get vanity addresses[1][2] with as many zeros as possible with 32bits of consecutive zeros or 0 nibbles of zeros taking a fairly small amount of time [1]; with gpu brute forcing becoming so powerful that you can inverse brute force from the 8+ character vanity address to the private keys if you used a deterministic random start [3].

The distribution of the returns from GetHashcode from user input can lead to a hashcode collision attack in Dictionary causing a DoS CVE[4] as handling collisions are an O(n²) (e.g. CVE-2011-3414)

So for example if someone sent more than 1000+ txns that their Keccak hash all started with 8 zeros; through brute forcing txn construction, as the hashcode only takes the first 4 bytes its entirely possible to turn that into a DoS of the execution client. Whereas using all 256 bits makes that an impossibility with current hardware.

[1] https://github.com/1inch/profanity2 [2] https://github.com/johguse/ERADICATE2 [3] https://blog.1inch.io/a-vulnerability-disclosed-in-profanity-an-ethereum-vanity-address-tool-68ed7455fc8c [4] https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3414

Great explanation! By the way, FYI, the profanity vanity vulnerability caused a large hack of a famous market maker recently.

benaadams force-pushed the cache branch from 502cfae to fe436bb Compare February 8, 2023 10:19

benaadams changed the title ~~Faster cache~~ Faster LruCache Feb 8, 2023

LukaszRozmej reviewed Feb 8, 2023

View reviewed changes

src/Nethermind/Nethermind.Core/Caching/LruCache.cs Outdated Show resolved Hide resolved

src/Nethermind/Nethermind.Core/Caching/MemCountingCache.cs Outdated Show resolved Hide resolved

LukaszRozmej requested a review from Scooletz February 8, 2023 11:29

benaadams force-pushed the cache branch from a738efd to 5f39558 Compare February 8, 2023 11:55

benaadams added 3 commits February 8, 2023 12:04

Faster cache

2c2c317

Feedback

57bee33

Update benchmarks

a7ae5f1

benaadams force-pushed the cache branch from 5f39558 to a7ae5f1 Compare February 8, 2023 12:06

benaadams marked this pull request as ready for review February 8, 2023 12:34

benaadams changed the title ~~Faster LruCache~~ x1.4 Faster LruCache Feb 8, 2023

Other Lru keys types

129ff7b

benaadams force-pushed the cache branch from d66dd96 to 129ff7b Compare February 8, 2023 14:08

benaadams added 2 commits February 8, 2023 15:54

Remove previous implementation

44b20e1

Got back to block namespace for less diffs

3b1ff20

benaadams requested review from LukaszRozmej and removed request for Scooletz February 8, 2023 20:39

Scooletz reviewed Feb 9, 2023

View reviewed changes

Feedback

0ca77f4

benaadams force-pushed the cache branch from 8cf1141 to 0ca77f4 Compare February 9, 2023 11:21

LukaszRozmej approved these changes Feb 9, 2023

View reviewed changes

Scooletz approved these changes Feb 9, 2023

View reviewed changes

LukaszRozmej merged commit 43a4eab into NethermindEth:master Feb 9, 2023

benaadams deleted the cache branch February 9, 2023 13:43

benaadams added the performance label Mar 7, 2023

benaadams mentioned this pull request Jan 30, 2024

Swaps in ConcurrentLru for LruCache. #2978

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x1.4 Faster LruCache #5251

x1.4 Faster LruCache #5251

benaadams commented Feb 8, 2023 •

edited

Loading

LukaszRozmej commented Feb 8, 2023

benaadams commented Feb 8, 2023

LukaszRozmej commented Feb 8, 2023

benaadams commented Feb 8, 2023

Scooletz left a comment

Scooletz Feb 9, 2023

LukaszRozmej Feb 9, 2023

benaadams Feb 9, 2023 •

edited

Loading

benaadams Feb 9, 2023 •

edited

Loading

Scooletz Feb 9, 2023

benaadams Feb 9, 2023

benaadams Feb 9, 2023

OneOnlyMechanic Mar 10, 2023

x1.4 Faster LruCache #5251

x1.4 Faster LruCache #5251

Conversation

benaadams commented Feb 8, 2023 • edited Loading

Changes

Types of changes

What types of changes does your code introduce?

Testing

Requires testing

If yes, did you write tests?

Notes on testing

LukaszRozmej commented Feb 8, 2023

benaadams commented Feb 8, 2023

LukaszRozmej commented Feb 8, 2023

benaadams commented Feb 8, 2023

Scooletz left a comment

Choose a reason for hiding this comment

Scooletz Feb 9, 2023

Choose a reason for hiding this comment

LukaszRozmej Feb 9, 2023

Choose a reason for hiding this comment

benaadams Feb 9, 2023 • edited Loading

Choose a reason for hiding this comment

benaadams Feb 9, 2023 • edited Loading

Choose a reason for hiding this comment

Scooletz Feb 9, 2023

Choose a reason for hiding this comment

benaadams Feb 9, 2023

Choose a reason for hiding this comment

benaadams Feb 9, 2023

Choose a reason for hiding this comment

OneOnlyMechanic Mar 10, 2023

Choose a reason for hiding this comment

benaadams commented Feb 8, 2023 •

edited

Loading

benaadams Feb 9, 2023 •

edited

Loading

benaadams Feb 9, 2023 •

edited

Loading