New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster optimized frozen dictionary creation (4/n) #87876
Faster optimized frozen dictionary creation (4/n) #87876
Conversation
we rent a single dimension array, where every bucket has five slots. The bucket starts at (key.Length - minLength) * 5. Each value is an index of the key from _keys array or just -1, which represents "null". Creation time is from 2 to 10 times faster Indexing is 4-10% slower, but still and order of magnitude faster than Dictionary
Tagging subscribers to this area: @dotnet/area-system-collections Issue DetailsInstead of creating a dictionary of lists and a multi-dimensional array we rent a single dimension array, where every bucket has five slots. Creation time is from x2 to x10 times faster, not more than 2x slower compared to Dictionary.
|
{ | ||
// This size check prevents OOM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really :) It just prevents artificial OOMs from arrays of unsupported sizes. You could still try to rent an array of length Array.MaxLength - 1 and OOM.
#if NET6_0_OR_GREATER | ||
List<KeyValuePair<string, TValue>> list = CollectionsMarshal.GetValueRefOrAddDefault(groupedByLength, s.Length, out _) ??= new List<KeyValuePair<string, TValue>>(MaxPerLength); | ||
if (arraySize >= Array.MaxLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why >= rather than >?
|
||
namespace System.Collections.Frozen | ||
{ | ||
/// <summary>Provides a frozen dictionary implementation where strings are grouped by their lengths.</summary> | ||
internal sealed class LengthBucketsFrozenDictionary<TValue> : FrozenDictionary<string, TValue> | ||
{ | ||
/// <summary>Allowed ratio between buckets with values and total buckets. Under this ratio, this implementation won't be used due to too much wasted space.</summary> | ||
private const double EmptyLengthsRatio = 0.2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why it's ok to do away with this concept. If I have one string of length 1_000_000 and one string of length 1, won't I now end up allocating an array of 5 million elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I have one string of length 1_000_000 and one string of length 1, won't I now end up allocating an array of 5 million elements?
This case is the first thing we check and I've left it untouched:
Lines 43 to 49 in 29d5bfc
// If without even looking at the keys we know that some bucket will exceed the max per-bucket | |
// limit (pigeon hole principle), we can early-exit out without doing any further work. | |
int spread = maxLength - minLength + 1; | |
if (keys.Length / spread > MaxPerLength) | |
{ | |
return null; | |
} |
I am going to benchmark whether it's beneficial to remove this check or not and get back to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is the first thing we check and I've left it untouched:
No, that's different. The check that's there is seeing whether it can be easily predetermined whether any bucket might have too many elements in it. In contrast, the EmptyLengthsRatio you deleted was used to determine whether the lengths were too spread out such that we'd be wasting a ton of space with holes in the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to benchmark whether it's beneficial to remove this check or not and get back to you.
For MaxPerLength you mean? For a given bucket, the implementation is doing a linear scan of every element. If 1000 strings were to end up in the same bucket and every look up involved comparing up to 1000 strings, it would absolutely be worse than an O(1) dictionary lookup. So there is going to be some threshold where we no longer want to use this strategy. It's certainly possible, especially with the other changes, that MaxPerLength can be increased, but it can't become infinite.
if (buckets[index] < 0) buckets[index] = i; | ||
else if (buckets[index + 1] < 0) buckets[index + 1] = i; | ||
else if (buckets[index + 2] < 0) buckets[index + 2] = i; | ||
else if (buckets[index + 3] < 0) buckets[index + 3] = i; | ||
else if (buckets[index + 4] < 0) buckets[index + 4] = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's brittle to manually unroll this loop that's tied to the MaxPerLength constant (e.g. if the constant were increased or decreased, this unrolling would be wrong). Can we just make this a loop?
// AllocateUninitializedArray is slower for small inputs, hence the size check. | ||
int[] copy = arraySize < 1000 | ||
? new int[arraySize] | ||
: GC.AllocateUninitializedArray<int>(arraySize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first thing GC.AllocateUninitializedArray<int>
does is a length < 512 check. Can we just let it do its thing rather than also checking here?
....Collections.Immutable/src/System/Collections/Frozen/String/LengthBucketsFrozenDictionary.cs
Show resolved
Hide resolved
Cc @geeknoid in case he's interested in this series of PRs |
Updated perf numbers: Creation time is from x2 to x16 times faster, not more than 2x slower compared to Dictionary.
|
Instead of creating a dictionary of lists and a multi-dimensional array we rent a single dimension array, where every bucket has five slots.
The bucket starts at
(key.Length - minLength) * 5
index of the array.Each value is an index of the key from _keys array or just -1, which represents "null".
We avoid having two copies and re-ordering of keys and values collections.
Creation time is from x2 to x10 times faster, not more than 2x slower compared to Dictionary.
Indexing is 6-12% slower, but for large inputs it's still and order of magnitude faster than Dictionary.