Faster optimized frozen dictionary creation (4/n) #87876

adamsitnik · 2023-06-21T18:54:57Z

Instead of creating a dictionary of lists and a multi-dimensional array we rent a single dimension array, where every bucket has five slots.
The bucket starts at (key.Length - minLength) * 5 index of the array.
Each value is an index of the key from _keys array or just -1, which represents "null".
We avoid having two copies and re-ordering of keys and values collections.

Creation time is from x2 to x10 times faster, not more than 2x slower compared to Dictionary.
Indexing is 6-12% slower, but for large inputs it's still and order of magnitude faster than Dictionary.

Method	Job	Count	ItemsPerBucket	Mean	Ratio	Allocated	Alloc Ratio
ToDictionary	#87688	10	1	102.26 ns	-	440 B	-
ToFrozenDictionary_Optimized	#87876	10	1	218.24 ns	0.25	488 B	0.16
ToFrozenDictionary_Optimized	#87688	10	1	889.59 ns	1.00	3120 B	1.00

TryGetValue_True_Dictionary	#87688	10	1	97.78 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	10	1	24.48 ns	1.13	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	10	1	21.69 ns	1.00	-	NA

ToDictionary	#87688	10	5	104.26 ns	-	440 B	-
ToFrozenDictionary_Optimized	#87876	10	5	223.41 ns	0.43	328 B	0.33
ToFrozenDictionary_Optimized	#87688	10	5	523.13 ns	1.00	1000 B	1.00

TryGetValue_True_Dictionary	#87688	10	5	98.22 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	10	5	90.22 ns	1.12	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	10	5	80.66 ns	1.00	-	NA

ToDictionary	#87688	100	1	766.64 ns	-	3128 B	-
ToFrozenDictionary_Optimized	#87876	100	1	995.67 ns	0.14	3728 B	0.12
ToFrozenDictionary_Optimized	#87688	100	1	7,179.71 ns	1.00	30320 B	1.00

TryGetValue_True_Dictionary	#87688	100	1	1,761.21 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	100	1	227.97 ns	1.13	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	100	1	201.72 ns	1.00	-	NA

ToDictionary	#87688	100	5	761.50 ns	-	3128 B	-
ToFrozenDictionary_Optimized	#87876	100	5	996.27 ns	0.25	2128 B	0.24
ToFrozenDictionary_Optimized	#87688	100	5	3,959.95 ns	1.00	8768 B	1.00

TryGetValue_True_Dictionary	#87688	100	5	1,042.10 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	100	5	929.69 ns	1.08	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	100	5	864.51 ns	1.00	-	NA

ToDictionary	#87688	1000	1	7,234.20 ns	-	31016 B	-
ToFrozenDictionary_Optimized	#87876	1000	1	8,918.56 ns	0.12	36128 B	0.12
ToFrozenDictionary_Optimized	#87688	1000	1	77,083.15 ns	1.00	302344 B	1.00

TryGetValue_True_Dictionary	#87688	1000	1	115,510.37 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	1000	1	2,559.60 ns	1.09	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	1000	1	2,341.85 ns	1.00	-	NA

ToDictionary	#87688	1000	5	7,541.47 ns	-	31016 B	-
ToFrozenDictionary_Optimized	#87876	1000	5	9,438.56 ns	0.24	20128 B	0.23
ToFrozenDictionary_Optimized	#87688	1000	5	39,774.73 ns	1.00	88040 B	1.00

TryGetValue_True_Dictionary	#87688	1000	5	31,294.80 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	1000	5	9,168.88 ns	1.12	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	1000	5	8,156.66 ns	1.00	-	NA

ToDictionary	#87688	10000	1	112,776.03 ns	-	283020 B	-
ToFrozenDictionary_Optimized	#87876	10000	1	263,961.31 ns	0.07	360130 B	0.12
ToFrozenDictionary_Optimized	#87688	10000	1	4,023,881.49 ns	1.00	2942142 B	1.00

TryGetValue_True_Dictionary	#87688	10000	1	12,874,506.46 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	10000	1	65,572.38 ns	1.08	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	10000	1	60,486.95 ns	1.00	-	NA

ToDictionary	#87688	10000	5	100,764.40 ns	-	283090 B	-
ToFrozenDictionary_Optimized	#87876	10000	5	168,735.65 ns	0.15	200128 B	0.23
ToFrozenDictionary_Optimized	#87688	10000	5	1,097,493.94 ns	1.00	871771 B	1.00

TryGetValue_True_Dictionary	#87688	10000	5	2,732,495.46 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	10000	5	113,734.85 ns	1.06	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	10000	5	107,283.76 ns	1.00	-	NA

we rent a single dimension array, where every bucket has five slots. The bucket starts at (key.Length - minLength) * 5. Each value is an index of the key from _keys array or just -1, which represents "null". Creation time is from 2 to 10 times faster Indexing is 4-10% slower, but still and order of magnitude faster than Dictionary

ghost · 2023-06-21T18:55:11Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Instead of creating a dictionary of lists and a multi-dimensional array we rent a single dimension array, where every bucket has five slots.
The bucket starts at (key.Length - minLength) * 5 index of the array.
Each value is an index of the key from _keys array or just -1, which represents "null".
We avoid having two copies and re-ordering of keys and values collections.

Creation time is from x2 to x10 times faster, not more than 2x slower compared to Dictionary.
Indexing is 4-10% slower, but still and order of magnitude faster than Dictionary.

Author:	adamsitnik
Assignees:	-
Labels:	`area-System.Collections`, `tenet-performance`
Milestone:	-

stephentoub · 2023-06-22T12:40:09Z

....Collections.Immutable/src/System/Collections/Frozen/String/LengthBucketsFrozenDictionary.cs

            {
+                // This size check prevents OOM.


Not really :) It just prevents artificial OOMs from arrays of unsupported sizes. You could still try to rent an array of length Array.MaxLength - 1 and OOM.

stephentoub · 2023-06-22T12:40:18Z

....Collections.Immutable/src/System/Collections/Frozen/String/LengthBucketsFrozenDictionary.cs

 #if NET6_0_OR_GREATER
-                List<KeyValuePair<string, TValue>> list = CollectionsMarshal.GetValueRefOrAddDefault(groupedByLength, s.Length, out _) ??= new List<KeyValuePair<string, TValue>>(MaxPerLength);
+            if (arraySize >= Array.MaxLength)


Why >= rather than >?

stephentoub · 2023-06-22T12:42:02Z

....Collections.Immutable/src/System/Collections/Frozen/String/LengthBucketsFrozenDictionary.cs


 namespace System.Collections.Frozen
 {
    /// <summary>Provides a frozen dictionary implementation where strings are grouped by their lengths.</summary>
    internal sealed class LengthBucketsFrozenDictionary<TValue> : FrozenDictionary<string, TValue>
    {
-        /// <summary>Allowed ratio between buckets with values and total buckets.  Under this ratio, this implementation won't be used due to too much wasted space.</summary>
-        private const double EmptyLengthsRatio = 0.2;


I don't understand why it's ok to do away with this concept. If I have one string of length 1_000_000 and one string of length 1, won't I now end up allocating an array of 5 million elements?

If I have one string of length 1_000_000 and one string of length 1, won't I now end up allocating an array of 5 million elements?

This case is the first thing we check and I've left it untouched:

runtime/src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/String/LengthBucketsFrozenDictionary.cs

Lines 43 to 49 in 29d5bfc

// If without even looking at the keys we know that some bucket will exceed the max per-bucket

// limit (pigeon hole principle), we can early-exit out without doing any further work.

int spread = maxLength - minLength + 1;

if (keys.Length / spread > MaxPerLength)

{

return null;

}

I am going to benchmark whether it's beneficial to remove this check or not and get back to you.

This case is the first thing we check and I've left it untouched:

No, that's different. The check that's there is seeing whether it can be easily predetermined whether any bucket might have too many elements in it. In contrast, the EmptyLengthsRatio you deleted was used to determine whether the lengths were too spread out such that we'd be wasting a ton of space with holes in the array.

I am going to benchmark whether it's beneficial to remove this check or not and get back to you.

For MaxPerLength you mean? For a given bucket, the implementation is doing a linear scan of every element. If 1000 strings were to end up in the same bucket and every look up involved comparing up to 1000 strings, it would absolutely be worse than an O(1) dictionary lookup. So there is going to be some threshold where we no longer want to use this strategy. It's certainly possible, especially with the other changes, that MaxPerLength can be increased, but it can't become infinite.

stephentoub · 2023-06-22T12:44:33Z

....Collections.Immutable/src/System/Collections/Frozen/String/LengthBucketsFrozenDictionary.cs

+                if (buckets[index] < 0) buckets[index] = i;
+                else if (buckets[index + 1] < 0) buckets[index + 1] = i;
+                else if (buckets[index + 2] < 0) buckets[index + 2] = i;
+                else if (buckets[index + 3] < 0) buckets[index + 3] = i;
+                else if (buckets[index + 4] < 0) buckets[index + 4] = i;


It's brittle to manually unroll this loop that's tied to the MaxPerLength constant (e.g. if the constant were increased or decreased, this unrolling would be wrong). Can we just make this a loop?

stephentoub · 2023-06-22T12:46:44Z

....Collections.Immutable/src/System/Collections/Frozen/String/LengthBucketsFrozenDictionary.cs

+            // AllocateUninitializedArray is slower for small inputs, hence the size check.
+            int[] copy = arraySize < 1000
+                ? new int[arraySize]
+                : GC.AllocateUninitializedArray<int>(arraySize);


The first thing GC.AllocateUninitializedArray<int> does is a length < 512 check. Can we just let it do its thing rather than also checking here?

....Collections.Immutable/src/System/Collections/Frozen/String/LengthBucketsFrozenDictionary.cs

danmoseley · 2023-06-22T12:53:43Z

Cc @geeknoid in case he's interested in this series of PRs

adamsitnik · 2023-06-22T16:38:39Z

Updated perf numbers: Creation time is from x2 to x16 times faster, not more than 2x slower compared to Dictionary.
Indexing is 0-8% slower, but for large inputs it's still and order of magnitude faster than Dictionary.

Method	Job	Count	PerBucket	Mean	Ratio	Allocated	Alloc Ratio
ToDictionary	#87688	10	1	102.98 ns	-	440 B	-
ToFrozenDictionary_Optimized	#87876	10	1	232.24 ns	0.26	488 B	0.16
ToFrozenDictionary_Optimized	#87688	10	1	908.94 ns	1.00	3120 B	1.00

TryGetValue_True_Dictionary	#87688	10	1	94.47 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	10	1	22.66 ns	1.04	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	10	1	21.77 ns	1.00	-	NA

ToDictionary	#87688	10	5	101.63 ns	-	440 B	-
ToFrozenDictionary_Optimized	#87876	10	5	236.91 ns	0.46	328 B	0.33
ToFrozenDictionary_Optimized	#87688	10	5	520.80 ns	1.00	1000 B	1.00

TryGetValue_True_Dictionary	#87688	10	5	89.71 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	10	5	88.69 ns	1.07	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	10	5	82.67 ns	1.00	-	NA

ToDictionary	#87688	100	1	772.06 ns	-	3128 B	-
ToFrozenDictionary_Optimized	#87876	100	1	1,014.77 ns	0.14	3728 B	0.12
ToFrozenDictionary_Optimized	#87688	100	1	7,009.70 ns	1.00	30320 B	1.00

TryGetValue_True_Dictionary	#87688	100	1	1,749.18 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	100	1	203.88 ns	1.02	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	100	1	200.65 ns	1.00	-	NA

ToDictionary	#87688	100	5	763.90 ns	-	3128 B	-
ToFrozenDictionary_Optimized	#87876	100	5	1,076.07 ns	0.27	2128 B	0.24
ToFrozenDictionary_Optimized	#87688	100	5	3,980.54 ns	1.00	8768 B	1.00

TryGetValue_True_Dictionary	#87688	100	5	1,031.84 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	100	5	968.47 ns	1.09	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	100	5	886.03 ns	1.00	-	NA

ToDictionary	#87688	1000	1	7,414.00 ns	-	31016 B	-
ToFrozenDictionary_Optimized	#87876	1000	1	9,243.94 ns	0.12	36128 B	0.12
ToFrozenDictionary_Optimized	#87688	1000	1	76,829.34 ns	1.00	302344 B	1.00

TryGetValue_True_Dictionary	#87688	1000	1	113,945.71 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	1000	1	2,335.38 ns	1.00	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	1000	1	2,327.41 ns	1.00	-	NA

ToDictionary	#87688	1000	5	7,124.45 ns	-	31016 B	-
ToFrozenDictionary_Optimized	#87876	1000	5	9,365.89 ns	0.24	20128 B	0.23
ToFrozenDictionary_Optimized	#87688	1000	5	38,455.22 ns	1.00	88040 B	1.00

TryGetValue_True_Dictionary	#87688	1000	5	30,516.19 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	1000	5	8,836.81 ns	1.05	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	1000	5	8,438.84 ns	1.00	-	NA

ToDictionary	#87688	10000	1	115,595.22 ns	-	283023 B	1.00
ToFrozenDictionary_Optimized	#87876	10000	1	253,471.88 ns	0.06	360129 B	0.12
ToFrozenDictionary_Optimized	#87688	10000	1	3,989,689.41 ns	1.00	2942141 B	1.00

TryGetValue_True_Dictionary	#87688	10000	1	12,715,860.97 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	10000	1	63,461.63 ns	1.06	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	10000	1	59,874.28 ns	1.00	-	NA

ToDictionary	#87688	10000	5	95,454.42 ns	-	283088 B	-
ToFrozenDictionary_Optimized	#87876	10000	5	176,332.31 ns	0.17	200129 B	0.23
ToFrozenDictionary_Optimized	#87688	10000	5	1,068,480.18 ns	1.00	871771 B	1.00

TryGetValue_True_Dictionary	#87688	10000	5	2,605,945.23 ns	-	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87876	10000	5	114,240.60 ns	1.08	-	NA
TryGetValue_True_FrozenDictionaryOptimized	#87688	10000	5	105,536.70 ns	1.00	-	NA

adamsitnik added area-System.Collections tenet-performance Performance related issue labels Jun 21, 2023

adamsitnik requested a review from stephentoub June 21, 2023 18:54

ghost assigned adamsitnik Jun 21, 2023

stephentoub reviewed Jun 22, 2023

View reviewed changes

....Collections.Immutable/src/System/Collections/Frozen/String/LengthBucketsFrozenDictionary.cs Show resolved Hide resolved

address code review feedback

30e3671

adamsitnik requested a review from stephentoub June 22, 2023 16:39

adamsitnik mentioned this pull request Jun 23, 2023

Frozen collection construction performance #87964

Closed

stephentoub approved these changes Jun 23, 2023

View reviewed changes

stephentoub merged commit a05537e into dotnet:main Jun 23, 2023
105 checks passed

dotnet locked as resolved and limited conversation to collaborators Jul 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster optimized frozen dictionary creation (4/n) #87876

Faster optimized frozen dictionary creation (4/n) #87876

adamsitnik commented Jun 21, 2023 •

edited

ghost commented Jun 21, 2023

stephentoub Jun 22, 2023

stephentoub Jun 22, 2023

stephentoub Jun 22, 2023

adamsitnik Jun 22, 2023

stephentoub Jun 22, 2023

stephentoub Jun 22, 2023

stephentoub Jun 22, 2023

stephentoub Jun 22, 2023

danmoseley commented Jun 22, 2023 •

edited

adamsitnik commented Jun 22, 2023

	// If without even looking at the keys we know that some bucket will exceed the max per-bucket
	// limit (pigeon hole principle), we can early-exit out without doing any further work.
	int spread = maxLength - minLength + 1;
	if (keys.Length / spread > MaxPerLength)
	{
	return null;
	}

Faster optimized frozen dictionary creation (4/n) #87876

Faster optimized frozen dictionary creation (4/n) #87876

Conversation

adamsitnik commented Jun 21, 2023 • edited

ghost commented Jun 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danmoseley commented Jun 22, 2023 • edited

adamsitnik commented Jun 22, 2023

adamsitnik commented Jun 21, 2023 •

edited

danmoseley commented Jun 22, 2023 •

edited