Skip to content

Fix FrozenSet/FrozenDictionary load factor cliff at ~3.5M unique hash codes#125939

Draft
sachinsharma3191 wants to merge 9 commits intodotnet:mainfrom
sachinsharma3191:fix/125878-frozen-hashtable-load-factor
Draft

Fix FrozenSet/FrozenDictionary load factor cliff at ~3.5M unique hash codes#125939
sachinsharma3191 wants to merge 9 commits intodotnet:mainfrom
sachinsharma3191:fix/125878-frozen-hashtable-load-factor

Conversation

@sachinsharma3191
Copy link

@sachinsharma3191 sachinsharma3191 commented Mar 23, 2026

Fixes #125878

When uniqueCodesCount * 2 exceeds the largest precomputed prime in the HashHelpers.Primes table, the fallback used GetPrime(uniqueCodesCount) which returns a prime just above uniqueCodesCount, giving ~1.0 load factor. This created a sharp performance cliff at ~3.5M unique hash codes.

Use GetPrime(minNumBuckets) in the fallback to maintain ~0.5 load factor when exceeding the precomputed primes table, avoiding the cliff.

… codes

When uniqueCodesCount * 2 exceeds the largest precomputed prime, the
fallback used GetPrime(uniqueCodesCount) giving ~1.0 load factor and
a sharp performance cliff. Use GetPrime(minNumBuckets) to maintain
~0.5 load factor when exceeding the primes table.

Fixes dotnet#125878

Made-with: Cursor
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 23, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

@sachinsharma3191 sachinsharma3191 marked this pull request as ready for review March 23, 2026 05:07
@jeffhandley
Copy link
Member

@MihuBot

@eiriktsarpalis
Copy link
Member

Note

This comment was generated by GitHub Copilot.

@EgorBot -amd -intel

using BenchmarkDotNet.Attributes;
using System.Collections.Frozen;
using System.Collections.Generic;
using System.Linq;

public class Perf_FrozenSet_LoadFactorCliff
{
    private FrozenSet<int> _frozenSet;
    private int[] _hitKeys;
    private int[] _missKeys;

    [Params(3_000_000, 4_000_000)]
    public int Count;

    [GlobalSetup]
    public void Setup()
    {
        _frozenSet = Enumerable.Range(0, Count).ToFrozenSet();

        var rng = new Random(42);
        _hitKeys = new int[1024];
        for (int i = 0; i < _hitKeys.Length; i++)
            _hitKeys[i] = rng.Next(Count);

        _missKeys = new int[1024];
        for (int i = 0; i < _missKeys.Length; i++)
            _missKeys[i] = Count + i;
    }

    [Benchmark]
    public bool Contains_True()
    {
        bool result = false;
        var set = _frozenSet;
        var keys = _hitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }

    [Benchmark]
    public bool Contains_False()
    {
        bool result = false;
        var set = _frozenSet;
        var keys = _missKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }
}

public class Perf_FrozenDictionary_String_LoadFactorCliff
{
    private FrozenDictionary<string, int> _frozenDict;
    private string[] _hitKeys;
    private string[] _missKeys;

    [Params(3_000_000, 4_000_000)]
    public int Count;

    [GlobalSetup]
    public void Setup()
    {
        var dict = new Dictionary<string, int>(Count);
        for (int i = 0; i < Count; i++)
            dict[i.ToString("D8")] = i;

        _frozenDict = dict.ToFrozenDictionary();

        var rng = new Random(42);
        _hitKeys = new string[1024];
        for (int i = 0; i < _hitKeys.Length; i++)
            _hitKeys[i] = rng.Next(Count).ToString("D8");

        _missKeys = new string[1024];
        for (int i = 0; i < _missKeys.Length; i++)
            _missKeys[i] = (Count + i).ToString("D8");
    }

    [Benchmark]
    public bool TryGetValue_True()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _hitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }

    [Benchmark]
    public bool TryGetValue_False()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _missKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }
}

@eiriktsarpalis
Copy link
Member

@sachinsharma3191 @danmoseley I am not entirely familiar with the cited issue, what is the scenario resulting in a measurable performance cliff? The benchmarks from EgorBot tend towards this creating a marginal regression.

@danmoseley
Copy link
Member

danmoseley commented Mar 24, 2026

@eiriktsarpalis I didn't investigate whether there was a performance regression that we care about -- I figured a fix (this one maybe) would prove that there was.

All I observed in the code was that at an arbitrary point of ~3.5M unique hash codes (so, exactly as many keys in your code above) the FrozenDictionary transitions from working hard trying different primes to get the best load factor of approx 0.5 for the most perf, to giving up entirely and picking one prime that gives a load factor near 1.0.

Assuming the main value prop of these collections is precisely that what it does to minimise collisions, this seems a clear defect. (Or we could just say that collections over 3.5M are not interesting to optimize for.)

If it didn't show up as worse perf, I'd be curious why we bother with all the prime hunting for smaller tables. So something is wrong somewhere. I do not know why your perf test above doesn't show a benefit. It would be good to understand, It's not worth taking this or any change unless we see an improvement.

Another option is that I and copilot didn't understand the code. That's a non zero possibility but it would surprise me.

@danmoseley
Copy link
Member

@EgorBot -intel

using BenchmarkDotNet.Attributes;
using System.Collections.Frozen;
using System.Collections.Generic;
using System.Linq;

public class Perf_FrozenSet_LoadFactorCliff
{
    private FrozenSet<int> _frozenSet;
    private int[] _hitKeys;
    private int[] _missKeys;

    [Params(2_000_000, 3_000_000, 4_000_000, 6_000_000, 8_000_000, 16_000_000)]
    public int Count;

    [GlobalSetup]
    public void Setup()
    {
        _frozenSet = Enumerable.Range(0, Count).ToFrozenSet();

        var rng = new Random(42);
        _hitKeys = new int[1024];
        for (int i = 0; i < _hitKeys.Length; i++)
            _hitKeys[i] = rng.Next(Count);

        _missKeys = new int[1024];
        for (int i = 0; i < _missKeys.Length; i++)
            _missKeys[i] = Count + i;
    }

    [Benchmark]
    public bool Contains_True()
    {
        bool result = false;
        var set = _frozenSet;
        var keys = _hitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }

    [Benchmark]
    public bool Contains_False()
    {
        bool result = false;
        var set = _frozenSet;
        var keys = _missKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }
}

public class Perf_FrozenDictionary_String_LoadFactorCliff
{
    private FrozenDictionary<string, int> _frozenDict;
    private string[] _hitKeys;
    private string[] _missKeys;

    [Params(2_000_000, 3_000_000, 4_000_000, 6_000_000, 8_000_000, 16_000_000)]
    public int Count;

    [GlobalSetup]
    public void Setup()
    {
        var dict = new Dictionary<string, int>(Count);
        for (int i = 0; i < Count; i++)
            dict[i.ToString("D8")] = i;

        _frozenDict = dict.ToFrozenDictionary();

        var rng = new Random(42);
        _hitKeys = new string[1024];
        for (int i = 0; i < _hitKeys.Length; i++)
            _hitKeys[i] = rng.Next(Count).ToString("D8");

        _missKeys = new string[1024];
        for (int i = 0; i < _missKeys.Length; i++)
            _missKeys[i] = (Count + i).ToString("D8");
    }

    [Benchmark]
    public bool TryGetValue_True()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _hitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }

    [Benchmark]
    public bool TryGetValue_False()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _missKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }
}

@danmoseley
Copy link
Member

The dictionary test above uses strings as keys, so it will have fewer unique hashcodes (and more expensive to calculate too) than using ints. I added larger numbers to try. But I don't have a theory. Let me see what copilot thinks.

danmoseley and others added 3 commits March 24, 2026 11:52
The early-exit path for collections near Array.MaxLength had the same
bug as the primes-table-exhaustion fallback: it used GetPrime(uniqueCodesCount)
giving ~1.0 load factor instead of GetPrime(minNumBuckets) for ~0.5.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the implicit primes.Length check with an explicit
LargestPrecomputedPrime constant (7,199,369). This makes the threshold
visible in CalcNumBuckets rather than being an opaque dependency on the
HashHelpers.Primes table size. The loop no longer needs a bounds check
since we verify we are within range before entering it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add DOTNET_FROZEN_SKIP_TUNING and DOTNET_FROZEN_TUNING_THRESHOLD env vars
to CalcNumBuckets, read uncached during construction only (not on the
lookup hot path). This allows EgorBot benchmarks to test whether the
collision-counting loop helps lookup performance at various sizes.

This commit is temporary and should be reverted before merge.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danmoseley
Copy link
Member

Note

This comment was generated by GitHub Copilot.

Investigating whether the prime-tuning collision loop in CalcNumBuckets helps lookup performance at medium sizes (1K–500K). See analysis. Two classes below: baseline (default tuning) vs skip-tuning (GetPrime(2*N) only, no collision-counting loop). Comparing within the same EgorBot run.

@EgorBot -intel

using BenchmarkDotNet.Attributes;
using System.Collections.Frozen;
using System.Collections.Generic;
using System.Linq;

[MemoryDiagnoser]
public class Perf_FrozenSet_Int_TuningBaseline
{
    private FrozenSet<int> _frozenSet;
    private int[] _hitKeys;
    private int[] _missKeys;

    [Params(1_000, 10_000, 50_000, 100_000, 500_000)]
    public int Count;

    [GlobalSetup]
    public void Setup()
    {
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_SKIP_TUNING", null);
        _frozenSet = Enumerable.Range(0, Count).ToFrozenSet();

        var rng = new Random(42);
        _hitKeys = new int[1024];
        for (int i = 0; i < _hitKeys.Length; i++)
            _hitKeys[i] = rng.Next(Count);

        _missKeys = new int[1024];
        for (int i = 0; i < _missKeys.Length; i++)
            _missKeys[i] = Count + i;
    }

    [Benchmark]
    public bool Contains_Hit()
    {
        bool result = false;
        var set = _frozenSet;
        var keys = _hitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }

    [Benchmark]
    public bool Contains_Miss()
    {
        bool result = false;
        var set = _frozenSet;
        var keys = _missKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }
}

[MemoryDiagnoser]
public class Perf_FrozenSet_Int_SkipTuning
{
    private FrozenSet<int> _frozenSet;
    private int[] _hitKeys;
    private int[] _missKeys;

    [Params(1_000, 10_000, 50_000, 100_000, 500_000)]
    public int Count;

    [GlobalSetup]
    public void Setup()
    {
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_SKIP_TUNING", "1");
        _frozenSet = Enumerable.Range(0, Count).ToFrozenSet();
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_SKIP_TUNING", null);

        var rng = new Random(42);
        _hitKeys = new int[1024];
        for (int i = 0; i < _hitKeys.Length; i++)
            _hitKeys[i] = rng.Next(Count);

        _missKeys = new int[1024];
        for (int i = 0; i < _missKeys.Length; i++)
            _missKeys[i] = Count + i;
    }

    [Benchmark]
    public bool Contains_Hit()
    {
        bool result = false;
        var set = _frozenSet;
        var keys = _hitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }

    [Benchmark]
    public bool Contains_Miss()
    {
        bool result = false;
        var set = _frozenSet;
        var keys = _missKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }
}

@danmoseley
Copy link
Member

Note

This comment was generated by GitHub Copilot.

FrozenDictionary<int, int> — baseline vs skip-tuning at medium sizes.

@EgorBot -intel

using BenchmarkDotNet.Attributes;
using System.Collections.Frozen;
using System.Collections.Generic;
using System.Linq;

[MemoryDiagnoser]
public class Perf_FrozenDict_Int_TuningBaseline
{
    private FrozenDictionary<int, int> _frozenDict;
    private int[] _hitKeys;
    private int[] _missKeys;

    [Params(1_000, 10_000, 50_000, 100_000, 500_000)]
    public int Count;

    [GlobalSetup]
    public void Setup()
    {
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_SKIP_TUNING", null);
        _frozenDict = Enumerable.Range(0, Count).ToDictionary(i => i, i => i).ToFrozenDictionary();

        var rng = new Random(42);
        _hitKeys = new int[1024];
        for (int i = 0; i < _hitKeys.Length; i++)
            _hitKeys[i] = rng.Next(Count);

        _missKeys = new int[1024];
        for (int i = 0; i < _missKeys.Length; i++)
            _missKeys[i] = Count + i;
    }

    [Benchmark]
    public bool TryGetValue_Hit()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _hitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }

    [Benchmark]
    public bool TryGetValue_Miss()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _missKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }
}

[MemoryDiagnoser]
public class Perf_FrozenDict_Int_SkipTuning
{
    private FrozenDictionary<int, int> _frozenDict;
    private int[] _hitKeys;
    private int[] _missKeys;

    [Params(1_000, 10_000, 50_000, 100_000, 500_000)]
    public int Count;

    [GlobalSetup]
    public void Setup()
    {
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_SKIP_TUNING", "1");
        _frozenDict = Enumerable.Range(0, Count).ToDictionary(i => i, i => i).ToFrozenDictionary();
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_SKIP_TUNING", null);

        var rng = new Random(42);
        _hitKeys = new int[1024];
        for (int i = 0; i < _hitKeys.Length; i++)
            _hitKeys[i] = rng.Next(Count);

        _missKeys = new int[1024];
        for (int i = 0; i < _missKeys.Length; i++)
            _missKeys[i] = Count + i;
    }

    [Benchmark]
    public bool TryGetValue_Hit()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _hitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }

    [Benchmark]
    public bool TryGetValue_Miss()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _missKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }
}

@danmoseley
Copy link
Member

Note

This comment was generated by GitHub Copilot.

FrozenDictionary<string, int> — baseline vs skip-tuning. Strings have more expensive Equals, so load factor should matter more here.

@EgorBot -intel

using BenchmarkDotNet.Attributes;
using System.Collections.Frozen;
using System.Collections.Generic;
using System.Linq;

[MemoryDiagnoser]
public class Perf_FrozenDict_String_TuningBaseline
{
    private FrozenDictionary<string, int> _frozenDict;
    private string[] _hitKeys;
    private string[] _missKeys;

    [Params(1_000, 10_000, 50_000, 100_000, 500_000)]
    public int Count;

    [GlobalSetup]
    public void Setup()
    {
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_SKIP_TUNING", null);
        var dict = new Dictionary<string, int>(Count);
        for (int i = 0; i < Count; i++)
            dict[i.ToString("D8")] = i;

        _frozenDict = dict.ToFrozenDictionary();

        var rng = new Random(42);
        _hitKeys = new string[1024];
        for (int i = 0; i < _hitKeys.Length; i++)
            _hitKeys[i] = rng.Next(Count).ToString("D8");

        _missKeys = new string[1024];
        for (int i = 0; i < _missKeys.Length; i++)
            _missKeys[i] = (Count + i).ToString("D8");
    }

    [Benchmark]
    public bool TryGetValue_Hit()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _hitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }

    [Benchmark]
    public bool TryGetValue_Miss()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _missKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }
}

[MemoryDiagnoser]
public class Perf_FrozenDict_String_SkipTuning
{
    private FrozenDictionary<string, int> _frozenDict;
    private string[] _hitKeys;
    private string[] _missKeys;

    [Params(1_000, 10_000, 50_000, 100_000, 500_000)]
    public int Count;

    [GlobalSetup]
    public void Setup()
    {
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_SKIP_TUNING", "1");
        var dict = new Dictionary<string, int>(Count);
        for (int i = 0; i < Count; i++)
            dict[i.ToString("D8")] = i;

        _frozenDict = dict.ToFrozenDictionary();
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_SKIP_TUNING", null);

        var rng = new Random(42);
        _hitKeys = new string[1024];
        for (int i = 0; i < _hitKeys.Length; i++)
            _hitKeys[i] = rng.Next(Count).ToString("D8");

        _missKeys = new string[1024];
        for (int i = 0; i < _missKeys.Length; i++)
            _missKeys[i] = (Count + i).ToString("D8");
    }

    [Benchmark]
    public bool TryGetValue_Hit()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _hitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }

    [Benchmark]
    public bool TryGetValue_Miss()
    {
        bool result = false;
        var dict = _frozenDict;
        var keys = _missKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }
}

@danmoseley
Copy link
Member

Ideally we would have posted one egorbot, it's spinning up extra VM's oh well.

@danmoseley
Copy link
Member

danmoseley commented Mar 24, 2026

Note

This comment was generated by GitHub Copilot.

EgorBot results: tuning ON vs tuning OFF

The three benchmarks above compared main (tuning loop active) vs PR branch with DOTNET_FROZEN_SKIP_TUNING=1 (tuning loop skipped, just uses GetPrime(2*N)). Results at 1K, 10K, 50K, 100K, 500K entries:

  • FrozenDictionary<int, int> -- no difference at any size, hits or misses. All ratios 0.99-1.01.
  • FrozenSet<int> -- no difference. All ratios 0.98-1.03, within noise.
  • FrozenDictionary<string, int> -- mostly noise, but a possible small signal at 500K misses: main was 3-7% faster than skip-tuning across two runs. Other sizes/scenarios showed no consistent difference.

Why the tuning loop doesn't help

CalcNumBuckets starts at GetPrime(2*N) (load factor ~0.5) and tries progressively larger primes until collisions drop below 5%. But FrozenHashTable stores colliding entries contiguously in HashCodes[] -- collision chains are sequential scans within the same cache line.

The extra comparisons from collisions are hash code comparisons (cheap int scan), not key Equals calls. Equals is only called when a hash code matches, which for well-distributed 32-bit hashes is ~once per hit and ~never per miss -- regardless of load factor or tuning. So even expensive key types (strings, custom comparers) don't amplify the collision cost difference.

For large inputs (>=1K entries), MaxLargeBucketTableMultiplier = 3, so the tuning loop searches primes from ~2N up to ~3N. At 3N buckets the expected collision rate (birthday problem: ~N^2/(2M)) is ~17% -- still well above the 5% target. So the tuning loop can't meet its collision target at medium-to-large sizes and exhausts its search range, falling back to the best prime found. This means it's spending construction time without achieving its goal.

Assumption: this analysis assumes well-behaved GetHashCode() implementations that distribute uniformly across the 32-bit space. With such hash functions, actual hash code collisions (distinct keys with identical hash codes) are negligible until collection sizes reach hundreds of millions. Poorly distributed hash functions are out of scope -- the fix for those is fixing the hash function, not tuning bucket counts.

Does the base load factor (0.5) matter either?

The tuning loop is one knob; the base load factor is another. Currently CalcNumBuckets targets LF ~0.5 by requesting GetPrime(2*N) buckets before the tuning loop runs. If collisions are nearly free thanks to the contiguous layout, this 2x bucket overhead may also be unnecessary.

At LF 1.0 (GetPrime(N)) vs LF 0.5 (GetPrime(2*N)):

  • Collision cost increase: ~0.5 more hash code comparisons per lookup (avg bucket size 2.0 vs 1.5 for hits). At ~0.3ns per int comparison, that's ~0.15ns per lookup.
  • Memory savings: bucket array is half the size. At 100K entries with 8-byte Bucket structs: 1.6MB down to 800KB.
  • Better locality: smaller array = more fits in L2/L3 = fewer cache misses on the bucket lookup itself.

For cheap key types (int, long, Guid), LF 1.0 could actually be faster -- the cache/locality gain from the smaller bucket array may exceed the ~0.15ns collision penalty. Dictionary<K,V> uses LF ~0.72 despite having scattered collision chains (pointer chasing), which are far more expensive than FrozenHashTable's contiguous scans.

For ref types and expensive comparers, the load factor story is the same -- the extra comparisons are hash code comparisons (cheap), not Equals calls. Equals call count is determined by hash code collisions (same 32-bit value), not bucket collisions (same modulo). With well-distributed hashes, Equals is called ~once per hit and ~0 per miss at any load factor.

Note that even at LF 0.5, 39% of lookups already involve chaining (bucket size >= 2). At LF 0.75 (what Dictionary uses) it's 53%, and at LF 1.0 it's 63% -- using the formula 1 - e^(-N/M). The avg hash code comparisons per hit are 1.5, 1.75, and 2.0 respectively. Intermediate values like LF 0.75 are worth exploring if LF 1.0 shows a regression, but we're starting with the extremes (1 vs 2 multiplier) to find out whether load factor matters at all.

Planned follow-up experiment

We've added DOTNET_FROZEN_BUCKET_MULTIPLIER=N env var (construction-time only) to control the base load factor: default 2 (LF ~0.5), set to 1 for LF ~1.0.

The key question: does the smaller bucket array at LF 1.0 give a measurable locality win that offsets ~0.5 extra hash code comparisons per lookup?

Plan: a single EgorBot run testing LF 0.5 vs LF 1.0 (both with tuning OFF via DOTNET_FROZEN_TUNING_THRESHOLD=0) across:

  • FrozenSet<int> at 1K, 10K, 50K, 100K, 500K (hits + misses)
  • FrozenDictionary<string, int> at 10K, 50K, 100K, 200K, 500K (hits + misses)

If LF 1.0 matches or beats LF 0.5: both the tuning loop and the 2x bucket overhead are dead weight. CalcNumBuckets could be simplified to just GetPrime(N).

DOTNET_FROZEN_SKIP_TUNING is redundant with DOTNET_FROZEN_TUNING_THRESHOLD=0.
Add DOTNET_FROZEN_BUCKET_MULTIPLIER=N to test load factor impact
(default 2 for LF ~0.5, set to 1 for LF ~1.0).

This commit is temporary and should be reverted before merge.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danmoseley
Copy link
Member

danmoseley commented Mar 24, 2026

@sachinsharma3191 thanks for your change, as you see it appears that counter intuitively fixing this odd cliff does not help perf. So we've hijacked this PR for now to have AI run some experiments that might suggest we significantly change the implementation...

@danmoseley danmoseley marked this pull request as draft March 24, 2026 20:45
@danmoseley danmoseley added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 24, 2026
@danmoseley
Copy link
Member

@EgorBot -intel

// EgorBot benchmark: Does the base load factor (0.5 vs 1.0) matter for FrozenSet/FrozenDictionary lookup?
//
// Background:
//   FrozenHashTable.CalcNumBuckets allocates ~2N buckets (load factor ~0.5) then runs a collision-counting
//   tuning loop to try to push collisions below 5%. Our prior experiment showed the tuning loop provides
//   zero measurable benefit (see EgorBot/Benchmarks#66-68). This experiment tests whether the BASE load
//   factor matters: does doubling the bucket count (LF 0.5) actually help vs just using N buckets (LF 1.0)?
//
//   At LF 1.0 the frozen collection uses half the memory and has better cache locality, but ~63% of buckets
//   have >1 entry (vs ~39% at LF 0.5). However, collisions in FrozenHashTable are cheap: entries are stored
//   contiguously, so chain traversal is a sequential int-comparison scan within the same cache line.
//
// Env vars (only effective on PR build; ignored on main):
//   DOTNET_FROZEN_BUCKET_MULTIPLIER=N: bucket count = N * uniqueCodesCount. Default 2 (LF ~0.5), set to 1 for LF ~1.0.
//   DOTNET_FROZEN_TUNING_THRESHOLD=0: skip the collision-counting tuning loop entirely.
//
// Setup:
//   Two frozen collections are created per GlobalSetup with different env-var-controlled load factors,
//   both with the tuning loop disabled. On the 'main' toolchain these env vars are ignored (code doesn't
//   exist there), so both collections use default behavior — serving as a baseline to confirm the
//   benchmark harness isn't introducing noise.
//
// What to look for in results:
//   Compare LF05_* vs LF10_* methods within the PR toolchain column.
//   If LF 1.0 matches or beats LF 0.5, the entire CalcNumBuckets complexity could be replaced
//   with a simple GetPrime(N).

using BenchmarkDotNet.Attributes;
using System.Collections.Frozen;
using System.Collections.Generic;
using System.Linq;

// FrozenSet<int>: cheap key type — hash and equality are both fast.
// Any LF difference here would be from the raw collision-chain scan cost.
[MemoryDiagnoser]
public class Perf_Frozen_LoadFactor
{
    private FrozenSet<int> _setLF05;
    private FrozenSet<int> _setLF10;
    private int[] _intHitKeys;
    private int[] _intMissKeys;

    private FrozenDictionary<string, int> _dictLF05;
    private FrozenDictionary<string, int> _dictLF10;
    private string[] _strHitKeys;
    private string[] _strMissKeys;

    [Params(1_000, 10_000, 100_000, 500_000)]
    public int Count;

    [GlobalSetup]
    public void Setup()
    {
        // --- FrozenSet<int> ---
        var intSource = Enumerable.Range(0, Count).ToHashSet();

        // Disable the collision-counting tuning loop for both configurations.
        // We want to isolate the effect of load factor alone.
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_TUNING_THRESHOLD", "0");

        // LF ~0.5: 2N buckets (current default)
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_BUCKET_MULTIPLIER", "2");
        _setLF05 = intSource.ToFrozenSet();

        // LF ~1.0: N buckets (half the memory, more collisions)
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_BUCKET_MULTIPLIER", "1");
        _setLF10 = intSource.ToFrozenSet();

        var rng = new Random(42);
        _intHitKeys = new int[1024];
        for (int i = 0; i < _intHitKeys.Length; i++)
            _intHitKeys[i] = rng.Next(Count); // guaranteed hits

        _intMissKeys = new int[1024];
        for (int i = 0; i < _intMissKeys.Length; i++)
            _intMissKeys[i] = Count + i; // guaranteed misses (all values are 0..Count-1)

        // --- FrozenDictionary<string, int> ---
        // Same-length keys prevent the length-bucket optimization; FrozenDictionary will
        // use the hash-based strategy that goes through CalcNumBuckets.
        var strData = new Dictionary<string, int>(Count);
        for (int i = 0; i < Count; i++)
            strData[$"key_{i:D8}"] = i;

        Environment.SetEnvironmentVariable("DOTNET_FROZEN_BUCKET_MULTIPLIER", "2");
        _dictLF05 = strData.ToFrozenDictionary();

        Environment.SetEnvironmentVariable("DOTNET_FROZEN_BUCKET_MULTIPLIER", "1");
        _dictLF10 = strData.ToFrozenDictionary();

        // Clean up env vars
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_TUNING_THRESHOLD", null);
        Environment.SetEnvironmentVariable("DOTNET_FROZEN_BUCKET_MULTIPLIER", null);

        _strHitKeys = new string[1024];
        for (int i = 0; i < _strHitKeys.Length; i++)
            _strHitKeys[i] = $"key_{rng.Next(Count):D8}";

        // Miss keys use a different prefix so they can't possibly match
        _strMissKeys = new string[1024];
        for (int i = 0; i < _strMissKeys.Length; i++)
            _strMissKeys[i] = $"miss{i:D8}";
    }

    // --- FrozenSet<int> lookups ---

    [Benchmark]
    public bool SetInt_LF05_Hit()
    {
        bool result = false;
        var set = _setLF05;
        var keys = _intHitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }

    [Benchmark]
    public bool SetInt_LF10_Hit()
    {
        bool result = false;
        var set = _setLF10;
        var keys = _intHitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }

    [Benchmark]
    public bool SetInt_LF05_Miss()
    {
        bool result = false;
        var set = _setLF05;
        var keys = _intMissKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }

    [Benchmark]
    public bool SetInt_LF10_Miss()
    {
        bool result = false;
        var set = _setLF10;
        var keys = _intMissKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= set.Contains(keys[i]);
        return result;
    }

    // --- FrozenDictionary<string, int> lookups ---
    // String keys exercise the OrdinalStringFrozenDictionary path with more expensive
    // hash computation and equality checks.

    [Benchmark]
    public bool DictStr_LF05_Hit()
    {
        bool result = false;
        var dict = _dictLF05;
        var keys = _strHitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }

    [Benchmark]
    public bool DictStr_LF10_Hit()
    {
        bool result = false;
        var dict = _dictLF10;
        var keys = _strHitKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }

    [Benchmark]
    public bool DictStr_LF05_Miss()
    {
        bool result = false;
        var dict = _dictLF05;
        var keys = _strMissKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }

    [Benchmark]
    public bool DictStr_LF10_Miss()
    {
        bool result = false;
        var dict = _dictLF10;
        var keys = _strMissKeys;
        for (int i = 0; i < keys.Length; i++)
            result ^= dict.TryGetValue(keys[i], out _);
        return result;
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Collections community-contribution Indicates that the PR has been added by a community member NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FrozenSet/FrozenDictionary: sharp load factor cliff at ~3.5M unique hash codes

4 participants