New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary CQ #15419

Merged
merged 4 commits into from Mar 1, 2018

Conversation

Projects
None yet
@benaadams
Collaborator

benaadams commented Dec 7, 2017

Up to x 1.5 speed up for finding value types, x1.2 for finding object types

  • Improve Dictionary FindEntry CQ #15460
  • Improve Dictionary TryInsert CQ #15462
  • Use EqualityComparer<TKey>.Default (only from #15411)
  • 1-bases _buckets so the don't be set to -1 on each resize

Adds 5kB (0.16% of base) to System.Private.CoreLib.ni.dll #15419 (comment)

Before

                                Method |
-------------------------------------- |
    ContainsKeySimplePositive_IntKey   | IEqualityComparer
    ContainsKeySimpleNegative_IntKey   | IEqualityComparer
 ContainsKeySimplePositive_StringKey   | IEqualityComparer
 ContainsKeySimpleNegative_StringKey   | IEqualityComparer
   ContainsKeySimplePositive_ObjectKey | IEqualityComparer
   ContainsKeySimpleNegative_ObjectKey | IEqualityComparer

After

                                Method |
-------------------------------------- |
    ContainsKeySimplePositive_IntKey   | devirtualized
    ContainsKeySimpleNegative_IntKey   | devirtualized
 ContainsKeySimplePositive_StringKey   | IEqualityComparer
 ContainsKeySimpleNegative_StringKey   | IEqualityComparer
   ContainsKeySimplePositive_ObjectKey | virtual
   ContainsKeySimpleNegative_ObjectKey | virtual
                          Method |Items |       Current |   Single func | Change |
-------------------------------- |----- |--------------:|--------------:|-------:|
   ContainsKey_Single_Yes_IntKey | 1000 |     11.231 ns |      7.110 ns | x 1.58 |
    ContainsKey_Single_No_IntKey | 1000 |     10.794 ns |      7.179 ns | x 1.50 |
          ContainsKey_All_IntKey | 1000 | 10,815.405 ns |  7,220.128 ns | x 1.50 |
ContainsKey_Single_Yes_StringKey | 1000 |     20.489 ns |     19.925 ns | x 1.03 |
 ContainsKey_Single_No_StringKey | 1000 |     12.747 ns |     13.414 ns | x 0.95 |
ContainsKey_Single_Yes_ObjectKey | 1000 |     11.233 ns |      9.119 ns | x 1.23 |
 ContainsKey_Single_No_ObjectKey | 1000 |     11.192 ns |      8.560 ns | x 1.31 |
                      Add_IntKey | 1000 | 27,573.689 ns | 22,503.652 ns | x 1.23 |
                   Add_StringKey | 1000 | 76,095.748 ns | 77,253.701 ns | x 0.99 |
                   Add_ObjectKey | 1000 | 68,845.402 ns | 67,529.550 ns | x 1.02 |
              PreSizedAdd_IntKey | 1000 | 11,558.450 ns |  9,658.953 ns | x 1.20 |
           PreSizedAdd_StringKey | 1000 | 57,360.936 ns | 57,710.254 ns | x 0.99 |
           PreSizedAdd_ObjectKey | 1000 | 54,780.942 ns | 50,300.463 ns | x 1.09 |

Performance-wise, multipath was faster but that increased code-gen by 27kB #15419 (comment)

Resolves: #16258

{
// To start, move off default comparer for string which is randomised

This comment has been minimized.

@jkotas

jkotas Dec 7, 2017

Member

Are we still going to take this path when somebody passes in EqualityComparer<string>.Default explicitly?

This comment has been minimized.

@benaadams

benaadams Dec 7, 2017

Collaborator

Yes as earlier test comparer != EqualityComparer<TKey>.Default will mean its null

This comment has been minimized.

@jkotas

jkotas Dec 7, 2017

Member

this.comparer will be null. comparer is going to be non-null.

This comment has been minimized.

@benaadams

benaadams Dec 7, 2017

Collaborator

ah, yes... would like to rename all the class levels - is a bit confusing.

Also, should be done in isolation as it makes any code change confusing.

This comment has been minimized.

@jkotas

jkotas Dec 7, 2017

Member

Right

This comment has been minimized.

@benaadams

benaadams Dec 7, 2017

Collaborator

fixed

for (int i = 0; i < count; i++)
{
if (entries[i].hashCode >= 0 && c.Equals(entries[i].value, value)) return true;
if (entries[i].hashCode >= 0 && (default(TValue) == null ? value.Equals(entries[i].value) : EqualityComparer<TValue>.Default.Equals(entries[i].value, value))) return true;

This comment has been minimized.

@jkotas

jkotas Dec 7, 2017

Member

This is functional change, e.g. when the value implements IEquatable.

This comment has been minimized.

@benaadams

benaadams Dec 7, 2017

Collaborator

fixed

@@ -358,7 +358,7 @@ private int FindEntry(TKey key)
int hashCode = (comparer != null ? comparer.GetHashCode(key) : key.GetHashCode()) & 0x7FFFFFFF;
for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next)
{
if (entries[i].hashCode == hashCode && (comparer != null ? comparer.Equals(entries[i].key, key) : (default(TKey) == null ? key.Equals(entries[i].key) : EqualityComparer<TKey>.Default.Equals(entries[i].key, key)))) return i;
if (entries[i].hashCode == hashCode && (comparer != null ? comparer.Equals(entries[i].key, key) : EqualityComparer<TKey>.Default.Equals(entries[i].key, key))) return i;

This comment has been minimized.

@omariom

omariom Dec 7, 2017

Contributor

why without ref Entry?

This comment has been minimized.

@benaadams

benaadams Dec 7, 2017

Collaborator

Going to follow up with ref changes in different PR

This comment has been minimized.

@benaadams

benaadams Dec 7, 2017

Collaborator

i.e. just pulling in the devirtualization of EqualityComparer<T>.Default #14125 to skip the interface dispatch

@benaadams benaadams changed the title from Dictionary - use EqualityComparer<TKey>.Default directly to Dictionary - use EqualityComparer<TKey>.Default devirtualization Dec 7, 2017

@benaadams benaadams changed the title from Dictionary - use EqualityComparer<TKey>.Default devirtualization to Dictionary - EqualityComparer<TKey>.Default devirtualization Dec 7, 2017

@jkotas

This comment has been minimized.

Member

jkotas commented Dec 7, 2017

@dotnet-bot test Windows_NT x64 Checked corefx_baseline

@jkotas

This comment has been minimized.

Member

jkotas commented Dec 7, 2017

@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@jkotas

This comment has been minimized.

Member

jkotas commented Dec 7, 2017

Do you have any numbers for this? Throughput, code size?

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Dec 7, 2017

@omariom

This comment has been minimized.

Contributor

omariom commented Dec 7, 2017

Is there a way to run perf tests from CoreFX? There are tests for Dictionary and HashSet there.

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Dec 7, 2017

I've never worked out how to run the Microsoft.Xunit.Performance tests

@danmosemsft

This comment has been minimized.

Member

danmosemsft commented Dec 7, 2017

I haven't done it in a while, but this describes how to run the tests and of course you could build with /p:CoreCLROverridePath before doing that.

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Dec 7, 2017

Failures are in Regex tests

System.OutOfMemoryException : Exception of type 'System.OutOfMemoryException' was thrown.

@benaadams benaadams changed the title from Dictionary - EqualityComparer<TKey>.Default devirtualization to [WIP] Dictionary - EqualityComparer<TKey>.Default devirtualization Dec 8, 2017

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Dec 8, 2017

While its an improvement; its a bit fragile (changing the code a little can loose the improvement) - so I think it needs work

             Method |     Mean |    Error |   StdDev |
------------------- |---------:|---------:|---------:|
 Knucleotide (Pre)  | 72.88 ms | 1.493 ms | 1.777 ms |
 Knucleotide (Post) | 67.64 ms | 1.349 ms | 1.847 ms |

The corefx tests produce odd results; with variations in the same tests at different sizes; feels like GC - also there are no object key tests (are string keys, but that's a special case)

I think this might be a longer journey...

@omariom

This comment has been minimized.

Contributor

omariom commented Dec 8, 2017

Did you use BDN?

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Dec 8, 2017

Did you use BDN?

Yeah, [InProcess] publish self-contained; copy coreclr release tests bin dlls (corefx+coreclr) on top; duplicate folder for each variant; copy Product coreclr dll over for each variant - bit manual; but means don't have to rebuild corefx each time to produce a NetCoreApp package using the right coreclr; which is a very slow iteration step

@AndyAyersMS

This comment has been minimized.

Member

AndyAyersMS commented Dec 8, 2017

Doing a proper evaluation of these kinds of changes is not easy.

We don't have anything I know of that would be considered a representative perf test suite for dictionaries (covering different kinds of key types, comparers, load factors, loading history, collision patterns, hash functions, etc). Creating something like this would be very useful, but also a lot of work.

I have some estimates for the typical distribution of key types seen in internal Microsoft usage of dictionaries, but this is a static accounting (eg it looks at the memory image of a process, not at how often a dictionary is used and via what API). I'm not all that confident in the data yet, but it appears strings are the dominant key type, at least statically. We have been trying to look at comparer usage and typical load factors and collision chain distributions too, but haven't gotten very far with that yet. So maybe as this gets a bit further and we get some dynamic information fed in we can build a dictionary performance modelling framework.

Without that it's not obvious which cases to optimize for: there are almost always tradeoffs to be made, and we need guidance on which cases can become a bit slower without causing undue harm, what the worst-case impact is on any scenario, and whether the wins on some cases are sufficient to offset the losses in others.

Also we don't have a cookbook way of assessing whether code size increases we might incur in trying to optimize are justified by the improved performance.

I'm not trying to discourage this sort of exploration. It is interesting to know what is possible and how the performance characteristics can be altered.

But it is going to be hard to take changes here unless they are unconditionally better in all key aspects (cleaner code, smaller code, faster code, faster jitting).

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Dec 8, 2017

Agreed. The issue I'm having is obvious "next step" improvements, don't improve things or regress (though not below baseline) - so the changes feel fragile.

So I think the next step is to have better performance tests to validate the changes and iterate from there. Its more complicated to test than I was doing previously as also have pick up the Jit changes; rather than have two sets of Dictionary implementation (e.g. for the devirtualization)

@omariom

This comment has been minimized.

Contributor

omariom commented Dec 8, 2017

@benaadams
What is Knucleotide?
I started porting Perf.Dictionary.cs but stuck at set_Item - there was a mysterious degradation, though with manual nano-benchmarking there was improvement.
I will publish them as gist tomorrow.

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Dec 8, 2017

What is Knucleotide?

Its interesting use of Dictionary from benchmarks game, rather than just testing raw function time; though not specifically just testing dictionary

Example implementations in the JIT Performance/CodeQuality directory; though I think I was using an easier iteration of the algorithm.

@omariom

This comment has been minimized.

Contributor

omariom commented Dec 11, 2017

@benaadams

If still needed

Some perf tests
using System;
using System.Collections.Generic;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;

namespace BDN.Tests
{
    
    [Config(typeof(InProcConfig))]
    public class CollectiontestPefBenchmarks
    {
        private Dictionary<int, int> _dict;
        private int _key;
        
        const int Iterations = 20000;
        
        [Params(1000, 10_000, 100_000)]
        public int Size_Common { get; set; }
        
        #region Add

        [IterationSetup(Target=nameof(Add))]
        public void IterationSetup_Add()
        {
            _dict = new Dictionary<int, int>(CreateDictionary(Size_Common));
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void Add()
        {
            for (int i = 0; i <= Iterations; i++)
            {
                _dict.Add(i * 10 + 1, 0);
                _dict.Add(i * 10 + 2, 0);
                _dict.Add(i * 10 + 3, 0);
                _dict.Add(i * 10 + 4, 0);
                _dict.Add(i * 10 + 5, 0);
                _dict.Add(i * 10 + 6, 0);
                _dict.Add(i * 10 + 7, 0);
                _dict.Add(i * 10 + 8, 0);
                _dict.Add(i * 10 + 9, 0);
            }
        }

        #endregion

        #region GetItem

        [IterationSetup(Target=nameof(GetItem))]
        public void IterationSetup_GetItem()
        {
            _dict = CreateDictionary(Size_Common);

            for (int i = 1; i <= 9; i++)
            {
                _dict.Add(i, 0);
            }
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void GetItem()
        {
            int retrieved;

            for (int i = 0; i < Iterations; i++)
            {
                retrieved = _dict[1];
                retrieved = _dict[2];
                retrieved = _dict[3];
                retrieved = _dict[4];
                retrieved = _dict[5];
                retrieved = _dict[6];
                retrieved = _dict[7];
                retrieved = _dict[8];
                retrieved = _dict[9];
            }
        }

        #endregion

        #region SetItem

        [IterationSetup(Target=nameof(SetItem))]
        public void IterationSetup_SetItem()
        {
            _dict = CreateDictionary(Size_Common);

            for (int i = 1; i <= 9; i++)
            {
                _dict.Add(i, 0);
            }
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void SetItem()
        {
            for (int i = 0; i < Iterations; i++)
            {
                _dict[1] = 0;
                _dict[2] = 0;
                _dict[3] = 0;
                _dict[4] = 0;
                _dict[5] = 0;
                _dict[6] = 0;
                _dict[7] = 0;
                _dict[8] = 0;
                _dict[9] = 0;
            }
        }

        #endregion

        #region TryGetValue

        [IterationSetup(Target = nameof(TryGetValue))]
        public void IterationSetup_TryGetValue()
        {
            _dict = CreateDictionary(Size_Common);
            // Setup - utils needs a specific seed to prevent key collision with TestData
            Random rand = new Random(837322);
            _key = rand.Next(0, 400000);
            _dict.Add(_key, 12);
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void TryGetValue()
        {
            int value;
            int key = _key;

            for (int i = 0; i < Iterations; i++)
            {
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
            }
        }

        #endregion

        #region ContainsKey_Int_True

        [IterationSetup(Target = nameof(ContainsKey_Int_True))]
        public void IterationSetup_ContainsKey_Int_True()
        {
            _dict = new Dictionary<int, int>();

            for (int i = 0; i < Size_Common; i++)
            {
                _dict.Add(i, i);
            }
        }

        [Benchmark]
        public void ContainsKey_Int_True()
        {
            bool result = false;
            int iterations = _dict.Count;

            for (int i = 0; i < iterations; i++)
            {
                result = _dict.ContainsKey(i);
            }
        }

        #endregion

        private static Dictionary<int, int> CreateDictionary(int size)
        {
            Random rand = new Random(837322);
            Dictionary<int, int> dict = new Dictionary<int, int>();
            while (dict.Count < size)
            {
                int key = rand.Next(500000, int.MaxValue);
                dict.TryAdd(key, 0);
            }
            return dict;
        }
    }

    class InProcConfig : ManualConfig
    {
        public InProcConfig()
        {
            Add(Job.InProcess
                //.WithLaunchCount(1)
                .WithUnrollFactor(1)
                .WithInvocationCount(1)
                .WithId("InProcess"));
        }
    }
}

@benaadams benaadams force-pushed the benaadams:dict-default-comparer branch from 4d8b7ac to b50667f Dec 15, 2017

@benaadams benaadams changed the title from [WIP] Dictionary - EqualityComparer<TKey>.Default devirtualization to Dictionary - EqualityComparer<TKey>.Default devirtualization Dec 15, 2017

@benaadams benaadams closed this Dec 15, 2017

@benaadams benaadams reopened this Dec 15, 2017

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Dec 16, 2017

@omariom tests, Plus a straight Add to empty dict and GC before each iteration for stability

Before

                     Method | Size_Common |            Mean |           Error |          Median |
--------------------------- |------------ |----------------:|----------------:|----------------:|
                        Add |        1000 |        40.08 ns |       4.3242 ns |        33.62 ns |
                    GetItem |        1000 |        10.96 ns |       0.3272 ns |        10.69 ns |
                    SetItem |        1000 |        11.62 ns |       0.1644 ns |        11.56 ns |
                TryGetValue |        1000 |        11.69 ns |       0.1441 ns |        11.67 ns |
 ContainsKey_Int_True_Total |        1000 |    10,826.17 ns |     631.9406 ns |    10,500.50 ns |
                  Add_Total |        1000 |    27,086.82 ns |   4,350.4026 ns |    25,560.43 ns |
                        Add |       10000 |        53.30 ns |       1.4740 ns |        54.26 ns |
                    GetItem |       10000 |        10.82 ns |       0.1663 ns |        10.81 ns |
                    SetItem |       10000 |        12.25 ns |       0.2434 ns |        12.26 ns |
                TryGetValue |       10000 |        12.09 ns |       0.2394 ns |        11.99 ns |
 ContainsKey_Int_True_Total |       10000 |   104,565.16 ns |   2,645.6341 ns |   103,465.48 ns |
                  Add_Total |       10000 |   307,192.85 ns |   5,521.5687 ns |   305,067.24 ns |
                        Add |      100000 |        60.51 ns |       1.3262 ns |        60.09 ns |
                    GetItem |      100000 |        10.74 ns |       0.1687 ns |        10.69 ns |
                    SetItem |      100000 |        11.72 ns |       0.2996 ns |        11.56 ns |
                TryGetValue |      100000 |        11.65 ns |       0.2297 ns |        11.58 ns |
 ContainsKey_Int_True_Total |      100000 | 1,037,571.45 ns |     457.3348 ns | 1,037,753.65 ns |
                  Add_Total |      100000 | 3,825,088.47 ns | 224,668.2170 ns | 3,367,917.89 ns |

After

                     Method | Size_Common |            Mean |           Error |          Median |
--------------------------- |------------ |----------------:|----------------:|----------------:|
                        Add |        1000 |        34.28 ns |       2.6295 ns |        32.36 ns |
                    GetItem |        1000 |         7.54 ns |       0.1193 ns |         7.56 ns |
                    SetItem |        1000 |         7.52 ns |       0.0662 ns |         7.49 ns |
                TryGetValue |        1000 |         8.10 ns |       0.1818 ns |         8.07 ns |
 ContainsKey_Int_True_Total |        1000 |     5,487.10 ns |     111.5480 ns |     5,388.42 ns |
                  Add_Total |        1000 |    23,512.64 ns |     513.0279 ns |    23,389.28 ns |
                        Add |       10000 |        67.47 ns |       5.9138 ns |        57.77 ns |
                    GetItem |       10000 |         7.52 ns |       0.1439 ns |         7.43 ns |
                    SetItem |       10000 |         7.67 ns |       0.0222 ns |         7.67 ns |
                TryGetValue |       10000 |         8.04 ns |       0.1263 ns |         7.98 ns |
 ContainsKey_Int_True_Total |       10000 |    53,739.41 ns |     658.6621 ns |    53,607.83 ns |
                  Add_Total |       10000 |   286,839.07 ns |   5,628.6049 ns |   282,980.66 ns |
                        Add |      100000 |        66.64 ns |       1.2882 ns |        66.88 ns |
                    GetItem |      100000 |         7.50 ns |       0.1166 ns |         7.44 ns |
                    SetItem |      100000 |         7.50 ns |       0.0293 ns |         7.49 ns |
                TryGetValue |      100000 |         8.10 ns |       0.2361 ns |         7.98 ns |
 ContainsKey_Int_True_Total |      100000 |   534,637.44 ns |     984.6944 ns |   534,440.07 ns |
                  Add_Total |      100000 | 3,590,623.28 ns | 191,663.7678 ns | 3,195,646.47 ns |
@benaadams

This comment has been minimized.

Collaborator

benaadams commented Dec 16, 2017

Added per iteration GC to cleanup; results more stable

Incorporates

  • Improve Dictionary FindEntry CQ #15460
  • Improve Dictionary TryInsert CQ #15462

Before

                           Method |      Mean |     Error |    StdDev |
--------------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive_Int | 10.457 ns | 0.0080 ns | 0.0067 ns | IEqualityComparer
    ContainsKeySimpleNegative_Int |  7.387 ns | 0.0100 ns | 0.0093 ns | IEqualityComparer
 ContainsKeyCollisionPositive_Int | 21.294 ns | 0.1610 ns | 0.1257 ns | IEqualityComparer
 ContainsKeyCollisionNegative_Int | 20.041 ns | 0.0380 ns | 0.0337 ns | IEqualityComparer
 ContainsKeySimplePositive_String | 21.382 ns | 0.0796 ns | 0.0744 ns | IEqualityComparer
 ContainsKeySimpleNegative_String | 17.249 ns | 0.0302 ns | 0.0282 ns | IEqualityComparer
   ContainsKeySimplePositive_Type | 16.377 ns | 0.0159 ns | 0.0133 ns | IEqualityComparer
   ContainsKeySimpleNegative_Type | 10.821 ns | 0.0243 ns | 0.0227 ns | IEqualityComparer

After

                           Method |      Mean |     Error |    StdDev |
--------------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive_Int |  5.980 ns | 0.0117 ns | 0.0109 ns | x 1.74 devirtualized
    ContainsKeySimpleNegative_Int |  4.846 ns | 0.0073 ns | 0.0069 ns | x 1.52 devirtualized
 ContainsKeyCollisionPositive_Int | 15.208 ns | 0.0289 ns | 0.0271 ns | x 1.40 devirtualized
 ContainsKeyCollisionNegative_Int | 15.247 ns | 0.0330 ns | 0.0292 ns | x 1.31 devirtualized
 ContainsKeySimplePositive_String | 20.636 ns | 0.0324 ns | 0.0270 ns | x 1.03 IEqualityComparer
 ContainsKeySimpleNegative_String | 17.536 ns | 0.0174 ns | 0.0163 ns | x 0.98 IEqualityComparer
   ContainsKeySimplePositive_Type | 16.330 ns | 0.0199 ns | 0.0176 ns | x 1.00 virtual
   ContainsKeySimpleNegative_Type |  7.676 ns | 0.0080 ns | 0.0074 ns | x 1.41 virtual
@jkotas

This comment has been minimized.

Member

jkotas commented Mar 1, 2018

@benaadams Thank you!

@HFadeel

This comment has been minimized.

HFadeel commented Mar 2, 2018

@AndyAyersMS would love to see this in CoreCLR 2.1 and future .NET

@danmosemsft

This comment has been minimized.

Member

danmosemsft commented Mar 2, 2018

@HFadeel it should be.

@omariom

This comment has been minimized.

Contributor

omariom commented Mar 2, 2018

I am sure a lot of code will suddenly get faster

@AndyAyersMS

This comment has been minimized.

Member

AndyAyersMS commented Mar 2, 2018

We'd be quite interested in independent observations on the perf impact of this change, both good and (perhaps especially important) bad.

Looks like it may take a few days to get these bits out into the nightly build packages. Once that happens we should perhaps do a bit of advertising....

They should also be in 2.1 preview 2.

@benaadams benaadams deleted the benaadams:dict-default-comparer branch Mar 2, 2018

@AndyAyersMS

This comment has been minimized.

Member

AndyAyersMS commented Mar 11, 2018

Perf infrastructure is finally back on line. Data shows this change gave roughly 6% improvement on k-nuckeotide-1 and 4.5% on k-nuckeotide-9. It should help close the gap vs Java somewhat, from 1.32x to something more like 1.25x (assuming proportional speedup on those ancient machines).

Probably should look at this test anew and see what else we might be able to do.

cc @danmosemsft @tannergooding

image

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Mar 11, 2018

Good news!

The Benchmarks Game tests also I believe include Jit and startup time; was the extra cost from ArrayPoolEventSource in 2.0? #15954

I assume so; and that has also gone.

@AndyAyersMS

This comment has been minimized.

Member

AndyAyersMS commented Mar 11, 2018

Looking back a bit further at -9, the overall speedup over 2.1 may be more like 1.125x (2720/2415), so vs java maybe now only ~1.17x slower. Seems like enabling software write watch (#16516) may have helped here too.

(Actually forgot official runs are on Ubuntu -- speedup over 2.0 is looks similar though 3400/3000 or about 1.13x).

We don't measure jit costs in our HW lab (we discard the initial iteration). Maybe Tanner can do some pseudo-official runs...

@danmosemsft

This comment has been minimized.

Member

danmosemsft commented Mar 11, 2018

The Benchmarks game measures the total time the process runs, including JIT, although the machine is warmed up first.

I would love @tannergooding or @ViktorHofer to get a full set of CLBG numbers on the special machine this week.

@simplejackcoder

This comment has been minimized.

simplejackcoder commented Jul 4, 2018

PreCondition: I'm not trying to be a smart alec or rude.

This improvement for TryGetValue (33%) made me think some very clever algorithm improvement or major bug like looping through twice.

I check code change and there is no such change.

If I read it correctly I should create my own dictionary for best fastest perf. IntDictionary, LongDictionary, StringDictionary -- I want @mikedn implementation which is even faster.

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Jul 4, 2018

This improvement for TryGetValue (33%) made me think some very clever algorithm improvement or major bug like looping through twice.

I check code change and there is no such change.

There was a change to the Jit #14125 to turn EqualityComparer<T>.Default.Equals into an intrinsic. So if you have a value type T (struct) that implements IEquatable<T> then it will devitualize it. For example for int the Equals then becomes a single cpu instruction.

This code change was to take advantage of that Jit change in Dictionary when you don't specify your own custom IEqualityComparer<TKey> comparer in the constructor.

If I read it correctly I should create my own dictionary for best fastest perf. IntDictionary, LongDictionary, StringDictionary

You will get this advantage just by using the regular dictionary constructor for int and long and not using the overload that specifies an IEqualityComparer<TKey>
e.g.

var intDict = new Dictionary<int, MyValue>();
var longDict = new Dictionary<long, MyValue>();

string is a bit different as it also has DoS protection that kicks in if you start getting too many key collisions by moving to a randomised HashCode generator which is more expensive but reduces the collisions for a net performance gain. This is doubly important as string Dictionary keys can be externally controlled (like entries in a forms, route or query string collection for a webserver)

If you have a specific scenario for object type keys you want to optimize for you can create a struct wrapper and use that as the key for a regular dictionary for the same behaviour. For example you could create a TypeKey and use that to create a TypeDictionary that is optimized for ReferenceEquality on the types.

@mikedn

This comment has been minimized.

Contributor

mikedn commented Jul 5, 2018

I want @mikedn implementation which is even faster.

Note that my implementation is specific to the k-nucleotide benchmark. Not to say that it cannot be used for other purposes but it's certainly not a general purpose hashtable. For example, it takes advantage of the fact that the hashcode of an integer is fast to compute so it doesn't store the hash code in the Entry object. That saves 8 bytes per entry and considering that the hashtable is pretty large that's pretty good. And AFAIR I customized the hash code computation as well.

If I read it correctly I should create my own dictionary for best fastest perf. IntDictionary, LongDictionary, StringDictionary

Yes. Assuming that "best perf" is indeed relevant to your application. If the application spends 5% of its time in hashtable code then creating your own hashtable is quite possible a waste of time. If the application spends 95% of its time in hashtable code then creating your own hashtable (or whatever suitable data structure is required) is something that you may want to consider.

@ViktorHofer

This comment has been minimized.

Member

ViktorHofer commented Jul 5, 2018

@mikedn did we ever make use of your implementation for k-nucleotide, i.e. in a nuget package similar to what Java does in their benchmark?

@mikedn

This comment has been minimized.

Contributor

mikedn commented Jul 5, 2018

@ViktorHofer Not that I know of. In fact I'm not even sure where I put that thing, need to search my drive for it.

@Zhentar

This comment has been minimized.

Zhentar commented Jul 11, 2018

100% Real™ code, 100% Real™ data, very Dictionary heavy (mostly <string, object>, and <int, string>)

Method Runtime Toolchain Mean Error StdDev
ParseLargeFile Clr Default 6.679 s 0.3603 s 0.1600 s
ParseLargeFile Core .NET Core 2.0 6.208 s 0.2307 s 0.0599 s
ParseLargeFile Core .NET Core 2.1 5.657 s 0.3788 s 0.1682 s

From profiling, it looks like this change is about half of the improvement between .Net 4.7 & Core 2.1 💯

@danmosemsft

This comment has been minimized.

Member

danmosemsft commented Jul 11, 2018

It's like we need partial specialization in C# so we can have Dictionary<int, T>, but from reading this discussion it's not clear to me whether that's even possible for generics, or if it is, whether it would have non zero costs (shapes implemented with interfaces?)

Maybe the way forward is for code gen to have special knowledge of Dictionary. It's critical enough that the ickyness may be worthwhile.

@benaadams

This comment has been minimized.

Collaborator

benaadams commented Jul 11, 2018

Dictionary<int, T> is covered by this PR; is where TKey is an object type that its not.

From what I understand; shapes are like the Surface Phone, they are a panacea for all problems, and probably both will be released at the same time IEnumerable<T> devirualizes and becomes non-allocating 😉

@mikedn

This comment has been minimized.

Contributor

mikedn commented Jul 23, 2018

@ViktorHofer and everyone else who may be interested: My version is here: https://gist.github.com/mikedn/8dc555a867aba65b6e26f39cf3cb66eb

On my machine it's about 1.4x faster than the current top C# entry on Benchmarks Game. If this factor translates to Benchmarks Game's environment then it will probably be almost as fast as the current top entry (a C++ version).

That said, it goes a bit beyond "just use another hashtable" approach. It has been customized specifically for knucleotide (e.g. the hashBits = 18 thing). If you're trying to make a more generic hashtable then performance might suffer (maybe 1.2x instead of 1.4x, I don't remember).

And to be clear: I don't want to encourage people to write their own hashtables because "omg, this is faster". But it's something you can do in very specific cases.

@danmosemsft

This comment has been minimized.

Member

danmosemsft commented Jul 23, 2018

@AnthonyLloyd

This comment has been minimized.

AnthonyLloyd commented Jul 23, 2018

Rules of the k-nukleotide are that it has to use a hash table but you are not allowed to implement one yourself and can only nuget.

My increment method sneaks past this just.

@mikedn

This comment has been minimized.

Contributor

mikedn commented Jul 23, 2018

Yeah, I know the rules, that's what's I'm not really bothering with it (that is, I'm not going to waste my time to create a nuget package for my custom hashtable).

My increment method sneaks past this just.

Heh, it's odd that they accepted that because what you did can be equated with a custom hashtable implementation. Well, at least the hashtable lookup part.

@danmosemsft

This comment has been minimized.

Member

danmosemsft commented Jul 23, 2018

Hmm it is tempting to create such a package if it would be the easiest way to improve perf on this benchmark. It's basically what Java does. Maybe we could put in in corefx lab..

@ViktorHofer

This comment has been minimized.

Member

ViktorHofer commented Jul 23, 2018

I believed that custom packages are also not allowed? The java one was an exception and was criticized in the forum. Am I wrong?

@AnthonyLloyd

This comment has been minimized.

AnthonyLloyd commented Jul 23, 2018

It has to be a package with a reasonable number of downloads. The Java one is part of something else I think.

@danmosemsft

This comment has been minimized.

Member

danmosemsft commented Jul 23, 2018

Maybe it could be part of dotnet/corefx#31191 (if it's binary). At least as a general purpose long->int map not something hacked for the benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment