Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider expression interpretation to speed-up first time resolution #45

Closed
dadhi opened this issue Nov 9, 2018 · 18 comments
Closed

Consider expression interpretation to speed-up first time resolution #45

dadhi opened this issue Nov 9, 2018 · 18 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@dadhi
Copy link
Owner

dadhi commented Nov 9, 2018

Idea is to have fast first-time (cold start) performance, and compile and cache the expression in parallel to have a later fastest performance.

Assumptions:

  1. Activator.CreateInstance much faster than Expression.Compile.
  2. Creating expression is much faster than Compile + caching

If the assumptions are true, we may try do both the expression creation and Activator.CreateInstance. Then schedule expression to ThreadPool and return activated instance right away.

@ahydrax
Copy link

ahydrax commented Nov 18, 2018

Hi @dadhi , the thing is that Activate.CreateInstance is slower than expression-based activators. Proof: https://blogs.msdn.microsoft.com/seteplia/2017/02/01/dissecting-the-new-constraint-in-c-a-perfect-example-of-a-leaky-abstraction/

@dadhi
Copy link
Owner Author

dadhi commented Nov 18, 2018

Yep, I know :)

But the idea is different, I am talking about first-time resolution where compiling expression + calling result delegate is much slower than Activator.CreateInstance.

@ahydrax
Copy link

ahydrax commented Nov 19, 2018

Oh, ok, I get the idea :)

@dadhi dadhi changed the title Test two parallel path resolution, 1st via Activator, 2nd via Expression compile Consider expression interpretation to speed-up first time resolution Nov 20, 2018
@dadhi
Copy link
Owner Author

dadhi commented Nov 20, 2018

Other things to consider:

  1. Two types of cache, for default and for keyed services
  2. Collection resolution
  3. Nested lambdas
  4. Func and Func with arguments
  5. Partly interpreted singletons

@dadhi
Copy link
Owner Author

dadhi commented Nov 22, 2018

Remaining work:

  • Interpret scoped dependency creation.
  • Make Interpreter public, to allow the client to interpret the resolved expression.
  • Switch Off UseInterpretation rule for expression generation (DryIocZero).
  • Directly call full Resolve from expression instead of Invoke.
  • Tighten Resolve loop for inlining.
  • Separate code with lambdas from the hot-path for inlining, e.g. cache Swap methods.
  • Replace Activator.CreateInstance with ctor .Invoke for singletons, or consider to reuse TryInterpret for singletons too.
  • Replace .SingleMethod calls with more faster less allocating alternative (benchmark alternatives)
  • Handle Resolve for keyed service.
  • Interpret Expression.Invoke.
  • Optimize Interpreter to avoid recursion and stack growth where possible, similar how FEC does it.
  • Benchmark with IoC Performance.
  • Benchmark with Autofac, Grace, LightInject and compare with MS.DI for reference. To include both approaches: activation and compilation based (maybe adding one with Roslyn compilation like Lamar).
  • Benchmark with scoped dependency which is not interpreted until Consider dependency creation Order to simplify scoped dependency expression #52.
  • Minimize memory allocations where possible

dadhi added a commit that referenced this issue Nov 25, 2018
…g and controlling method into the Factory.TryInterpretation #45
@dadhi
Copy link
Owner Author

dadhi commented Nov 25, 2018

Benchmark

Here is the source.

DryIoc setup as an example, the rest of the containers do the same.

public static DryIoc.IContainer PrepareDryIoc()
{
    var container = new Container();

    container.Register<Parameter1>(Reuse.Transient);
    container.Register<Parameter2>(Reuse.Singleton);
    container.Register<ScopedBlah>(Reuse.Scoped);

    return container;
}

public static object Measure(DryIoc.IContainer container)
{
    using (var scope = container.OpenScope())
        return scope.Resolve<ScopedBlah>();
}

Register, then Open Scope and Resolve for the first time

[Benchmark(Baseline = true)]
public object BmarkDryIoc() => Measure(PrepareDryIoc());

Results:

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17134.345 (1803/April2018Update/Redstone4)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
Frequency=2156252 Hz, Resolution=463.7677 ns, Timer=TSC
.NET Core SDK=2.1.500
  [Host]     : .NET Core 2.1.6 (CoreCLR 4.6.27019.06, CoreFX 4.6.27019.05), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.6 (CoreCLR 4.6.27019.06, CoreFX 4.6.27019.05), 64bit RyuJIT


           Method |       Mean |     Error |    StdDev |  Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
----------------- |-----------:|----------:|----------:|-------:|--------:|------------:|------------:|------------:|--------------------:|
     BmarkAutofac |  29.831 us | 0.2230 us | 0.2086 us |   7.36 |    0.06 |      5.2185 |           - |           - |            24.15 KB |
      BmarkDryIoc |   4.053 us | 0.0178 us | 0.0167 us |   1.00 |    0.00 |      1.2131 |           - |           - |              5.6 KB |
       BmarkGrace | 507.573 us | 5.6479 us | 5.2830 us | 125.24 |    1.59 |      5.8594 |      2.9297 |           - |            30.21 KB |
 BmarkLightInject | 401.432 us | 3.0346 us | 2.8386 us |  99.05 |    0.82 |      6.8359 |      3.4180 |           - |            32.31 KB |

Open Scope and Resolve for the first time

private static readonly DryIoc.IContainer _dryioc = PrepareDryIoc();

[Benchmark(Baseline = true)]
public object BmarkDryIoc() => Measure(_dryioc);

Results:

           Method |       Mean |     Error |    StdDev | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
----------------- |-----------:|----------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
     BmarkAutofac | 1,577.7 ns | 3.7877 ns | 3.3577 ns | 12.13 |    0.04 |      0.5302 |           - |           - |              2504 B |
      BmarkDryIoc |   130.0 ns | 0.3659 ns | 0.3422 ns |  1.00 |    0.00 |      0.0558 |           - |           - |               264 B |
       BmarkGrace |   152.0 ns | 0.3930 ns | 0.3676 ns |  1.17 |    0.00 |      0.0608 |           - |           - |               288 B |
 BmarkLightInject |   609.2 ns | 2.0553 ns | 1.8220 ns |  4.68 |    0.02 |      0.1488 |           - |           - |               704 B |

@dadhi
Copy link
Owner Author

dadhi commented Nov 29, 2018

Here is state of fast .NET DI containers in certain (not uncommon) scenario, or in other words: always treat benchmarks in context!

Comparing to the benchmark above, I have just added a scoped dependency. DryIoc at the moment does not support interpreting of scoped dependency, because it uses the nested lambda expression.

CreateContainerAndRegister_FirstTimeOpenScopeResolve:

Method Mean Error StdDev Ratio RatioSD Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
BmarkAutofac 35.672 us 0.3983 us 0.3326 us 7.32 0.08 6.4697 - - 29.83 KB
BmarkDryIoc 529.302 us 2.3636 us 2.0953 us 108.60 0.69 1.9531 0.9766 - 12.61 KB
BmarkMicrosoftDependencyInjection 4.873 us 0.0267 us 0.0250 us 1.00 0.00 1.0529 - - 4.87 KB
BmarkGrace 783.044 us 3.8283 us 3.5810 us 160.68 1.18 8.7891 3.9063 - 42.44 KB
BmarkLightInject 666.277 us 6.2531 us 5.8492 us 136.72 1.38 8.7891 3.9063 - 43.12 KB

FirstTimeOpenScopeResolve:

Method Mean Error StdDev Ratio RatioSD Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
BmarkAutofac 1,970.3 ns 11.519 ns 10.7747 ns 7.17 0.06 0.6676 - - 3152 B
BmarkDryIoc 207.0 ns 1.062 ns 0.9931 ns 0.75 0.01 0.0966 - - 456 B
BmarkMicrosoftDependencyInjection 274.9 ns 2.064 ns 1.9308 ns 1.00 0.00 0.0758 - - 360 B
BmarkGrace 264.8 ns 1.653 ns 1.5462 ns 0.96 0.01 0.1216 - - 576 B
BmarkLightInject 998.6 ns 4.589 ns 4.0676 ns 3.64 0.02 0.2422 - - 1144 B

PS. MS.DI performs great for what it designed 👍 (again, in this specific start-up / first resolution scenario).

@ahydrax
Copy link

ahydrax commented Nov 29, 2018

Hi @dadhi ,
Could you also add simple injector to comparison chart?

@dadhi
Copy link
Owner Author

dadhi commented Nov 29, 2018

@ahydrax
Maybe later, but I would expect similar (slightly slower) results than LightInject. It boils down to the approach: the similar approaches produce similar results. It should be measured though ;)

@dadhi
Copy link
Owner Author

dadhi commented Nov 29, 2018

Considering that @jeremydmiller has just announced the Lamar 2.0 release, let's check it out too because it uses yet another approach with Roslyn based compilation. But I expect it to perform slower in the above use-case exactly because of the approach. Let see.

@dadhi
Copy link
Owner Author

dadhi commented Nov 29, 2018

@ahydrax

Here are the results with SimpleInjector.

CreateContainerAndRegister_FirstTimeOpenScopeResolve.BmarkSimpleInjector: DefaultJob
Runtime = .NET Core 2.1.6 (CoreCLR 4.6.27019.06, CoreFX 4.6.27019.05), 64bit RyuJIT; GC = Concurrent Workstation
Mean = 1.3705 ms, StdErr = 0.0216 ms (1.58%); N = 91, StdDev = 0.2064 ms
Min = 1.2566 ms, Q1 = 1.2671 ms, Median = 1.2764 ms, Q3 = 1.2922 ms, Max = 1.9380 ms
IQR = 0.0251 ms, LowerFence = 1.2295 ms, UpperFence = 1.3298 ms
ConfidenceInterval = [1.2969 ms; 1.4441 ms] (CI 99.9%), Margin = 0.0736 ms (5.37% of Mean)
Skewness = 1.84, Kurtosis = 4.78, MValue = 2
-------------------- Histogram --------------------
[1.241 ms ; 1.322 ms) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[1.322 ms ; 1.371 ms) |
[1.371 ms ; 1.451 ms) | @@
[1.451 ms ; 1.500 ms) |
[1.500 ms ; 1.580 ms) | @@
[1.580 ms ; 1.678 ms) | @@@@
[1.678 ms ; 1.762 ms) | @
[1.762 ms ; 1.842 ms) | @@
[1.842 ms ; 1.944 ms) | @@@@@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17134.407 (1803/April2018Update/Redstone4)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
Frequency=2156248 Hz, Resolution=463.7685 ns, Timer=TSC
.NET Core SDK=2.1.500
  [Host]     : .NET Core 2.1.6 (CoreCLR 4.6.27019.06, CoreFX 4.6.27019.05), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.6 (CoreCLR 4.6.27019.06, CoreFX 4.6.27019.05), 64bit RyuJIT


                            Method |         Mean |      Error |     StdDev |       Median |  Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
---------------------------------- |-------------:|-----------:|-----------:|-------------:|-------:|--------:|------------:|------------:|------------:|--------------------:|
 BmarkMicrosoftDependencyInjection |     5.474 us |  0.6720 us |   1.981 us |     3.949 us |   1.00 |    0.00 |      0.9155 |           - |           - |             4.27 KB |
                      BmarkAutofac |    44.365 us |  3.7849 us |  11.160 us |    47.858 us |   9.03 |    3.60 |      5.2490 |           - |           - |            24.22 KB |
                  BmarkLightInject |   633.471 us | 16.6356 us |  21.039 us |   626.910 us | 119.65 |   40.94 |      7.8125 |      3.9063 |           - |             38.4 KB |
                       BmarkDryIoc |   676.467 us | 79.1501 us | 233.376 us |   505.889 us | 139.14 |   66.56 |      1.9531 |           - |           - |            10.83 KB |
                        BmarkGrace |   800.199 us | 88.9617 us | 162.672 us |   742.909 us | 143.31 |   54.79 |      7.8125 |      3.9063 |           - |            40.25 KB |
               BmarkSimpleInjector | 1,370.494 us | 73.5985 us | 206.378 us | 1,276.409 us | 275.71 |   98.10 |     15.6250 |      7.8125 |           - |            77.75 KB |

// * Warnings *
MultimodalDistribution
  CreateContainerAndRegister_FirstTimeOpenScopeResolve.BmarkMicrosoftDependencyInjection: Default -> It seems that the distribution can have several modes (mValue = 2.9)
  CreateContainerAndRegister_FirstTimeOpenScopeResolve.BmarkAutofac: Default                      -> It seems that the distribution is bimodal (mValue = 3.72)
  CreateContainerAndRegister_FirstTimeOpenScopeResolve.BmarkDryIoc: Default                       -> It seems that the distribution is bimodal (mValue = 3.3)

FirstTimeOpenScopeResolve.BmarkSimpleInjector:

                             Method |       Mean |     Error |    StdDev |     Median | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
----------------------------------- |-----------:|----------:|----------:|-----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
                       BmarkAutofac | 2,346.6 ns | 216.86 ns | 639.42 ns | 2,553.5 ns |  6.82 |    2.12 |      0.5455 |           - |           - |              2576 B |
                        BmarkDryIoc |   216.5 ns |  10.51 ns |  28.60 ns |   203.6 ns |  0.64 |    0.13 |      0.0830 |           - |           - |               392 B |
 BmarkMicrosoftSDependencyInjection |   349.6 ns |  25.51 ns |  70.25 ns |   325.5 ns |  1.00 |    0.00 |      0.0687 |           - |           - |               328 B |
                         BmarkGrace |   323.0 ns |  12.65 ns |  36.11 ns |   306.7 ns |  0.95 |    0.19 |      0.1149 |           - |           - |               544 B |
                   BmarkLightInject | 1,100.0 ns |  18.29 ns |  16.22 ns | 1,098.9 ns |  2.84 |    0.47 |      0.2346 |           - |           - |              1112 B |
                BmarkSimpleInjector |   582.9 ns |  46.52 ns | 134.96 ns |   546.2 ns |  1.74 |    0.52 |      0.1101 |           - |           - |               520 B |

@jeremydmiller
Copy link

@dadhi "But I expect it to perform slower in the above use-case exactly because of the approach." A big yes and maybe no. If you're using it simply, yeah, the cold start time isn't super awesome on the first usage of Roslyn, but they (Roslyn team) have made huge strides on that one.

Lamar also has a model where it can drop the generated C# code into your code once, and just use the already compiled resolver strategies for much, much faster cold start times.

@dadhi
Copy link
Owner Author

dadhi commented Nov 29, 2018

@jeremydmiller,

Lamar also has a model where it can drop the generated C# code into your code once, and just use the already compiled resolver strategies for much, much faster cold start times.

This is super interesting. How I can test that?

@jeremydmiller
Copy link

Shame on me, I haven't written any docs for that. Let me write a blog post on that and all the struggles I went through w/ optimizing cold start. And then you know that Lamar uses bits and pieces of ImTools and its own copy of FastExpressionCompiler for internal types. So even if Lamar is competitive, you still get credit;-)

@dadhi
Copy link
Owner Author

dadhi commented Nov 30, 2018

Hey, the benchmark with first version of DryIoc with scoped dependency interpretation:

CreateContainerAndRegister_FirstTimeOpenScopeResolve:

                            Method |       Mean |      Error |      StdDev |     Median |          P95 |  Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
---------------------------------- |-----------:|-----------:|------------:|-----------:|-------------:|-------:|--------:|------------:|------------:|------------:|--------------------:|
 BmarkMicrosoftDependencyInjection |   4.797 us |  0.1125 us |   0.2904 us |   4.724 us |     5.754 us |   1.00 |    0.00 |      1.0529 |           - |           - |             4.87 KB |
                       BmarkDryIoc |   5.547 us |  0.0401 us |   0.0313 us |   5.535 us |     5.594 us |   1.17 |    0.01 |      1.6251 |           - |           - |             7.49 KB |
                      BmarkAutofac |  36.195 us |  0.1435 us |   0.1198 us |  36.232 us |    36.332 us |   7.64 |    0.03 |      6.4697 |           - |           - |            29.83 KB |
                        BmarkGrace | 776.478 us |  5.3626 us |   4.4780 us | 774.993 us |   783.422 us | 163.89 |    1.13 |      8.7891 |      3.9063 |           - |            42.44 KB |
                  BmarkLightInject | 799.472 us | 79.1998 us | 231.0294 us | 658.761 us | 1,299.696 us | 169.62 |   51.94 |      8.7891 |      3.9063 |           - |            43.12 KB |

FirstTimeOpenScopeResolve:

                             Method |        Mean |      Error |       StdDev |      Median | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
----------------------------------- |------------:|-----------:|-------------:|------------:|------:|--------:|------------:|------------:|------------:|--------------------:|
                        BmarkDryIoc |    244.9 ns |  24.425 ns |    71.634 ns |    198.8 ns |  0.81 |    0.24 |      0.0896 |           - |           - |               424 B |
                         BmarkGrace |    269.0 ns |   5.140 ns |     4.556 ns |    267.4 ns |  0.95 |    0.02 |      0.1216 |           - |           - |               576 B |
 BmarkMicrosoftSDependencyInjection |    282.9 ns |   5.552 ns |     4.636 ns |    280.9 ns |  1.00 |    0.00 |      0.0758 |           - |           - |               360 B |
                   BmarkLightInject |  1,072.9 ns | 111.797 ns |    99.105 ns |  1,046.4 ns |  3.80 |    0.38 |      0.2422 |           - |           - |              1144 B |
                       BmarkAutofac |  2,203.3 ns | 599.133 ns |   560.429 ns |  1,991.7 ns |  7.88 |    1.96 |      0.6676 |           - |           - |              3152 B |
                         BmarkLamar | 20,724.7 ns | 727.986 ns | 2,100.407 ns | 20,637.7 ns | 76.88 |    5.79 |           - |           - |           - |              1512 B |

Btw: @jeremydmiller, Here is mine heads-on benchmark with Lamar v2. Adding it here just for reference:

                            Method |           Mean |         Error |        StdDev |         Median |     Ratio |  RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
---------------------------------- |---------------:|--------------:|--------------:|---------------:|----------:|---------:|------------:|------------:|------------:|--------------------:|
 BmarkMicrosoftDependencyInjection |       4.676 us |     0.0201 us |     0.0178 us |       4.679 us |      1.00 |     0.00 |      1.0529 |           - |           - |             4.87 KB |
                       BmarkDryIoc |       5.519 us |     0.1487 us |     0.1527 us |       5.480 us |      1.18 |     0.04 |      1.6251 |           - |           - |             7.49 KB |
                      BmarkAutofac |      35.191 us |     0.1632 us |     0.1447 us |      35.197 us |      7.53 |     0.04 |      6.4697 |           - |           - |            29.83 KB |
                  BmarkLightInject |     751.269 us |    52.9342 us |   152.7272 us |     666.809 us |    161.81 |    35.92 |      8.7891 |      3.9063 |           - |            43.12 KB |
                        BmarkGrace |   1,054.674 us |   117.4721 us |   346.3691 us |     826.261 us |    223.94 |    79.97 |      8.7891 |      3.9063 |           - |            42.44 KB |
                        BmarkLamar | 110,034.715 us | 9,271.5789 us | 8,672.6407 us | 106,728.911 us | 23,553.16 | 1,957.17 |   2000.0000 |   1000.0000 |           - |         10695.63 KB |

dadhi added a commit that referenced this issue Nov 30, 2018
@dadhi
Copy link
Owner Author

dadhi commented Nov 30, 2018

For completeness, here is the OpenScope-Resolve-Dispose load (your usual Unit-of-Work or Request) after the warmup (5 times the cycle):

                             Method |       Mean |      Error |     StdDev | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
----------------------------------- |-----------:|-----------:|-----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
                        BmarkDryIoc |   199.2 ns |  1.5780 ns |  1.4761 ns |  0.74 |    0.01 |      0.0896 |           - |           - |               424 B |
                         BmarkGrace |   258.3 ns |  1.4635 ns |  1.2973 ns |  0.96 |    0.01 |      0.1216 |           - |           - |               576 B |
 BmarkMicrosoftSDependencyInjection |   269.7 ns |  0.7576 ns |  0.6326 ns |  1.00 |    0.00 |      0.0758 |           - |           - |               360 B |
                   BmarkLightInject |   976.5 ns |  3.7448 ns |  3.3197 ns |  3.62 |    0.02 |      0.2422 |           - |           - |              1144 B |
                       BmarkAutofac | 2,185.3 ns | 15.1710 ns | 13.4487 ns |  8.10 |    0.06 |      0.6676 |           - |           - |              3152 B |
                         BmarkLamar | 2,360.9 ns | 11.5168 ns | 10.2094 ns |  8.75 |    0.05 |      0.3166 |           - |           - |              1512 B |

Again, we should benchmark a more real-world object graph (maybe 10 level deep and 5 to 10 dependencies wide on each level, with a variety of lifestyles). Current setup is a 2 level root with 3 dependencies. Take it into account when looking at the benchmarks.

dadhi added a commit that referenced this issue Nov 30, 2018
@jeremydmiller
Copy link

Psst, here's a cheap way to do more realistic benchmarking: https://github.com/JasperFx/lamar/tree/master/src/Benchmarks. Use an ASP.net Core app to quickly get yourself a much better set of registrations.

@dadhi dadhi self-assigned this Feb 10, 2019
@dadhi dadhi added the enhancement New feature or request label Feb 10, 2019
@dadhi dadhi added this to the 4.0.0 milestone Feb 10, 2019
@dadhi
Copy link
Owner Author

dadhi commented Feb 10, 2019

Considering complete.

@dadhi dadhi closed this as completed Feb 10, 2019
dadhi added a commit that referenced this issue Feb 12, 2019
fixed: #63
added: another test for #45
dadhi added a commit that referenced this issue Feb 16, 2019
…penGenerics

optimized and cleaned up rule selection a bit
Leszek-Kowalski pushed a commit to Leszek-Kowalski/DryIoc that referenced this issue Oct 11, 2019
Leszek-Kowalski pushed a commit to Leszek-Kowalski/DryIoc that referenced this issue Oct 11, 2019
…for OpenGenerics

optimized and cleaned up rule selection a bit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants