Emit shorter opcodes in ILGenerator.Emit(OpCode, int) #35427

stephentoub · 2020-04-24T21:55:20Z

The Ldloc, Stloc, Ldloca, Ldc_I4, Ldarg, Ldarga, and Starg opcodes all have shorter variants that take up less space in the IL instruction stream. ILGenerator.Emit(OpCode, LocalBuilder) already special cases Ldloc, Stloc, and Ldloca to automatically translate those into their shorter forms where applicable, but similar logic doesn't exist in Emit(OpCode, int) for Ldc_I4, Ldarg, Ldarga, and Starg. As a result, various other libraries higher in the stack that use reflection emit end up doing all the special-casing with their own helper routines to do the shrinking (or they just forego it and end up with larger IL than is necessary).

This PR just consolidates the special-casing logic down into Emit(OpCode, int).

The Ldloc, Stloc, Ldloca, Ldc_I4, Ldarg, Ldarga, and Starg opcodes all have shorter variants that take up less space in the IL instruction stream. ILGenerator.Emit(OpCode, LocalBuilder) already special cases Ldloc, Stloc, and Ldloca to automatically translate those into their shorter forms where applicable, but similar logic doesn't exist in Emit(OpCode, int) for Ldc_I4, Ldarg, Ldarga, and Starg. Instead, various other libraries higher in the stack that use reflection emit either end up doing all the special-casing with their own helper routines to do the shrinking, or they just forego it and end up with larger IL than is necessary. This PR just moves the logic down into Emit(OpCode, int) such that all uses can benefit, and removes the special-casing duplication from the other libraries.

src/coreclr/src/System.Private.CoreLib/src/System/Reflection/Emit/ILGenerator.cs

danmoseley

Nice optimization, how did this not happen already over the years?

danmoseley · 2020-04-24T22:52:04Z

Could one imagine an analogous abstraction that saves me having to pick from Br vs Br_S etc? That is one of those things that makes it harder to maintain code that emits IL - as you add or remove instructions, you have to potentially adjust the size of nearby jumps

stephentoub · 2020-04-24T23:20:37Z

Could one imagine an analogous abstraction that saves me having to pick from Br vs Br_S etc?

Maybe for backward branches. Not for forward, at least not without a lot more complexity / post-processing of the IL stream.

jkotas · 2020-04-25T13:45:19Z

Test failures are known issues #35438 #35451

davidfowl · 2020-04-25T15:03:09Z

@stephentoub if you’re looking at ref emit can you also nuke the allocations 😁?

stephentoub · 2020-04-25T18:43:11Z

can you also nuke the allocations

Which allocations here are impactful, and how?

davidfowl · 2020-04-25T18:54:49Z

runtime/src/libraries/Microsoft.Extensions.DependencyInjection/src/ServiceLookup/ILEmit/ILEmitCallSiteAnalyzer.cs

Lines 7 to 8 in e3ffd34

    
           // This class walks the service call site tree and tries to calculate approximate 
        
           // code size to avoid array resizings during IL generation

This was an optimization @pakrym did to reduce allocations while building the callsite for object creation during DI.

pakrym · 2020-04-25T18:57:51Z

Using an array poll in ILGenerator might be a good start.

stephentoub · 2020-04-25T19:01:05Z

I understand techniques could be used to avoid allocation. I'm asking what does it help, other than lowering an allocation number? Is ref emit being used in ASP.NET somewhere on a hot path?

pakrym · 2020-04-25T19:01:59Z

Yes, DependencyInjection uses it to emit the service resolvers: https://github.com/dotnet/runtime/blob/e3ffd343ad5bd3a999cb9515f59e6e7a777b2c34/src/libraries/Microsoft.Extensions.DependencyInjection/src/ServiceLookup/ILEmit/ILEmitResolverBuilder.cs

BTW, it has the same optimizations as this PR addressed

runtime/src/libraries/Microsoft.Extensions.DependencyInjection/src/ServiceLookup/ILEmit/ILEmitResolverBuilder.cs

Lines 431 to 466 in e3ffd34

    
           private void Ldloc(ILGenerator generator, int index) 
        
           { 
        
               switch (index) 
        
               { 
        
                   case 0: generator.Emit(OpCodes.Ldloc_0); 
        
                       return; 
        
                   case 1: generator.Emit(OpCodes.Ldloc_1); 
        
                       return; 
        
                   case 2: generator.Emit(OpCodes.Ldloc_2); 
        
                       return; 
        
                   case 3: generator.Emit(OpCodes.Ldloc_3); 
        
                       return; 
        
               } 
        
               if (index < byte.MaxValue) 
        
               { 
        
                   generator.Emit(OpCodes.Ldloc_S, (byte)index); 
        
                   return; 
        
               } 
        
               generator.Emit(OpCodes.Ldloc, index); 
        
           } 
        
           private void Stloc(ILGenerator generator, int index) 
        
           { 
        
               switch (index) 
        
               { 
        
                   case 0: generator.Emit(OpCodes.Stloc_0); 
        
                       return; 
        
                   case 1: generator.Emit(OpCodes.Stloc_1); 
        
                       return; 
        
                   case 2: generator.Emit(OpCodes.Stloc_2); 
        
                       return; 
        
                   case 3: generator.Emit(OpCodes.Stloc_3); 
        
                       return; 
        
               }

stephentoub · 2020-04-25T19:04:52Z

Yes, DependencyInjection uses it to emit the service resolvers:

On a hot path? That's not a one-time / infrequent action? And you had evidence that reducing managed allocation here moved a throughout / startup measurement, rather than just lowering a bytes allocated counter?

stephentoub · 2020-04-25T19:09:05Z

BTW, it has the same optimizations as this PR addressed

That code is for ldloc/stloc. Specialization of those already existed, even on .NET Framework:
https://referencesource.microsoft.com/#mscorlib/system/reflection/emit/ilgenerator.cs,876

pakrym · 2020-04-25T19:20:22Z

On a hot path? That's not a one-time / infrequent action?

Most of the service resolver building happens in parallel with app startup so it's not a recurring hot path but improvements in it might improve the startup time.

And you had evidence that reducing managed allocation here moved a throughout / startup measurement, rather than just lowering a bytes allocated counter?

Adding that optimization did improve resolver service building performance at the time of writing I don't think I did app startup time measurements at the time.

That code is for ldloc/stloc. Specialization of those already existed, even on .NET Framework:

Huh. I wrote some useless code then.

davidfowl · 2020-04-25T19:23:32Z

rather than just lowering a bytes allocated counter

Pushing it all the way to 0. I'm going to turn the GC into a heap scanner 😄

stephentoub · 2020-04-25T19:31:37Z

Pushing it all the way to 0.

Pooling has costs. Extra code to avoid allocation needs to be loaded and JIT'd and has costs (beyond complexity and maintenance costs). Allocation counts as a sole metric can easily tune for the wrong thing, and an occasional allocation can yield better all-up performance than the extra code to avoid it.

Adding that optimization did improve resolver service building performance at the time of writing

Thanks. I'm trying to gauge the benefits to actually investing more here. Those wins you saw, that was from reducing allocation alone, or were there other aspects of the change that were the most impactful? Link to a PR? Thanks.

jkotas · 2020-04-25T19:36:10Z

Adding that optimization did improve resolver service building performance at the time of writing

I assume that you measured it by microbenchmarks. The microbenchmarks do not account for static footprint and startup time that matter quite bit for real apps too.

I don't think I did app startup time measurements at the time.

This suggest that you may have just shifted the cost from one place to a different place, while increasing complexity.

Pushing it all the way to 0

Well, we may be getting to the point where the focus on reducing allocations is starting to hurt more than help. I am finding myself telling people to allocate more quite often recently to avoid writing hard to maintain overengineered code (e.g. https://github.com/dotnet/runtime/pull/35383/files#r414296004).

pakrym · 2020-04-25T19:43:16Z

Thanks. I'm trying to gauge the benefits to actually investing more here. Those wins you saw, that was from reducing allocation alone, or were there other aspects of the change that were the most impactful? Link to a PR? Thanks.

It was done as part of working on the larger PR:
aspnet/DependencyInjection#630 so I'm afraid the precise data on specific changes is lost.

This suggest that you may have just shifted the cost from one place to a different place, while increasing complexity.

What other place? The optimization is precalculating the approximate size of IL body to avoid extensive resizing/allocations. It happens inline with operation being benchmarked.

davidfowl · 2020-04-25T19:46:11Z

I was obviously kidding as I don’t want to make things worse but it’s gonna be hard to convince m to not allocate if it’s easy to do so and there’s no negative performance impact. If the code is much harder to maintain and there’s no benefit, or if the pooling overhead defeats the purpose then sure. Otherwise don’t allocate

jkotas · 2020-04-25T19:47:25Z

The optimization is precalculating the approximate size of IL body to avoid extensive resizing/allocations.

If this happens for almost free, it is fine. It was not obvious to me. It looked like there is a special pass for pre-calculating from brief look.

stephentoub added area-System.Reflection.Emit tenet-performance Performance related issue labels Apr 24, 2020

stephentoub added this to the 5.0 milestone Apr 24, 2020

jkotas approved these changes Apr 24, 2020

View reviewed changes

danmoseley reviewed Apr 24, 2020

View reviewed changes

src/coreclr/src/System.Private.CoreLib/src/System/Reflection/Emit/ILGenerator.cs Show resolved Hide resolved

danmoseley approved these changes Apr 24, 2020

View reviewed changes

This was referenced Apr 25, 2020

System.Array.SorterObjectArray.IntroSort failing with Assert #35438

Closed

Test hang in System.Diagnostics.Tracing.EventProvider.EventUnregister #35451

Closed

jkotas merged commit cfaaf77 into dotnet:master Apr 25, 2020

stephentoub deleted the specializeilgen branch April 25, 2020 13:57

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit shorter opcodes in ILGenerator.Emit(OpCode, int) #35427

Emit shorter opcodes in ILGenerator.Emit(OpCode, int) #35427

stephentoub commented Apr 24, 2020

danmoseley left a comment

danmoseley commented Apr 24, 2020

stephentoub commented Apr 24, 2020

jkotas commented Apr 25, 2020

davidfowl commented Apr 25, 2020

stephentoub commented Apr 25, 2020

davidfowl commented Apr 25, 2020

pakrym commented Apr 25, 2020

stephentoub commented Apr 25, 2020

pakrym commented Apr 25, 2020 •

edited

Loading

stephentoub commented Apr 25, 2020

stephentoub commented Apr 25, 2020

pakrym commented Apr 25, 2020 •

edited

Loading

davidfowl commented Apr 25, 2020

stephentoub commented Apr 25, 2020 •

edited

Loading

jkotas commented Apr 25, 2020 •

edited

Loading

pakrym commented Apr 25, 2020 •

edited

Loading

davidfowl commented Apr 25, 2020

jkotas commented Apr 25, 2020

Emit shorter opcodes in ILGenerator.Emit(OpCode, int) #35427

Emit shorter opcodes in ILGenerator.Emit(OpCode, int) #35427

Conversation

stephentoub commented Apr 24, 2020

danmoseley left a comment

Choose a reason for hiding this comment

danmoseley commented Apr 24, 2020

stephentoub commented Apr 24, 2020

jkotas commented Apr 25, 2020

davidfowl commented Apr 25, 2020

stephentoub commented Apr 25, 2020

davidfowl commented Apr 25, 2020

pakrym commented Apr 25, 2020

stephentoub commented Apr 25, 2020

pakrym commented Apr 25, 2020 • edited Loading

stephentoub commented Apr 25, 2020

stephentoub commented Apr 25, 2020

pakrym commented Apr 25, 2020 • edited Loading

davidfowl commented Apr 25, 2020

stephentoub commented Apr 25, 2020 • edited Loading

jkotas commented Apr 25, 2020 • edited Loading

pakrym commented Apr 25, 2020 • edited Loading

davidfowl commented Apr 25, 2020

jkotas commented Apr 25, 2020

pakrym commented Apr 25, 2020 •

edited

Loading

pakrym commented Apr 25, 2020 •

edited

Loading

stephentoub commented Apr 25, 2020 •

edited

Loading

jkotas commented Apr 25, 2020 •

edited

Loading

pakrym commented Apr 25, 2020 •

edited

Loading