Faster List Add #9539

benaadams · 2017-02-12T06:12:38Z

List Add and Clear are warmspots in Kestrel

This is a mild tweak to Add however as what's being cleared is a list of GCHandle structs Clear is a significant win.

@jkotas is this a valid use of JitHelpers.ContainsReferences<T>()?

/cc @stephentoub

jkotas · 2017-02-12T06:23:52Z

@jkotas is this a valid use of JitHelpers.ContainsReferences<T>()?

Yes, it is what it is meant for.

stephentoub · 2017-02-12T12:39:07Z

I'd made a similar change to Add in #9323 (the PR shows separating out the EnsureCapacity, but I'd reverted that part locally and just never pushed it up, so it looked basically identically to this PR), as it showed benefits on one machine, but on another it actually showed a slowdown. We should just make sure it's consistently better before changing it.

stephentoub · 2017-02-12T12:39:27Z

src/mscorlib/src/System/Collections/Generic/List.cs


+using System.Reflection;


The usings should all be together.

stephentoub · 2017-02-12T12:39:48Z

src/mscorlib/src/System/Collections/Generic/List.cs

-            _items[_size++] = item;
+        public void Add(T item)
+        {
+            var size = _size;


Nit: var => int

stephentoub · 2017-02-12T12:41:51Z

src/mscorlib/src/System/Collections/Generic/List.cs

            if (_size > 0)
            {
-                Array.Clear(_items, 0, _size); // Don't need to doc this but we clear the elements so that the gc can reclaim the references.
+                if (JitHelpers.ContainsReferences<T>())


Assuming the call completely evaporates, should we use this in List (and maybe elsewhere in other data structures) anywhere we null out a field, e.g. not just Clear but all of the Remove methods?

Yes, the call should completely evaporate. It should be good idea to use - in particular before bulk clearing (Array.Clear).

benaadams · 2017-02-12T13:58:16Z

src/mscorlib/src/System/Collections/Generic/List.cs

+            var size = _size;
+            if (size == _items.Length)
+            {
+                EnsureCapacity(size + 1);


Tried something like @stephentoub's #9323 where the addition was also factored out to try and get Add to be inlinable as its not; but the simple function was [below ALWAYS_INLINE size] and seemed to ignore the [MethodImpl(MethodImplAttributes.NoInlining)] attribute.

Not sure if is expected or a change? /cc @AndyAyersMS

Also had to test in before after in coreclr due to the stackcrawlmark stuff

Noinlining should always be honored, it's one of the first things the jit looks for.

jkotas · 2017-02-12T15:44:37Z

src/mscorlib/src/System/Collections/Generic/List.cs

+            if (_size == Array.MaxArrayLength)
+            {
+                // New length would be too large for array
+                ThrowHelper.ThrowLengthArgumentOutOfRange_ArgumentOutOfRange_NeedNonNegNum();


Why do we need this exception when it was not there before? It seems to be changing the behavior for this case - was OutOfMemoryException thrown before in this case?

Couldn't find docs saying any exception could be thrown; it would call set_Capacity with a negative number, which would call the array .ctor with negative number new T[value].

Should throw OutOfMemoryExecption instead?

Actually... just need to remove the casts from Math.Min

benaadams · 2017-02-12T17:31:47Z

Going to submit separate PR for Clear and Remove as they are quite simple. Add ends up with a lot going on in the asm so will see if can take care of it seperately

danmoseley · 2017-02-13T02:26:27Z

src/mscorlib/src/System/Collections/Generic/List.cs

                // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
                // Note that this check works even when _items.Length overflowed thanks to the (uint) cast
                if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
                if (newCapacity < min) newCapacity = min;
                Capacity = newCapacity;
            }
        }
-
+
+        [MethodImpl(MethodImplAttributes.NoInlining)]


maybe a comment? // Separated out of List.Add to improve its code quality

jamesqo · 2017-02-14T23:47:46Z

@benaadams Awesome, nice work!

benaadams · 2017-02-14T23:49:44Z

Clear+Remove were picked up and merged in #9540 so this will just become Add

jamesqo · 2017-02-15T03:35:33Z

src/mscorlib/src/System/Collections/Generic/List.cs

+            if (size == _items.Length)
+            {
+                IncreaseCapacity();
+            }


Maybe avoid a field access in the common case by doing:

T[] items = _items; if (size == items.Length) { IncreaseCapacity(); items = _items; } _size = size + 1; items[size] = item;

?

benaadams · 2017-02-17T01:49:52Z

Willc lose and reopen different PR for Add

benaadams · 2017-02-22T01:14:25Z

Trims the asm by 10 bytes and doesn't get it permanently branded as no-inline.

Pre

; ============================================================
Marking List`1:Add(long):this as NOINLINE because of unprofitable inline
**************** Inline Tree
Inlines into 060034D4 List`1:Add(long):this
  [0 IL=0025 TR=000050 060034E3] [FAILED: noinline per IL/cached result] List`1:EnsureCapacity(int):this
Budget: initialTime=282, finalTime=282, initialBudget=2820, currentBudget=2820
Budget: initialSize=1818, finalSize=1818
; Assembly listing for method List`1:Add(long):this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 this         [V00,T00] ( 20,  18  )     ref  ->  rsi         this
;  V01 arg1         [V01,T03] (  3,   3  )    long  ->  rdi        
;  V02 loc0         [V02,T02] (  6,   6  )     int  ->  rdx        
;  V03 tmp0         [V03,T01] (  4,   8  )     ref  ->  rax        
;  V04 OutArgs      [V04    ] (  1,   1  )  lclBlk (32) [rsp+0x00]  
;
; Lcl frame size = 40

G_M46198_IG01:
       57                   push     rdi
       56                   push     rsi
       4883EC28             sub      rsp, 40
       488BF1               mov      rsi, rcx
       488BFA               mov      rdi, rdx

G_M46198_IG02:
       8B5618               mov      edx, dword ptr [rsi+24]
       488B4E08             mov      rcx, gword ptr [rsi+8]
       3B5108               cmp      edx, dword ptr [rcx+8]
       750D                 jne      SHORT G_M46198_IG03
       8B5618               mov      edx, dword ptr [rsi+24]
       FFC2                 inc      edx
       488BCE               mov      rcx, rsi
       E800000000           call     List`1:EnsureCapacity(int):this

G_M46198_IG03:
       488B4608             mov      rax, gword ptr [rsi+8]
       8B5618               mov      edx, dword ptr [rsi+24]
       8D4A01               lea      ecx, [rdx+1]
       894E18               mov      dword ptr [rsi+24], ecx
       3B5008               cmp      edx, dword ptr [rax+8]
       7312                 jae      SHORT G_M46198_IG05
       4863D2               movsxd   rdx, edx
       48897CD010           mov      qword ptr [rax+8*rdx+16], rdi
       FF461C               inc      dword ptr [rsi+28]

G_M46198_IG04:
       4883C428             add      rsp, 40
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret      

G_M46198_IG05:
       E800000000           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3     

; Total bytes of code 79, prolog size 6 for method List`1:Add(long):this

Post

**************** Inline Tree
Inlines into 060034D4 List`1:Add(long):this
  [0 IL=0054 TR=000029 060034D5] [FAILED: noinline per IL/cached result] List`1:AddWithResize(long):this
Budget: initialTime=240, finalTime=240, initialBudget=2400, currentBudget=2400
Budget: initialSize=1499, finalSize=1499
; Assembly listing for method List`1:Add(long):this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 this         [V00,T00] ( 12,  10.5)     ref  ->  rcx         this
;  V01 arg1         [V01,T02] (  4,   3  )    long  ->  rdx        
;  V02 loc0         [V02,T03] (  4,   3  )     ref  ->  rax        
;  V03 loc1         [V03,T01] (  7,   5  )     int  ->   r8        
;  V04 OutArgs      [V04    ] (  1,   1  )  lclBlk (32) [rsp+0x00]  
;
; Lcl frame size = 40

G_M46198_IG01:
       4883EC28             sub      rsp, 40
       90                   nop      

G_M46198_IG02:
       488B4108             mov      rax, gword ptr [rcx+8]
       448B4118             mov      r8d, dword ptr [rcx+24]
       FF411C               inc      dword ptr [rcx+28]
       44394008             cmp      dword ptr [rax+8], r8d
       761C                 jbe      SHORT G_M46198_IG04
       458D4801             lea      r9d, [r8+1]
       44894918             mov      dword ptr [rcx+24], r9d
       443B4008             cmp      r8d, dword ptr [rax+8]
       731C                 jae      SHORT G_M46198_IG06
       4963C8               movsxd   rcx, r8d
       488954C810           mov      qword ptr [rax+8*rcx+16], rdx

G_M46198_IG03:
       4883C428             add      rsp, 40
       C3                   ret      

G_M46198_IG04:
       488D0500000000       lea      rax, [(reloc)]

G_M46198_IG05:
       4883C428             add      rsp, 40
       48FFE0               rex.jmp  rax

G_M46198_IG06:
       E800000000           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3     

; Total bytes of code 69, prolog size 5 for method List`1:Add(long):this

PTAL @stephentoub @jkotas

benaadams · 2017-02-22T01:19:14Z

Can't eliminate the range check so there is a double check on length https://github.com/dotnet/coreclr/issues/9707

jkotas · 2017-02-22T01:57:37Z

src/mscorlib/src/System/Collections/Generic/List.cs

+        }
+
+        // Separated out of List.Add to improve inlinability of both functions
+        private void AddWithoutResize(T item)


Is this different from just marking the public method as [MethodImpl(MethodImplOptions.AggressiveInlining)]?

Jit still chooses based on context? Whereas aggressive inline considers context less?

@AndyAyersMS Would you prefer MethodImplOptions.AggressiveInlining here; or splitting the method into two to trick the JIT to inline it more often?

Would prefer AggressiveInlining since it makes the intent clear.

K, can make the asm better by keeping them together

Updated the asm from keeping them in same method with aggressive inline; is reduced by 10 bytes; with potential for the range check to be eliminated.

benaadams · 2017-02-22T05:30:09Z

Updated to aggressive inlined method; asm reduced and cleaner though still has both

cmp      dword ptr [rax+8], r8d
jbe      SHORT G_M46198_IG04

and

cmp      r8d, dword ptr [rax+8]
jae      SHORT G_M46198_IG06

benaadams · 2017-03-21T00:13:30Z

Second check now elided by #9773

jamesqo · 2017-03-21T00:17:10Z

@benaadams Great, then other collections can take advantage of this then.

jamesqo · 2017-03-21T00:25:23Z

Opened dotnet/corefx#17318

omariom · 2017-03-21T10:39:12Z

@benaadams
If you change the order..

array[size] = item;
_size = size + 1;

it may reuse r8d.
Like

inc      r8d
mov      dword ptr [rcx+24], r8d

A couple of bytes less :)
Though not sure about perf.

benaadams · 2017-03-21T10:49:46Z

@omariom for struct containing refs and classes array[size] = item; becomes a memory barrier assign, so was approaching it as getting the int assign in the cpu pipeline prior to the memory barrier. Might not make much difference tho

Commit migrated from dotnet/coreclr@7d9e017

dnfclas added the cla-already-signed label Feb 12, 2017

benaadams force-pushed the list-clear branch from c7bb365 to 9f26028 Compare February 12, 2017 06:16

benaadams closed this Feb 12, 2017

benaadams reopened this Feb 12, 2017

dnfclas added the cla-already-signed label Feb 12, 2017

benaadams force-pushed the list-clear branch from 9f26028 to 2601c83 Compare February 12, 2017 06:22

stephentoub reviewed Feb 12, 2017

View reviewed changes

src/mscorlib/src/System/Collections/Generic/List.cs Outdated

using System.Reflection;

Copy link

Member

stephentoub Feb 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usings should all be together.

stephentoub reviewed Feb 12, 2017

View reviewed changes

stephentoub approved these changes Feb 12, 2017

View reviewed changes

benaadams commented Feb 12, 2017

View reviewed changes

jkotas reviewed Feb 12, 2017

View reviewed changes

danmoseley added the area-System.Collections label Feb 13, 2017

danmoseley reviewed Feb 13, 2017

View reviewed changes

jamesqo reviewed Feb 15, 2017

View reviewed changes

benaadams closed this Feb 17, 2017

benaadams reopened this Feb 22, 2017

dnfclas added the cla-already-signed label Feb 22, 2017

benaadams force-pushed the list-clear branch from bb4d8c2 to 828225c Compare February 22, 2017 00:53

benaadams changed the title ~~Faster List Add & Clear~~ Faster List Add Feb 22, 2017

jkotas reviewed Feb 22, 2017

View reviewed changes

benaadams force-pushed the list-clear branch from eeeabda to 9c8554a Compare February 22, 2017 04:48

Faster List Add

e2a5a90

benaadams force-pushed the list-clear branch from 9c8554a to e2a5a90 Compare February 22, 2017 12:19

jkotas merged commit 7d9e017 into dotnet:master Feb 23, 2017

benaadams mentioned this pull request Mar 21, 2017

Use for rather than foreach on List aspnet/KestrelHttpServer#1523

Merged

jorive pushed a commit to guhuro/coreclr that referenced this pull request May 4, 2017

Faster List Add (dotnet#9539)

f47350d

karelz modified the milestone: 2.0.0 Aug 28, 2017

benaadams deleted the list-clear branch March 27, 2018 05:12

gfoidl mentioned this pull request Jun 21, 2018

ImmutableArray<T>.Builder.Add splitted in fast- and cold-path dotnet/corefx#28184

Closed

jamesqo mentioned this pull request Jan 31, 2020

[Performance] Take advantage of new array range check elimination in other collections dotnet/runtime#20702

Closed

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Faster List Add (dotnet/coreclr#9539)

d2db18b

Commit migrated from dotnet/coreclr@7d9e017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster List Add #9539

Faster List Add #9539

benaadams commented Feb 12, 2017

jkotas commented Feb 12, 2017

stephentoub commented Feb 12, 2017

stephentoub Feb 12, 2017

stephentoub Feb 12, 2017

stephentoub Feb 12, 2017

jkotas Feb 12, 2017

benaadams Feb 12, 2017

benaadams Feb 12, 2017

AndyAyersMS Feb 12, 2017

jkotas Feb 12, 2017

benaadams Feb 12, 2017

benaadams Feb 12, 2017

benaadams commented Feb 12, 2017

danmoseley Feb 13, 2017

benaadams Feb 22, 2017

jamesqo commented Feb 14, 2017

benaadams commented Feb 14, 2017

jamesqo Feb 15, 2017

benaadams commented Feb 17, 2017

benaadams commented Feb 22, 2017 •

edited

benaadams commented Feb 22, 2017

jkotas Feb 22, 2017 •

edited

benaadams Feb 22, 2017

jkotas Feb 22, 2017

AndyAyersMS Feb 22, 2017

benaadams Feb 22, 2017

benaadams Feb 22, 2017

benaadams commented Feb 22, 2017 •

edited

benaadams commented Mar 21, 2017

jamesqo commented Mar 21, 2017

jamesqo commented Mar 21, 2017

omariom commented Mar 21, 2017

benaadams commented Mar 21, 2017

Faster List Add #9539

Faster List Add #9539

Conversation

benaadams commented Feb 12, 2017

jkotas commented Feb 12, 2017

stephentoub commented Feb 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benaadams commented Feb 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesqo commented Feb 14, 2017

benaadams commented Feb 14, 2017

Choose a reason for hiding this comment

benaadams commented Feb 17, 2017

benaadams commented Feb 22, 2017 • edited

benaadams commented Feb 22, 2017

jkotas Feb 22, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benaadams commented Feb 22, 2017 • edited

benaadams commented Mar 21, 2017

jamesqo commented Mar 21, 2017

jamesqo commented Mar 21, 2017

omariom commented Mar 21, 2017

benaadams commented Mar 21, 2017

benaadams commented Feb 22, 2017 •

edited

jkotas Feb 22, 2017 •

edited

benaadams commented Feb 22, 2017 •

edited