Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Faster List Add #9539

Merged
merged 1 commit into from Feb 23, 2017
Merged

Faster List Add #9539

merged 1 commit into from Feb 23, 2017

Conversation

benaadams
Copy link
Member

List Add and Clear are warmspots in Kestrel

This is a mild tweak to Add however as what's being cleared is a list of GCHandle structs Clear is a significant win.

@jkotas is this a valid use of JitHelpers.ContainsReferences<T>()?

/cc @stephentoub

@jkotas
Copy link
Member

jkotas commented Feb 12, 2017

@jkotas is this a valid use of JitHelpers.ContainsReferences<T>()?

Yes, it is what it is meant for.

@stephentoub
Copy link
Member

I'd made a similar change to Add in #9323 (the PR shows separating out the EnsureCapacity, but I'd reverted that part locally and just never pushed it up, so it looked basically identically to this PR), as it showed benefits on one machine, but on another it actually showed a slowdown. We should just make sure it's consistently better before changing it.


using System.Reflection;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usings should all be together.

_items[_size++] = item;
public void Add(T item)
{
var size = _size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: var => int

if (_size > 0)
{
Array.Clear(_items, 0, _size); // Don't need to doc this but we clear the elements so that the gc can reclaim the references.
if (JitHelpers.ContainsReferences<T>())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the call completely evaporates, should we use this in List (and maybe elsewhere in other data structures) anywhere we null out a field, e.g. not just Clear but all of the Remove methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the call should completely evaporate. It should be good idea to use - in particular before bulk clearing (Array.Clear).

var size = _size;
if (size == _items.Length)
{
EnsureCapacity(size + 1);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried something like @stephentoub's #9323 where the addition was also factored out to try and get Add to be inlinable as its not; but the simple function was [below ALWAYS_INLINE size] and seemed to ignore the [MethodImpl(MethodImplAttributes.NoInlining)] attribute.

Not sure if is expected or a change? /cc @AndyAyersMS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also had to test in before after in coreclr due to the stackcrawlmark stuff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noinlining should always be honored, it's one of the first things the jit looks for.

if (_size == Array.MaxArrayLength)
{
// New length would be too large for array
ThrowHelper.ThrowLengthArgumentOutOfRange_ArgumentOutOfRange_NeedNonNegNum();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this exception when it was not there before? It seems to be changing the behavior for this case - was OutOfMemoryException thrown before in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't find docs saying any exception could be thrown; it would call set_Capacity with a negative number, which would call the array .ctor with negative number new T[value].

Should throw OutOfMemoryExecption instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... just need to remove the casts from Math.Min

@benaadams
Copy link
Member Author

Going to submit separate PR for Clear and Remove as they are quite simple. Add ends up with a lot going on in the asm so will see if can take care of it seperately

// Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
// Note that this check works even when _items.Length overflowed thanks to the (uint) cast
if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
if (newCapacity < min) newCapacity = min;
Capacity = newCapacity;
}
}


[MethodImpl(MethodImplAttributes.NoInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment? // Separated out of List.Add to improve its code quality

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@jamesqo
Copy link

jamesqo commented Feb 14, 2017

@benaadams Awesome, nice work!

@benaadams
Copy link
Member Author

Clear+Remove were picked up and merged in #9540 so this will just become Add

if (size == _items.Length)
{
IncreaseCapacity();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe avoid a field access in the common case by doing:

T[] items = _items;
if (size == items.Length)
{
    IncreaseCapacity();
    items = _items;
}
_size = size + 1;
items[size] = item;

?

@benaadams
Copy link
Member Author

Willc lose and reopen different PR for Add

@benaadams benaadams closed this Feb 17, 2017
@benaadams benaadams reopened this Feb 22, 2017
@benaadams benaadams changed the title Faster List Add & Clear Faster List Add Feb 22, 2017
@benaadams
Copy link
Member Author

benaadams commented Feb 22, 2017

Trims the asm by 10 bytes and doesn't get it permanently branded as no-inline.

Pre

; ============================================================
Marking List`1:Add(long):this as NOINLINE because of unprofitable inline
**************** Inline Tree
Inlines into 060034D4 List`1:Add(long):this
  [0 IL=0025 TR=000050 060034E3] [FAILED: noinline per IL/cached result] List`1:EnsureCapacity(int):this
Budget: initialTime=282, finalTime=282, initialBudget=2820, currentBudget=2820
Budget: initialSize=1818, finalSize=1818
; Assembly listing for method List`1:Add(long):this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 this         [V00,T00] ( 20,  18  )     ref  ->  rsi         this
;  V01 arg1         [V01,T03] (  3,   3  )    long  ->  rdi        
;  V02 loc0         [V02,T02] (  6,   6  )     int  ->  rdx        
;  V03 tmp0         [V03,T01] (  4,   8  )     ref  ->  rax        
;  V04 OutArgs      [V04    ] (  1,   1  )  lclBlk (32) [rsp+0x00]  
;
; Lcl frame size = 40

G_M46198_IG01:
       57                   push     rdi
       56                   push     rsi
       4883EC28             sub      rsp, 40
       488BF1               mov      rsi, rcx
       488BFA               mov      rdi, rdx

G_M46198_IG02:
       8B5618               mov      edx, dword ptr [rsi+24]
       488B4E08             mov      rcx, gword ptr [rsi+8]
       3B5108               cmp      edx, dword ptr [rcx+8]
       750D                 jne      SHORT G_M46198_IG03
       8B5618               mov      edx, dword ptr [rsi+24]
       FFC2                 inc      edx
       488BCE               mov      rcx, rsi
       E800000000           call     List`1:EnsureCapacity(int):this

G_M46198_IG03:
       488B4608             mov      rax, gword ptr [rsi+8]
       8B5618               mov      edx, dword ptr [rsi+24]
       8D4A01               lea      ecx, [rdx+1]
       894E18               mov      dword ptr [rsi+24], ecx
       3B5008               cmp      edx, dword ptr [rax+8]
       7312                 jae      SHORT G_M46198_IG05
       4863D2               movsxd   rdx, edx
       48897CD010           mov      qword ptr [rax+8*rdx+16], rdi
       FF461C               inc      dword ptr [rsi+28]

G_M46198_IG04:
       4883C428             add      rsp, 40
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret      

G_M46198_IG05:
       E800000000           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3     

; Total bytes of code 79, prolog size 6 for method List`1:Add(long):this

Post

**************** Inline Tree
Inlines into 060034D4 List`1:Add(long):this
  [0 IL=0054 TR=000029 060034D5] [FAILED: noinline per IL/cached result] List`1:AddWithResize(long):this
Budget: initialTime=240, finalTime=240, initialBudget=2400, currentBudget=2400
Budget: initialSize=1499, finalSize=1499
; Assembly listing for method List`1:Add(long):this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
;  V00 this         [V00,T00] ( 12,  10.5)     ref  ->  rcx         this
;  V01 arg1         [V01,T02] (  4,   3  )    long  ->  rdx        
;  V02 loc0         [V02,T03] (  4,   3  )     ref  ->  rax        
;  V03 loc1         [V03,T01] (  7,   5  )     int  ->   r8        
;  V04 OutArgs      [V04    ] (  1,   1  )  lclBlk (32) [rsp+0x00]  
;
; Lcl frame size = 40

G_M46198_IG01:
       4883EC28             sub      rsp, 40
       90                   nop      

G_M46198_IG02:
       488B4108             mov      rax, gword ptr [rcx+8]
       448B4118             mov      r8d, dword ptr [rcx+24]
       FF411C               inc      dword ptr [rcx+28]
       44394008             cmp      dword ptr [rax+8], r8d
       761C                 jbe      SHORT G_M46198_IG04
       458D4801             lea      r9d, [r8+1]
       44894918             mov      dword ptr [rcx+24], r9d
       443B4008             cmp      r8d, dword ptr [rax+8]
       731C                 jae      SHORT G_M46198_IG06
       4963C8               movsxd   rcx, r8d
       488954C810           mov      qword ptr [rax+8*rcx+16], rdx

G_M46198_IG03:
       4883C428             add      rsp, 40
       C3                   ret      

G_M46198_IG04:
       488D0500000000       lea      rax, [(reloc)]

G_M46198_IG05:
       4883C428             add      rsp, 40
       48FFE0               rex.jmp  rax

G_M46198_IG06:
       E800000000           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3     

; Total bytes of code 69, prolog size 5 for method List`1:Add(long):this

PTAL @stephentoub @jkotas

@benaadams
Copy link
Member Author

Can't eliminate the range check so there is a double check on length https://github.com/dotnet/coreclr/issues/9707

}

// Separated out of List.Add to improve inlinability of both functions
private void AddWithoutResize(T item)
Copy link
Member

@jkotas jkotas Feb 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different from just marking the public method as [MethodImpl(MethodImplOptions.AggressiveInlining)]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jit still chooses based on context? Whereas aggressive inline considers context less?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndyAyersMS Would you prefer MethodImplOptions.AggressiveInlining here; or splitting the method into two to trick the JIT to inline it more often?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer AggressiveInlining since it makes the intent clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K, can make the asm better by keeping them together

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the asm from keeping them in same method with aggressive inline; is reduced by 10 bytes; with potential for the range check to be eliminated.

@benaadams
Copy link
Member Author

benaadams commented Feb 22, 2017

Updated to aggressive inlined method; asm reduced and cleaner though still has both

cmp      dword ptr [rax+8], r8d
jbe      SHORT G_M46198_IG04

and

cmp      r8d, dword ptr [rax+8]
jae      SHORT G_M46198_IG06

@benaadams
Copy link
Member Author

Second check now elided by #9773

@jamesqo
Copy link

jamesqo commented Mar 21, 2017

@benaadams Great, then other collections can take advantage of this then.

@jamesqo
Copy link

jamesqo commented Mar 21, 2017

Opened dotnet/corefx#17318

@omariom
Copy link

omariom commented Mar 21, 2017

@benaadams
If you change the order..

array[size] = item;
_size = size + 1;

it may reuse r8d.
Like

inc      r8d
mov      dword ptr [rcx+24], r8d

A couple of bytes less :)
Though not sure about perf.

@benaadams
Copy link
Member Author

@omariom for struct containing refs and classes array[size] = item; becomes a memory barrier assign, so was approaching it as getting the int assign in the cpu pipeline prior to the memory barrier. Might not make much difference tho

jorive pushed a commit to guhuro/coreclr that referenced this pull request May 4, 2017
@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
@benaadams benaadams deleted the list-clear branch March 27, 2018 05:12
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
9 participants