Conversation
c7bb365
to
9f26028
Compare
9f26028
to
2601c83
Compare
Yes, it is what it is meant for. |
I'd made a similar change to Add in #9323 (the PR shows separating out the EnsureCapacity, but I'd reverted that part locally and just never pushed it up, so it looked basically identically to this PR), as it showed benefits on one machine, but on another it actually showed a slowdown. We should just make sure it's consistently better before changing it. |
|
||
using System.Reflection; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usings should all be together.
_items[_size++] = item; | ||
public void Add(T item) | ||
{ | ||
var size = _size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: var => int
if (_size > 0) | ||
{ | ||
Array.Clear(_items, 0, _size); // Don't need to doc this but we clear the elements so that the gc can reclaim the references. | ||
if (JitHelpers.ContainsReferences<T>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the call completely evaporates, should we use this in List (and maybe elsewhere in other data structures) anywhere we null out a field, e.g. not just Clear but all of the Remove methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the call should completely evaporate. It should be good idea to use - in particular before bulk clearing (Array.Clear).
var size = _size; | ||
if (size == _items.Length) | ||
{ | ||
EnsureCapacity(size + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried something like @stephentoub's #9323 where the addition was also factored out to try and get Add
to be inlinable as its not; but the simple function was [below ALWAYS_INLINE size]
and seemed to ignore the [MethodImpl(MethodImplAttributes.NoInlining)]
attribute.
Not sure if is expected or a change? /cc @AndyAyersMS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also had to test in before after in coreclr due to the stackcrawlmark stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noinlining should always be honored, it's one of the first things the jit looks for.
if (_size == Array.MaxArrayLength) | ||
{ | ||
// New length would be too large for array | ||
ThrowHelper.ThrowLengthArgumentOutOfRange_ArgumentOutOfRange_NeedNonNegNum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this exception when it was not there before? It seems to be changing the behavior for this case - was OutOfMemoryException thrown before in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't find docs saying any exception could be thrown; it would call set_Capacity
with a negative number, which would call the array .ctor with negative number new T[value]
.
Should throw OutOfMemoryExecption instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually... just need to remove the casts from Math.Min
Going to submit separate PR for Clear and Remove as they are quite simple. Add ends up with a lot going on in the asm so will see if can take care of it seperately |
// Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow. | ||
// Note that this check works even when _items.Length overflowed thanks to the (uint) cast | ||
if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength; | ||
if (newCapacity < min) newCapacity = min; | ||
Capacity = newCapacity; | ||
} | ||
} | ||
|
||
|
||
[MethodImpl(MethodImplAttributes.NoInlining)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a comment? // Separated out of List.Add to improve its code quality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@benaadams Awesome, nice work! |
Clear+Remove were picked up and merged in #9540 so this will just become Add |
if (size == _items.Length) | ||
{ | ||
IncreaseCapacity(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe avoid a field access in the common case by doing:
T[] items = _items;
if (size == items.Length)
{
IncreaseCapacity();
items = _items;
}
_size = size + 1;
items[size] = item;
?
Willc lose and reopen different PR for Add |
bb4d8c2
to
828225c
Compare
Trims the asm by 10 bytes and doesn't get it permanently branded as no-inline. Pre ; ============================================================
Marking List`1:Add(long):this as NOINLINE because of unprofitable inline
**************** Inline Tree
Inlines into 060034D4 List`1:Add(long):this
[0 IL=0025 TR=000050 060034E3] [FAILED: noinline per IL/cached result] List`1:EnsureCapacity(int):this
Budget: initialTime=282, finalTime=282, initialBudget=2820, currentBudget=2820
Budget: initialSize=1818, finalSize=1818
; Assembly listing for method List`1:Add(long):this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
; V00 this [V00,T00] ( 20, 18 ) ref -> rsi this
; V01 arg1 [V01,T03] ( 3, 3 ) long -> rdi
; V02 loc0 [V02,T02] ( 6, 6 ) int -> rdx
; V03 tmp0 [V03,T01] ( 4, 8 ) ref -> rax
; V04 OutArgs [V04 ] ( 1, 1 ) lclBlk (32) [rsp+0x00]
;
; Lcl frame size = 40
G_M46198_IG01:
57 push rdi
56 push rsi
4883EC28 sub rsp, 40
488BF1 mov rsi, rcx
488BFA mov rdi, rdx
G_M46198_IG02:
8B5618 mov edx, dword ptr [rsi+24]
488B4E08 mov rcx, gword ptr [rsi+8]
3B5108 cmp edx, dword ptr [rcx+8]
750D jne SHORT G_M46198_IG03
8B5618 mov edx, dword ptr [rsi+24]
FFC2 inc edx
488BCE mov rcx, rsi
E800000000 call List`1:EnsureCapacity(int):this
G_M46198_IG03:
488B4608 mov rax, gword ptr [rsi+8]
8B5618 mov edx, dword ptr [rsi+24]
8D4A01 lea ecx, [rdx+1]
894E18 mov dword ptr [rsi+24], ecx
3B5008 cmp edx, dword ptr [rax+8]
7312 jae SHORT G_M46198_IG05
4863D2 movsxd rdx, edx
48897CD010 mov qword ptr [rax+8*rdx+16], rdi
FF461C inc dword ptr [rsi+28]
G_M46198_IG04:
4883C428 add rsp, 40
5E pop rsi
5F pop rdi
C3 ret
G_M46198_IG05:
E800000000 call CORINFO_HELP_RNGCHKFAIL
CC int3
; Total bytes of code 79, prolog size 6 for method List`1:Add(long):this Post **************** Inline Tree
Inlines into 060034D4 List`1:Add(long):this
[0 IL=0054 TR=000029 060034D5] [FAILED: noinline per IL/cached result] List`1:AddWithResize(long):this
Budget: initialTime=240, finalTime=240, initialBudget=2400, currentBudget=2400
Budget: initialSize=1499, finalSize=1499
; Assembly listing for method List`1:Add(long):this
; Emitting BLENDED_CODE for X64 CPU with SSE2
; optimized code
; rsp based frame
; fully interruptible
; Final local variable assignments
;
; V00 this [V00,T00] ( 12, 10.5) ref -> rcx this
; V01 arg1 [V01,T02] ( 4, 3 ) long -> rdx
; V02 loc0 [V02,T03] ( 4, 3 ) ref -> rax
; V03 loc1 [V03,T01] ( 7, 5 ) int -> r8
; V04 OutArgs [V04 ] ( 1, 1 ) lclBlk (32) [rsp+0x00]
;
; Lcl frame size = 40
G_M46198_IG01:
4883EC28 sub rsp, 40
90 nop
G_M46198_IG02:
488B4108 mov rax, gword ptr [rcx+8]
448B4118 mov r8d, dword ptr [rcx+24]
FF411C inc dword ptr [rcx+28]
44394008 cmp dword ptr [rax+8], r8d
761C jbe SHORT G_M46198_IG04
458D4801 lea r9d, [r8+1]
44894918 mov dword ptr [rcx+24], r9d
443B4008 cmp r8d, dword ptr [rax+8]
731C jae SHORT G_M46198_IG06
4963C8 movsxd rcx, r8d
488954C810 mov qword ptr [rax+8*rcx+16], rdx
G_M46198_IG03:
4883C428 add rsp, 40
C3 ret
G_M46198_IG04:
488D0500000000 lea rax, [(reloc)]
G_M46198_IG05:
4883C428 add rsp, 40
48FFE0 rex.jmp rax
G_M46198_IG06:
E800000000 call CORINFO_HELP_RNGCHKFAIL
CC int3
; Total bytes of code 69, prolog size 5 for method List`1:Add(long):this PTAL @stephentoub @jkotas |
Can't eliminate the range check so there is a double check on length https://github.com/dotnet/coreclr/issues/9707 |
} | ||
|
||
// Separated out of List.Add to improve inlinability of both functions | ||
private void AddWithoutResize(T item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this different from just marking the public method as [MethodImpl(MethodImplOptions.AggressiveInlining)]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jit still chooses based on context? Whereas aggressive inline considers context less?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndyAyersMS Would you prefer MethodImplOptions.AggressiveInlining here; or splitting the method into two to trick the JIT to inline it more often?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer AggressiveInlining
since it makes the intent clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K, can make the asm better by keeping them together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the asm from keeping them in same method with aggressive inline; is reduced by 10 bytes; with potential for the range check to be eliminated.
eeeabda
to
9c8554a
Compare
Updated to aggressive inlined method; asm reduced and cleaner though still has both
and
|
9c8554a
to
e2a5a90
Compare
Second check now elided by #9773 |
@benaadams Great, then other collections can take advantage of this then. |
Opened dotnet/corefx#17318 |
@benaadams array[size] = item;
_size = size + 1; it may reuse r8d. inc r8d
mov dword ptr [rcx+24], r8d A couple of bytes less :) |
@omariom for struct containing refs and classes |
Commit migrated from dotnet/coreclr@7d9e017
List Add and Clear are warmspots in Kestrel
This is a mild tweak to
Add
however as what's being cleared is a list ofGCHandle
structsClear
is a significant win.@jkotas is this a valid use of
JitHelpers.ContainsReferences<T>()
?/cc @stephentoub