-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential CQ improvements in Dictionary<K,T> #8953
Comments
I do not think you need to worry about this. It is unpredictable behavior. |
I think it's an acceptable change. Yes, it's technically breaking, but such could would be extremely brittle and terribly suspect. |
Nice, didn't notice that one before. FWIW I have a plan to get rid of the remaining 2 range checks. |
A related item that may help with devirtualization is the default comparer can be a virtual type rather than an interface type? e.g. private static EqualityComparer<TKey> s_defaultComparer;
private IEqualityComparer<TKey> _comparer; EqualityComparer<TKey> defaultComparer = s_defaultComparer;
IEqualityComparer<TKey> comparer = _comparer;
int hashCode = (comparer?.GetHashCode(key) ?? defaultComparer.GetHashCode(key)) & 0x7FFFFFFF; Note: default string comparer is a bit of a special case as it switches type in use. |
Will also need to look more closely at the case where key is a ref type -- the above helps it somewhat but the loop still looks overly complex (at least when prejitting). Possibly the helper call is killing too much. Also odd that the loop body ends up non-contiguous (in both before & after). ; Assembly listing for method System.Collections.Generic.Dictionary`2[__Canon,__Canon][System.__Canon,System.__Canon]:FindEntry(ref):int:this
; AFTER
G_M14721_IG05:
mov ecx, dword ptr [rbp+8]
cmp r13d, ecx
jae G_M14721_IG14
movsxd rcx, r13d
lea rcx, [rcx+2*rcx]
lea rax, bword ptr [rbp+8*rcx+16]
mov bword ptr [rsp+20H], rax
cmp dword ptr [rax+16], r12d
jne SHORT G_M14721_IG07
mov r8, gword ptr [rbp+8*rcx+16]
mov gword ptr [rsp+28H], r8
mov rcx, r14
mov r11, qword ptr [r15+136]
test r11, r11
jne SHORT G_M14721_IG10
lea rdx, [(reloc)]
call CORINFO_HELP_RUNTIMEHANDLE_CLASS
mov r11, rax
mov r8, gword ptr [rsp+28H]
G_M14721_IG06:
mov rcx, rbx
mov rdx, r8
mov r8, rdi
cmp dword ptr [rcx], ecx
call qword ptr [r11]
test eax, eax
jne SHORT G_M14721_IG11
G_M14721_IG07:
mov rax, bword ptr [rsp+20H]
mov r13d, dword ptr [rax+20]
test r13d, r13d
jge SHORT G_M14721_IG05
...
G_M14721_IG10:
mov r8, gword ptr [rsp+28H]
jmp SHORT G_M14721_IG06 Seems like the helper call should be hoistable |
Have added caching to ; Assembly listing for method System.Collections.Generic.Dictionary`2[Char,__Canon][System.Char,System.__Canon]:TryInsert(char,ref,ubyte):bool:this
; AFTER
G_M62716_IG04:
cmp r10d, dword ptr [r15+8]
jae G_M62716_IG18
mov dword ptr [rsp+3CH], r10d
movsxd rdx, r10d
lea rdx, [rdx+2*rdx]
lea r11, bword ptr [r15+8*rdx+16]
cmp dword ptr [r11+8], r13d
jne SHORT G_M62716_IG05
mov bword ptr [rsp+20H], r11
movzx rdx, word ptr [r11+16]
mov r8d, r12d
mov rcx, r14
lea r11, [(reloc)]
cmp dword ptr [rcx], ecx
call qword ptr [r11]System.Collections.Generic.IEqualityComparer`1[Char][System.Char]:Equals(char,char):bool:this
test eax, eax
mov r11, bword ptr [rsp+20H]
jne SHORT G_M62716_IG07
G_M62716_IG05:
mov r9d, dword ptr [rsp+40H]
inc r9d
mov dword ptr [rsp+40H], r9d
mov r10d, dword ptr [r11+12]
mov edx, r10d
test edx, edx
mov r10d, edx
jge SHORT G_M62716_IG04
So may just look at updating |
Changes in case anyone else is interested in poking at this: master...AndyAyersMS:DictionaryHacks |
Null check comes from this bit of code in lower.cpp: GenTree* Lowering::LowerVirtualStubCall(GenTreeCall* call)
{
...
#ifdef _TARGET_64BIT_
// Non-tail calls: Jump Stubs are not taken into account by VM for mapping an AV into a NullRef
// exception. Therefore, JIT needs to emit an explicit null check. Note that Jit64 too generates
// an explicit null check.
//
// Tail calls: fgMorphTailCall() materializes null check explicitly and hence no need to emit
// null check.
// Non-64-bit: No need to null check the this pointer - the dispatch code will deal with this.
// The VM considers exceptions that occur in stubs on 64-bit to be not managed exceptions and
// it would be difficult to change this in a way so that it affects only the right stubs.
if (!call->IsTailCallViaHelper())
{
call->gtFlags |= GTF_CALL_NULLCHECK;
}
#endif Seems like the kind of thing that should be set in the importer or morph rather than implied by the call type, so that if it ever gets cleared we actually know it's not needed. As things are now the non-null assertion from the initial comparer call doesn't propagate here because assertion prop thinks this subsequent call doesn't need a null check. |
This is not accurate statement. There is nothing special about x64 that makes this hard. We have it for every other platform. It should be a matter of copying |
As in-function constraints don't currently exist dotnet/csharplang#905 Using the pattern that Vector uses if (typeof(TKey) == typeof(byte))
{
return (byte)(object)key0 == (byte)(object)key1;
}
else if (typeof(TKey) == typeof(sbyte))
{
return (sbyte)(object)key0 == (sbyte)(object)key1;
}
else if (typeof(TKey) == typeof(ushort))
{
return (ushort)(object)key0 == (ushort)(object)key1;
}
else if (typeof(TKey) == typeof(short))
{
return (short)(object)key0 == (short)(object)key1;
}
else if (typeof(TKey) == typeof(uint))
{
return (uint)(object)key0 == (uint)(object)key1;
}
else if (typeof(TKey) == typeof(int))
{
return (int)(object)key0 == (int)(object)key1;
}
else if (typeof(TKey) == typeof(ulong))
{
return (ulong)(object)key0 == (ulong)(object)key1;
}
else if (typeof(TKey) == typeof(long))
{
return (long)(object)key0 == (long)(object)key1;
}
else if (typeof(TKey) == typeof(IntPtr))
{
return (IntPtr)(object)key0 == (IntPtr)(object)key1;
}
else if (typeof(TKey) == typeof(UIntPtr))
{
return (UIntPtr)(object)key0 == (UIntPtr)(object)key1;
}
else if (typeof(TKey) == typeof(Guid))
{
return (Guid)(object)key0 == (Guid)(object)key1;
}
// ... to get no virtual calls for default comparer and the BCL valuetype primitives master...benaadams:dictionary; won't apply to generic structs though. FindEntry for a char key doesn't look too bad..? G_M38479_IG01:
4157 push r15
4156 push r14
4155 push r13
4154 push r12
57 push rdi
56 push rsi
55 push rbp
53 push rbx
4883EC58 sub rsp, 88
488BF1 mov rsi, rcx
G_M38479_IG02:
BFFFFFFFFF mov edi, -1
488B5E08 mov rbx, gword ptr [rsi+8]
4885DB test rbx, rbx
0F84AE000000 je G_M38479_IG10
488B4E20 mov rcx, gword ptr [rsi+32]
4885C9 test rcx, rcx
7512 jne SHORT G_M38479_IG03
0FB7EA movzx rbp, dx
8BD5 mov edx, ebp
8BCA mov ecx, edx
C1E110 shl ecx, 16
448BF2 mov r14d, edx
440BF1 or r14d, ecx
EB14 jmp SHORT G_M38479_IG04 ; non-default comparer
;G_M38479_IG03: ; non-default comparer
; 0FB7EA movzx rbp, dx
; 8BD5 mov edx, ebp
; 4C8D1D00000000 lea r11, [(reloc)]
; 3909 cmp dword ptr [rcx], ecx
; 41FF13 call qword ptr [r11]IEqualityComparer`1:GetHashCode(char):int:this
; 448BF0 mov r14d, eax
G_M38479_IG04:
4181E6FFFFFF7F and r14d, 0x7FFFFFFF
4C8B7E10 mov r15, gword ptr [rsi+16]
448B4308 mov r8d, dword ptr [rbx+8]
418BC6 mov eax, r14d
99 cdq
41F7F8 idiv edx:eax, r8d
413BD0 cmp edx, r8d
7377 jae SHORT G_M38479_IG12
4C63C2 movsxd r8, edx
428B5C8310 mov ebx, dword ptr [rbx+4*r8+16]
85DB test ebx, ebx
7C58 jl SHORT G_M38479_IG10
G_M38479_IG05:
413B5F08 cmp ebx, dword ptr [r15+8]
7365 jae SHORT G_M38479_IG12
4C63C3 movsxd r8, ebx
4F8D0440 lea r8, [r8+2*r8]
4F8D64C710 lea r12, bword ptr [r15+8*r8+16]
4539742408 cmp dword ptr [r12+8], r14d
7532 jne SHORT G_M38479_IG08
410FB7542410 movzx rdx, word ptr [r12+16]
488B4E20 mov rcx, gword ptr [rsi+32]
4885C9 test rcx, rcx
750C jne SHORT G_M38479_IG06 ; non-default comparer
3BD5 cmp edx, ebp
410F94C5 sete r13b
450FB6ED movzx r13, r13b
EB12 jmp SHORT G_M38479_IG07
;G_M38479_IG06: ; non-default comparer
; 448BC5 mov r8d, ebp
; 4C8D1D00000000 lea r11, [(reloc)]
; 3909 cmp dword ptr [rcx], ecx
; 41FF13 call qword ptr [r11]IEqualityComparer`1:Equals(char,char):bool:this
; 448BE8 mov r13d, eax
G_M38479_IG07:
4585ED test r13d, r13d
750B jne SHORT G_M38479_IG09
G_M38479_IG08:
418B5C240C mov ebx, dword ptr [r12+12]
85DB test ebx, ebx
7DAC jge SHORT G_M38479_IG05
EB02 jmp SHORT G_M38479_IG10
G_M38479_IG09:
8BFB mov edi, ebx
G_M38479_IG10:
8BC7 mov eax, edi
G_M38479_IG11:
4883C458 add rsp, 88
5B pop rbx
5D pop rbp
5E pop rsi
5F pop rdi
415C pop r12
415D pop r13
415E pop r14
415F pop r15
C3 ret
G_M38479_IG12:
E800000000 call CORINFO_HELP_RNGCHKFAIL
CC int3 |
TIL: You break dictionary, you break every test 🙀 |
Been looking for a good set of perf tests. The ones from CoreFX are a plausible starting point but they may be too noisy to use in low-level CQ tuning. Need to play around with them some more. On that note, consecutive integer keys are probably best-case for collision avoidance. Would be nice to have test inputs that produce "representative" collision chains, if such a thing can be defined. benchmarkgames/k-nucleotide is ostensibly a hash table perf test so also warrants a look. Also want to look deeper at what the actual cost of the virtual stub dispatch is relative to everything else to see whether there's any hope that shaving off a handful of instructions (which is what we'd see on a first-bucket hit/miss w/o being able to inline away the hash & equals) will show up as a perf improvement. Since VSD is stateful, this requires some care. |
VSD looks like a double call for monomorphic sites and could be reduced to a single call? https://github.com/dotnet/coreclr/issues/8819 |
@AndyAyersMS is this resolved with dotnet/coreclr#15419? |
Could very well be. It is probably worth looking at the new code gen to see if the minor issues mentioned above are cleared up (or now perhaps more prominent ). I'm be happy to do it, if nobody else gets to it in the next few days. |
@AndyAyersMS is this fixed now do you know? |
Perf is definitely improved from when I opened the issue. There are still things here we can improve further but we should take a fresh look. So we might as well close this issue and open a new one when we do the analysis and come up with ideas. |
Dictionary<K,T>
has some inefficient CQ in key inner loops, in part because the jit has to assume that calls to the comparer may side effect the dictionary itself. For instance inFindEntry
's search loop:the jit must assume that both
comparer
andentries
are modified by the call tocomparer.Equals
. This yields native code like the following:So at
IG04
, the jit re-fetchesentries
, then its length, and then does a redundant bounds check.There is also some register shuffling before the call to the comparer; this is caused in part by the fact that the comparer object is also be re-fetched from the dictionary before the call.
These issues are unlikely to go away with proposed/conceptual changes to allow default comparer dictionaries to devirtualize, since the possibility of a non-default comparer with unknown effects will still be present. One could perhaps avoid this by unswitching on the "is default" across the entire loop.
Manually caching the comparer and equals array into locals as in the following:
avoids these issues and produces tighter inner loop code (statically, 6 instructions less all told)
However this alters dictionary behavior in cases where the comparer side effects the dictionary. I am not sure if such behaviors are worth preserving, and it's not clear they are at all useful, but they are currently permitted.
If this sort of change seems reasonable, I'll go and apply it to the other methods that use the comparer, and do some benchmarking to see what the overall impact is on performance.
Still need to look at why the null check on the comparer remains; it should now be fully redundant.
@stephentoub @jkotas @benaadams @dotnet/jit-contrib thoughts?
The text was updated successfully, but these errors were encountered: