Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Unsafe.BitCast for Int128UInt128 operators #104506

Merged
merged 1 commit into from
Jul 9, 2024

Conversation

xtqqczze
Copy link
Contributor

@xtqqczze xtqqczze commented Jul 6, 2024

Diffs show increased inlining and tail calls.

MihuBot/runtime-utils#478
MihuBot/runtime-utils#479

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 6, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

@tannergooding
Copy link
Member

This in general LGTM, but it might be interesting to understand why promotion didn't handle this given it's a simple struct containing 2x ulong fields.

CC. @jakobbotsch

@jakobbotsch
Copy link
Member

This in general LGTM, but it might be interesting to understand why promotion didn't handle this given it's a simple struct containing 2x ulong fields.

CC. @jakobbotsch

I would need some concrete cases to look at. As it is this looks like a size wise regression. @xtqqczze do you have concrete benchmarks showing improvements from the change?

@stephentoub
Copy link
Member

As it is this looks like a size wise regression

Is that not just because the system chose to inline more things as a result of this? If BitCast had been available when this code was first written, presumably we'd have chosen to use it then.

@jakobbotsch
Copy link
Member

Is that not just because the system chose to inline more things as a result of this? If BitCast had been available when this code was first written, presumably we'd have chosen to use it then.

Yeah, definitely looks like there are different inlining decisions here (both new inlines we perform and cases where we no longer inline, it looks like).

Also looks like the second diff is a larger size-wise improvement than the first diff is a regression, so in that sense this isn't actually a size-wise regression (but as you said, it's hard to compare in the face of different inlining decisions).

Either way I'd be happy to look at concrete cases if there are any, but I wasn't immediately able to identify anything that looks related to deficiencies in promotion in the diffs. If you and @tannergooding prefer BitCast over the explicit field accesses then certainly that's fine with me. I don't think the JIT should have any problem handling either pattern.

@tannergooding
Copy link
Member

tannergooding commented Jul 9, 2024

Is that not just because the system chose to inline more things as a result of this?

That's what it looks like to me. Size diffs are always a little wonky for xarch due to the variable sized encoding and differing encoding cost for some sizes or registers. LSRA choosing to use R8 instead of RAX can lead to an additional byte, for example.

In this case, it looks like we eliminate code, which allows inlining to kick in and that changes register preferences and some operation sizes causes the size to increase even though the number of instructions often decreases. For example in System.Int128:TryFormat(System.Span1[ushort],byref,System.ReadOnlySpan1[ushort],System.IFormatProvider):ubyte:this,

-       mov      rsi, rax
-       or       rsi, 1
-       lzcnt    rsi, rsi
-       xor      esi, 63
-       movsxd   rsi, esi
+       mov      rdi, rax
+       or       rdi, 1
+       lzcnt    rdi, rdi
+       xor      edi, 63
+       movsxd   rdi, edi
+       mov      rsi, 0xD1FFAB1E      ; static handle
+       movzx    rdi, byte  ptr [rdi+rsi]
+       mov      esi, edi
        mov      rdx, 0xD1FFAB1E      ; static handle
-       movzx    rsi, byte  ptr [rsi+rdx]
-       mov      edx, esi
-       mov      rcx, 0xD1FFAB1E      ; static handle
-       cmp      rax, qword ptr [rcx+8*rdx]
-       setb     dl
-       movzx    rdx, dl
-       sub      esi, edx
-       lea      eax, [rsi+0x14]
+       cmp      rax, qword ptr [rdx+8*rsi]
+       setb     sil
+       movzx    rsi, sil
+       sub      edi, esi
+       lea      eax, [rdi+0x14]
        jmp      SHORT G_M10567_IG08
-						;; size=106 bbWeight=0.50 PerfScore 9.25
+						;; size=108 bbWeight=0.50 PerfScore 9.25

This code is 2 bytes bigger, but it actually hasn't fundamentally changed and if you were to replace the registers used with abstract names like reg1 you'd find they're identical. The reason the latter is bigger is because LSRA decided to use RSI for the "second register" rather than RDX, this requires an extra byte to encode the setb sil and subsequent movzx rsi, sil.

We see quite a lot of diffs that are in this realm and it'd probably be beneficial to track the number of instructions in addition to the size so we can get a better view over what's actually changed.

Later on in the method we get 96 extra bytes that are "real" due to the call to UInt128.DivRem being inlined and thus us having 3 calls to op_Division, op_Multiply, and op_Subtraction instead
-- Notably there's also a call to the System.ValueTuple constructor which seems non-ideal given its just setting two fields
-- Ideally we'd also optimize DivRem to avoid needing the multiply/subtract. The main division algorithm can just return the remainder directly and avoid needing the extra work

@tannergooding
Copy link
Member

If you and @tannergooding prefer BitCast over the explicit field accesses then certainly that's fine with me. I don't think the JIT should have any problem handling either pattern.

I think I prefer BitCast just because it's less IL and gives a small inlining profitability boost due to being intrinsic. Had just seemed a little odd that there were other improvements beyond just inlining.

@stephentoub
Copy link
Member

Thanks.

@stephentoub stephentoub merged commit 01dbf51 into dotnet:main Jul 9, 2024
146 checks passed
matouskozak added a commit to matouskozak/runtime that referenced this pull request Jul 11, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Aug 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Numerics community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants