Use `Unsafe.BitCast` for `Int128` ↔`UInt128` operators #104506

xtqqczze · 2024-07-06T12:16:39Z

Diffs show increased inlining and tail calls.

MihuBot/runtime-utils#478
MihuBot/runtime-utils#479

dotnet-policy-service · 2024-07-06T12:17:11Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

tannergooding · 2024-07-06T18:24:11Z

This in general LGTM, but it might be interesting to understand why promotion didn't handle this given it's a simple struct containing 2x ulong fields.

CC. @jakobbotsch

jakobbotsch · 2024-07-07T14:42:51Z

This in general LGTM, but it might be interesting to understand why promotion didn't handle this given it's a simple struct containing 2x ulong fields.

CC. @jakobbotsch

I would need some concrete cases to look at. As it is this looks like a size wise regression. @xtqqczze do you have concrete benchmarks showing improvements from the change?

stephentoub · 2024-07-09T15:27:44Z

As it is this looks like a size wise regression

Is that not just because the system chose to inline more things as a result of this? If BitCast had been available when this code was first written, presumably we'd have chosen to use it then.

jakobbotsch · 2024-07-09T15:44:13Z

Is that not just because the system chose to inline more things as a result of this? If BitCast had been available when this code was first written, presumably we'd have chosen to use it then.

Yeah, definitely looks like there are different inlining decisions here (both new inlines we perform and cases where we no longer inline, it looks like).

Also looks like the second diff is a larger size-wise improvement than the first diff is a regression, so in that sense this isn't actually a size-wise regression (but as you said, it's hard to compare in the face of different inlining decisions).

Either way I'd be happy to look at concrete cases if there are any, but I wasn't immediately able to identify anything that looks related to deficiencies in promotion in the diffs. If you and @tannergooding prefer BitCast over the explicit field accesses then certainly that's fine with me. I don't think the JIT should have any problem handling either pattern.

tannergooding · 2024-07-09T15:52:32Z

Is that not just because the system chose to inline more things as a result of this?

That's what it looks like to me. Size diffs are always a little wonky for xarch due to the variable sized encoding and differing encoding cost for some sizes or registers. LSRA choosing to use R8 instead of RAX can lead to an additional byte, for example.

In this case, it looks like we eliminate code, which allows inlining to kick in and that changes register preferences and some operation sizes causes the size to increase even though the number of instructions often decreases. For example in System.Int128:TryFormat(System.Span1[ushort],byref,System.ReadOnlySpan1[ushort],System.IFormatProvider):ubyte:this,

-       mov      rsi, rax
-       or       rsi, 1
-       lzcnt    rsi, rsi
-       xor      esi, 63
-       movsxd   rsi, esi
+       mov      rdi, rax
+       or       rdi, 1
+       lzcnt    rdi, rdi
+       xor      edi, 63
+       movsxd   rdi, edi
+       mov      rsi, 0xD1FFAB1E      ; static handle
+       movzx    rdi, byte  ptr [rdi+rsi]
+       mov      esi, edi
        mov      rdx, 0xD1FFAB1E      ; static handle
-       movzx    rsi, byte  ptr [rsi+rdx]
-       mov      edx, esi
-       mov      rcx, 0xD1FFAB1E      ; static handle
-       cmp      rax, qword ptr [rcx+8*rdx]
-       setb     dl
-       movzx    rdx, dl
-       sub      esi, edx
-       lea      eax, [rsi+0x14]
+       cmp      rax, qword ptr [rdx+8*rsi]
+       setb     sil
+       movzx    rsi, sil
+       sub      edi, esi
+       lea      eax, [rdi+0x14]
        jmp      SHORT G_M10567_IG08
-						;; size=106 bbWeight=0.50 PerfScore 9.25
+						;; size=108 bbWeight=0.50 PerfScore 9.25

This code is 2 bytes bigger, but it actually hasn't fundamentally changed and if you were to replace the registers used with abstract names like reg1 you'd find they're identical. The reason the latter is bigger is because LSRA decided to use RSI for the "second register" rather than RDX, this requires an extra byte to encode the setb sil and subsequent movzx rsi, sil.

We see quite a lot of diffs that are in this realm and it'd probably be beneficial to track the number of instructions in addition to the size so we can get a better view over what's actually changed.

Later on in the method we get 96 extra bytes that are "real" due to the call to UInt128.DivRem being inlined and thus us having 3 calls to op_Division, op_Multiply, and op_Subtraction instead
-- Notably there's also a call to the System.ValueTuple constructor which seems non-ideal given its just setting two fields
-- Ideally we'd also optimize DivRem to avoid needing the multiply/subtract. The main division algorithm can just return the remainder directly and avoid needing the extra work

tannergooding · 2024-07-09T15:55:56Z

If you and @tannergooding prefer BitCast over the explicit field accesses then certainly that's fine with me. I don't think the JIT should have any problem handling either pattern.

I think I prefer BitCast just because it's less IL and gives a small inlining profitability boost due to being intrinsic. Had just seemed a little odd that there were other improvements beyond just inlining.

stephentoub · 2024-07-09T15:59:05Z

Thanks.

…t#104506)" This reverts commit 01dbf51.

Use Unsafe.BitCast for Int128 ↔UInt128 operators

82df818

dotnet-issue-labeler bot added the area-System.Numerics label Jul 6, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 6, 2024

build-analysis bot mentioned this pull request Jul 6, 2024

The "RestoreTask" task returned false but did not log an error dotnet/dnceng#3100

Open

3 tasks

stephentoub approved these changes Jul 7, 2024

View reviewed changes

stephentoub assigned tannergooding Jul 9, 2024

stephentoub merged commit 01dbf51 into dotnet:main Jul 9, 2024
146 checks passed

tannergooding approved these changes Jul 9, 2024

View reviewed changes

matouskozak added a commit to matouskozak/runtime that referenced this pull request Jul 11, 2024

Revert "Use Unsafe.BitCast for Int128 ↔UInt128 operators (dotne…

06e0997

…t#104506)" This reverts commit 01dbf51.

LoopedBard3 mentioned this pull request Jul 11, 2024

[Perf] Linux/arm64: 1 Regression in System.Tests.Perf_Int128 #104739

Closed

github-actions bot locked and limited conversation to collaborators Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `Unsafe.BitCast` for `Int128` ↔`UInt128` operators #104506

Use `Unsafe.BitCast` for `Int128` ↔`UInt128` operators #104506

xtqqczze commented Jul 6, 2024

dotnet-policy-service bot commented Jul 6, 2024

tannergooding commented Jul 6, 2024

jakobbotsch commented Jul 7, 2024

stephentoub commented Jul 9, 2024

jakobbotsch commented Jul 9, 2024

tannergooding commented Jul 9, 2024 •

edited by stephentoub

Loading

tannergooding commented Jul 9, 2024

stephentoub commented Jul 9, 2024

Use Unsafe.BitCast for Int128 ↔UInt128 operators #104506

Use Unsafe.BitCast for Int128 ↔UInt128 operators #104506

Conversation

xtqqczze commented Jul 6, 2024

dotnet-policy-service bot commented Jul 6, 2024

tannergooding commented Jul 6, 2024

jakobbotsch commented Jul 7, 2024

stephentoub commented Jul 9, 2024

jakobbotsch commented Jul 9, 2024

tannergooding commented Jul 9, 2024 • edited by stephentoub Loading

tannergooding commented Jul 9, 2024

stephentoub commented Jul 9, 2024

Use `Unsafe.BitCast` for `Int128` ↔`UInt128` operators #104506

Use `Unsafe.BitCast` for `Int128` ↔`UInt128` operators #104506

tannergooding commented Jul 9, 2024 •

edited by stephentoub

Loading