-
Notifications
You must be signed in to change notification settings - Fork 5k
Arm64 SVE: Optimise zero/allbits vectors the same as masks #115566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Fixes dotnet#114443 * IsVectorZero() should allow for all zero vectors and false masks that have been converted to vectors. * IsVectorAllBitsSet() should allow for all bits set vectors and true masks that have been converted to vectors. * IsMaskZero() should all for false masks and all zero vectors that have been converted to masks. * IsMaskAllBitsSet() should allow for true masks and all bit set vectors that have been converted to masks. In addition: * Fix up all the errors caused by these changes. * Add a bunch of asmcheck tests
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Currently there are some issues around Non Faulting LoadVectors and and I need to check I've not created any code size regressions |
So, this should handle the case in #114286? |
Needs more investigation, but it's definitely a step in the right direction. |
Rebased and almost everything is looking good now. Just need to fix |
The predicate optimisations are failing. But I think in a good way.... Consider. Note that a vector (not mask) is returned from the function. static Vector<short> ZipLow()
{
return Sve.ZipLow(Vector<short>.Zero, Sve.CreateTrueMaskInt16());
} HEAD, Tier 0:
HEAD:
With this PR:
Ideally, the optimisation should build up a cost model of using both the vector version and predicate version, taking into account all input arguments and all uses of the result. I feel doing that is adding scope to this PR (especially given the code being produced here is better than HEAD). I suggest for this PR disabling the optimisation by removing the call to |
What are the diffs for this commit alone? |
Prior to removing calls to Diffs are based on 2,631,375 contexts (1,092,633 MinOpts, 1,538,742 FullOpts). Overall (-334,028 bytes)
MinOpts (-31,028 bytes)
FullOpts (-303,000 bytes)
Example diffsbenchmarks.run.linux.arm64.checked.mch-4 (-1.30%) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)@@ -7,33 +7,33 @@
; No matching PGO data
; Final local variable assignments
;
-; V00 this [V00,T09] ( 5, 5 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
-; V01 loc0 [V01,T05] ( 3, 9 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
+; V00 this [V00,T08] ( 5, 5 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0 [V01,T22] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[byte]>
;* V02 loc1 [V02 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V03 loc2 [V03 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V04 loc3 [V04 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
; V05 loc4 [V05,T00] ( 12, 41.50) int -> x1
-; V06 loc5 [V06,T13] ( 3, 6 ) int -> x2 single-def
-; V07 loc6 [V07,T17] ( 3, 5 ) long -> x4
-; V08 loc7 [V08,T18] ( 3, 5 ) long -> x6
+; V06 loc5 [V06,T12] ( 3, 6 ) int -> x2 single-def
+; V07 loc6 [V07,T16] ( 3, 5 ) long -> x4
+; V08 loc7 [V08,T17] ( 3, 5 ) long -> x6
; V09 loc8 [V09 ] ( 1, 0.50) ref -> [fp+0x18] must-init pinned class-hnd single-def <byte[]>
; V10 loc9 [V10 ] ( 1, 0.50) ref -> [fp+0x10] must-init pinned class-hnd single-def <byte[]>
-; V11 loc10 [V11,T08] ( 2, 8 ) ubyte -> x8
+; V11 loc10 [V11,T07] ( 2, 8 ) ubyte -> x8
;# V12 OutArgs [V12 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V13 tmp1 [V13,T15] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
-; V14 tmp2 [V14,T16] ( 5, 5 ) ref -> x6 class-hnd single-def "dup spill" <byte[]>
-; V15 tmp3 [V15,T19] ( 2, 2 ) long -> x4 "Cast away GC"
-; V16 tmp4 [V16,T20] ( 2, 2 ) long -> x6 "Cast away GC"
+; V13 tmp1 [V13,T14] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
+; V14 tmp2 [V14,T15] ( 5, 5 ) ref -> x6 class-hnd single-def "dup spill" <byte[]>
+; V15 tmp3 [V15,T18] ( 2, 2 ) long -> x4 "Cast away GC"
+; V16 tmp4 [V16,T19] ( 2, 2 ) long -> x6 "Cast away GC"
; V17 tmp5 [V17,T01] ( 3, 24 ) ref -> x2 "arr expr"
; V18 tmp6 [V18,T02] ( 3, 24 ) ref -> x6 "arr expr"
-;* V19 tmp7 [V19,T21] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
-;* V20 tmp8 [V20,T22] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
-; V21 cse0 [V21,T06] ( 3, 8.50) int -> x2 "CSE #11: aggressive"
-; V22 cse1 [V22,T07] ( 3, 8.50) int -> x4 "CSE #14: aggressive"
-; V23 cse2 [V23,T14] ( 3, 6 ) int -> x7 "CSE #07: aggressive"
-; V24 cse3 [V24,T12] ( 4, 6.50) int -> x0 "CSE #06: aggressive"
-; V25 cse4 [V25,T10] ( 4, 6.50) ref -> x3 "CSE #01: aggressive"
-; V26 cse5 [V26,T11] ( 4, 6.50) ref -> x5 "CSE #03: aggressive"
+;* V19 tmp7 [V19,T20] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
+;* V20 tmp8 [V20,T21] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
+; V21 cse0 [V21,T05] ( 3, 8.50) int -> x2 "CSE #11: aggressive"
+; V22 cse1 [V22,T06] ( 3, 8.50) int -> x4 "CSE #14: aggressive"
+; V23 cse2 [V23,T13] ( 3, 6 ) int -> x7 "CSE #07: aggressive"
+; V24 cse3 [V24,T11] ( 4, 6.50) int -> x0 "CSE #06: aggressive"
+; V25 cse4 [V25,T09] ( 4, 6.50) ref -> x3 "CSE #01: aggressive"
+; V26 cse5 [V26,T10] ( 4, 6.50) ref -> x5 "CSE #03: aggressive"
; V27 cse6 [V27,T03] ( 3, 12 ) long -> x4 "CSE #08: aggressive"
; V28 cse7 [V28,T04] ( 3, 12 ) long -> x8 "CSE #05: aggressive"
;
@@ -46,7 +46,6 @@ G_M892_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
;; size=12 bbWeight=1 PerfScore 2.50
G_M892_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
; gcrRegs +[x0]
- ptrue p0.b
mov w1, wzr
cntb x2, all
ldr x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
ldr w6, [x5, #0x08]
cmp w4, w6
bne G_M892_IG11
- ;; size=36 bbWeight=1 PerfScore 18.00
+ ;; size=32 bbWeight=1 PerfScore 16.00
G_M892_IG03: ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
mov x4, x3
; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07: ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
G_M892_IG08: ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
sxtw x8, w1
add x9, x4, x8
+ ptrue p0.b
ld1b { z16.b }, p0/z, [x9]
add x8, x6, x8
ld1b { z17.b }, p0/z, [x8]
- ptrue p1.b
- cmpne p1.b, p1/z, z16.b, z17.b
- mov z16.b, p1/z, #1
- ptrue p1.b
- uaddv d16, p1, z16.b
+ cmpne p0.b, p0/z, z16.b, z17.b
+ mov z16.b, p0/z, #1
+ ptrue p0.b
+ uaddv d16, p0, z16.b
umov x8, v16.d[0]
uxtb w8, w8
cmp w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
; ============================================================
Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 77 (0x0004d) Actual length = 308 (0x000134)
+ Function Length : 76 (0x0004c) Actual length = 304 (0x000130)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-1.05%) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)@@ -21,14 +21,13 @@
;# V10 OutArgs [V10 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V11 tmp1 [V11,T05] ( 5, 8 ) ref -> x3 class-hnd single-def "dup spill" <char[]>
;* V12 tmp2 [V12 ] ( 0, 0 ) ushort -> zero-ref "Inlining Arg"
-;* V13 tmp3 [V13 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg" <System.Numerics.Vector`1[short]>
-; V14 tmp4 [V14,T11] ( 2, 2 ) long -> x3 "Cast away GC"
-; V15 tmp5 [V15,T01] ( 3, 24 ) ref -> x3 "arr expr"
-; V16 cse0 [V16,T08] ( 3, 6 ) int -> x4 "CSE #07: aggressive"
-; V17 cse1 [V17,T03] ( 5, 10.25) int -> x0 "CSE #02: aggressive"
-; V18 cse2 [V18,T07] ( 3, 6 ) ref -> x2 "CSE #06: aggressive"
-; V19 cse3 [V19,T04] ( 4, 10 ) int -> x5 "CSE #05: aggressive"
-; V20 cse4 [V20,T10] ( 2, 4.25) mask -> p0 hoist "CSE #03: aggressive"
+; V13 tmp3 [V13,T11] ( 2, 2 ) long -> x3 "Cast away GC"
+; V14 tmp4 [V14,T01] ( 3, 24 ) ref -> x3 "arr expr"
+; V15 cse0 [V15,T08] ( 3, 6 ) int -> x4 "CSE #07: aggressive"
+; V16 cse1 [V16,T03] ( 5, 10.25) int -> x0 "CSE #02: aggressive"
+; V17 cse2 [V17,T07] ( 3, 6 ) ref -> x2 "CSE #06: aggressive"
+; V18 cse3 [V18,T04] ( 4, 10 ) int -> x5 "CSE #05: aggressive"
+; V19 cse4 [V19,T10] ( 2, 4.25) mask -> p0 hoist "CSE #03: aggressive"
;
; Lcl frame size = 16
@@ -62,14 +61,13 @@ G_M34028_IG04: ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
G_M34028_IG05: ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
ldrh w4, [x0, #0x14]
dup v16.8h, w4
- ptrue p0.h
- mov z17.h, p0/z, #1
+ mvni v17.4s, #0
ldr w0, [x0, #0x10]
; gcrRegs -[x0]
cnth x5, all
cmp w0, w5
ble G_M34028_IG10
- ;; size=32 bbWeight=1 PerfScore 15.50
+ ;; size=28 bbWeight=1 PerfScore 12.00
G_M34028_IG06: ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
ptrue p0.h
cmpne p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
; ============================================================
Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 95 (0x0005f) Actual length = 380 (0x00017c)
+ Function Length : 94 (0x0005e) Actual length = 376 (0x000178)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +8 (+2.11%) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)@@ -8,31 +8,31 @@
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
-; V00 this [V00,T06] ( 9, 7 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
+; V00 this [V00,T05] ( 9, 7 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
; V01 loc0 [V01,T02] ( 6, 17.50) int -> x1
; V02 loc1 [V02,T04] ( 5, 10 ) int -> x2 single-def
-; V03 loc2 [V03,T05] ( 4, 10 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
-; V04 loc3 [V04,T01] ( 6, 18 ) mask -> p1 <System.Numerics.Vector`1[byte]>
+;* V03 loc2 [V03,T19] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[byte]>
+; V04 loc3 [V04,T01] ( 6, 18 ) mask -> p0 <System.Numerics.Vector`1[byte]>
; V05 loc4 [V05,T20] ( 4, 13 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
;* V06 loc5 [V06 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V07 loc6 [V07 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
-; V08 loc7 [V08,T11] ( 3, 5 ) long -> x4
-; V09 loc8 [V09,T12] ( 3, 5 ) long -> x5
+; V08 loc7 [V08,T10] ( 3, 5 ) long -> x4
+; V09 loc8 [V09,T11] ( 3, 5 ) long -> x5
; V10 loc9 [V10 ] ( 1, 0.50) ref -> [fp+0x28] must-init pinned class-hnd single-def <byte[]>
; V11 loc10 [V11 ] ( 1, 0.50) ref -> [fp+0x20] must-init pinned class-hnd single-def <byte[]>
-; V12 loc11 [V12,T10] ( 4, 5 ) int -> x3
-; V13 loc12 [V13,T18] ( 3, 1.50) int -> x3 single-def
+; V12 loc11 [V12,T09] ( 4, 5 ) int -> x3
+; V13 loc12 [V13,T17] ( 3, 1.50) int -> x3 single-def
; V14 loc13 [V14,T00] ( 7, 22.50) int -> x4
;# V15 OutArgs [V15 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V16 tmp1 [V16,T08] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
-; V17 tmp2 [V17,T09] ( 5, 5 ) ref -> x5 class-hnd single-def "dup spill" <byte[]>
-; V18 tmp3 [V18,T16] ( 2, 2 ) long -> x4 "Cast away GC"
-; V19 tmp4 [V19,T17] ( 2, 2 ) long -> x5 "Cast away GC"
-; V20 tmp5 [V20,T13] ( 3, 3 ) ref -> x2 single-def "arr expr"
-; V21 tmp6 [V21,T14] ( 3, 3 ) ref -> x0 single-def "arr expr"
-; V22 cse0 [V22,T07] ( 3, 6 ) int -> x3 "CSE #05: aggressive"
-; V23 cse1 [V23,T19] ( 3, 1.50) long -> x3 "CSE #08: moderate"
-; V24 cse2 [V24,T15] ( 4, 2 ) int -> x1 "CSE #07: moderate"
+; V16 tmp1 [V16,T07] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
+; V17 tmp2 [V17,T08] ( 5, 5 ) ref -> x5 class-hnd single-def "dup spill" <byte[]>
+; V18 tmp3 [V18,T15] ( 2, 2 ) long -> x4 "Cast away GC"
+; V19 tmp4 [V19,T16] ( 2, 2 ) long -> x5 "Cast away GC"
+; V20 tmp5 [V20,T12] ( 3, 3 ) ref -> x2 single-def "arr expr"
+; V21 tmp6 [V21,T13] ( 3, 3 ) ref -> x0 single-def "arr expr"
+; V22 cse0 [V22,T06] ( 3, 6 ) int -> x3 "CSE #05: aggressive"
+; V23 cse1 [V23,T18] ( 3, 1.50) long -> x3 "CSE #08: moderate"
+; V24 cse2 [V24,T14] ( 4, 2 ) int -> x1 "CSE #07: moderate"
; V25 cse3 [V25,T03] ( 3, 12 ) long -> x6 "CSE #06: aggressive"
; V26 rat0 [V26,T21] ( 3, 9 ) simd16 -> [fp+0x10] do-not-enreg[S] "SIMDInitTempVar"
;
@@ -47,10 +47,9 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
; gcrRegs +[x0]
mov w1, wzr
cntb x2, all
- ptrue p0.b
ldr w3, [x0, #0x20]
mov w4, wzr
- whilelt p1.b, w4, w3
+ whilelt p0.b, w4, w3
movi v16.4s, #0
ldr x4, [x0, #0x10]
; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
; gcrRegs -[x5]
cmp w4, w5
bne G_M14759_IG14
- ;; size=52 bbWeight=1 PerfScore 24.00
+ ;; size=48 bbWeight=1 PerfScore 22.00
G_M14759_IG03: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
ldr x4, [x0, #0x10]
; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
mov x5, xzr
;; size=4 bbWeight=0.50 PerfScore 0.25
G_M14759_IG07: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
- ptest p0, p1.b
+ ptrue p1.b
+ ptest p1, p0.b
bge G_M14759_IG09
- ;; size=8 bbWeight=1 PerfScore 3.00
+ ;; size=12 bbWeight=1 PerfScore 5.00
G_M14759_IG08: ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
sxtw x6, w1
add x7, x4, x6
- ld1b { z16.b }, p1/z, [x7]
+ ld1b { z16.b }, p0/z, [x7]
add x6, x5, x6
- ld1b { z17.b }, p1/z, [x6]
+ ld1b { z17.b }, p0/z, [x6]
+ ptrue p0.b
+ cmpne p0.b, p0/z, z16.b, z17.b
+ mov z16.b, p0/z, #1
+ ptrue p0.b
+ cmpne p0.b, p0/z, z16.b, #0
ptrue p1.b
- cmpne p1.b, p1/z, z16.b, z17.b
- mov z16.b, p1/z, #1
- ptrue p1.b
- cmpne p1.b, p1/z, z16.b, #0
- ptest p0, p1.b
+ ptest p1, p0.b
bne G_M14759_IG09
add w1, w1, w2
- whilelt p1.b, w1, w3
- ptest p0, p1.b
+ whilelt p0.b, w1, w3
+ ptrue p1.b
+ ptest p1, p0.b
blt G_M14759_IG08
- ;; size=64 bbWeight=4 PerfScore 152.00
+ ;; size=72 bbWeight=4 PerfScore 168.00
G_M14759_IG09: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
mov w3, wzr
mov w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
; ============================================================
Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 95 (0x0005f) Actual length = 380 (0x00017c)
+ Function Length : 97 (0x00061) Actual length = 388 (0x000184)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +8 (+5.26%) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)@@ -8,19 +8,20 @@
; Final local variable assignments
;
; V00 this [V00,T05] ( 4, 4 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrLen>
-; V01 loc0 [V01,T04] ( 3, 7 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
+; V01 loc0 [V01,T11] ( 2, 3 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
;* V02 loc1 [V02 ] ( 0, 0 ) mask -> zero-ref <System.Numerics.Vector`1[byte]>
-; V03 loc2 [V03,T10] ( 5, 13 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
+; V03 loc2 [V03,T10] ( 5, 13 ) simd16 -> d17 <System.Numerics.Vector`1[byte]>
; V04 loc3 [V04,T00] ( 6, 18 ) long -> x1
; V05 loc4 [V05,T07] ( 2, 5 ) long -> x2 single-def
-; V06 loc5 [V06,T01] ( 5, 12 ) mask -> p1 <System.Numerics.Vector`1[byte]>
-; V07 loc6 [V07,T03] ( 4, 7 ) long -> x0
+; V06 loc5 [V06,T01] ( 5, 12 ) mask -> p0 <System.Numerics.Vector`1[byte]>
+; V07 loc6 [V07,T04] ( 4, 7 ) long -> x0
; V08 loc7 [V08 ] ( 1, 1 ) ref -> [fp+0x18] must-init pinned class-hnd single-def <byte[]>
;# V09 OutArgs [V09 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V10 tmp1 [V10,T02] ( 5, 8 ) ref -> x0 class-hnd single-def "dup spill" <byte[]>
; V11 tmp2 [V11,T08] ( 2, 2 ) long -> x0 "Cast away GC"
; V12 cse0 [V12,T06] ( 3, 6 ) int -> x3 "CSE #02: aggressive"
-; V13 cse1 [V13,T09] ( 2, 1 ) int -> x4 "CSE #01: moderate"
+; V13 cse1 [V13,T03] ( 3, 8 ) mask -> p1 "CSE #03: aggressive"
+; V14 cse2 [V14,T09] ( 2, 1 ) int -> x4 "CSE #01: moderate"
;
; Lcl frame size = 16
@@ -31,16 +32,16 @@ G_M60402_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=12 bbWeight=1 PerfScore 2.50
G_M60402_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
; gcrRegs +[x0]
- ptrue p0.b
+ mvni v16.4s, #0
mov x1, xzr
cntb x2, all
ldr w3, [x0, #0x18]
mov w4, wzr
- whilelt p1.b, w4, w3
+ whilelt p0.b, w4, w3
ldr x0, [x0, #0x08]
str x0, [fp, #0x18] // [V08 loc7]
cbz x0, G_M60402_IG04
- ;; size=36 bbWeight=1 PerfScore 15.00
+ ;; size=36 bbWeight=1 PerfScore 13.50
G_M60402_IG03: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
ldr w4, [x0, #0x08]
cbz w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
mov x0, xzr
;; size=4 bbWeight=0.50 PerfScore 0.25
G_M60402_IG05: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
- ld1b { z16.b }, p1/z, [x0]
- movi v17.4s, #0
+ ld1b { z17.b }, p0/z, [x0]
+ ptrue p1.b
+ cmpne p1.b, p1/z, z16.b, #0
+ movi v16.4s, #0
ptrue p2.b
- cmpeq p2.b, p2/z, z16.b, z17.b
- ptest p0, p2.b
+ cmpeq p2.b, p2/z, z17.b, z16.b
+ ptest p1, p2.b
bne G_M60402_IG07
- ;; size=24 bbWeight=2 PerfScore 33.00
+ ;; size=32 bbWeight=2 PerfScore 43.00
G_M60402_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
add x1, x1, x2
- whilelt p1.b, w1, w3
+ whilelt p0.b, w1, w3
add x4, x0, x1
- ld1b { z16.b }, p1/z, [x4]
- movi v17.4s, #0
+ ld1b { z17.b }, p0/z, [x4]
+ movi v16.4s, #0
ptrue p2.b
- cmpeq p2.b, p2/z, z16.b, z17.b
- ptest p0, p2.b
+ cmpeq p2.b, p2/z, z17.b, z16.b
+ ptest p1, p2.b
beq G_M60402_IG06
;; size=36 bbWeight=4 PerfScore 78.00
G_M60402_IG07: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- ptrue p0.b
- cmpne p0.b, p0/z, z16.b, #0
- cntp x0, p1, p0.b
+ ptrue p1.b
+ cmpne p1.b, p1/z, z17.b, #0
+ cntp x0, p0, p1.b
add x0, x0, x1
;; size=16 bbWeight=1 PerfScore 7.50
G_M60402_IG08: ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08: ; bbWeight=1, epilog, nogc, extend
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
; ============================================================
Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 38 (0x00026) Actual length = 152 (0x000098)
+ Function Length : 40 (0x00028) Actual length = 160 (0x0000a0)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) coreclr_tests.run.linux.arm64.checked.mch-28 (-58.33%) : 358603.dasm - PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)@@ -17,22 +17,15 @@ G_M44742_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M44742_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- ptrue p1.h
- ptrue p2.h
- ptrue p3.h
- bic p1.b, p3/z, p1.b, p2.b
- pfalse p2.b
- sel p0.b, p0, p1.b, p2.b
- mov z0.h, p0/z, #1
- ;; size=32 bbWeight=1 PerfScore 16.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M44742_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; END METHOD PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short]
-; Total bytes of code 48, prolog size 8, PerfScore 19.50, instruction count 12, allocated bytes for code 48 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
; ============================================================
Unwind Info:
@@ -43,7 +36,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 12 (0x0000c) Actual length = 48 (0x000030)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -24 (-54.55%) : 358606.dasm - PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)@@ -17,21 +17,15 @@ G_M19455_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M19455_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- ptrue p0.s
- movi v0.4s, #0
- cmpne p0.s, p0/z, z0.s, #0
- pfalse p1.b
- ptrue p2.s
- sel p0.b, p0, p1.b, p2.b
- mov z0.s, p0/z, #1
- ;; size=28 bbWeight=1 PerfScore 13.50
+ mvni v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M19455_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; END METHOD PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int]
-; Total bytes of code 44, prolog size 8, PerfScore 17.00, instruction count 11, allocated bytes for code 44 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
; ============================================================
Unwind Info:
@@ -42,7 +36,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 11 (0x0000b) Actual length = 44 (0x00002c)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -28 (-43.75%) : 679474.dasm - Runtime_1068867:TestEntryPoint() (FullOpts)@@ -16,7 +16,6 @@
;* V05 tmp1 [V05 ] ( 0, 0 ) long -> zero-ref class-hnd exact "NewObj constructor temp" <C0>
;* V06 tmp2 [V06 ] ( 0, 0 ) simd16 -> zero-ref "location for address-of(RValue)"
;* V07 tmp3 [V07 ] ( 0, 0 ) struct (16) zero-ref do-not-enreg[SF] "stack allocated C0" <C0>
-; V08 cse0 [V08,T00] ( 3, 3 ) mask -> p0 "CSE #01: aggressive"
;
; Lcl frame size = 0
@@ -24,28 +23,19 @@ G_M538_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
stp fp, lr, [sp, #-0x10]!
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
-G_M538_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
- ptrue p0.s
+G_M538_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
movi v0.4s, #0
- cmpne p0.s, p0/z, z0.s, #0
- movi v0.4s, #0
- ldr q16, [@RWD00]
- sel z0.s, p0, z0.s, z16.s
- movi v16.4s, #0
- sel z0.s, p0, z0.s, z16.s
movz x0, #0xD1FFAB1E // code for <unknown method>
movk x0, #0xD1FFAB1E LSL #16
movk x0, #0xD1FFAB1E LSL #32
ldr x0, [x0]
- ;; size=48 bbWeight=1 PerfScore 17.00
+ ;; size=20 bbWeight=1 PerfScore 5.00
G_M538_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
br x0
;; size=8 bbWeight=1 PerfScore 2.00
-RWD00 dq 0000000000000001h, 0000000000000000h
-
-; Total bytes of code 64, prolog size 8, PerfScore 20.50, instruction count 16, allocated bytes for code 64 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
+; Total bytes of code 36, prolog size 8, PerfScore 8.50, instruction count 9, allocated bytes for code 36 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
; ============================================================
Unwind Info:
@@ -56,7 +46,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 16 (0x00010) Actual length = 64 (0x000040)
+ Function Length : 9 (0x00009) Actual length = 36 (0x000024)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +16 (+2.53%) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)@@ -9,12 +9,12 @@
; Final local variable assignments
;
; V00 this [V00,T02] ( 4, 4 ) ref -> x19 this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong>
-;* V01 loc0 [V01,T30] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[ushort]>
-; V02 loc1 [V02,T29] ( 3, 3 ) mask -> [fp+0x10] spill-single-def <System.Numerics.Vector`1[ushort]>
-; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[ulong]>
+;* V01 loc0 [V01,T34] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[ushort]>
+; V02 loc1 [V02,T32] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[ushort]>
+; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d10 <System.Numerics.Vector`1[ulong]>
;# V04 OutArgs [V04 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V05 tmp1 [V05,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
-; V06 tmp2 [V06,T32] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V05 tmp1 [V05,T30] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V06 tmp2 [V06,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
; V07 tmp3 [V07,T18] ( 2, 4 ) long -> x21 "impAppendStmt"
; V08 tmp4 [V08,T19] ( 2, 4 ) long -> x22 "impAppendStmt"
; V09 tmp5 [V09,T20] ( 2, 4 ) long -> x23 "impAppendStmt"
@@ -51,21 +51,23 @@
;* V40 tmp36 [V40 ] ( 0, 0 ) long -> zero-ref ld-addr-op "Inline stloc first use temp"
; V41 tmp37 [V41,T28] ( 2, 4 ) long -> x0 "Inlining Arg"
; V42 tmp38 [V42,T17] ( 3, 6 ) long -> x4 "Inlining Arg"
-; V43 cse0 [V43,T00] ( 9, 9 ) byref -> x20 "CSE #02: aggressive"
+; V43 cse0 [V43,T29] ( 3, 3 ) mask -> [fp+0x18] spill-single-def "CSE #02: moderate"
+; V44 cse1 [V44,T00] ( 9, 9 ) byref -> x20 "CSE #01: aggressive"
;
-; Lcl frame size = 8
+; Lcl frame size = 16
G_M33034_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- stp fp, lr, [sp, #-0x60]!
- stp d8, d9, [sp, #0x18]
- stp d10, d11, [sp, #0x28]
- stp x19, x20, [sp, #0x38]
- stp x21, x22, [sp, #0x48]
- str x23, [sp, #0x58]
+ stp fp, lr, [sp, #-0x70]!
+ stp d8, d9, [sp, #0x20]
+ stp d10, d11, [sp, #0x30]
+ str d12, [sp, #0x40]
+ stp x19, x20, [sp, #0x48]
+ stp x21, x22, [sp, #0x58]
+ str x23, [sp, #0x68]
mov fp, sp
mov x19, x0
; gcrRegs +[x19]
- ;; size=32 bbWeight=1 PerfScore 7.00
+ ;; size=36 bbWeight=1 PerfScore 8.00
G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
ldr x1, [x1]
blr x1
; gcrRegs -[x0]
- ptrue p0.h
- add xip1, fp, #16
- str p0, [xip1]
+ mvni v8.4s, #0
add x20, x19, #96
; byrRegs +[x20]
mov x21, x20
@@ -99,22 +99,6 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- blr x1
- ; byrRegs -[x0]
- ldr x1, [x21, #0x20]
- add x0, x0, x1
- sub x0, x0, #1
- sub x1, x1, #1
- bic x0, x0, x1
- ptrue p0.d
- ld1d { z8.d }, p0/z, [x0]
- mov x21, x20
- add x0, x21, #48
- ; byrRegs +[x0]
- movz x1, #0xD1FFAB1E // code for <unknown method>
- movk x1, #0xD1FFAB1E LSL #16
- movk x1, #0xD1FFAB1E LSL #32
- ldr x1, [x1]
mov v9.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
- ld1h { z10.h }, p0/z, [x0]
+ ptrue p0.d
+ ld1d { z10.d }, p0/z, [x0]
mov x21, x20
- add x0, x21, #56
+ add x0, x21, #48
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
- ld1h { z7.h }, p0/z, [x0]
+ ptrue p0.h
mov v8.d[1], v9.d[0]
- mov v10.d[1], v11.d[0]
- udot z8.d, z10.h, z7.h[1]
+ cmpne p0.h, p0/z, z8.h, #0
+ add xip1, fp, #24
+ str p0, [xip1]
+ ld1h { z8.h }, p0/z, [x0]
mov x21, x20
- add x0, x21, #64
+ add x0, x21, #56
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- mov v9.d[0], v8.d[1]
+ mov v12.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
ldr x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- mov v8.d[1], v9.d[0]
- str q8, [x0]
+ add xip1, fp, #24
+ ldr p0, [xip1]
+ ld1h { z7.h }, p0/z, [x0]
+ mov v10.d[1], v11.d[0]
+ mov v8.d[1], v12.d[0]
+ udot z10.d, z8.h, z7.h[1]
+ mov x21, x20
+ add x0, x21, #64
+ ; byrRegs +[x0]
+ movz x1, #0xD1FFAB1E // code for <unknown method>
+ movk x1, #0xD1FFAB1E LSL #16
+ movk x1, #0xD1FFAB1E LSL #32
+ ldr x1, [x1]
+ mov v8.d[0], v10.d[1]
+ blr x1
+ ; byrRegs -[x0]
+ ldr x1, [x21, #0x20]
+ add x0, x0, x1
+ sub x0, x0, #1
+ sub x1, x1, #1
+ bic x0, x0, x1
+ mov v10.d[1], v8.d[0]
+ str q10, [x0]
mov x21, x20
add x0, x21, #40
; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x6, #0xD1FFAB1E LSL #16
movk x6, #0xD1FFAB1E LSL #32
ldr x6, [x6]
- ;; size=572 bbWeight=1 PerfScore 168.50
+ ;; size=580 bbWeight=1 PerfScore 168.50
G_M33034_IG03: ; bbWeight=1, epilog, nogc, extend
- ldr x23, [sp, #0x58]
- ldp x21, x22, [sp, #0x48]
- ldp x19, x20, [sp, #0x38]
- ldp d10, d11, [sp, #0x28]
- ldp d8, d9, [sp, #0x18]
- ldp fp, lr, [sp], #0x60
+ ldr x23, [sp, #0x68]
+ ldp x21, x22, [sp, #0x58]
+ ldp x19, x20, [sp, #0x48]
+ ldr d12, [sp, #0x40]
+ ldp d10, d11, [sp, #0x30]
+ ldp d8, d9, [sp, #0x20]
+ ldp fp, lr, [sp], #0x70
br x6
- ;; size=28 bbWeight=1 PerfScore 8.00
+ ;; size=32 bbWeight=1 PerfScore 10.00
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
; ============================================================
Unwind Info:
>> Start offset : 0x000000 (not in unwind data)
>> End offset : 0xd1ffab1e (not in unwind data)
- Code Words : 3
+ Code Words : 4
Epilog Count : 1
E bit : 0
X bit : 0
Vers : 0
- Function Length : 158 (0x0009e) Actual length = 632 (0x000278)
+ Function Length : 162 (0x000a2) Actual length = 648 (0x000288)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
---- Unwind codes ----
E1 set_fp; mov fp, sp
---- Epilog start at index 1 ----
- D1 0B save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+ D1 0D save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
E6 save_next
- C8 07 save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+ C8 09 save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+ DD 08 save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
E6 save_next
- D8 03 save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
- 8B save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+ D8 04 save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+ 8D save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+ E4 end
+ E4 end
E4 end
E4 end
+16 (+2.53%) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)@@ -9,12 +9,12 @@
; Final local variable assignments
;
; V00 this [V00,T02] ( 4, 4 ) ref -> x19 this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int>
-;* V01 loc0 [V01,T30] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[sbyte]>
-; V02 loc1 [V02,T29] ( 3, 3 ) mask -> [fp+0x10] spill-single-def <System.Numerics.Vector`1[sbyte]>
-; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[int]>
+;* V01 loc0 [V01,T34] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[sbyte]>
+; V02 loc1 [V02,T32] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[sbyte]>
+; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d10 <System.Numerics.Vector`1[int]>
;# V04 OutArgs [V04 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V05 tmp1 [V05,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
-; V06 tmp2 [V06,T32] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V05 tmp1 [V05,T30] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V06 tmp2 [V06,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
; V07 tmp3 [V07,T18] ( 2, 4 ) long -> x21 "impAppendStmt"
; V08 tmp4 [V08,T19] ( 2, 4 ) long -> x22 "impAppendStmt"
; V09 tmp5 [V09,T20] ( 2, 4 ) long -> x23 "impAppendStmt"
@@ -51,21 +51,23 @@
;* V40 tmp36 [V40 ] ( 0, 0 ) long -> zero-ref ld-addr-op "Inline stloc first use temp"
; V41 tmp37 [V41,T28] ( 2, 4 ) long -> x0 "Inlining Arg"
; V42 tmp38 [V42,T17] ( 3, 6 ) long -> x4 "Inlining Arg"
-; V43 cse0 [V43,T00] ( 9, 9 ) byref -> x20 "CSE #02: aggressive"
+; V43 cse0 [V43,T29] ( 3, 3 ) mask -> [fp+0x18] spill-single-def "CSE #02: moderate"
+; V44 cse1 [V44,T00] ( 9, 9 ) byref -> x20 "CSE #01: aggressive"
;
-; Lcl frame size = 8
+; Lcl frame size = 16
G_M55930_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- stp fp, lr, [sp, #-0x60]!
- stp d8, d9, [sp, #0x18]
- stp d10, d11, [sp, #0x28]
- stp x19, x20, [sp, #0x38]
- stp x21, x22, [sp, #0x48]
- str x23, [sp, #0x58]
+ stp fp, lr, [sp, #-0x70]!
+ stp d8, d9, [sp, #0x20]
+ stp d10, d11, [sp, #0x30]
+ str d12, [sp, #0x40]
+ stp x19, x20, [sp, #0x48]
+ stp x21, x22, [sp, #0x58]
+ str x23, [sp, #0x68]
mov fp, sp
mov x19, x0
; gcrRegs +[x19]
- ;; size=32 bbWeight=1 PerfScore 7.00
+ ;; size=36 bbWeight=1 PerfScore 8.00
G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
ldr x1, [x1]
blr x1
; gcrRegs -[x0]
- ptrue p0.b
- add xip1, fp, #16
- str p0, [xip1]
+ mvni v8.4s, #0
add x20, x19, #96
; byrRegs +[x20]
mov x21, x20
@@ -99,22 +99,6 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- blr x1
- ; byrRegs -[x0]
- ldr x1, [x21, #0x20]
- add x0, x0, x1
- sub x0, x0, #1
- sub x1, x1, #1
- bic x0, x0, x1
- ptrue p0.s
- ld1w { z8.s }, p0/z, [x0]
- mov x21, x20
- add x0, x21, #48
- ; byrRegs +[x0]
- movz x1, #0xD1FFAB1E // code for <unknown method>
- movk x1, #0xD1FFAB1E LSL #16
- movk x1, #0xD1FFAB1E LSL #32
- ldr x1, [x1]
mov v9.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
- ld1b { z10.b }, p0/z, [x0]
+ ptrue p0.s
+ ld1w { z10.s }, p0/z, [x0]
mov x21, x20
- add x0, x21, #56
+ add x0, x21, #48
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
- ld1b { z16.b }, p0/z, [x0]
+ ptrue p0.b
mov v8.d[1], v9.d[0]
- mov v10.d[1], v11.d[0]
- sdot z8.s, z10.b, z16.b
+ cmpne p0.b, p0/z, z8.b, #0
+ add xip1, fp, #24
+ str p0, [xip1]
+ ld1b { z8.b }, p0/z, [x0]
mov x21, x20
- add x0, x21, #64
+ add x0, x21, #56
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- mov v9.d[0], v8.d[1]
+ mov v12.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
ldr x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- mov v8.d[1], v9.d[0]
- str q8, [x0]
+ add xip1, fp, #24
+ ldr p0, [xip1]
+ ld1b { z16.b }, p0/z, [x0]
+ mov v10.d[1], v11.d[0]
+ mov v8.d[1], v12.d[0]
+ sdot z10.s, z8.b, z16.b
+ mov x21, x20
+ add x0, x21, #64
+ ; byrRegs +[x0]
+ movz x1, #0xD1FFAB1E // code for <unknown method>
+ movk x1, #0xD1FFAB1E LSL #16
+ movk x1, #0xD1FFAB1E LSL #32
+ ldr x1, [x1]
+ mov v8.d[0], v10.d[1]
+ blr x1
+ ; byrRegs -[x0]
+ ldr x1, [x21, #0x20]
+ add x0, x0, x1
+ sub x0, x0, #1
+ sub x1, x1, #1
+ bic x0, x0, x1
+ mov v10.d[1], v8.d[0]
+ str q10, [x0]
mov x21, x20
add x0, x21, #40
; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x6, #0xD1FFAB1E LSL #16
movk x6, #0xD1FFAB1E LSL #32
ldr x6, [x6]
- ;; size=572 bbWeight=1 PerfScore 168.50
+ ;; size=580 bbWeight=1 PerfScore 168.50
G_M55930_IG03: ; bbWeight=1, epilog, nogc, extend
- ldr x23, [sp, #0x58]
- ldp x21, x22, [sp, #0x48]
- ldp x19, x20, [sp, #0x38]
- ldp d10, d11, [sp, #0x28]
- ldp d8, d9, [sp, #0x18]
- ldp fp, lr, [sp], #0x60
+ ldr x23, [sp, #0x68]
+ ldp x21, x22, [sp, #0x58]
+ ldp x19, x20, [sp, #0x48]
+ ldr d12, [sp, #0x40]
+ ldp d10, d11, [sp, #0x30]
+ ldp d8, d9, [sp, #0x20]
+ ldp fp, lr, [sp], #0x70
br x6
- ;; size=28 bbWeight=1 PerfScore 8.00
+ ;; size=32 bbWeight=1 PerfScore 10.00
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
; ============================================================
Unwind Info:
>> Start offset : 0x000000 (not in unwind data)
>> End offset : 0xd1ffab1e (not in unwind data)
- Code Words : 3
+ Code Words : 4
Epilog Count : 1
E bit : 0
X bit : 0
Vers : 0
- Function Length : 158 (0x0009e) Actual length = 632 (0x000278)
+ Function Length : 162 (0x000a2) Actual length = 648 (0x000288)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
---- Unwind codes ----
E1 set_fp; mov fp, sp
---- Epilog start at index 1 ----
- D1 0B save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+ D1 0D save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
E6 save_next
- C8 07 save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+ C8 09 save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+ DD 08 save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
E6 save_next
- D8 03 save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
- 8B save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+ D8 04 save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+ 8D save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+ E4 end
+ E4 end
E4 end
E4 end
+16 (+2.53%) : 569192.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)@@ -9,12 +9,12 @@
; Final local variable assignments
;
; V00 this [V00,T02] ( 4, 4 ) ref -> x19 this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort>
-;* V01 loc0 [V01,T30] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[ushort]>
-; V02 loc1 [V02,T29] ( 3, 3 ) mask -> [fp+0x10] spill-single-def <System.Numerics.Vector`1[ushort]>
-; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[ushort]>
+;* V01 loc0 [V01,T34] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[ushort]>
+; V02 loc1 [V02,T32] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[ushort]>
+; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d10 <System.Numerics.Vector`1[ushort]>
;# V04 OutArgs [V04 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V05 tmp1 [V05,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
-; V06 tmp2 [V06,T32] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V05 tmp1 [V05,T30] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V06 tmp2 [V06,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
; V07 tmp3 [V07,T18] ( 2, 4 ) long -> x21 "impAppendStmt"
; V08 tmp4 [V08,T19] ( 2, 4 ) long -> x22 "impAppendStmt"
; V09 tmp5 [V09,T20] ( 2, 4 ) long -> x23 "impAppendStmt"
@@ -51,21 +51,23 @@
;* V40 tmp36 [V40 ] ( 0, 0 ) long -> zero-ref ld-addr-op "Inline stloc first use temp"
; V41 tmp37 [V41,T28] ( 2, 4 ) long -> x0 "Inlining Arg"
; V42 tmp38 [V42,T17] ( 3, 6 ) long -> x4 "Inlining Arg"
-; V43 cse0 [V43,T00] ( 9, 9 ) byref -> x20 "CSE #02: aggressive"
+; V43 cse0 [V43,T29] ( 3, 3 ) mask -> [fp+0x18] spill-single-def "CSE #02: moderate"
+; V44 cse1 [V44,T00] ( 9, 9 ) byref -> x20 "CSE #01: aggressive"
;
-; Lcl frame size = 8
+; Lcl frame size = 16
G_M13407_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- stp fp, lr, [sp, #-0x60]!
- stp d8, d9, [sp, #0x18]
- stp d10, d11, [sp, #0x28]
- stp x19, x20, [sp, #0x38]
- stp x21, x22, [sp, #0x48]
- str x23, [sp, #0x58]
+ stp fp, lr, [sp, #-0x70]!
+ stp d8, d9, [sp, #0x20]
+ stp d10, d11, [sp, #0x30]
+ str d12, [sp, #0x40]
+ stp x19, x20, [sp, #0x48]
+ stp x21, x22, [sp, #0x58]
+ str x23, [sp, #0x68]
mov fp, sp
mov x19, x0
; gcrRegs +[x19]
- ;; size=32 bbWeight=1 PerfScore 7.00
+ ;; size=36 bbWeight=1 PerfScore 8.00
G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
ldr x1, [x1]
blr x1
; gcrRegs -[x0]
- ptrue p0.h
- add xip1, fp, #16
- str p0, [xip1]
+ mvni v8.4s, #0
add x20, x19, #96
; byrRegs +[x20]
mov x21, x20
@@ -99,22 +99,6 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- blr x1
- ; byrRegs -[x0]
- ldr x1, [x21, #0x20]
- add x0, x0, x1
- sub x0, x0, #1
- sub x1, x1, #1
- bic x0, x0, x1
- ptrue p0.h
- ld1h { z8.h }, p0/z, [x0]
- mov x21, x20
- add x0, x21, #48
- ; byrRegs +[x0]
- movz x1, #0xD1FFAB1E // code for <unknown method>
- movk x1, #0xD1FFAB1E LSL #16
- movk x1, #0xD1FFAB1E LSL #32
- ldr x1, [x1]
mov v9.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
+ ptrue p0.h
ld1h { z10.h }, p0/z, [x0]
mov x21, x20
- add x0, x21, #56
+ add x0, x21, #48
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
- ld1h { z16.h }, p0/z, [x0]
+ ptrue p0.h
mov v8.d[1], v9.d[0]
- mov v10.d[1], v11.d[0]
- eor3 z8.d, z8.d, z10.d, z16.d
+ cmpne p0.h, p0/z, z8.h, #0
+ add xip1, fp, #24
+ str p0, [xip1]
+ ld1h { z8.h }, p0/z, [x0]
mov x21, x20
- add x0, x21, #64
+ add x0, x21, #56
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- mov v9.d[0], v8.d[1]
+ mov v12.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
ldr x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- mov v8.d[1], v9.d[0]
- str q8, [x0]
+ add xip1, fp, #24
+ ldr p0, [xip1]
+ ld1h { z16.h }, p0/z, [x0]
+ mov v10.d[1], v11.d[0]
+ mov v8.d[1], v12.d[0]
+ eor3 z10.d, z10.d, z8.d, z16.d
+ mov x21, x20
+ add x0, x21, #64
+ ; byrRegs +[x0]
+ movz x1, #0xD1FFAB1E // code for <unknown method>
+ movk x1, #0xD1FFAB1E LSL #16
+ movk x1, #0xD1FFAB1E LSL #32
+ ldr x1, [x1]
+ mov v8.d[0], v10.d[1]
+ blr x1
+ ; byrRegs -[x0]
+ ldr x1, [x21, #0x20]
+ add x0, x0, x1
+ sub x0, x0, #1
+ sub x1, x1, #1
+ bic x0, x0, x1
+ mov v10.d[1], v8.d[0]
+ str q10, [x0]
mov x21, x20
add x0, x21, #40
; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x6, #0xD1FFAB1E LSL #16
movk x6, #0xD1FFAB1E LSL #32
ldr x6, [x6]
- ;; size=572 bbWeight=1 PerfScore 166.50
+ ;; size=580 bbWeight=1 PerfScore 166.50
G_M13407_IG03: ; bbWeight=1, epilog, nogc, extend
- ldr x23, [sp, #0x58]
- ldp x21, x22, [sp, #0x48]
- ldp x19, x20, [sp, #0x38]
- ldp d10, d11, [sp, #0x28]
- ldp d8, d9, [sp, #0x18]
- ldp fp, lr, [sp], #0x60
+ ldr x23, [sp, #0x68]
+ ldp x21, x22, [sp, #0x58]
+ ldp x19, x20, [sp, #0x48]
+ ldr d12, [sp, #0x40]
+ ldp d10, d11, [sp, #0x30]
+ ldp d8, d9, [sp, #0x20]
+ ldp fp, lr, [sp], #0x70
br x6
- ;; size=28 bbWeight=1 PerfScore 8.00
+ ;; size=32 bbWeight=1 PerfScore 10.00
-; Total bytes of code 632, prolog size 28, PerfScore 181.50, instruction count 158, allocated bytes for code 632 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 184.50, instruction count 162, allocated bytes for code 648 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
; ============================================================
Unwind Info:
>> Start offset : 0x000000 (not in unwind data)
>> End offset : 0xd1ffab1e (not in unwind data)
- Code Words : 3
+ Code Words : 4
Epilog Count : 1
E bit : 0
X bit : 0
Vers : 0
- Function Length : 158 (0x0009e) Actual length = 632 (0x000278)
+ Function Length : 162 (0x000a2) Actual length = 648 (0x000288)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
---- Unwind codes ----
E1 set_fp; mov fp, sp
---- Epilog start at index 1 ----
- D1 0B save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+ D1 0D save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
E6 save_next
- C8 07 save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+ C8 09 save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+ DD 08 save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
E6 save_next
- D8 03 save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
- 8B save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+ D8 04 save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+ 8D save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+ E4 end
+ E4 end
E4 end
E4 end
benchmarks.run_pgo.linux.arm64.checked.mch-4 (-0.85%) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)@@ -36,8 +36,7 @@ G_M60402_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
G_M60402_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov w0, #0xD1FFAB1E
str w0, [fp, #0x20] // [V11 tmp2]
- ptrue p0.b
- mov z16.b, p0/z, #1
+ mvni v16.4s, #0
str q16, [fp, #0x80] // [V01 loc0]
str xzr, [fp, #0x58] // [V04 loc3]
cntb x0, all
@@ -62,7 +61,7 @@ G_M60402_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
ldr w0, [x0, #0x08]
; gcrRegs -[x0]
cbnz w0, G_M60402_IG05
- ;; size=96 bbWeight=1 PerfScore 40.50
+ ;; size=92 bbWeight=1 PerfScore 37.00
G_M60402_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -179,7 +178,7 @@ G_M60402_IG11: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 472, prolog size 36, PerfScore 168.27, instruction count 118, allocated bytes for code 472 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
+; Total bytes of code 468, prolog size 36, PerfScore 164.77, instruction count 117, allocated bytes for code 468 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
; ============================================================
Unwind Info:
@@ -190,7 +189,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 118 (0x00076) Actual length = 472 (0x0001d8)
+ Function Length : 117 (0x00075) Actual length = 468 (0x0001d4)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-0.58%) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)@@ -229,8 +229,7 @@ G_M22667_IG17: ; bbWeight=0.01, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
;; size=12 bbWeight=0.01 PerfScore 0.02
G_M22667_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
ptrue p0.h
- mov z16.h, p0/z, #1
- ptrue p0.h
+ mvni v16.4s, #0
cmpne p0.h, p0/z, z16.h, #0
ptrue p1.h
ldr q16, [fp, #0x50] // [V05 loc4]
@@ -249,7 +248,7 @@ G_M22667_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
bl CORINFO_HELP_COUNTPROFILE32
; gcr arg pop 0
movn w0, #0
- ;; size=76 bbWeight=1 PerfScore 25.50
+ ;; size=72 bbWeight=1 PerfScore 22.00
G_M22667_IG19: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x90
ret lr
@@ -265,7 +264,7 @@ G_M22667_IG21: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 688, prolog size 36, PerfScore 211.04, instruction count 172, allocated bytes for code 688 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
+; Total bytes of code 684, prolog size 36, PerfScore 207.54, instruction count 171, allocated bytes for code 684 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
; ============================================================
Unwind Info:
@@ -276,7 +275,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 172 (0x000ac) Actual length = 688 (0x0002b0)
+ Function Length : 171 (0x000ab) Actual length = 684 (0x0002ac)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-0.49%) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)@@ -95,8 +95,7 @@ G_M34028_IG06: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
ldr x2, [x2]
blr x2
; gcr arg pop 0
- ptrue p0.h
- mov z0.h, p0/z, #1
+ mvni v0.4s, #0
movz x0, #0xD1FFAB1E // code for <unknown method>
movk x0, #0xD1FFAB1E LSL #16
movk x0, #0xD1FFAB1E LSL #32
@@ -105,7 +104,7 @@ G_M34028_IG06: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcr arg pop 0
str q0, [fp, #0x50] // [V05 loc4]
b G_M34028_IG16
- ;; size=68 bbWeight=1 PerfScore 22.50
+ ;; size=64 bbWeight=1 PerfScore 19.00
G_M34028_IG07: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
ldr w0, [fp, #0x84] // [V01 loc0]
sxtw x0, w0
@@ -317,7 +316,7 @@ G_M34028_IG27: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 812, prolog size 36, PerfScore 237.56, instruction count 203, allocated bytes for code 812 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
+; Total bytes of code 808, prolog size 36, PerfScore 234.06, instruction count 202, allocated bytes for code 808 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
; ============================================================
Unwind Info:
@@ -328,7 +327,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 203 (0x000cb) Actual length = 812 (0x00032c)
+ Function Length : 202 (0x000ca) Actual length = 808 (0x000328)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-0.38%) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)@@ -49,8 +49,7 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
str wzr, [fp, #0xC4] // [V01 loc0]
cntb x0, all
str w0, [fp, #0xC0] // [V02 loc1]
- ptrue p0.b
- mov z16.b, p0/z, #1
+ mvni v16.4s, #0
str q16, [fp, #0xB0] // [V03 loc2]
ldr x0, [fp, #0xC8] // [V00 this]
; gcrRegs +[x0]
@@ -86,7 +85,7 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
ldr w0, [x0, #0x08]
; gcrRegs -[x0]
cbnz w0, G_M14759_IG05
- ;; size=136 bbWeight=1 PerfScore 59.50
+ ;; size=132 bbWeight=1 PerfScore 56.00
G_M14759_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -394,7 +393,7 @@ G_M14759_IG27: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 1052, prolog size 44, PerfScore 348.29, instruction count 263, allocated bytes for code 1052 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
+; Total bytes of code 1048, prolog size 44, PerfScore 344.79, instruction count 262, allocated bytes for code 1048 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
; ============================================================
Unwind Info:
@@ -405,7 +404,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 263 (0x00107) Actual length = 1052 (0x00041c)
+ Function Length : 262 (0x00106) Actual length = 1048 (0x000418)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-0.43%) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)@@ -41,8 +41,7 @@ G_M892_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
G_M892_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov w0, #0xD1FFAB1E
str w0, [fp, #0x28] // [V15 tmp3]
- ptrue p0.b
- mov z16.b, p0/z, #1
+ mvni v16.4s, #0
str q16, [fp, #0xA0] // [V01 loc0]
str wzr, [fp, #0x6C] // [V05 loc4]
cntb x0, all
@@ -71,7 +70,7 @@ G_M892_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, i
ldr w0, [x0, #0x08]
; gcrRegs -[x0]
cbnz w0, G_M892_IG05
- ;; size=104 bbWeight=1 PerfScore 46.00
+ ;; size=100 bbWeight=1 PerfScore 42.50
G_M892_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -357,7 +356,7 @@ G_M892_IG24: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 928, prolog size 36, PerfScore 307.29, instruction count 232, allocated bytes for code 928 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
+; Total bytes of code 924, prolog size 36, PerfScore 303.79, instruction count 231, allocated bytes for code 924 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
; ============================================================
Unwind Info:
@@ -368,7 +367,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 232 (0x000e8) Actual length = 928 (0x0003a0)
+ Function Length : 231 (0x000e7) Actual length = 924 (0x00039c)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) libraries.pmi.linux.arm64.checked.mch-4 (-16.67%) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)@@ -16,15 +16,14 @@ G_M40111_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M40111_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.s, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M40111_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-16.67%) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)@@ -16,15 +16,14 @@ G_M56373_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M56373_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.d, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M56373_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-16.67%) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)@@ -16,15 +16,14 @@ G_M57390_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M57390_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.b, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M57390_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-16.67%) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)@@ -16,15 +16,14 @@ G_M33416_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M33416_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.h, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M33416_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-16.67%) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)@@ -16,15 +16,14 @@ G_M18837_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M18837_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.d, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M18837_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-16.67%) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)@@ -16,15 +16,14 @@ G_M43790_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M43790_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.d, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M43790_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch-4 (-1.30%) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)@@ -7,33 +7,33 @@
; No matching PGO data
; Final local variable assignments
;
-; V00 this [V00,T09] ( 5, 5 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
-; V01 loc0 [V01,T05] ( 3, 9 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
+; V00 this [V00,T08] ( 5, 5 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0 [V01,T22] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[byte]>
;* V02 loc1 [V02 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V03 loc2 [V03 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V04 loc3 [V04 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
; V05 loc4 [V05,T00] ( 12, 41.50) int -> x1
-; V06 loc5 [V06,T13] ( 3, 6 ) int -> x2 single-def
-; V07 loc6 [V07,T17] ( 3, 5 ) long -> x4
-; V08 loc7 [V08,T18] ( 3, 5 ) long -> x6
+; V06 loc5 [V06,T12] ( 3, 6 ) int -> x2 single-def
+; V07 loc6 [V07,T16] ( 3, 5 ) long -> x4
+; V08 loc7 [V08,T17] ( 3, 5 ) long -> x6
; V09 loc8 [V09 ] ( 1, 0.50) ref -> [fp+0x18] must-init pinned class-hnd single-def <byte[]>
; V10 loc9 [V10 ] ( 1, 0.50) ref -> [fp+0x10] must-init pinned class-hnd single-def <byte[]>
-; V11 loc10 [V11,T08] ( 2, 8 ) ubyte -> x8
+; V11 loc10 [V11,T07] ( 2, 8 ) ubyte -> x8
;# V12 OutArgs [V12 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V13 tmp1 [V13,T15] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
-; V14 tmp2 [V14,T16] ( 5, 5 ) ref -> x6 class-hnd single-def "dup spill" <byte[]>
-; V15 tmp3 [V15,T19] ( 2, 2 ) long -> x4 "Cast away GC"
-; V16 tmp4 [V16,T20] ( 2, 2 ) long -> x6 "Cast away GC"
+; V13 tmp1 [V13,T14] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
+; V14 tmp2 [V14,T15] ( 5, 5 ) ref -> x6 class-hnd single-def "dup spill" <byte[]>
+; V15 tmp3 [V15,T18] ( 2, 2 ) long -> x4 "Cast away GC"
+; V16 tmp4 [V16,T19] ( 2, 2 ) long -> x6 "Cast away GC"
; V17 tmp5 [V17,T01] ( 3, 24 ) ref -> x2 "arr expr"
; V18 tmp6 [V18,T02] ( 3, 24 ) ref -> x6 "arr expr"
-;* V19 tmp7 [V19,T21] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
-;* V20 tmp8 [V20,T22] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
-; V21 cse0 [V21,T06] ( 3, 8.50) int -> x2 "CSE #11: aggressive"
-; V22 cse1 [V22,T07] ( 3, 8.50) int -> x4 "CSE #14: aggressive"
-; V23 cse2 [V23,T14] ( 3, 6 ) int -> x7 "CSE #07: aggressive"
-; V24 cse3 [V24,T12] ( 4, 6.50) int -> x0 "CSE #06: aggressive"
-; V25 cse4 [V25,T10] ( 4, 6.50) ref -> x3 "CSE #01: aggressive"
-; V26 cse5 [V26,T11] ( 4, 6.50) ref -> x5 "CSE #03: aggressive"
+;* V19 tmp7 [V19,T20] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
+;* V20 tmp8 [V20,T21] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
+; V21 cse0 [V21,T05] ( 3, 8.50) int -> x2 "CSE #11: aggressive"
+; V22 cse1 [V22,T06] ( 3, 8.50) int -> x4 "CSE #14: aggressive"
+; V23 cse2 [V23,T13] ( 3, 6 ) int -> x7 "CSE #07: aggressive"
+; V24 cse3 [V24,T11] ( 4, 6.50) int -> x0 "CSE #06: aggressive"
+; V25 cse4 [V25,T09] ( 4, 6.50) ref -> x3 "CSE #01: aggressive"
+; V26 cse5 [V26,T10] ( 4, 6.50) ref -> x5 "CSE #03: aggressive"
; V27 cse6 [V27,T03] ( 3, 12 ) long -> x4 "CSE #08: aggressive"
; V28 cse7 [V28,T04] ( 3, 12 ) long -> x8 "CSE #05: aggressive"
;
@@ -46,7 +46,6 @@ G_M892_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
;; size=12 bbWeight=1 PerfScore 2.50
G_M892_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
; gcrRegs +[x0]
- ptrue p0.b
mov w1, wzr
cntb x2, all
ldr x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
ldr w6, [x5, #0x08]
cmp w4, w6
bne G_M892_IG11
- ;; size=36 bbWeight=1 PerfScore 18.00
+ ;; size=32 bbWeight=1 PerfScore 16.00
G_M892_IG03: ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
mov x4, x3
; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07: ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
G_M892_IG08: ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
sxtw x8, w1
add x9, x4, x8
+ ptrue p0.b
ld1b { z16.b }, p0/z, [x9]
add x8, x6, x8
ld1b { z17.b }, p0/z, [x8]
- ptrue p1.b
- cmpne p1.b, p1/z, z16.b, z17.b
- mov z16.b, p1/z, #1
- ptrue p1.b
- uaddv d16, p1, z16.b
+ cmpne p0.b, p0/z, z16.b, z17.b
+ mov z16.b, p0/z, #1
+ ptrue p0.b
+ uaddv d16, p0, z16.b
umov x8, v16.d[0]
uxtb w8, w8
cmp w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
; ============================================================
Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 77 (0x0004d) Actual length = 308 (0x000134)
+ Function Length : 76 (0x0004c) Actual length = 304 (0x000130)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-1.05%) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)@@ -21,14 +21,13 @@
;# V10 OutArgs [V10 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V11 tmp1 [V11,T05] ( 5, 8 ) ref -> x3 class-hnd single-def "dup spill" <char[]>
;* V12 tmp2 [V12 ] ( 0, 0 ) ushort -> zero-ref "Inlining Arg"
-;* V13 tmp3 [V13 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg" <System.Numerics.Vector`1[short]>
-; V14 tmp4 [V14,T11] ( 2, 2 ) long -> x3 "Cast away GC"
-; V15 tmp5 [V15,T01] ( 3, 24 ) ref -> x3 "arr expr"
-; V16 cse0 [V16,T08] ( 3, 6 ) int -> x4 "CSE #07: aggressive"
-; V17 cse1 [V17,T03] ( 5, 10.25) int -> x0 "CSE #02: aggressive"
-; V18 cse2 [V18,T07] ( 3, 6 ) ref -> x2 "CSE #06: aggressive"
-; V19 cse3 [V19,T04] ( 4, 10 ) int -> x5 "CSE #05: aggressive"
-; V20 cse4 [V20,T10] ( 2, 4.25) mask -> p0 hoist "CSE #03: aggressive"
+; V13 tmp3 [V13,T11] ( 2, 2 ) long -> x3 "Cast away GC"
+; V14 tmp4 [V14,T01] ( 3, 24 ) ref -> x3 "arr expr"
+; V15 cse0 [V15,T08] ( 3, 6 ) int -> x4 "CSE #07: aggressive"
+; V16 cse1 [V16,T03] ( 5, 10.25) int -> x0 "CSE #02: aggressive"
+; V17 cse2 [V17,T07] ( 3, 6 ) ref -> x2 "CSE #06: aggressive"
+; V18 cse3 [V18,T04] ( 4, 10 ) int -> x5 "CSE #05: aggressive"
+; V19 cse4 [V19,T10] ( 2, 4.25) mask -> p0 hoist "CSE #03: aggressive"
;
; Lcl frame size = 16
@@ -62,14 +61,13 @@ G_M34028_IG04: ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
G_M34028_IG05: ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
ldrh w4, [x0, #0x14]
dup v16.8h, w4
- ptrue p0.h
- mov z17.h, p0/z, #1
+ mvni v17.4s, #0
ldr w0, [x0, #0x10]
; gcrRegs -[x0]
cnth x5, all
cmp w0, w5
ble G_M34028_IG10
- ;; size=32 bbWeight=1 PerfScore 15.50
+ ;; size=28 bbWeight=1 PerfScore 12.00
G_M34028_IG06: ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
ptrue p0.h
cmpne p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
; ============================================================
Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 95 (0x0005f) Actual length = 380 (0x00017c)
+ Function Length : 94 (0x0005e) Actual length = 376 (0x000178)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +8 (+2.11%) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)@@ -8,31 +8,31 @@
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
-; V00 this [V00,T06] ( 9, 7 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
+; V00 this [V00,T05] ( 9, 7 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
; V01 loc0 [V01,T02] ( 6, 17.50) int -> x1
; V02 loc1 [V02,T04] ( 5, 10 ) int -> x2 single-def
-; V03 loc2 [V03,T05] ( 4, 10 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
-; V04 loc3 [V04,T01] ( 6, 18 ) mask -> p1 <System.Numerics.Vector`1[byte]>
+;* V03 loc2 [V03,T19] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[byte]>
+; V04 loc3 [V04,T01] ( 6, 18 ) mask -> p0 <System.Numerics.Vector`1[byte]>
; V05 loc4 [V05,T20] ( 4, 13 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
;* V06 loc5 [V06 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V07 loc6 [V07 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
-; V08 loc7 [V08,T11] ( 3, 5 ) long -> x4
-; V09 loc8 [V09,T12] ( 3, 5 ) long -> x5
+; V08 loc7 [V08,T10] ( 3, 5 ) long -> x4
+; V09 loc8 [V09,T11] ( 3, 5 ) long -> x5
; V10 loc9 [V10 ] ( 1, 0.50) ref -> [fp+0x28] must-init pinned class-hnd single-def <byte[]>
; V11 loc10 [V11 ] ( 1, 0.50) ref -> [fp+0x20] must-init pinned class-hnd single-def <byte[]>
-; V12 loc11 [V12,T10] ( 4, 5 ) int -> x3
-; V13 loc12 [V13,T18] ( 3, 1.50) int -> x3 single-def
+; V12 loc11 [V12,T09] ( 4, 5 ) int -> x3
+; V13 loc12 [V13,T17] ( 3, 1.50) int -> x3 single-def
; V14 loc13 [V14,T00] ( 7, 22.50) int -> x4
;# V15 OutArgs [V15 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V16 tmp1 [V16,T08] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
-; V17 tmp2 [V17,T09] ( 5, 5 ) ref -> x5 class-hnd single-def "dup spill" <byte[]>
-; V18 tmp3 [V18,T16] ( 2, 2 ) long -> x4 "Cast away GC"
-; V19 tmp4 [V19,T17] ( 2, 2 ) long -> x5 "Cast away GC"
-; V20 tmp5 [V20,T13] ( 3, 3 ) ref -> x2 single-def "arr expr"
-; V21 tmp6 [V21,T14] ( 3, 3 ) ref -> x0 single-def "arr expr"
-; V22 cse0 [V22,T07] ( 3, 6 ) int -> x3 "CSE #05: aggressive"
-; V23 cse1 [V23,T19] ( 3, 1.50) long -> x3 "CSE #08: moderate"
-; V24 cse2 [V24,T15] ( 4, 2 ) int -> x1 "CSE #07: moderate"
+; V16 tmp1 [V16,T07] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
+; V17 tmp2 [V17,T08] ( 5, 5 ) ref -> x5 class-hnd single-def "dup spill" <byte[]>
+; V18 tmp3 [V18,T15] ( 2, 2 ) long -> x4 "Cast away GC"
+; V19 tmp4 [V19,T16] ( 2, 2 ) long -> x5 "Cast away GC"
+; V20 tmp5 [V20,T12] ( 3, 3 ) ref -> x2 single-def "arr expr"
+; V21 tmp6 [V21,T13] ( 3, 3 ) ref -> x0 single-def "arr expr"
+; V22 cse0 [V22,T06] ( 3, 6 ) int -> x3 "CSE #05: aggressive"
+; V23 cse1 [V23,T18] ( 3, 1.50) long -> x3 "CSE #08: moderate"
+; V24 cse2 [V24,T14] ( 4, 2 ) int -> x1 "CSE #07: moderate"
; V25 cse3 [V25,T03] ( 3, 12 ) long -> x6 "CSE #06: aggressive"
; V26 rat0 [V26,T21] ( 3, 9 ) simd16 -> [fp+0x10] do-not-enreg[S] "SIMDInitTempVar"
;
@@ -47,10 +47,9 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
; gcrRegs +[x0]
mov w1, wzr
cntb x2, all
- ptrue p0.b
ldr w3, [x0, #0x20]
mov w4, wzr
- whilelt p1.b, w4, w3
+ whilelt p0.b, w4, w3
movi v16.4s, #0
ldr x4, [x0, #0x10]
; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
; gcrRegs -[x5]
cmp w4, w5
bne G_M14759_IG14
- ;; size=52 bbWeight=1 PerfScore 24.00
+ ;; size=48 bbWeight=1 PerfScore 22.00
G_M14759_IG03: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
ldr x4, [x0, #0x10]
; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
mov x5, xzr
;; size=4 bbWeight=0.50 PerfScore 0.25
G_M14759_IG07: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
- ptest p0, p1.b
+ ptrue p1.b
+ ptest p1, p0.b
bge G_M14759_IG09
- ;; size=8 bbWeight=1 PerfScore 3.00
+ ;; size=12 bbWeight=1 PerfScore 5.00
G_M14759_IG08: ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
sxtw x6, w1
add x7, x4, x6
- ld1b { z16.b }, p1/z, [x7]
+ ld1b { z16.b }, p0/z, [x7]
add x6, x5, x6
- ld1b { z17.b }, p1/z, [x6]
+ ld1b { z17.b }, p0/z, [x6]
+ ptrue p0.b
+ cmpne p0.b, p0/z, z16.b, z17.b
+ mov z16.b, p0/z, #1
+ ptrue p0.b
+ cmpne p0.b, p0/z, z16.b, #0
ptrue p1.b
- cmpne p1.b, p1/z, z16.b, z17.b
- mov z16.b, p1/z, #1
- ptrue p1.b
- cmpne p1.b, p1/z, z16.b, #0
- ptest p0, p1.b
+ ptest p1, p0.b
bne G_M14759_IG09
add w1, w1, w2
- whilelt p1.b, w1, w3
- ptest p0, p1.b
+ whilelt p0.b, w1, w3
+ ptrue p1.b
+ ptest p1, p0.b
blt G_M14759_IG08
- ;; size=64 bbWeight=4 PerfScore 152.00
+ ;; size=72 bbWeight=4 PerfScore 168.00
G_M14759_IG09: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
mov w3, wzr
mov w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
; ============================================================
Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 95 (0x0005f) Actual length = 380 (0x00017c)
+ Function Length : 97 (0x00061) Actual length = 388 (0x000184)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +8 (+5.26%) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)@@ -8,19 +8,20 @@
; Final local variable assignments
;
; V00 this [V00,T05] ( 4, 4 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrLen>
-; V01 loc0 [V01,T04] ( 3, 7 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
+; V01 loc0 [V01,T11] ( 2, 3 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
;* V02 loc1 [V02 ] ( 0, 0 ) mask -> zero-ref <System.Numerics.Vector`1[byte]>
-; V03 loc2 [V03,T10] ( 5, 13 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
+; V03 loc2 [V03,T10] ( 5, 13 ) simd16 -> d17 <System.Numerics.Vector`1[byte]>
; V04 loc3 [V04,T00] ( 6, 18 ) long -> x1
; V05 loc4 [V05,T07] ( 2, 5 ) long -> x2 single-def
-; V06 loc5 [V06,T01] ( 5, 12 ) mask -> p1 <System.Numerics.Vector`1[byte]>
-; V07 loc6 [V07,T03] ( 4, 7 ) long -> x0
+; V06 loc5 [V06,T01] ( 5, 12 ) mask -> p0 <System.Numerics.Vector`1[byte]>
+; V07 loc6 [V07,T04] ( 4, 7 ) long -> x0
; V08 loc7 [V08 ] ( 1, 1 ) ref -> [fp+0x18] must-init pinned class-hnd single-def <byte[]>
;# V09 OutArgs [V09 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V10 tmp1 [V10,T02] ( 5, 8 ) ref -> x0 class-hnd single-def "dup spill" <byte[]>
; V11 tmp2 [V11,T08] ( 2, 2 ) long -> x0 "Cast away GC"
; V12 cse0 [V12,T06] ( 3, 6 ) int -> x3 "CSE #02: aggressive"
-; V13 cse1 [V13,T09] ( 2, 1 ) int -> x4 "CSE #01: moderate"
+; V13 cse1 [V13,T03] ( 3, 8 ) mask -> p1 "CSE #03: aggressive"
+; V14 cse2 [V14,T09] ( 2, 1 ) int -> x4 "CSE #01: moderate"
;
; Lcl frame size = 16
@@ -31,16 +32,16 @@ G_M60402_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=12 bbWeight=1 PerfScore 2.50
G_M60402_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
; gcrRegs +[x0]
- ptrue p0.b
+ mvni v16.4s, #0
mov x1, xzr
cntb x2, all
ldr w3, [x0, #0x18]
mov w4, wzr
- whilelt p1.b, w4, w3
+ whilelt p0.b, w4, w3
ldr x0, [x0, #0x08]
str x0, [fp, #0x18] // [V08 loc7]
cbz x0, G_M60402_IG04
- ;; size=36 bbWeight=1 PerfScore 15.00
+ ;; size=36 bbWeight=1 PerfScore 13.50
G_M60402_IG03: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
ldr w4, [x0, #0x08]
cbz w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
mov x0, xzr
;; size=4 bbWeight=0.50 PerfScore 0.25
G_M60402_IG05: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
- ld1b { z16.b }, p1/z, [x0]
- movi v17.4s, #0
+ ld1b { z17.b }, p0/z, [x0]
+ ptrue p1.b
+ cmpne p1.b, p1/z, z16.b, #0
+ movi v16.4s, #0
ptrue p2.b
- cmpeq p2.b, p2/z, z16.b, z17.b
- ptest p0, p2.b
+ cmpeq p2.b, p2/z, z17.b, z16.b
+ ptest p1, p2.b
bne G_M60402_IG07
- ;; size=24 bbWeight=2 PerfScore 33.00
+ ;; size=32 bbWeight=2 PerfScore 43.00
G_M60402_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
add x1, x1, x2
- whilelt p1.b, w1, w3
+ whilelt p0.b, w1, w3
add x4, x0, x1
- ld1b { z16.b }, p1/z, [x4]
- movi v17.4s, #0
+ ld1b { z17.b }, p0/z, [x4]
+ movi v16.4s, #0
ptrue p2.b
- cmpeq p2.b, p2/z, z16.b, z17.b
- ptest p0, p2.b
+ cmpeq p2.b, p2/z, z17.b, z16.b
+ ptest p1, p2.b
beq G_M60402_IG06
;; size=36 bbWeight=4 PerfScore 78.00
G_M60402_IG07: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- ptrue p0.b
- cmpne p0.b, p0/z, z16.b, #0
- cntp x0, p1, p0.b
+ ptrue p1.b
+ cmpne p1.b, p1/z, z17.b, #0
+ cntp x0, p0, p1.b
add x0, x0, x1
;; size=16 bbWeight=1 PerfScore 7.50
G_M60402_IG08: ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08: ; bbWeight=1, epilog, nogc, extend
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
; ============================================================
Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 38 (0x00026) Actual length = 152 (0x000098)
+ Function Length : 40 (0x00028) Actual length = 160 (0x0000a0)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) DetailsSize improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze outputbenchmarks.run.linux.arm64.checked.mch
Detail diffs
coreclr_tests.run.linux.arm64.checked.mch
Detail diffs
benchmarks.run_pgo.linux.arm64.checked.mch
Detail diffs
libraries.pmi.linux.arm64.checked.mch
Detail diffs
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch
Detail diffs
|
After removing calls to Diffs are based on 2,631,375 contexts (1,092,633 MinOpts, 1,538,742 FullOpts). Overall (-334,028 bytes)
MinOpts (-31,028 bytes)
FullOpts (-303,000 bytes)
Example diffsbenchmarks.run.linux.arm64.checked.mch-4 (-1.30%) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)@@ -7,33 +7,33 @@
; No matching PGO data
; Final local variable assignments
;
-; V00 this [V00,T09] ( 5, 5 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
-; V01 loc0 [V01,T05] ( 3, 9 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
+; V00 this [V00,T08] ( 5, 5 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0 [V01,T22] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[byte]>
;* V02 loc1 [V02 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V03 loc2 [V03 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V04 loc3 [V04 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
; V05 loc4 [V05,T00] ( 12, 41.50) int -> x1
-; V06 loc5 [V06,T13] ( 3, 6 ) int -> x2 single-def
-; V07 loc6 [V07,T17] ( 3, 5 ) long -> x4
-; V08 loc7 [V08,T18] ( 3, 5 ) long -> x6
+; V06 loc5 [V06,T12] ( 3, 6 ) int -> x2 single-def
+; V07 loc6 [V07,T16] ( 3, 5 ) long -> x4
+; V08 loc7 [V08,T17] ( 3, 5 ) long -> x6
; V09 loc8 [V09 ] ( 1, 0.50) ref -> [fp+0x18] must-init pinned class-hnd single-def <byte[]>
; V10 loc9 [V10 ] ( 1, 0.50) ref -> [fp+0x10] must-init pinned class-hnd single-def <byte[]>
-; V11 loc10 [V11,T08] ( 2, 8 ) ubyte -> x8
+; V11 loc10 [V11,T07] ( 2, 8 ) ubyte -> x8
;# V12 OutArgs [V12 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V13 tmp1 [V13,T15] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
-; V14 tmp2 [V14,T16] ( 5, 5 ) ref -> x6 class-hnd single-def "dup spill" <byte[]>
-; V15 tmp3 [V15,T19] ( 2, 2 ) long -> x4 "Cast away GC"
-; V16 tmp4 [V16,T20] ( 2, 2 ) long -> x6 "Cast away GC"
+; V13 tmp1 [V13,T14] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
+; V14 tmp2 [V14,T15] ( 5, 5 ) ref -> x6 class-hnd single-def "dup spill" <byte[]>
+; V15 tmp3 [V15,T18] ( 2, 2 ) long -> x4 "Cast away GC"
+; V16 tmp4 [V16,T19] ( 2, 2 ) long -> x6 "Cast away GC"
; V17 tmp5 [V17,T01] ( 3, 24 ) ref -> x2 "arr expr"
; V18 tmp6 [V18,T02] ( 3, 24 ) ref -> x6 "arr expr"
-;* V19 tmp7 [V19,T21] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
-;* V20 tmp8 [V20,T22] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
-; V21 cse0 [V21,T06] ( 3, 8.50) int -> x2 "CSE #11: aggressive"
-; V22 cse1 [V22,T07] ( 3, 8.50) int -> x4 "CSE #14: aggressive"
-; V23 cse2 [V23,T14] ( 3, 6 ) int -> x7 "CSE #07: aggressive"
-; V24 cse3 [V24,T12] ( 4, 6.50) int -> x0 "CSE #06: aggressive"
-; V25 cse4 [V25,T10] ( 4, 6.50) ref -> x3 "CSE #01: aggressive"
-; V26 cse5 [V26,T11] ( 4, 6.50) ref -> x5 "CSE #03: aggressive"
+;* V19 tmp7 [V19,T20] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
+;* V20 tmp8 [V20,T21] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
+; V21 cse0 [V21,T05] ( 3, 8.50) int -> x2 "CSE #11: aggressive"
+; V22 cse1 [V22,T06] ( 3, 8.50) int -> x4 "CSE #14: aggressive"
+; V23 cse2 [V23,T13] ( 3, 6 ) int -> x7 "CSE #07: aggressive"
+; V24 cse3 [V24,T11] ( 4, 6.50) int -> x0 "CSE #06: aggressive"
+; V25 cse4 [V25,T09] ( 4, 6.50) ref -> x3 "CSE #01: aggressive"
+; V26 cse5 [V26,T10] ( 4, 6.50) ref -> x5 "CSE #03: aggressive"
; V27 cse6 [V27,T03] ( 3, 12 ) long -> x4 "CSE #08: aggressive"
; V28 cse7 [V28,T04] ( 3, 12 ) long -> x8 "CSE #05: aggressive"
;
@@ -46,7 +46,6 @@ G_M892_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
;; size=12 bbWeight=1 PerfScore 2.50
G_M892_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
; gcrRegs +[x0]
- ptrue p0.b
mov w1, wzr
cntb x2, all
ldr x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
ldr w6, [x5, #0x08]
cmp w4, w6
bne G_M892_IG11
- ;; size=36 bbWeight=1 PerfScore 18.00
+ ;; size=32 bbWeight=1 PerfScore 16.00
G_M892_IG03: ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
mov x4, x3
; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07: ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
G_M892_IG08: ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
sxtw x8, w1
add x9, x4, x8
+ ptrue p0.b
ld1b { z16.b }, p0/z, [x9]
add x8, x6, x8
ld1b { z17.b }, p0/z, [x8]
- ptrue p1.b
- cmpne p1.b, p1/z, z16.b, z17.b
- mov z16.b, p1/z, #1
- ptrue p1.b
- uaddv d16, p1, z16.b
+ cmpne p0.b, p0/z, z16.b, z17.b
+ mov z16.b, p0/z, #1
+ ptrue p0.b
+ uaddv d16, p0, z16.b
umov x8, v16.d[0]
uxtb w8, w8
cmp w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
; ============================================================
Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 77 (0x0004d) Actual length = 308 (0x000134)
+ Function Length : 76 (0x0004c) Actual length = 304 (0x000130)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-1.05%) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)@@ -21,14 +21,13 @@
;# V10 OutArgs [V10 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V11 tmp1 [V11,T05] ( 5, 8 ) ref -> x3 class-hnd single-def "dup spill" <char[]>
;* V12 tmp2 [V12 ] ( 0, 0 ) ushort -> zero-ref "Inlining Arg"
-;* V13 tmp3 [V13 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg" <System.Numerics.Vector`1[short]>
-; V14 tmp4 [V14,T11] ( 2, 2 ) long -> x3 "Cast away GC"
-; V15 tmp5 [V15,T01] ( 3, 24 ) ref -> x3 "arr expr"
-; V16 cse0 [V16,T08] ( 3, 6 ) int -> x4 "CSE #07: aggressive"
-; V17 cse1 [V17,T03] ( 5, 10.25) int -> x0 "CSE #02: aggressive"
-; V18 cse2 [V18,T07] ( 3, 6 ) ref -> x2 "CSE #06: aggressive"
-; V19 cse3 [V19,T04] ( 4, 10 ) int -> x5 "CSE #05: aggressive"
-; V20 cse4 [V20,T10] ( 2, 4.25) mask -> p0 hoist "CSE #03: aggressive"
+; V13 tmp3 [V13,T11] ( 2, 2 ) long -> x3 "Cast away GC"
+; V14 tmp4 [V14,T01] ( 3, 24 ) ref -> x3 "arr expr"
+; V15 cse0 [V15,T08] ( 3, 6 ) int -> x4 "CSE #07: aggressive"
+; V16 cse1 [V16,T03] ( 5, 10.25) int -> x0 "CSE #02: aggressive"
+; V17 cse2 [V17,T07] ( 3, 6 ) ref -> x2 "CSE #06: aggressive"
+; V18 cse3 [V18,T04] ( 4, 10 ) int -> x5 "CSE #05: aggressive"
+; V19 cse4 [V19,T10] ( 2, 4.25) mask -> p0 hoist "CSE #03: aggressive"
;
; Lcl frame size = 16
@@ -62,14 +61,13 @@ G_M34028_IG04: ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
G_M34028_IG05: ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
ldrh w4, [x0, #0x14]
dup v16.8h, w4
- ptrue p0.h
- mov z17.h, p0/z, #1
+ mvni v17.4s, #0
ldr w0, [x0, #0x10]
; gcrRegs -[x0]
cnth x5, all
cmp w0, w5
ble G_M34028_IG10
- ;; size=32 bbWeight=1 PerfScore 15.50
+ ;; size=28 bbWeight=1 PerfScore 12.00
G_M34028_IG06: ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
ptrue p0.h
cmpne p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
; ============================================================
Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 95 (0x0005f) Actual length = 380 (0x00017c)
+ Function Length : 94 (0x0005e) Actual length = 376 (0x000178)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +8 (+2.11%) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)@@ -8,31 +8,31 @@
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
-; V00 this [V00,T06] ( 9, 7 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
+; V00 this [V00,T05] ( 9, 7 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
; V01 loc0 [V01,T02] ( 6, 17.50) int -> x1
; V02 loc1 [V02,T04] ( 5, 10 ) int -> x2 single-def
-; V03 loc2 [V03,T05] ( 4, 10 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
-; V04 loc3 [V04,T01] ( 6, 18 ) mask -> p1 <System.Numerics.Vector`1[byte]>
+;* V03 loc2 [V03,T19] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[byte]>
+; V04 loc3 [V04,T01] ( 6, 18 ) mask -> p0 <System.Numerics.Vector`1[byte]>
; V05 loc4 [V05,T20] ( 4, 13 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
;* V06 loc5 [V06 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V07 loc6 [V07 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
-; V08 loc7 [V08,T11] ( 3, 5 ) long -> x4
-; V09 loc8 [V09,T12] ( 3, 5 ) long -> x5
+; V08 loc7 [V08,T10] ( 3, 5 ) long -> x4
+; V09 loc8 [V09,T11] ( 3, 5 ) long -> x5
; V10 loc9 [V10 ] ( 1, 0.50) ref -> [fp+0x28] must-init pinned class-hnd single-def <byte[]>
; V11 loc10 [V11 ] ( 1, 0.50) ref -> [fp+0x20] must-init pinned class-hnd single-def <byte[]>
-; V12 loc11 [V12,T10] ( 4, 5 ) int -> x3
-; V13 loc12 [V13,T18] ( 3, 1.50) int -> x3 single-def
+; V12 loc11 [V12,T09] ( 4, 5 ) int -> x3
+; V13 loc12 [V13,T17] ( 3, 1.50) int -> x3 single-def
; V14 loc13 [V14,T00] ( 7, 22.50) int -> x4
;# V15 OutArgs [V15 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V16 tmp1 [V16,T08] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
-; V17 tmp2 [V17,T09] ( 5, 5 ) ref -> x5 class-hnd single-def "dup spill" <byte[]>
-; V18 tmp3 [V18,T16] ( 2, 2 ) long -> x4 "Cast away GC"
-; V19 tmp4 [V19,T17] ( 2, 2 ) long -> x5 "Cast away GC"
-; V20 tmp5 [V20,T13] ( 3, 3 ) ref -> x2 single-def "arr expr"
-; V21 tmp6 [V21,T14] ( 3, 3 ) ref -> x0 single-def "arr expr"
-; V22 cse0 [V22,T07] ( 3, 6 ) int -> x3 "CSE #05: aggressive"
-; V23 cse1 [V23,T19] ( 3, 1.50) long -> x3 "CSE #08: moderate"
-; V24 cse2 [V24,T15] ( 4, 2 ) int -> x1 "CSE #07: moderate"
+; V16 tmp1 [V16,T07] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
+; V17 tmp2 [V17,T08] ( 5, 5 ) ref -> x5 class-hnd single-def "dup spill" <byte[]>
+; V18 tmp3 [V18,T15] ( 2, 2 ) long -> x4 "Cast away GC"
+; V19 tmp4 [V19,T16] ( 2, 2 ) long -> x5 "Cast away GC"
+; V20 tmp5 [V20,T12] ( 3, 3 ) ref -> x2 single-def "arr expr"
+; V21 tmp6 [V21,T13] ( 3, 3 ) ref -> x0 single-def "arr expr"
+; V22 cse0 [V22,T06] ( 3, 6 ) int -> x3 "CSE #05: aggressive"
+; V23 cse1 [V23,T18] ( 3, 1.50) long -> x3 "CSE #08: moderate"
+; V24 cse2 [V24,T14] ( 4, 2 ) int -> x1 "CSE #07: moderate"
; V25 cse3 [V25,T03] ( 3, 12 ) long -> x6 "CSE #06: aggressive"
; V26 rat0 [V26,T21] ( 3, 9 ) simd16 -> [fp+0x10] do-not-enreg[S] "SIMDInitTempVar"
;
@@ -47,10 +47,9 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
; gcrRegs +[x0]
mov w1, wzr
cntb x2, all
- ptrue p0.b
ldr w3, [x0, #0x20]
mov w4, wzr
- whilelt p1.b, w4, w3
+ whilelt p0.b, w4, w3
movi v16.4s, #0
ldr x4, [x0, #0x10]
; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
; gcrRegs -[x5]
cmp w4, w5
bne G_M14759_IG14
- ;; size=52 bbWeight=1 PerfScore 24.00
+ ;; size=48 bbWeight=1 PerfScore 22.00
G_M14759_IG03: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
ldr x4, [x0, #0x10]
; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
mov x5, xzr
;; size=4 bbWeight=0.50 PerfScore 0.25
G_M14759_IG07: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
- ptest p0, p1.b
+ ptrue p1.b
+ ptest p1, p0.b
bge G_M14759_IG09
- ;; size=8 bbWeight=1 PerfScore 3.00
+ ;; size=12 bbWeight=1 PerfScore 5.00
G_M14759_IG08: ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
sxtw x6, w1
add x7, x4, x6
- ld1b { z16.b }, p1/z, [x7]
+ ld1b { z16.b }, p0/z, [x7]
add x6, x5, x6
- ld1b { z17.b }, p1/z, [x6]
+ ld1b { z17.b }, p0/z, [x6]
+ ptrue p0.b
+ cmpne p0.b, p0/z, z16.b, z17.b
+ mov z16.b, p0/z, #1
+ ptrue p0.b
+ cmpne p0.b, p0/z, z16.b, #0
ptrue p1.b
- cmpne p1.b, p1/z, z16.b, z17.b
- mov z16.b, p1/z, #1
- ptrue p1.b
- cmpne p1.b, p1/z, z16.b, #0
- ptest p0, p1.b
+ ptest p1, p0.b
bne G_M14759_IG09
add w1, w1, w2
- whilelt p1.b, w1, w3
- ptest p0, p1.b
+ whilelt p0.b, w1, w3
+ ptrue p1.b
+ ptest p1, p0.b
blt G_M14759_IG08
- ;; size=64 bbWeight=4 PerfScore 152.00
+ ;; size=72 bbWeight=4 PerfScore 168.00
G_M14759_IG09: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
mov w3, wzr
mov w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
; ============================================================
Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 95 (0x0005f) Actual length = 380 (0x00017c)
+ Function Length : 97 (0x00061) Actual length = 388 (0x000184)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +8 (+5.26%) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)@@ -8,19 +8,20 @@
; Final local variable assignments
;
; V00 this [V00,T05] ( 4, 4 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrLen>
-; V01 loc0 [V01,T04] ( 3, 7 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
+; V01 loc0 [V01,T11] ( 2, 3 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
;* V02 loc1 [V02 ] ( 0, 0 ) mask -> zero-ref <System.Numerics.Vector`1[byte]>
-; V03 loc2 [V03,T10] ( 5, 13 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
+; V03 loc2 [V03,T10] ( 5, 13 ) simd16 -> d17 <System.Numerics.Vector`1[byte]>
; V04 loc3 [V04,T00] ( 6, 18 ) long -> x1
; V05 loc4 [V05,T07] ( 2, 5 ) long -> x2 single-def
-; V06 loc5 [V06,T01] ( 5, 12 ) mask -> p1 <System.Numerics.Vector`1[byte]>
-; V07 loc6 [V07,T03] ( 4, 7 ) long -> x0
+; V06 loc5 [V06,T01] ( 5, 12 ) mask -> p0 <System.Numerics.Vector`1[byte]>
+; V07 loc6 [V07,T04] ( 4, 7 ) long -> x0
; V08 loc7 [V08 ] ( 1, 1 ) ref -> [fp+0x18] must-init pinned class-hnd single-def <byte[]>
;# V09 OutArgs [V09 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V10 tmp1 [V10,T02] ( 5, 8 ) ref -> x0 class-hnd single-def "dup spill" <byte[]>
; V11 tmp2 [V11,T08] ( 2, 2 ) long -> x0 "Cast away GC"
; V12 cse0 [V12,T06] ( 3, 6 ) int -> x3 "CSE #02: aggressive"
-; V13 cse1 [V13,T09] ( 2, 1 ) int -> x4 "CSE #01: moderate"
+; V13 cse1 [V13,T03] ( 3, 8 ) mask -> p1 "CSE #03: aggressive"
+; V14 cse2 [V14,T09] ( 2, 1 ) int -> x4 "CSE #01: moderate"
;
; Lcl frame size = 16
@@ -31,16 +32,16 @@ G_M60402_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=12 bbWeight=1 PerfScore 2.50
G_M60402_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
; gcrRegs +[x0]
- ptrue p0.b
+ mvni v16.4s, #0
mov x1, xzr
cntb x2, all
ldr w3, [x0, #0x18]
mov w4, wzr
- whilelt p1.b, w4, w3
+ whilelt p0.b, w4, w3
ldr x0, [x0, #0x08]
str x0, [fp, #0x18] // [V08 loc7]
cbz x0, G_M60402_IG04
- ;; size=36 bbWeight=1 PerfScore 15.00
+ ;; size=36 bbWeight=1 PerfScore 13.50
G_M60402_IG03: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
ldr w4, [x0, #0x08]
cbz w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
mov x0, xzr
;; size=4 bbWeight=0.50 PerfScore 0.25
G_M60402_IG05: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
- ld1b { z16.b }, p1/z, [x0]
- movi v17.4s, #0
+ ld1b { z17.b }, p0/z, [x0]
+ ptrue p1.b
+ cmpne p1.b, p1/z, z16.b, #0
+ movi v16.4s, #0
ptrue p2.b
- cmpeq p2.b, p2/z, z16.b, z17.b
- ptest p0, p2.b
+ cmpeq p2.b, p2/z, z17.b, z16.b
+ ptest p1, p2.b
bne G_M60402_IG07
- ;; size=24 bbWeight=2 PerfScore 33.00
+ ;; size=32 bbWeight=2 PerfScore 43.00
G_M60402_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
add x1, x1, x2
- whilelt p1.b, w1, w3
+ whilelt p0.b, w1, w3
add x4, x0, x1
- ld1b { z16.b }, p1/z, [x4]
- movi v17.4s, #0
+ ld1b { z17.b }, p0/z, [x4]
+ movi v16.4s, #0
ptrue p2.b
- cmpeq p2.b, p2/z, z16.b, z17.b
- ptest p0, p2.b
+ cmpeq p2.b, p2/z, z17.b, z16.b
+ ptest p1, p2.b
beq G_M60402_IG06
;; size=36 bbWeight=4 PerfScore 78.00
G_M60402_IG07: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- ptrue p0.b
- cmpne p0.b, p0/z, z16.b, #0
- cntp x0, p1, p0.b
+ ptrue p1.b
+ cmpne p1.b, p1/z, z17.b, #0
+ cntp x0, p0, p1.b
add x0, x0, x1
;; size=16 bbWeight=1 PerfScore 7.50
G_M60402_IG08: ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08: ; bbWeight=1, epilog, nogc, extend
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
; ============================================================
Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 38 (0x00026) Actual length = 152 (0x000098)
+ Function Length : 40 (0x00028) Actual length = 160 (0x0000a0)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) coreclr_tests.run.linux.arm64.checked.mch-28 (-58.33%) : 358603.dasm - PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)@@ -17,22 +17,15 @@ G_M44742_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M44742_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- ptrue p1.h
- ptrue p2.h
- ptrue p3.h
- bic p1.b, p3/z, p1.b, p2.b
- pfalse p2.b
- sel p0.b, p0, p1.b, p2.b
- mov z0.h, p0/z, #1
- ;; size=32 bbWeight=1 PerfScore 16.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M44742_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; END METHOD PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short]
-; Total bytes of code 48, prolog size 8, PerfScore 19.50, instruction count 12, allocated bytes for code 48 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
; ============================================================
Unwind Info:
@@ -43,7 +36,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 12 (0x0000c) Actual length = 48 (0x000030)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -24 (-54.55%) : 358606.dasm - PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)@@ -17,21 +17,15 @@ G_M19455_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M19455_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- ptrue p0.s
- movi v0.4s, #0
- cmpne p0.s, p0/z, z0.s, #0
- pfalse p1.b
- ptrue p2.s
- sel p0.b, p0, p1.b, p2.b
- mov z0.s, p0/z, #1
- ;; size=28 bbWeight=1 PerfScore 13.50
+ mvni v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M19455_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; END METHOD PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int]
-; Total bytes of code 44, prolog size 8, PerfScore 17.00, instruction count 11, allocated bytes for code 44 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
; ============================================================
Unwind Info:
@@ -42,7 +36,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 11 (0x0000b) Actual length = 44 (0x00002c)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -28 (-43.75%) : 679474.dasm - Runtime_1068867:TestEntryPoint() (FullOpts)@@ -16,7 +16,6 @@
;* V05 tmp1 [V05 ] ( 0, 0 ) long -> zero-ref class-hnd exact "NewObj constructor temp" <C0>
;* V06 tmp2 [V06 ] ( 0, 0 ) simd16 -> zero-ref "location for address-of(RValue)"
;* V07 tmp3 [V07 ] ( 0, 0 ) struct (16) zero-ref do-not-enreg[SF] "stack allocated C0" <C0>
-; V08 cse0 [V08,T00] ( 3, 3 ) mask -> p0 "CSE #01: aggressive"
;
; Lcl frame size = 0
@@ -24,28 +23,19 @@ G_M538_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
stp fp, lr, [sp, #-0x10]!
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
-G_M538_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
- ptrue p0.s
+G_M538_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
movi v0.4s, #0
- cmpne p0.s, p0/z, z0.s, #0
- movi v0.4s, #0
- ldr q16, [@RWD00]
- sel z0.s, p0, z0.s, z16.s
- movi v16.4s, #0
- sel z0.s, p0, z0.s, z16.s
movz x0, #0xD1FFAB1E // code for <unknown method>
movk x0, #0xD1FFAB1E LSL #16
movk x0, #0xD1FFAB1E LSL #32
ldr x0, [x0]
- ;; size=48 bbWeight=1 PerfScore 17.00
+ ;; size=20 bbWeight=1 PerfScore 5.00
G_M538_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
br x0
;; size=8 bbWeight=1 PerfScore 2.00
-RWD00 dq 0000000000000001h, 0000000000000000h
-
-; Total bytes of code 64, prolog size 8, PerfScore 20.50, instruction count 16, allocated bytes for code 64 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
+; Total bytes of code 36, prolog size 8, PerfScore 8.50, instruction count 9, allocated bytes for code 36 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
; ============================================================
Unwind Info:
@@ -56,7 +46,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 16 (0x00010) Actual length = 64 (0x000040)
+ Function Length : 9 (0x00009) Actual length = 36 (0x000024)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +16 (+2.53%) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)@@ -9,12 +9,12 @@
; Final local variable assignments
;
; V00 this [V00,T02] ( 4, 4 ) ref -> x19 this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong>
-;* V01 loc0 [V01,T30] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[ushort]>
-; V02 loc1 [V02,T29] ( 3, 3 ) mask -> [fp+0x10] spill-single-def <System.Numerics.Vector`1[ushort]>
-; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[ulong]>
+;* V01 loc0 [V01,T34] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[ushort]>
+; V02 loc1 [V02,T32] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[ushort]>
+; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d10 <System.Numerics.Vector`1[ulong]>
;# V04 OutArgs [V04 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V05 tmp1 [V05,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
-; V06 tmp2 [V06,T32] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V05 tmp1 [V05,T30] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V06 tmp2 [V06,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
; V07 tmp3 [V07,T18] ( 2, 4 ) long -> x21 "impAppendStmt"
; V08 tmp4 [V08,T19] ( 2, 4 ) long -> x22 "impAppendStmt"
; V09 tmp5 [V09,T20] ( 2, 4 ) long -> x23 "impAppendStmt"
@@ -51,21 +51,23 @@
;* V40 tmp36 [V40 ] ( 0, 0 ) long -> zero-ref ld-addr-op "Inline stloc first use temp"
; V41 tmp37 [V41,T28] ( 2, 4 ) long -> x0 "Inlining Arg"
; V42 tmp38 [V42,T17] ( 3, 6 ) long -> x4 "Inlining Arg"
-; V43 cse0 [V43,T00] ( 9, 9 ) byref -> x20 "CSE #02: aggressive"
+; V43 cse0 [V43,T29] ( 3, 3 ) mask -> [fp+0x18] spill-single-def "CSE #02: moderate"
+; V44 cse1 [V44,T00] ( 9, 9 ) byref -> x20 "CSE #01: aggressive"
;
-; Lcl frame size = 8
+; Lcl frame size = 16
G_M33034_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- stp fp, lr, [sp, #-0x60]!
- stp d8, d9, [sp, #0x18]
- stp d10, d11, [sp, #0x28]
- stp x19, x20, [sp, #0x38]
- stp x21, x22, [sp, #0x48]
- str x23, [sp, #0x58]
+ stp fp, lr, [sp, #-0x70]!
+ stp d8, d9, [sp, #0x20]
+ stp d10, d11, [sp, #0x30]
+ str d12, [sp, #0x40]
+ stp x19, x20, [sp, #0x48]
+ stp x21, x22, [sp, #0x58]
+ str x23, [sp, #0x68]
mov fp, sp
mov x19, x0
; gcrRegs +[x19]
- ;; size=32 bbWeight=1 PerfScore 7.00
+ ;; size=36 bbWeight=1 PerfScore 8.00
G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
ldr x1, [x1]
blr x1
; gcrRegs -[x0]
- ptrue p0.h
- add xip1, fp, #16
- str p0, [xip1]
+ mvni v8.4s, #0
add x20, x19, #96
; byrRegs +[x20]
mov x21, x20
@@ -99,22 +99,6 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- blr x1
- ; byrRegs -[x0]
- ldr x1, [x21, #0x20]
- add x0, x0, x1
- sub x0, x0, #1
- sub x1, x1, #1
- bic x0, x0, x1
- ptrue p0.d
- ld1d { z8.d }, p0/z, [x0]
- mov x21, x20
- add x0, x21, #48
- ; byrRegs +[x0]
- movz x1, #0xD1FFAB1E // code for <unknown method>
- movk x1, #0xD1FFAB1E LSL #16
- movk x1, #0xD1FFAB1E LSL #32
- ldr x1, [x1]
mov v9.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
- ld1h { z10.h }, p0/z, [x0]
+ ptrue p0.d
+ ld1d { z10.d }, p0/z, [x0]
mov x21, x20
- add x0, x21, #56
+ add x0, x21, #48
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
- ld1h { z7.h }, p0/z, [x0]
+ ptrue p0.h
mov v8.d[1], v9.d[0]
- mov v10.d[1], v11.d[0]
- udot z8.d, z10.h, z7.h[1]
+ cmpne p0.h, p0/z, z8.h, #0
+ add xip1, fp, #24
+ str p0, [xip1]
+ ld1h { z8.h }, p0/z, [x0]
mov x21, x20
- add x0, x21, #64
+ add x0, x21, #56
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- mov v9.d[0], v8.d[1]
+ mov v12.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
ldr x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- mov v8.d[1], v9.d[0]
- str q8, [x0]
+ add xip1, fp, #24
+ ldr p0, [xip1]
+ ld1h { z7.h }, p0/z, [x0]
+ mov v10.d[1], v11.d[0]
+ mov v8.d[1], v12.d[0]
+ udot z10.d, z8.h, z7.h[1]
+ mov x21, x20
+ add x0, x21, #64
+ ; byrRegs +[x0]
+ movz x1, #0xD1FFAB1E // code for <unknown method>
+ movk x1, #0xD1FFAB1E LSL #16
+ movk x1, #0xD1FFAB1E LSL #32
+ ldr x1, [x1]
+ mov v8.d[0], v10.d[1]
+ blr x1
+ ; byrRegs -[x0]
+ ldr x1, [x21, #0x20]
+ add x0, x0, x1
+ sub x0, x0, #1
+ sub x1, x1, #1
+ bic x0, x0, x1
+ mov v10.d[1], v8.d[0]
+ str q10, [x0]
mov x21, x20
add x0, x21, #40
; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M33034_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x6, #0xD1FFAB1E LSL #16
movk x6, #0xD1FFAB1E LSL #32
ldr x6, [x6]
- ;; size=572 bbWeight=1 PerfScore 168.50
+ ;; size=580 bbWeight=1 PerfScore 168.50
G_M33034_IG03: ; bbWeight=1, epilog, nogc, extend
- ldr x23, [sp, #0x58]
- ldp x21, x22, [sp, #0x48]
- ldp x19, x20, [sp, #0x38]
- ldp d10, d11, [sp, #0x28]
- ldp d8, d9, [sp, #0x18]
- ldp fp, lr, [sp], #0x60
+ ldr x23, [sp, #0x68]
+ ldp x21, x22, [sp, #0x58]
+ ldp x19, x20, [sp, #0x48]
+ ldr d12, [sp, #0x40]
+ ldp d10, d11, [sp, #0x30]
+ ldp d8, d9, [sp, #0x20]
+ ldp fp, lr, [sp], #0x70
br x6
- ;; size=28 bbWeight=1 PerfScore 8.00
+ ;; size=32 bbWeight=1 PerfScore 10.00
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
; ============================================================
Unwind Info:
>> Start offset : 0x000000 (not in unwind data)
>> End offset : 0xd1ffab1e (not in unwind data)
- Code Words : 3
+ Code Words : 4
Epilog Count : 1
E bit : 0
X bit : 0
Vers : 0
- Function Length : 158 (0x0009e) Actual length = 632 (0x000278)
+ Function Length : 162 (0x000a2) Actual length = 648 (0x000288)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
---- Unwind codes ----
E1 set_fp; mov fp, sp
---- Epilog start at index 1 ----
- D1 0B save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+ D1 0D save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
E6 save_next
- C8 07 save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+ C8 09 save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+ DD 08 save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
E6 save_next
- D8 03 save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
- 8B save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+ D8 04 save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+ 8D save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+ E4 end
+ E4 end
E4 end
E4 end
+16 (+2.53%) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)@@ -9,12 +9,12 @@
; Final local variable assignments
;
; V00 this [V00,T02] ( 4, 4 ) ref -> x19 this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int>
-;* V01 loc0 [V01,T30] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[sbyte]>
-; V02 loc1 [V02,T29] ( 3, 3 ) mask -> [fp+0x10] spill-single-def <System.Numerics.Vector`1[sbyte]>
-; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[int]>
+;* V01 loc0 [V01,T34] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[sbyte]>
+; V02 loc1 [V02,T32] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[sbyte]>
+; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d10 <System.Numerics.Vector`1[int]>
;# V04 OutArgs [V04 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V05 tmp1 [V05,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
-; V06 tmp2 [V06,T32] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V05 tmp1 [V05,T30] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V06 tmp2 [V06,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
; V07 tmp3 [V07,T18] ( 2, 4 ) long -> x21 "impAppendStmt"
; V08 tmp4 [V08,T19] ( 2, 4 ) long -> x22 "impAppendStmt"
; V09 tmp5 [V09,T20] ( 2, 4 ) long -> x23 "impAppendStmt"
@@ -51,21 +51,23 @@
;* V40 tmp36 [V40 ] ( 0, 0 ) long -> zero-ref ld-addr-op "Inline stloc first use temp"
; V41 tmp37 [V41,T28] ( 2, 4 ) long -> x0 "Inlining Arg"
; V42 tmp38 [V42,T17] ( 3, 6 ) long -> x4 "Inlining Arg"
-; V43 cse0 [V43,T00] ( 9, 9 ) byref -> x20 "CSE #02: aggressive"
+; V43 cse0 [V43,T29] ( 3, 3 ) mask -> [fp+0x18] spill-single-def "CSE #02: moderate"
+; V44 cse1 [V44,T00] ( 9, 9 ) byref -> x20 "CSE #01: aggressive"
;
-; Lcl frame size = 8
+; Lcl frame size = 16
G_M55930_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- stp fp, lr, [sp, #-0x60]!
- stp d8, d9, [sp, #0x18]
- stp d10, d11, [sp, #0x28]
- stp x19, x20, [sp, #0x38]
- stp x21, x22, [sp, #0x48]
- str x23, [sp, #0x58]
+ stp fp, lr, [sp, #-0x70]!
+ stp d8, d9, [sp, #0x20]
+ stp d10, d11, [sp, #0x30]
+ str d12, [sp, #0x40]
+ stp x19, x20, [sp, #0x48]
+ stp x21, x22, [sp, #0x58]
+ str x23, [sp, #0x68]
mov fp, sp
mov x19, x0
; gcrRegs +[x19]
- ;; size=32 bbWeight=1 PerfScore 7.00
+ ;; size=36 bbWeight=1 PerfScore 8.00
G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
ldr x1, [x1]
blr x1
; gcrRegs -[x0]
- ptrue p0.b
- add xip1, fp, #16
- str p0, [xip1]
+ mvni v8.4s, #0
add x20, x19, #96
; byrRegs +[x20]
mov x21, x20
@@ -99,22 +99,6 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- blr x1
- ; byrRegs -[x0]
- ldr x1, [x21, #0x20]
- add x0, x0, x1
- sub x0, x0, #1
- sub x1, x1, #1
- bic x0, x0, x1
- ptrue p0.s
- ld1w { z8.s }, p0/z, [x0]
- mov x21, x20
- add x0, x21, #48
- ; byrRegs +[x0]
- movz x1, #0xD1FFAB1E // code for <unknown method>
- movk x1, #0xD1FFAB1E LSL #16
- movk x1, #0xD1FFAB1E LSL #32
- ldr x1, [x1]
mov v9.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
- ld1b { z10.b }, p0/z, [x0]
+ ptrue p0.s
+ ld1w { z10.s }, p0/z, [x0]
mov x21, x20
- add x0, x21, #56
+ add x0, x21, #48
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
- ld1b { z16.b }, p0/z, [x0]
+ ptrue p0.b
mov v8.d[1], v9.d[0]
- mov v10.d[1], v11.d[0]
- sdot z8.s, z10.b, z16.b
+ cmpne p0.b, p0/z, z8.b, #0
+ add xip1, fp, #24
+ str p0, [xip1]
+ ld1b { z8.b }, p0/z, [x0]
mov x21, x20
- add x0, x21, #64
+ add x0, x21, #56
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- mov v9.d[0], v8.d[1]
+ mov v12.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
ldr x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- mov v8.d[1], v9.d[0]
- str q8, [x0]
+ add xip1, fp, #24
+ ldr p0, [xip1]
+ ld1b { z16.b }, p0/z, [x0]
+ mov v10.d[1], v11.d[0]
+ mov v8.d[1], v12.d[0]
+ sdot z10.s, z8.b, z16.b
+ mov x21, x20
+ add x0, x21, #64
+ ; byrRegs +[x0]
+ movz x1, #0xD1FFAB1E // code for <unknown method>
+ movk x1, #0xD1FFAB1E LSL #16
+ movk x1, #0xD1FFAB1E LSL #32
+ ldr x1, [x1]
+ mov v8.d[0], v10.d[1]
+ blr x1
+ ; byrRegs -[x0]
+ ldr x1, [x21, #0x20]
+ add x0, x0, x1
+ sub x0, x0, #1
+ sub x1, x1, #1
+ bic x0, x0, x1
+ mov v10.d[1], v8.d[0]
+ str q10, [x0]
mov x21, x20
add x0, x21, #40
; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M55930_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x6, #0xD1FFAB1E LSL #16
movk x6, #0xD1FFAB1E LSL #32
ldr x6, [x6]
- ;; size=572 bbWeight=1 PerfScore 168.50
+ ;; size=580 bbWeight=1 PerfScore 168.50
G_M55930_IG03: ; bbWeight=1, epilog, nogc, extend
- ldr x23, [sp, #0x58]
- ldp x21, x22, [sp, #0x48]
- ldp x19, x20, [sp, #0x38]
- ldp d10, d11, [sp, #0x28]
- ldp d8, d9, [sp, #0x18]
- ldp fp, lr, [sp], #0x60
+ ldr x23, [sp, #0x68]
+ ldp x21, x22, [sp, #0x58]
+ ldp x19, x20, [sp, #0x48]
+ ldr d12, [sp, #0x40]
+ ldp d10, d11, [sp, #0x30]
+ ldp d8, d9, [sp, #0x20]
+ ldp fp, lr, [sp], #0x70
br x6
- ;; size=28 bbWeight=1 PerfScore 8.00
+ ;; size=32 bbWeight=1 PerfScore 10.00
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
; ============================================================
Unwind Info:
>> Start offset : 0x000000 (not in unwind data)
>> End offset : 0xd1ffab1e (not in unwind data)
- Code Words : 3
+ Code Words : 4
Epilog Count : 1
E bit : 0
X bit : 0
Vers : 0
- Function Length : 158 (0x0009e) Actual length = 632 (0x000278)
+ Function Length : 162 (0x000a2) Actual length = 648 (0x000288)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
---- Unwind codes ----
E1 set_fp; mov fp, sp
---- Epilog start at index 1 ----
- D1 0B save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+ D1 0D save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
E6 save_next
- C8 07 save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+ C8 09 save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+ DD 08 save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
E6 save_next
- D8 03 save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
- 8B save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+ D8 04 save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+ 8D save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+ E4 end
+ E4 end
E4 end
E4 end
+16 (+2.53%) : 569192.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)@@ -9,12 +9,12 @@
; Final local variable assignments
;
; V00 this [V00,T02] ( 4, 4 ) ref -> x19 this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort>
-;* V01 loc0 [V01,T30] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[ushort]>
-; V02 loc1 [V02,T29] ( 3, 3 ) mask -> [fp+0x10] spill-single-def <System.Numerics.Vector`1[ushort]>
-; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[ushort]>
+;* V01 loc0 [V01,T34] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[ushort]>
+; V02 loc1 [V02,T32] ( 2, 2 ) simd16 -> d8 <System.Numerics.Vector`1[ushort]>
+; V03 loc2 [V03,T33] ( 2, 2 ) simd16 -> d10 <System.Numerics.Vector`1[ushort]>
;# V04 OutArgs [V04 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V05 tmp1 [V05,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
-; V06 tmp2 [V06,T32] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V05 tmp1 [V05,T30] ( 2, 4 ) simd16 -> d10 "impAppendStmt"
+; V06 tmp2 [V06,T31] ( 2, 4 ) simd16 -> d8 "impAppendStmt"
; V07 tmp3 [V07,T18] ( 2, 4 ) long -> x21 "impAppendStmt"
; V08 tmp4 [V08,T19] ( 2, 4 ) long -> x22 "impAppendStmt"
; V09 tmp5 [V09,T20] ( 2, 4 ) long -> x23 "impAppendStmt"
@@ -51,21 +51,23 @@
;* V40 tmp36 [V40 ] ( 0, 0 ) long -> zero-ref ld-addr-op "Inline stloc first use temp"
; V41 tmp37 [V41,T28] ( 2, 4 ) long -> x0 "Inlining Arg"
; V42 tmp38 [V42,T17] ( 3, 6 ) long -> x4 "Inlining Arg"
-; V43 cse0 [V43,T00] ( 9, 9 ) byref -> x20 "CSE #02: aggressive"
+; V43 cse0 [V43,T29] ( 3, 3 ) mask -> [fp+0x18] spill-single-def "CSE #02: moderate"
+; V44 cse1 [V44,T00] ( 9, 9 ) byref -> x20 "CSE #01: aggressive"
;
-; Lcl frame size = 8
+; Lcl frame size = 16
G_M13407_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
- stp fp, lr, [sp, #-0x60]!
- stp d8, d9, [sp, #0x18]
- stp d10, d11, [sp, #0x28]
- stp x19, x20, [sp, #0x38]
- stp x21, x22, [sp, #0x48]
- str x23, [sp, #0x58]
+ stp fp, lr, [sp, #-0x70]!
+ stp d8, d9, [sp, #0x20]
+ stp d10, d11, [sp, #0x30]
+ str d12, [sp, #0x40]
+ stp x19, x20, [sp, #0x48]
+ stp x21, x22, [sp, #0x58]
+ str x23, [sp, #0x68]
mov fp, sp
mov x19, x0
; gcrRegs +[x19]
- ;; size=32 bbWeight=1 PerfScore 7.00
+ ;; size=36 bbWeight=1 PerfScore 8.00
G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
ldr x1, [x1]
blr x1
; gcrRegs -[x0]
- ptrue p0.h
- add xip1, fp, #16
- str p0, [xip1]
+ mvni v8.4s, #0
add x20, x19, #96
; byrRegs +[x20]
mov x21, x20
@@ -99,22 +99,6 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- blr x1
- ; byrRegs -[x0]
- ldr x1, [x21, #0x20]
- add x0, x0, x1
- sub x0, x0, #1
- sub x1, x1, #1
- bic x0, x0, x1
- ptrue p0.h
- ld1h { z8.h }, p0/z, [x0]
- mov x21, x20
- add x0, x21, #48
- ; byrRegs +[x0]
- movz x1, #0xD1FFAB1E // code for <unknown method>
- movk x1, #0xD1FFAB1E LSL #16
- movk x1, #0xD1FFAB1E LSL #32
- ldr x1, [x1]
mov v9.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
+ ptrue p0.h
ld1h { z10.h }, p0/z, [x0]
mov x21, x20
- add x0, x21, #56
+ add x0, x21, #48
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- add xip1, fp, #16
- ldr p0, [xip1]
- ld1h { z16.h }, p0/z, [x0]
+ ptrue p0.h
mov v8.d[1], v9.d[0]
- mov v10.d[1], v11.d[0]
- eor3 z8.d, z8.d, z10.d, z16.d
+ cmpne p0.h, p0/z, z8.h, #0
+ add xip1, fp, #24
+ str p0, [xip1]
+ ld1h { z8.h }, p0/z, [x0]
mov x21, x20
- add x0, x21, #64
+ add x0, x21, #56
; byrRegs +[x0]
movz x1, #0xD1FFAB1E // code for <unknown method>
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
- mov v9.d[0], v8.d[1]
+ mov v12.d[0], v8.d[1]
blr x1
; byrRegs -[x0]
ldr x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
sub x0, x0, #1
sub x1, x1, #1
bic x0, x0, x1
- mov v8.d[1], v9.d[0]
- str q8, [x0]
+ add xip1, fp, #24
+ ldr p0, [xip1]
+ ld1h { z16.h }, p0/z, [x0]
+ mov v10.d[1], v11.d[0]
+ mov v8.d[1], v12.d[0]
+ eor3 z10.d, z10.d, z8.d, z16.d
+ mov x21, x20
+ add x0, x21, #64
+ ; byrRegs +[x0]
+ movz x1, #0xD1FFAB1E // code for <unknown method>
+ movk x1, #0xD1FFAB1E LSL #16
+ movk x1, #0xD1FFAB1E LSL #32
+ ldr x1, [x1]
+ mov v8.d[0], v10.d[1]
+ blr x1
+ ; byrRegs -[x0]
+ ldr x1, [x21, #0x20]
+ add x0, x0, x1
+ sub x0, x0, #1
+ sub x1, x1, #1
+ bic x0, x0, x1
+ mov v10.d[1], v8.d[0]
+ str q10, [x0]
mov x21, x20
add x0, x21, #40
; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M13407_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x6, #0xD1FFAB1E LSL #16
movk x6, #0xD1FFAB1E LSL #32
ldr x6, [x6]
- ;; size=572 bbWeight=1 PerfScore 166.50
+ ;; size=580 bbWeight=1 PerfScore 166.50
G_M13407_IG03: ; bbWeight=1, epilog, nogc, extend
- ldr x23, [sp, #0x58]
- ldp x21, x22, [sp, #0x48]
- ldp x19, x20, [sp, #0x38]
- ldp d10, d11, [sp, #0x28]
- ldp d8, d9, [sp, #0x18]
- ldp fp, lr, [sp], #0x60
+ ldr x23, [sp, #0x68]
+ ldp x21, x22, [sp, #0x58]
+ ldp x19, x20, [sp, #0x48]
+ ldr d12, [sp, #0x40]
+ ldp d10, d11, [sp, #0x30]
+ ldp d8, d9, [sp, #0x20]
+ ldp fp, lr, [sp], #0x70
br x6
- ;; size=28 bbWeight=1 PerfScore 8.00
+ ;; size=32 bbWeight=1 PerfScore 10.00
-; Total bytes of code 632, prolog size 28, PerfScore 181.50, instruction count 158, allocated bytes for code 632 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 184.50, instruction count 162, allocated bytes for code 648 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
; ============================================================
Unwind Info:
>> Start offset : 0x000000 (not in unwind data)
>> End offset : 0xd1ffab1e (not in unwind data)
- Code Words : 3
+ Code Words : 4
Epilog Count : 1
E bit : 0
X bit : 0
Vers : 0
- Function Length : 158 (0x0009e) Actual length = 632 (0x000278)
+ Function Length : 162 (0x000a2) Actual length = 648 (0x000288)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
---- Unwind codes ----
E1 set_fp; mov fp, sp
---- Epilog start at index 1 ----
- D1 0B save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+ D1 0D save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
E6 save_next
- C8 07 save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+ C8 09 save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+ DD 08 save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
E6 save_next
- D8 03 save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
- 8B save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+ D8 04 save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+ 8D save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+ E4 end
+ E4 end
E4 end
E4 end
benchmarks.run_pgo.linux.arm64.checked.mch-4 (-0.85%) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)@@ -36,8 +36,7 @@ G_M60402_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
G_M60402_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov w0, #0xD1FFAB1E
str w0, [fp, #0x20] // [V11 tmp2]
- ptrue p0.b
- mov z16.b, p0/z, #1
+ mvni v16.4s, #0
str q16, [fp, #0x80] // [V01 loc0]
str xzr, [fp, #0x58] // [V04 loc3]
cntb x0, all
@@ -62,7 +61,7 @@ G_M60402_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
ldr w0, [x0, #0x08]
; gcrRegs -[x0]
cbnz w0, G_M60402_IG05
- ;; size=96 bbWeight=1 PerfScore 40.50
+ ;; size=92 bbWeight=1 PerfScore 37.00
G_M60402_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -179,7 +178,7 @@ G_M60402_IG11: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 472, prolog size 36, PerfScore 168.27, instruction count 118, allocated bytes for code 472 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
+; Total bytes of code 468, prolog size 36, PerfScore 164.77, instruction count 117, allocated bytes for code 468 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
; ============================================================
Unwind Info:
@@ -190,7 +189,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 118 (0x00076) Actual length = 472 (0x0001d8)
+ Function Length : 117 (0x00075) Actual length = 468 (0x0001d4)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-0.58%) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)@@ -229,8 +229,7 @@ G_M22667_IG17: ; bbWeight=0.01, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
;; size=12 bbWeight=0.01 PerfScore 0.02
G_M22667_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
ptrue p0.h
- mov z16.h, p0/z, #1
- ptrue p0.h
+ mvni v16.4s, #0
cmpne p0.h, p0/z, z16.h, #0
ptrue p1.h
ldr q16, [fp, #0x50] // [V05 loc4]
@@ -249,7 +248,7 @@ G_M22667_IG18: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
bl CORINFO_HELP_COUNTPROFILE32
; gcr arg pop 0
movn w0, #0
- ;; size=76 bbWeight=1 PerfScore 25.50
+ ;; size=72 bbWeight=1 PerfScore 22.00
G_M22667_IG19: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x90
ret lr
@@ -265,7 +264,7 @@ G_M22667_IG21: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 688, prolog size 36, PerfScore 211.04, instruction count 172, allocated bytes for code 688 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
+; Total bytes of code 684, prolog size 36, PerfScore 207.54, instruction count 171, allocated bytes for code 684 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
; ============================================================
Unwind Info:
@@ -276,7 +275,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 172 (0x000ac) Actual length = 688 (0x0002b0)
+ Function Length : 171 (0x000ab) Actual length = 684 (0x0002ac)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-0.49%) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)@@ -95,8 +95,7 @@ G_M34028_IG06: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
ldr x2, [x2]
blr x2
; gcr arg pop 0
- ptrue p0.h
- mov z0.h, p0/z, #1
+ mvni v0.4s, #0
movz x0, #0xD1FFAB1E // code for <unknown method>
movk x0, #0xD1FFAB1E LSL #16
movk x0, #0xD1FFAB1E LSL #32
@@ -105,7 +104,7 @@ G_M34028_IG06: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
; gcr arg pop 0
str q0, [fp, #0x50] // [V05 loc4]
b G_M34028_IG16
- ;; size=68 bbWeight=1 PerfScore 22.50
+ ;; size=64 bbWeight=1 PerfScore 19.00
G_M34028_IG07: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
ldr w0, [fp, #0x84] // [V01 loc0]
sxtw x0, w0
@@ -317,7 +316,7 @@ G_M34028_IG27: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 812, prolog size 36, PerfScore 237.56, instruction count 203, allocated bytes for code 812 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
+; Total bytes of code 808, prolog size 36, PerfScore 234.06, instruction count 202, allocated bytes for code 808 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
; ============================================================
Unwind Info:
@@ -328,7 +327,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 203 (0x000cb) Actual length = 812 (0x00032c)
+ Function Length : 202 (0x000ca) Actual length = 808 (0x000328)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-0.38%) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)@@ -49,8 +49,7 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
str wzr, [fp, #0xC4] // [V01 loc0]
cntb x0, all
str w0, [fp, #0xC0] // [V02 loc1]
- ptrue p0.b
- mov z16.b, p0/z, #1
+ mvni v16.4s, #0
str q16, [fp, #0xB0] // [V03 loc2]
ldr x0, [fp, #0xC8] // [V00 this]
; gcrRegs +[x0]
@@ -86,7 +85,7 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
ldr w0, [x0, #0x08]
; gcrRegs -[x0]
cbnz w0, G_M14759_IG05
- ;; size=136 bbWeight=1 PerfScore 59.50
+ ;; size=132 bbWeight=1 PerfScore 56.00
G_M14759_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -394,7 +393,7 @@ G_M14759_IG27: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 1052, prolog size 44, PerfScore 348.29, instruction count 263, allocated bytes for code 1052 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
+; Total bytes of code 1048, prolog size 44, PerfScore 344.79, instruction count 262, allocated bytes for code 1048 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
; ============================================================
Unwind Info:
@@ -405,7 +404,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 263 (0x00107) Actual length = 1052 (0x00041c)
+ Function Length : 262 (0x00106) Actual length = 1048 (0x000418)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-0.43%) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)@@ -41,8 +41,7 @@ G_M892_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
G_M892_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
mov w0, #0xD1FFAB1E
str w0, [fp, #0x28] // [V15 tmp3]
- ptrue p0.b
- mov z16.b, p0/z, #1
+ mvni v16.4s, #0
str q16, [fp, #0xA0] // [V01 loc0]
str wzr, [fp, #0x6C] // [V05 loc4]
cntb x0, all
@@ -71,7 +70,7 @@ G_M892_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, i
ldr w0, [x0, #0x08]
; gcrRegs -[x0]
cbnz w0, G_M892_IG05
- ;; size=104 bbWeight=1 PerfScore 46.00
+ ;; size=100 bbWeight=1 PerfScore 42.50
G_M892_IG03: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
movz x0, #0xD1FFAB1E
movk x0, #0xD1FFAB1E LSL #16
@@ -357,7 +356,7 @@ G_M892_IG24: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 928, prolog size 36, PerfScore 307.29, instruction count 232, allocated bytes for code 928 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
+; Total bytes of code 924, prolog size 36, PerfScore 303.79, instruction count 231, allocated bytes for code 924 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
; ============================================================
Unwind Info:
@@ -368,7 +367,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 232 (0x000e8) Actual length = 928 (0x0003a0)
+ Function Length : 231 (0x000e7) Actual length = 924 (0x00039c)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) libraries.pmi.linux.arm64.checked.mch-4 (-16.67%) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)@@ -16,15 +16,14 @@ G_M40111_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M40111_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.s, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M40111_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-16.67%) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)@@ -16,15 +16,14 @@ G_M56373_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M56373_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.d, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M56373_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-16.67%) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)@@ -16,15 +16,14 @@ G_M57390_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M57390_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.b, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M57390_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-16.67%) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)@@ -16,15 +16,14 @@ G_M33416_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M33416_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.h, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M33416_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-16.67%) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)@@ -16,15 +16,14 @@ G_M18837_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M18837_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.d, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M18837_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-16.67%) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)@@ -16,15 +16,14 @@ G_M43790_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M43790_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- pfalse p0.b
- mov z0.d, p0/z, #1
- ;; size=8 bbWeight=1 PerfScore 4.00
+ movi v0.4s, #0
+ ;; size=4 bbWeight=1 PerfScore 0.50
G_M43790_IG03: ; bbWeight=1, epilog, nogc, extend
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
; ============================================================
Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 6 (0x00006) Actual length = 24 (0x000018)
+ Function Length : 5 (0x00005) Actual length = 20 (0x000014)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch-4 (-1.30%) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)@@ -7,33 +7,33 @@
; No matching PGO data
; Final local variable assignments
;
-; V00 this [V00,T09] ( 5, 5 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
-; V01 loc0 [V01,T05] ( 3, 9 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
+; V00 this [V00,T08] ( 5, 5 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0 [V01,T22] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[byte]>
;* V02 loc1 [V02 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V03 loc2 [V03 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V04 loc3 [V04 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
; V05 loc4 [V05,T00] ( 12, 41.50) int -> x1
-; V06 loc5 [V06,T13] ( 3, 6 ) int -> x2 single-def
-; V07 loc6 [V07,T17] ( 3, 5 ) long -> x4
-; V08 loc7 [V08,T18] ( 3, 5 ) long -> x6
+; V06 loc5 [V06,T12] ( 3, 6 ) int -> x2 single-def
+; V07 loc6 [V07,T16] ( 3, 5 ) long -> x4
+; V08 loc7 [V08,T17] ( 3, 5 ) long -> x6
; V09 loc8 [V09 ] ( 1, 0.50) ref -> [fp+0x18] must-init pinned class-hnd single-def <byte[]>
; V10 loc9 [V10 ] ( 1, 0.50) ref -> [fp+0x10] must-init pinned class-hnd single-def <byte[]>
-; V11 loc10 [V11,T08] ( 2, 8 ) ubyte -> x8
+; V11 loc10 [V11,T07] ( 2, 8 ) ubyte -> x8
;# V12 OutArgs [V12 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V13 tmp1 [V13,T15] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
-; V14 tmp2 [V14,T16] ( 5, 5 ) ref -> x6 class-hnd single-def "dup spill" <byte[]>
-; V15 tmp3 [V15,T19] ( 2, 2 ) long -> x4 "Cast away GC"
-; V16 tmp4 [V16,T20] ( 2, 2 ) long -> x6 "Cast away GC"
+; V13 tmp1 [V13,T14] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
+; V14 tmp2 [V14,T15] ( 5, 5 ) ref -> x6 class-hnd single-def "dup spill" <byte[]>
+; V15 tmp3 [V15,T18] ( 2, 2 ) long -> x4 "Cast away GC"
+; V16 tmp4 [V16,T19] ( 2, 2 ) long -> x6 "Cast away GC"
; V17 tmp5 [V17,T01] ( 3, 24 ) ref -> x2 "arr expr"
; V18 tmp6 [V18,T02] ( 3, 24 ) ref -> x6 "arr expr"
-;* V19 tmp7 [V19,T21] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
-;* V20 tmp8 [V20,T22] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
-; V21 cse0 [V21,T06] ( 3, 8.50) int -> x2 "CSE #11: aggressive"
-; V22 cse1 [V22,T07] ( 3, 8.50) int -> x4 "CSE #14: aggressive"
-; V23 cse2 [V23,T14] ( 3, 6 ) int -> x7 "CSE #07: aggressive"
-; V24 cse3 [V24,T12] ( 4, 6.50) int -> x0 "CSE #06: aggressive"
-; V25 cse4 [V25,T10] ( 4, 6.50) ref -> x3 "CSE #01: aggressive"
-; V26 cse5 [V26,T11] ( 4, 6.50) ref -> x5 "CSE #03: aggressive"
+;* V19 tmp7 [V19,T20] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
+;* V20 tmp8 [V20,T21] ( 0, 0 ) ref -> zero-ref single-def "arr expr"
+; V21 cse0 [V21,T05] ( 3, 8.50) int -> x2 "CSE #11: aggressive"
+; V22 cse1 [V22,T06] ( 3, 8.50) int -> x4 "CSE #14: aggressive"
+; V23 cse2 [V23,T13] ( 3, 6 ) int -> x7 "CSE #07: aggressive"
+; V24 cse3 [V24,T11] ( 4, 6.50) int -> x0 "CSE #06: aggressive"
+; V25 cse4 [V25,T09] ( 4, 6.50) ref -> x3 "CSE #01: aggressive"
+; V26 cse5 [V26,T10] ( 4, 6.50) ref -> x5 "CSE #03: aggressive"
; V27 cse6 [V27,T03] ( 3, 12 ) long -> x4 "CSE #08: aggressive"
; V28 cse7 [V28,T04] ( 3, 12 ) long -> x8 "CSE #05: aggressive"
;
@@ -46,7 +46,6 @@ G_M892_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
;; size=12 bbWeight=1 PerfScore 2.50
G_M892_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
; gcrRegs +[x0]
- ptrue p0.b
mov w1, wzr
cntb x2, all
ldr x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
ldr w6, [x5, #0x08]
cmp w4, w6
bne G_M892_IG11
- ;; size=36 bbWeight=1 PerfScore 18.00
+ ;; size=32 bbWeight=1 PerfScore 16.00
G_M892_IG03: ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
mov x4, x3
; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07: ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
G_M892_IG08: ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
sxtw x8, w1
add x9, x4, x8
+ ptrue p0.b
ld1b { z16.b }, p0/z, [x9]
add x8, x6, x8
ld1b { z17.b }, p0/z, [x8]
- ptrue p1.b
- cmpne p1.b, p1/z, z16.b, z17.b
- mov z16.b, p1/z, #1
- ptrue p1.b
- uaddv d16, p1, z16.b
+ cmpne p0.b, p0/z, z16.b, z17.b
+ mov z16.b, p0/z, #1
+ ptrue p0.b
+ uaddv d16, p0, z16.b
umov x8, v16.d[0]
uxtb w8, w8
cmp w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
; ============================================================
Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 77 (0x0004d) Actual length = 308 (0x000134)
+ Function Length : 76 (0x0004c) Actual length = 304 (0x000130)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -4 (-1.05%) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)@@ -21,14 +21,13 @@
;# V10 OutArgs [V10 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V11 tmp1 [V11,T05] ( 5, 8 ) ref -> x3 class-hnd single-def "dup spill" <char[]>
;* V12 tmp2 [V12 ] ( 0, 0 ) ushort -> zero-ref "Inlining Arg"
-;* V13 tmp3 [V13 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg" <System.Numerics.Vector`1[short]>
-; V14 tmp4 [V14,T11] ( 2, 2 ) long -> x3 "Cast away GC"
-; V15 tmp5 [V15,T01] ( 3, 24 ) ref -> x3 "arr expr"
-; V16 cse0 [V16,T08] ( 3, 6 ) int -> x4 "CSE #07: aggressive"
-; V17 cse1 [V17,T03] ( 5, 10.25) int -> x0 "CSE #02: aggressive"
-; V18 cse2 [V18,T07] ( 3, 6 ) ref -> x2 "CSE #06: aggressive"
-; V19 cse3 [V19,T04] ( 4, 10 ) int -> x5 "CSE #05: aggressive"
-; V20 cse4 [V20,T10] ( 2, 4.25) mask -> p0 hoist "CSE #03: aggressive"
+; V13 tmp3 [V13,T11] ( 2, 2 ) long -> x3 "Cast away GC"
+; V14 tmp4 [V14,T01] ( 3, 24 ) ref -> x3 "arr expr"
+; V15 cse0 [V15,T08] ( 3, 6 ) int -> x4 "CSE #07: aggressive"
+; V16 cse1 [V16,T03] ( 5, 10.25) int -> x0 "CSE #02: aggressive"
+; V17 cse2 [V17,T07] ( 3, 6 ) ref -> x2 "CSE #06: aggressive"
+; V18 cse3 [V18,T04] ( 4, 10 ) int -> x5 "CSE #05: aggressive"
+; V19 cse4 [V19,T10] ( 2, 4.25) mask -> p0 hoist "CSE #03: aggressive"
;
; Lcl frame size = 16
@@ -62,14 +61,13 @@ G_M34028_IG04: ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
G_M34028_IG05: ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
ldrh w4, [x0, #0x14]
dup v16.8h, w4
- ptrue p0.h
- mov z17.h, p0/z, #1
+ mvni v17.4s, #0
ldr w0, [x0, #0x10]
; gcrRegs -[x0]
cnth x5, all
cmp w0, w5
ble G_M34028_IG10
- ;; size=32 bbWeight=1 PerfScore 15.50
+ ;; size=28 bbWeight=1 PerfScore 12.00
G_M34028_IG06: ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
ptrue p0.h
cmpne p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18: ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
; ============================================================
Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 95 (0x0005f) Actual length = 380 (0x00017c)
+ Function Length : 94 (0x0005e) Actual length = 376 (0x000178)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +8 (+2.11%) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)@@ -8,31 +8,31 @@
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
-; V00 this [V00,T06] ( 9, 7 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
+; V00 this [V00,T05] ( 9, 7 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrCmp>
; V01 loc0 [V01,T02] ( 6, 17.50) int -> x1
; V02 loc1 [V02,T04] ( 5, 10 ) int -> x2 single-def
-; V03 loc2 [V03,T05] ( 4, 10 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
-; V04 loc3 [V04,T01] ( 6, 18 ) mask -> p1 <System.Numerics.Vector`1[byte]>
+;* V03 loc2 [V03,T19] ( 0, 0 ) mask -> zero-ref single-def <System.Numerics.Vector`1[byte]>
+; V04 loc3 [V04,T01] ( 6, 18 ) mask -> p0 <System.Numerics.Vector`1[byte]>
; V05 loc4 [V05,T20] ( 4, 13 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
;* V06 loc5 [V06 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
;* V07 loc6 [V07 ] ( 0, 0 ) simd16 -> zero-ref <System.Numerics.Vector`1[byte]>
-; V08 loc7 [V08,T11] ( 3, 5 ) long -> x4
-; V09 loc8 [V09,T12] ( 3, 5 ) long -> x5
+; V08 loc7 [V08,T10] ( 3, 5 ) long -> x4
+; V09 loc8 [V09,T11] ( 3, 5 ) long -> x5
; V10 loc9 [V10 ] ( 1, 0.50) ref -> [fp+0x28] must-init pinned class-hnd single-def <byte[]>
; V11 loc10 [V11 ] ( 1, 0.50) ref -> [fp+0x20] must-init pinned class-hnd single-def <byte[]>
-; V12 loc11 [V12,T10] ( 4, 5 ) int -> x3
-; V13 loc12 [V13,T18] ( 3, 1.50) int -> x3 single-def
+; V12 loc11 [V12,T09] ( 4, 5 ) int -> x3
+; V13 loc12 [V13,T17] ( 3, 1.50) int -> x3 single-def
; V14 loc13 [V14,T00] ( 7, 22.50) int -> x4
;# V15 OutArgs [V15 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-; V16 tmp1 [V16,T08] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
-; V17 tmp2 [V17,T09] ( 5, 5 ) ref -> x5 class-hnd single-def "dup spill" <byte[]>
-; V18 tmp3 [V18,T16] ( 2, 2 ) long -> x4 "Cast away GC"
-; V19 tmp4 [V19,T17] ( 2, 2 ) long -> x5 "Cast away GC"
-; V20 tmp5 [V20,T13] ( 3, 3 ) ref -> x2 single-def "arr expr"
-; V21 tmp6 [V21,T14] ( 3, 3 ) ref -> x0 single-def "arr expr"
-; V22 cse0 [V22,T07] ( 3, 6 ) int -> x3 "CSE #05: aggressive"
-; V23 cse1 [V23,T19] ( 3, 1.50) long -> x3 "CSE #08: moderate"
-; V24 cse2 [V24,T15] ( 4, 2 ) int -> x1 "CSE #07: moderate"
+; V16 tmp1 [V16,T07] ( 5, 5 ) ref -> x4 class-hnd single-def "dup spill" <byte[]>
+; V17 tmp2 [V17,T08] ( 5, 5 ) ref -> x5 class-hnd single-def "dup spill" <byte[]>
+; V18 tmp3 [V18,T15] ( 2, 2 ) long -> x4 "Cast away GC"
+; V19 tmp4 [V19,T16] ( 2, 2 ) long -> x5 "Cast away GC"
+; V20 tmp5 [V20,T12] ( 3, 3 ) ref -> x2 single-def "arr expr"
+; V21 tmp6 [V21,T13] ( 3, 3 ) ref -> x0 single-def "arr expr"
+; V22 cse0 [V22,T06] ( 3, 6 ) int -> x3 "CSE #05: aggressive"
+; V23 cse1 [V23,T18] ( 3, 1.50) long -> x3 "CSE #08: moderate"
+; V24 cse2 [V24,T14] ( 4, 2 ) int -> x1 "CSE #07: moderate"
; V25 cse3 [V25,T03] ( 3, 12 ) long -> x6 "CSE #06: aggressive"
; V26 rat0 [V26,T21] ( 3, 9 ) simd16 -> [fp+0x10] do-not-enreg[S] "SIMDInitTempVar"
;
@@ -47,10 +47,9 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
; gcrRegs +[x0]
mov w1, wzr
cntb x2, all
- ptrue p0.b
ldr w3, [x0, #0x20]
mov w4, wzr
- whilelt p1.b, w4, w3
+ whilelt p0.b, w4, w3
movi v16.4s, #0
ldr x4, [x0, #0x10]
; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
; gcrRegs -[x5]
cmp w4, w5
bne G_M14759_IG14
- ;; size=52 bbWeight=1 PerfScore 24.00
+ ;; size=48 bbWeight=1 PerfScore 22.00
G_M14759_IG03: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
ldr x4, [x0, #0x10]
; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
mov x5, xzr
;; size=4 bbWeight=0.50 PerfScore 0.25
G_M14759_IG07: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
- ptest p0, p1.b
+ ptrue p1.b
+ ptest p1, p0.b
bge G_M14759_IG09
- ;; size=8 bbWeight=1 PerfScore 3.00
+ ;; size=12 bbWeight=1 PerfScore 5.00
G_M14759_IG08: ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
sxtw x6, w1
add x7, x4, x6
- ld1b { z16.b }, p1/z, [x7]
+ ld1b { z16.b }, p0/z, [x7]
add x6, x5, x6
- ld1b { z17.b }, p1/z, [x6]
+ ld1b { z17.b }, p0/z, [x6]
+ ptrue p0.b
+ cmpne p0.b, p0/z, z16.b, z17.b
+ mov z16.b, p0/z, #1
+ ptrue p0.b
+ cmpne p0.b, p0/z, z16.b, #0
ptrue p1.b
- cmpne p1.b, p1/z, z16.b, z17.b
- mov z16.b, p1/z, #1
- ptrue p1.b
- cmpne p1.b, p1/z, z16.b, #0
- ptest p0, p1.b
+ ptest p1, p0.b
bne G_M14759_IG09
add w1, w1, w2
- whilelt p1.b, w1, w3
- ptest p0, p1.b
+ whilelt p0.b, w1, w3
+ ptrue p1.b
+ ptest p1, p0.b
blt G_M14759_IG08
- ;; size=64 bbWeight=4 PerfScore 152.00
+ ;; size=72 bbWeight=4 PerfScore 168.00
G_M14759_IG09: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
mov w3, wzr
mov w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19: ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
brk #0
;; size=8 bbWeight=0 PerfScore 0.00
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
; ============================================================
Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 95 (0x0005f) Actual length = 380 (0x00017c)
+ Function Length : 97 (0x00061) Actual length = 388 (0x000184)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +8 (+5.26%) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)@@ -8,19 +8,20 @@
; Final local variable assignments
;
; V00 this [V00,T05] ( 4, 4 ) ref -> x0 this class-hnd single-def <SveBenchmarks.StrLen>
-; V01 loc0 [V01,T04] ( 3, 7 ) mask -> p0 single-def <System.Numerics.Vector`1[byte]>
+; V01 loc0 [V01,T11] ( 2, 3 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
;* V02 loc1 [V02 ] ( 0, 0 ) mask -> zero-ref <System.Numerics.Vector`1[byte]>
-; V03 loc2 [V03,T10] ( 5, 13 ) simd16 -> d16 <System.Numerics.Vector`1[byte]>
+; V03 loc2 [V03,T10] ( 5, 13 ) simd16 -> d17 <System.Numerics.Vector`1[byte]>
; V04 loc3 [V04,T00] ( 6, 18 ) long -> x1
; V05 loc4 [V05,T07] ( 2, 5 ) long -> x2 single-def
-; V06 loc5 [V06,T01] ( 5, 12 ) mask -> p1 <System.Numerics.Vector`1[byte]>
-; V07 loc6 [V07,T03] ( 4, 7 ) long -> x0
+; V06 loc5 [V06,T01] ( 5, 12 ) mask -> p0 <System.Numerics.Vector`1[byte]>
+; V07 loc6 [V07,T04] ( 4, 7 ) long -> x0
; V08 loc7 [V08 ] ( 1, 1 ) ref -> [fp+0x18] must-init pinned class-hnd single-def <byte[]>
;# V09 OutArgs [V09 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V10 tmp1 [V10,T02] ( 5, 8 ) ref -> x0 class-hnd single-def "dup spill" <byte[]>
; V11 tmp2 [V11,T08] ( 2, 2 ) long -> x0 "Cast away GC"
; V12 cse0 [V12,T06] ( 3, 6 ) int -> x3 "CSE #02: aggressive"
-; V13 cse1 [V13,T09] ( 2, 1 ) int -> x4 "CSE #01: moderate"
+; V13 cse1 [V13,T03] ( 3, 8 ) mask -> p1 "CSE #03: aggressive"
+; V14 cse2 [V14,T09] ( 2, 1 ) int -> x4 "CSE #01: moderate"
;
; Lcl frame size = 16
@@ -31,16 +32,16 @@ G_M60402_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
;; size=12 bbWeight=1 PerfScore 2.50
G_M60402_IG02: ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
; gcrRegs +[x0]
- ptrue p0.b
+ mvni v16.4s, #0
mov x1, xzr
cntb x2, all
ldr w3, [x0, #0x18]
mov w4, wzr
- whilelt p1.b, w4, w3
+ whilelt p0.b, w4, w3
ldr x0, [x0, #0x08]
str x0, [fp, #0x18] // [V08 loc7]
cbz x0, G_M60402_IG04
- ;; size=36 bbWeight=1 PerfScore 15.00
+ ;; size=36 bbWeight=1 PerfScore 13.50
G_M60402_IG03: ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
ldr w4, [x0, #0x08]
cbz w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04: ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
mov x0, xzr
;; size=4 bbWeight=0.50 PerfScore 0.25
G_M60402_IG05: ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
- ld1b { z16.b }, p1/z, [x0]
- movi v17.4s, #0
+ ld1b { z17.b }, p0/z, [x0]
+ ptrue p1.b
+ cmpne p1.b, p1/z, z16.b, #0
+ movi v16.4s, #0
ptrue p2.b
- cmpeq p2.b, p2/z, z16.b, z17.b
- ptest p0, p2.b
+ cmpeq p2.b, p2/z, z17.b, z16.b
+ ptest p1, p2.b
bne G_M60402_IG07
- ;; size=24 bbWeight=2 PerfScore 33.00
+ ;; size=32 bbWeight=2 PerfScore 43.00
G_M60402_IG06: ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
add x1, x1, x2
- whilelt p1.b, w1, w3
+ whilelt p0.b, w1, w3
add x4, x0, x1
- ld1b { z16.b }, p1/z, [x4]
- movi v17.4s, #0
+ ld1b { z17.b }, p0/z, [x4]
+ movi v16.4s, #0
ptrue p2.b
- cmpeq p2.b, p2/z, z16.b, z17.b
- ptest p0, p2.b
+ cmpeq p2.b, p2/z, z17.b, z16.b
+ ptest p1, p2.b
beq G_M60402_IG06
;; size=36 bbWeight=4 PerfScore 78.00
G_M60402_IG07: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
- ptrue p0.b
- cmpne p0.b, p0/z, z16.b, #0
- cntp x0, p1, p0.b
+ ptrue p1.b
+ cmpne p1.b, p1/z, z17.b, #0
+ cntp x0, p0, p1.b
add x0, x0, x1
;; size=16 bbWeight=1 PerfScore 7.50
G_M60402_IG08: ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08: ; bbWeight=1, epilog, nogc, extend
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
; ============================================================
Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 38 (0x00026) Actual length = 152 (0x000098)
+ Function Length : 40 (0x00028) Actual length = 160 (0x0000a0)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) DetailsSize improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze outputbenchmarks.run.linux.arm64.checked.mch
Detail diffs
coreclr_tests.run.linux.arm64.checked.mch
Detail diffs
benchmarks.run_pgo.linux.arm64.checked.mch
Detail diffs
libraries.pmi.linux.arm64.checked.mch
Detail diffs
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch
Detail diffs
|
Diffs look the same to me. Which is as expected - Only PredicateInstructions.cs is triggering the optimisation. |
In that case we will have to reopen all the issues that #114438 closed when we introduced that method.
Do you think we should fix the Predicate tests that was regressed by the work that this PR introduces and then open a follow-up issue to fix it in general? |
Given these were essentially dups of each other, could we just reopen one of them?
I think fixing it will be quite a bit of work, so best in a new PR. This PR at least removes the asmcheck lines. |
Fixes #114443
Fixes #114431
Fixes #114433