Skip to content

Arm64 SVE: Optimise zero/allbits vectors the same as masks #115566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 64 commits into
base: main
Choose a base branch
from

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented May 14, 2025

Fixes #114443
Fixes #114431
Fixes #114433

  • TrueMasks are imported as constant vectors with the mask pattern expanded into the constant. If the pattern is unknown or cannot be expanded, a TrueMaskX node is imported.
  • FalseMasks are imported as constant vector zero
  • The mask used in a ConvertVectorToMask is imported as a ConversionTrueMask
  • During folding, constant vectors which are used as masks are turned into constant masks. This captures both imported TrueMasks/Falsemasks and constant vectors created by users.
  • Optimisations on masks should use IsTrueMask() and IsFalseMask().
  • IsTrueMask() checks for a constant mask pattern that matches the type of the parent.
  • IsFalseMask() checks for a constant mask zero
  • TrueMaskX and ConversionTrueMask are generally not optimised.
  • At code generation, constant masks are pattern matched back into a mask pattern for use in ptrue/pfalse.

Fixes dotnet#114443

* IsVectorZero() should allow for all zero vectors and false masks that have been converted to vectors.
* IsVectorAllBitsSet() should allow for all bits set vectors and true masks that have been converted to vectors.
* IsMaskZero() should all for false masks and all zero vectors that have been converted to masks.
* IsMaskAllBitsSet() should allow for true masks and all bit set vectors that have been converted to masks.

In addition:
* Fix up all the errors caused by these changes.
* Add a bunch of asmcheck tests
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 14, 2025
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 14, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@a74nh a74nh added arch-arm64 arm-sve Work related to arm64 SVE/SVE2 support labels May 14, 2025
@a74nh
Copy link
Contributor Author

a74nh commented May 14, 2025

Currently there are some issues around Non Faulting LoadVectors and and I need to check I've not created any code size regressions

@a74nh a74nh changed the title Arm64 SVE: Better optimise zero/allbits vectors Arm64 SVE: Optimise zero/allbits vectors the same as masks May 22, 2025
@kunalspathak
Copy link
Member

The jit now imports this as a constant mask (instead of as intrinsic). This causes assertion propagation to spot there is a constant and, if the use is not inside a loop, then replace the uses with the constant.

So, this should handle the case in #114286?

@a74nh
Copy link
Contributor Author

a74nh commented Jun 11, 2025

The jit now imports this as a constant mask (instead of as intrinsic). This causes assertion propagation to spot there is a constant and, if the use is not inside a loop, then replace the uses with the constant.

So, this should handle the case in #114286?

Needs more investigation, but it's definitely a step in the right direction.
The constant propagation should step in and do it in loops only, but I suspect we might want to tweak it for ptrues so it always does it.

@a74nh
Copy link
Contributor Author

a74nh commented Jun 13, 2025

Rebased and almost everything is looking good now. Just need to fix JIT/opt/SVE/PredicateInstructions/PredicateInstructions

@a74nh
Copy link
Contributor Author

a74nh commented Jun 13, 2025

The predicate optimisations are failing. But I think in a good way....

Consider. Note that a vector (not mask) is returned from the function.

    static Vector<short> ZipLow()
    {
        return Sve.ZipLow(Vector<short>.Zero, Sve.CreateTrueMaskInt16());
    }

HEAD, Tier 0:
Arguments in are converted to vectors.

IN0001: 000008      ptrue   p0.h
IN0002: 00000C      mov     z16.h, p0/z, #1
IN0003: 000010      movi    v17.4s, #0
IN0004: 000014      zip1    z0.h, z17.h, z16.h

HEAD:
Arguments in are masks, but the return needs converting from a mask:

IN0001: 000008      ptrue   p0.h
IN0002: 00000C      pfalse  p1.b
IN0003: 000010      zip1    p0.h, p1.h, p0.h
IN0004: 000014      mov     z0.h, p0/z, #1

With this PR:
The predicate optimisation fails to trigger (due to a now incorrect canMorphVectorOperandToMask()).
However, the code is already optimal without the optimisation. "Fixing" the predicate optimisation would result in the previous code which is worse.

IN0001: 000008      movi    v0.4s, #0
IN0002: 00000C      mvni    v16.4s, #0
IN0003: 000010      zip1    z0.h, z0.h, z16.h

Ideally, the optimisation should build up a cost model of using both the vector version and predicate version, taking into account all input arguments and all uses of the result. I feel doing that is adding scope to this PR (especially given the code being produced here is better than HEAD).

I suggest for this PR disabling the optimisation by removing the call to fgMorphTryUseAllMaskVariant() and fixing up the test cases. A follow on PR would add the cost modelling.

@kunalspathak
Copy link
Member

I suggest for this PR disabling the optimisation by removing the call to fgMorphTryUseAllMaskVariant() and fixing up the test cases. A follow on PR would add the cost modelling.

What are the diffs for this commit alone?

@a74nh
Copy link
Contributor Author

a74nh commented Jun 16, 2025

Prior to removing calls to fgMorphTryUseAllMaskVariant():

Diffs are based on 2,631,375 contexts (1,092,633 MinOpts, 1,538,742 FullOpts).

Overall (-334,028 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch 16,042,184 +8 +2.49%
coreclr_tests.run.linux.arm64.checked.mch 567,688,716 -333,984 -5.31%
benchmarks.run_pgo.linux.arm64.checked.mch 71,813,300 -20 -1.47%
libraries.pmi.linux.arm64.checked.mch 68,207,660 -40 -46.67%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch 16,695,796 +8 +2.49%
MinOpts (-31,028 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
coreclr_tests.run.linux.arm64.checked.mch 383,578,920 -31,008 -2.93%
benchmarks.run_pgo.linux.arm64.checked.mch 25,146,552 -20 -1.47%
FullOpts (-303,000 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch 15,723,912 +8 +2.49%
coreclr_tests.run.linux.arm64.checked.mch 184,109,796 -302,976 -7.75%
libraries.pmi.linux.arm64.checked.mch 68,087,900 -40 -46.67%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch 16,330,864 +8 +2.49%
Example diffs
benchmarks.run.linux.arm64.checked.mch
-4 (-1.30%) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
@@ -7,33 +7,33 @@
 ; No matching PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T09] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
-;  V01 loc0         [V01,T05] (  3,  9   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V00 this         [V00,T08] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0         [V01,T22] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V03 loc2         [V03    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V04 loc3         [V04    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T00] ( 12, 41.50)     int  ->   x1        
-;  V06 loc5         [V06,T13] (  3,  6   )     int  ->   x2         single-def
-;  V07 loc6         [V07,T17] (  3,  5   )    long  ->   x4        
-;  V08 loc7         [V08,T18] (  3,  5   )    long  ->   x6        
+;  V06 loc5         [V06,T12] (  3,  6   )     int  ->   x2         single-def
+;  V07 loc6         [V07,T16] (  3,  5   )    long  ->   x4        
+;  V08 loc7         [V08,T17] (  3,  5   )    long  ->   x6        
 ;  V09 loc8         [V09    ] (  1,  0.50)     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x10]   must-init pinned class-hnd single-def <byte[]>
-;  V11 loc10        [V11,T08] (  2,  8   )   ubyte  ->   x8        
+;  V11 loc10        [V11,T07] (  2,  8   )   ubyte  ->   x8        
 ;# V12 OutArgs      [V12    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V13 tmp1         [V13,T15] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V14 tmp2         [V14,T16] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
-;  V15 tmp3         [V15,T19] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V16 tmp4         [V16,T20] (  2,  2   )    long  ->   x6         "Cast away GC"
+;  V13 tmp1         [V13,T14] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V14 tmp2         [V14,T15] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
+;  V15 tmp3         [V15,T18] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V16 tmp4         [V16,T19] (  2,  2   )    long  ->   x6         "Cast away GC"
 ;  V17 tmp5         [V17,T01] (  3, 24   )     ref  ->   x2         "arr expr"
 ;  V18 tmp6         [V18,T02] (  3, 24   )     ref  ->   x6         "arr expr"
-;* V19 tmp7         [V19,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;* V20 tmp8         [V20,T22] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;  V21 cse0         [V21,T06] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
-;  V22 cse1         [V22,T07] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
-;  V23 cse2         [V23,T14] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
-;  V24 cse3         [V24,T12] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
-;  V25 cse4         [V25,T10] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
-;  V26 cse5         [V26,T11] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
+;* V19 tmp7         [V19,T20] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;* V20 tmp8         [V20,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;  V21 cse0         [V21,T05] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
+;  V22 cse1         [V22,T06] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
+;  V23 cse2         [V23,T13] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
+;  V24 cse3         [V24,T11] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
+;  V25 cse4         [V25,T09] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
+;  V26 cse5         [V26,T10] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
 ;  V27 cse6         [V27,T03] (  3, 12   )    long  ->   x4         "CSE #08: aggressive"
 ;  V28 cse7         [V28,T04] (  3, 12   )    long  ->   x8         "CSE #05: aggressive"
 ;
@@ -46,7 +46,6 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
             mov     w1, wzr
             cntb    x2, all
             ldr     x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
             ldr     w6, [x5, #0x08]
             cmp     w4, w6
             bne     G_M892_IG11
-						;; size=36 bbWeight=1 PerfScore 18.00
+						;; size=32 bbWeight=1 PerfScore 16.00
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
             mov     x4, x3
             ; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07:        ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
 G_M892_IG08:        ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
             sxtw    x8, w1
             add     x9, x4, x8
+            ptrue   p0.b
             ld1b    { z16.b }, p0/z, [x9]
             add     x8, x6, x8
             ld1b    { z17.b }, p0/z, [x8]
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            uaddv   d16, p1, z16.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            uaddv   d16, p0, z16.b
             umov    x8, v16.d[0]
             uxtb    w8, w8
             cmp     w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 77 (0x0004d) Actual length = 308 (0x000134)
+  Function Length   : 76 (0x0004c) Actual length = 304 (0x000130)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-1.05%) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
@@ -21,14 +21,13 @@
 ;# V10 OutArgs      [V10    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V11 tmp1         [V11,T05] (  5,  8   )     ref  ->   x3         class-hnd single-def "dup spill" <char[]>
 ;* V12 tmp2         [V12    ] (  0,  0   )  ushort  ->  zero-ref    "Inlining Arg"
-;* V13 tmp3         [V13    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg" <System.Numerics.Vector`1[short]>
-;  V14 tmp4         [V14,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
-;  V15 tmp5         [V15,T01] (  3, 24   )     ref  ->   x3         "arr expr"
-;  V16 cse0         [V16,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
-;  V17 cse1         [V17,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
-;  V18 cse2         [V18,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
-;  V19 cse3         [V19,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
-;  V20 cse4         [V20,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
+;  V13 tmp3         [V13,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
+;  V14 tmp4         [V14,T01] (  3, 24   )     ref  ->   x3         "arr expr"
+;  V15 cse0         [V15,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
+;  V16 cse1         [V16,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
+;  V17 cse2         [V17,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
+;  V18 cse3         [V18,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
+;  V19 cse4         [V19,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
 ;
 ; Lcl frame size = 16
 
@@ -62,14 +61,13 @@ G_M34028_IG04:        ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
 G_M34028_IG05:        ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
             ldrh    w4, [x0, #0x14]
             dup     v16.8h, w4
-            ptrue   p0.h
-            mov     z17.h, p0/z, #1
+            mvni    v17.4s, #0
             ldr     w0, [x0, #0x10]
             ; gcrRegs -[x0]
             cnth    x5, all
             cmp     w0, w5
             ble     G_M34028_IG10
-						;; size=32 bbWeight=1 PerfScore 15.50
+						;; size=28 bbWeight=1 PerfScore 12.00
 G_M34028_IG06:        ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
             ptrue   p0.h
             cmpne   p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 94 (0x0005e) Actual length = 376 (0x000178)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+8 (+2.11%) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
@@ -8,31 +8,31 @@
 ; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T06] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;  V00 this         [V00,T05] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
 ;  V01 loc0         [V01,T02] (  6, 17.50)     int  ->   x1        
 ;  V02 loc1         [V02,T04] (  5, 10   )     int  ->   x2         single-def
-;  V03 loc2         [V03,T05] (  4, 10   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
-;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
+;* V03 loc2         [V03,T19] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
+;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T20] (  4, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V06 loc5         [V06    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V07 loc6         [V07    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V08 loc7         [V08,T11] (  3,  5   )    long  ->   x4        
-;  V09 loc8         [V09,T12] (  3,  5   )    long  ->   x5        
+;  V08 loc7         [V08,T10] (  3,  5   )    long  ->   x4        
+;  V09 loc8         [V09,T11] (  3,  5   )    long  ->   x5        
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x28]   must-init pinned class-hnd single-def <byte[]>
 ;  V11 loc10        [V11    ] (  1,  0.50)     ref  ->  [fp+0x20]   must-init pinned class-hnd single-def <byte[]>
-;  V12 loc11        [V12,T10] (  4,  5   )     int  ->   x3        
-;  V13 loc12        [V13,T18] (  3,  1.50)     int  ->   x3         single-def
+;  V12 loc11        [V12,T09] (  4,  5   )     int  ->   x3        
+;  V13 loc12        [V13,T17] (  3,  1.50)     int  ->   x3         single-def
 ;  V14 loc13        [V14,T00] (  7, 22.50)     int  ->   x4        
 ;# V15 OutArgs      [V15    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V16 tmp1         [V16,T08] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V17 tmp2         [V17,T09] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
-;  V18 tmp3         [V18,T16] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V19 tmp4         [V19,T17] (  2,  2   )    long  ->   x5         "Cast away GC"
-;  V20 tmp5         [V20,T13] (  3,  3   )     ref  ->   x2         single-def "arr expr"
-;  V21 tmp6         [V21,T14] (  3,  3   )     ref  ->   x0         single-def "arr expr"
-;  V22 cse0         [V22,T07] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
-;  V23 cse1         [V23,T19] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
-;  V24 cse2         [V24,T15] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
+;  V16 tmp1         [V16,T07] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V17 tmp2         [V17,T08] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
+;  V18 tmp3         [V18,T15] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V19 tmp4         [V19,T16] (  2,  2   )    long  ->   x5         "Cast away GC"
+;  V20 tmp5         [V20,T12] (  3,  3   )     ref  ->   x2         single-def "arr expr"
+;  V21 tmp6         [V21,T13] (  3,  3   )     ref  ->   x0         single-def "arr expr"
+;  V22 cse0         [V22,T06] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
+;  V23 cse1         [V23,T18] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
+;  V24 cse2         [V24,T14] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
 ;  V25 cse3         [V25,T03] (  3, 12   )    long  ->   x6         "CSE #06: aggressive"
 ;  V26 rat0         [V26,T21] (  3,  9   )  simd16  ->  [fp+0x10]   do-not-enreg[S] "SIMDInitTempVar"
 ;
@@ -47,10 +47,9 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs +[x0]
             mov     w1, wzr
             cntb    x2, all
-            ptrue   p0.b
             ldr     w3, [x0, #0x20]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             movi    v16.4s, #0
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs -[x5]
             cmp     w4, w5
             bne     G_M14759_IG14
-						;; size=52 bbWeight=1 PerfScore 24.00
+						;; size=48 bbWeight=1 PerfScore 22.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
             mov     x5, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M14759_IG07:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
-            ptest   p0, p1.b
+            ptrue   p1.b
+            ptest   p1, p0.b
             bge     G_M14759_IG09
-						;; size=8 bbWeight=1 PerfScore 3.00
+						;; size=12 bbWeight=1 PerfScore 5.00
 G_M14759_IG08:        ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             sxtw    x6, w1
             add     x7, x4, x6
-            ld1b    { z16.b }, p1/z, [x7]
+            ld1b    { z16.b }, p0/z, [x7]
             add     x6, x5, x6
-            ld1b    { z17.b }, p1/z, [x6]
+            ld1b    { z17.b }, p0/z, [x6]
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, #0
             ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, #0
-            ptest   p0, p1.b
+            ptest   p1, p0.b
             bne     G_M14759_IG09
             add     w1, w1, w2
-            whilelt p1.b, w1, w3
-            ptest   p0, p1.b
+            whilelt p0.b, w1, w3
+            ptrue   p1.b
+            ptest   p1, p0.b
             blt     G_M14759_IG08
-						;; size=64 bbWeight=4 PerfScore 152.00
+						;; size=72 bbWeight=4 PerfScore 168.00
 G_M14759_IG09:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             mov     w3, wzr
             mov     w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 97 (0x00061) Actual length = 388 (0x000184)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+8 (+5.26%) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
@@ -8,19 +8,20 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T05] (  4,  4   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrLen>
-;  V01 loc0         [V01,T04] (  3,  7   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V01 loc0         [V01,T11] (  2,  3   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )    mask  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
+;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d17         <System.Numerics.Vector`1[byte]>
 ;  V04 loc3         [V04,T00] (  6, 18   )    long  ->   x1        
 ;  V05 loc4         [V05,T07] (  2,  5   )    long  ->   x2         single-def
-;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
-;  V07 loc6         [V07,T03] (  4,  7   )    long  ->   x0        
+;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
+;  V07 loc6         [V07,T04] (  4,  7   )    long  ->   x0        
 ;  V08 loc7         [V08    ] (  1,  1   )     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;# V09 OutArgs      [V09    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V10 tmp1         [V10,T02] (  5,  8   )     ref  ->   x0         class-hnd single-def "dup spill" <byte[]>
 ;  V11 tmp2         [V11,T08] (  2,  2   )    long  ->   x0         "Cast away GC"
 ;  V12 cse0         [V12,T06] (  3,  6   )     int  ->   x3         "CSE #02: aggressive"
-;  V13 cse1         [V13,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
+;  V13 cse1         [V13,T03] (  3,  8   )    mask  ->   p1         "CSE #03: aggressive"
+;  V14 cse2         [V14,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
 ;
 ; Lcl frame size = 16
 
@@ -31,16 +32,16 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
+            mvni    v16.4s, #0
             mov     x1, xzr
             cntb    x2, all
             ldr     w3, [x0, #0x18]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             ldr     x0, [x0, #0x08]
             str     x0, [fp, #0x18]	// [V08 loc7]
             cbz     x0, G_M60402_IG04
-						;; size=36 bbWeight=1 PerfScore 15.00
+						;; size=36 bbWeight=1 PerfScore 13.50
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     w4, [x0, #0x08]
             cbz     w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
             mov     x0, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M60402_IG05:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ld1b    { z16.b }, p1/z, [x0]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x0]
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z16.b, #0
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             bne     G_M60402_IG07
-						;; size=24 bbWeight=2 PerfScore 33.00
+						;; size=32 bbWeight=2 PerfScore 43.00
 G_M60402_IG06:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             add     x1, x1, x2
-            whilelt p1.b, w1, w3
+            whilelt p0.b, w1, w3
             add     x4, x0, x1
-            ld1b    { z16.b }, p1/z, [x4]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x4]
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             beq     G_M60402_IG06
 						;; size=36 bbWeight=4 PerfScore 78.00
 G_M60402_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.b
-            cmpne   p0.b, p0/z, z16.b, #0
-            cntp    x0, p1, p0.b
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z17.b, #0
+            cntp    x0, p0, p1.b
             add     x0, x0, x1
 						;; size=16 bbWeight=1 PerfScore 7.50
 G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 38 (0x00026) Actual length = 152 (0x000098)
+  Function Length   : 40 (0x00028) Actual length = 160 (0x0000a0)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
coreclr_tests.run.linux.arm64.checked.mch
-28 (-58.33%) : 358603.dasm - PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
@@ -17,22 +17,15 @@ G_M44742_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M44742_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            ptrue   p1.h
-            ptrue   p2.h
-            ptrue   p3.h
-            bic     p1.b, p3/z, p1.b, p2.b
-            pfalse  p2.b
-            sel     p0.b, p0, p1.b, p2.b
-            mov     z0.h, p0/z, #1
-						;; size=32 bbWeight=1 PerfScore 16.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M44742_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 ; END METHOD PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short]
 
-; Total bytes of code 48, prolog size 8, PerfScore 19.50, instruction count 12, allocated bytes for code 48 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -43,7 +36,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 12 (0x0000c) Actual length = 48 (0x000030)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-24 (-54.55%) : 358606.dasm - PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
@@ -17,21 +17,15 @@ G_M19455_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M19455_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.s
-            movi    v0.4s, #0
-            cmpne   p0.s, p0/z, z0.s, #0
-            pfalse  p1.b
-            ptrue   p2.s
-            sel     p0.b, p0, p1.b, p2.b
-            mov     z0.s, p0/z, #1
-						;; size=28 bbWeight=1 PerfScore 13.50
+            mvni    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M19455_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 ; END METHOD PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int]
 
-; Total bytes of code 44, prolog size 8, PerfScore 17.00, instruction count 11, allocated bytes for code 44 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -42,7 +36,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 11 (0x0000b) Actual length = 44 (0x00002c)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-28 (-43.75%) : 679474.dasm - Runtime_1068867:TestEntryPoint() (FullOpts)
@@ -16,7 +16,6 @@
 ;* V05 tmp1         [V05    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact "NewObj constructor temp" <C0>
 ;* V06 tmp2         [V06    ] (  0,  0   )  simd16  ->  zero-ref    "location for address-of(RValue)"
 ;* V07 tmp3         [V07    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SF] "stack allocated C0" <C0>
-;  V08 cse0         [V08,T00] (  3,  3   )    mask  ->   p0         "CSE #01: aggressive"
 ;
 ; Lcl frame size = 0
 
@@ -24,28 +23,19 @@ G_M538_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
             stp     fp, lr, [sp, #-0x10]!
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
-G_M538_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ptrue   p0.s
+G_M538_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movi    v0.4s, #0
-            cmpne   p0.s, p0/z, z0.s, #0
-            movi    v0.4s, #0
-            ldr     q16, [@RWD00]
-            sel     z0.s, p0, z0.s, z16.s
-            movi    v16.4s, #0
-            sel     z0.s, p0, z0.s, z16.s
             movz    x0, #0xD1FFAB1E      // code for <unknown method>
             movk    x0, #0xD1FFAB1E LSL #16
             movk    x0, #0xD1FFAB1E LSL #32
             ldr     x0, [x0]
-						;; size=48 bbWeight=1 PerfScore 17.00
+						;; size=20 bbWeight=1 PerfScore 5.00
 G_M538_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             br      x0
 						;; size=8 bbWeight=1 PerfScore 2.00
-RWD00  	dq	0000000000000001h, 0000000000000000h
 
-
-; Total bytes of code 64, prolog size 8, PerfScore 20.50, instruction count 16, allocated bytes for code 64 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
+; Total bytes of code 36, prolog size 8, PerfScore 8.50, instruction count 9, allocated bytes for code 36 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -56,7 +46,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 16 (0x00010) Actual length = 64 (0x000040)
+  Function Length   : 9 (0x00009) Actual length = 36 (0x000024)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+16 (+2.53%) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[ushort]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[ushort]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ulong]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[ushort]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[ulong]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M33034_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.h
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.d
-            ld1d    { z8.d }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z10.h }, p0/z, [x0]
+            ptrue   p0.d
+            ld1d    { z10.d }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z7.h }, p0/z, [x0]
+            ptrue   p0.h
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            udot    z8.d, z10.h, z7.h[1]
+            cmpne   p0.h, p0/z, z8.h, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1h    { z8.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1h    { z7.h }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            udot    z10.d, z8.h, z7.h[1]
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 168.50
+						;; size=580 bbWeight=1 PerfScore 168.50
 G_M33034_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end
 
+16 (+2.53%) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[sbyte]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[sbyte]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[int]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[sbyte]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[sbyte]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[int]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M55930_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.b
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.s
-            ld1w    { z8.s }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1b    { z10.b }, p0/z, [x0]
+            ptrue   p0.s
+            ld1w    { z10.s }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1b    { z16.b }, p0/z, [x0]
+            ptrue   p0.b
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            sdot    z8.s, z10.b, z16.b
+            cmpne   p0.b, p0/z, z8.b, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1b    { z8.b }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1b    { z16.b }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            sdot    z10.s, z8.b, z16.b
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 168.50
+						;; size=580 bbWeight=1 PerfScore 168.50
 G_M55930_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end
 
+16 (+2.53%) : 569192.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[ushort]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[ushort]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[ushort]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[ushort]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M13407_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.h
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.h
-            ld1h    { z8.h }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
+            ptrue   p0.h
             ld1h    { z10.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z16.h }, p0/z, [x0]
+            ptrue   p0.h
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            eor3    z8.d, z8.d, z10.d, z16.d
+            cmpne   p0.h, p0/z, z8.h, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1h    { z8.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1h    { z16.h }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            eor3    z10.d, z10.d, z8.d, z16.d
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 166.50
+						;; size=580 bbWeight=1 PerfScore 166.50
 G_M13407_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 181.50, instruction count 158, allocated bytes for code 632 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 184.50, instruction count 162, allocated bytes for code 648 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end
 
benchmarks.run_pgo.linux.arm64.checked.mch
-4 (-0.85%) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
@@ -36,8 +36,7 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             mov     w0, #0xD1FFAB1E
             str     w0, [fp, #0x20]	// [V11 tmp2]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0x80]	// [V01 loc0]
             str     xzr, [fp, #0x58]	// [V04 loc3]
             cntb    x0, all
@@ -62,7 +61,7 @@ G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M60402_IG05
-						;; size=96 bbWeight=1 PerfScore 40.50
+						;; size=92 bbWeight=1 PerfScore 37.00
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -179,7 +178,7 @@ G_M60402_IG11:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 472, prolog size 36, PerfScore 168.27, instruction count 118, allocated bytes for code 472 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
+; Total bytes of code 468, prolog size 36, PerfScore 164.77, instruction count 117, allocated bytes for code 468 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -190,7 +189,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 118 (0x00076) Actual length = 472 (0x0001d8)
+  Function Length   : 117 (0x00075) Actual length = 468 (0x0001d4)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-0.58%) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
@@ -229,8 +229,7 @@ G_M22667_IG17:        ; bbWeight=0.01, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
 						;; size=12 bbWeight=0.01 PerfScore 0.02
 G_M22667_IG18:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             ptrue   p0.h
-            mov     z16.h, p0/z, #1
-            ptrue   p0.h
+            mvni    v16.4s, #0
             cmpne   p0.h, p0/z, z16.h, #0
             ptrue   p1.h
             ldr     q16, [fp, #0x50]	// [V05 loc4]
@@ -249,7 +248,7 @@ G_M22667_IG18:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             bl      CORINFO_HELP_COUNTPROFILE32
             ; gcr arg pop 0
             movn    w0, #0
-						;; size=76 bbWeight=1 PerfScore 25.50
+						;; size=72 bbWeight=1 PerfScore 22.00
 G_M22667_IG19:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x90
             ret     lr
@@ -265,7 +264,7 @@ G_M22667_IG21:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 688, prolog size 36, PerfScore 211.04, instruction count 172, allocated bytes for code 688 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
+; Total bytes of code 684, prolog size 36, PerfScore 207.54, instruction count 171, allocated bytes for code 684 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -276,7 +275,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 172 (0x000ac) Actual length = 688 (0x0002b0)
+  Function Length   : 171 (0x000ab) Actual length = 684 (0x0002ac)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-0.49%) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
@@ -95,8 +95,7 @@ G_M34028_IG06:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ldr     x2, [x2]
             blr     x2
             ; gcr arg pop 0
-            ptrue   p0.h
-            mov     z0.h, p0/z, #1
+            mvni    v0.4s, #0
             movz    x0, #0xD1FFAB1E      // code for <unknown method>
             movk    x0, #0xD1FFAB1E LSL #16
             movk    x0, #0xD1FFAB1E LSL #32
@@ -105,7 +104,7 @@ G_M34028_IG06:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcr arg pop 0
             str     q0, [fp, #0x50]	// [V05 loc4]
             b       G_M34028_IG16
-						;; size=68 bbWeight=1 PerfScore 22.50
+						;; size=64 bbWeight=1 PerfScore 19.00
 G_M34028_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             ldr     w0, [fp, #0x84]	// [V01 loc0]
             sxtw    x0, w0
@@ -317,7 +316,7 @@ G_M34028_IG27:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 812, prolog size 36, PerfScore 237.56, instruction count 203, allocated bytes for code 812 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
+; Total bytes of code 808, prolog size 36, PerfScore 234.06, instruction count 202, allocated bytes for code 808 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -328,7 +327,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 203 (0x000cb) Actual length = 812 (0x00032c)
+  Function Length   : 202 (0x000ca) Actual length = 808 (0x000328)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-0.38%) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
@@ -49,8 +49,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             str     wzr, [fp, #0xC4]	// [V01 loc0]
             cntb    x0, all
             str     w0, [fp, #0xC0]	// [V02 loc1]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0xB0]	// [V03 loc2]
             ldr     x0, [fp, #0xC8]	// [V00 this]
             ; gcrRegs +[x0]
@@ -86,7 +85,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M14759_IG05
-						;; size=136 bbWeight=1 PerfScore 59.50
+						;; size=132 bbWeight=1 PerfScore 56.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -394,7 +393,7 @@ G_M14759_IG27:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 1052, prolog size 44, PerfScore 348.29, instruction count 263, allocated bytes for code 1052 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
+; Total bytes of code 1048, prolog size 44, PerfScore 344.79, instruction count 262, allocated bytes for code 1048 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -405,7 +404,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 263 (0x00107) Actual length = 1052 (0x00041c)
+  Function Length   : 262 (0x00106) Actual length = 1048 (0x000418)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-0.43%) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
@@ -41,8 +41,7 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             mov     w0, #0xD1FFAB1E
             str     w0, [fp, #0x28]	// [V15 tmp3]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0xA0]	// [V01 loc0]
             str     wzr, [fp, #0x6C]	// [V05 loc4]
             cntb    x0, all
@@ -71,7 +70,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, i
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M892_IG05
-						;; size=104 bbWeight=1 PerfScore 46.00
+						;; size=100 bbWeight=1 PerfScore 42.50
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -357,7 +356,7 @@ G_M892_IG24:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 928, prolog size 36, PerfScore 307.29, instruction count 232, allocated bytes for code 928 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
+; Total bytes of code 924, prolog size 36, PerfScore 303.79, instruction count 231, allocated bytes for code 924 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -368,7 +367,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 232 (0x000e8) Actual length = 928 (0x0003a0)
+  Function Length   : 231 (0x000e7) Actual length = 924 (0x00039c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
libraries.pmi.linux.arm64.checked.mch
-4 (-16.67%) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
@@ -16,15 +16,14 @@ G_M40111_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M40111_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.s, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M40111_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
@@ -16,15 +16,14 @@ G_M56373_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M56373_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M56373_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
@@ -16,15 +16,14 @@ G_M57390_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M57390_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.b, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M57390_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
@@ -16,15 +16,14 @@ G_M33416_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M33416_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.h, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M33416_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
@@ -16,15 +16,14 @@ G_M18837_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M18837_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M18837_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
@@ -16,15 +16,14 @@ G_M43790_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M43790_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M43790_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch
-4 (-1.30%) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
@@ -7,33 +7,33 @@
 ; No matching PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T09] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
-;  V01 loc0         [V01,T05] (  3,  9   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V00 this         [V00,T08] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0         [V01,T22] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V03 loc2         [V03    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V04 loc3         [V04    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T00] ( 12, 41.50)     int  ->   x1        
-;  V06 loc5         [V06,T13] (  3,  6   )     int  ->   x2         single-def
-;  V07 loc6         [V07,T17] (  3,  5   )    long  ->   x4        
-;  V08 loc7         [V08,T18] (  3,  5   )    long  ->   x6        
+;  V06 loc5         [V06,T12] (  3,  6   )     int  ->   x2         single-def
+;  V07 loc6         [V07,T16] (  3,  5   )    long  ->   x4        
+;  V08 loc7         [V08,T17] (  3,  5   )    long  ->   x6        
 ;  V09 loc8         [V09    ] (  1,  0.50)     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x10]   must-init pinned class-hnd single-def <byte[]>
-;  V11 loc10        [V11,T08] (  2,  8   )   ubyte  ->   x8        
+;  V11 loc10        [V11,T07] (  2,  8   )   ubyte  ->   x8        
 ;# V12 OutArgs      [V12    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V13 tmp1         [V13,T15] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V14 tmp2         [V14,T16] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
-;  V15 tmp3         [V15,T19] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V16 tmp4         [V16,T20] (  2,  2   )    long  ->   x6         "Cast away GC"
+;  V13 tmp1         [V13,T14] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V14 tmp2         [V14,T15] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
+;  V15 tmp3         [V15,T18] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V16 tmp4         [V16,T19] (  2,  2   )    long  ->   x6         "Cast away GC"
 ;  V17 tmp5         [V17,T01] (  3, 24   )     ref  ->   x2         "arr expr"
 ;  V18 tmp6         [V18,T02] (  3, 24   )     ref  ->   x6         "arr expr"
-;* V19 tmp7         [V19,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;* V20 tmp8         [V20,T22] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;  V21 cse0         [V21,T06] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
-;  V22 cse1         [V22,T07] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
-;  V23 cse2         [V23,T14] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
-;  V24 cse3         [V24,T12] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
-;  V25 cse4         [V25,T10] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
-;  V26 cse5         [V26,T11] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
+;* V19 tmp7         [V19,T20] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;* V20 tmp8         [V20,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;  V21 cse0         [V21,T05] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
+;  V22 cse1         [V22,T06] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
+;  V23 cse2         [V23,T13] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
+;  V24 cse3         [V24,T11] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
+;  V25 cse4         [V25,T09] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
+;  V26 cse5         [V26,T10] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
 ;  V27 cse6         [V27,T03] (  3, 12   )    long  ->   x4         "CSE #08: aggressive"
 ;  V28 cse7         [V28,T04] (  3, 12   )    long  ->   x8         "CSE #05: aggressive"
 ;
@@ -46,7 +46,6 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
             mov     w1, wzr
             cntb    x2, all
             ldr     x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
             ldr     w6, [x5, #0x08]
             cmp     w4, w6
             bne     G_M892_IG11
-						;; size=36 bbWeight=1 PerfScore 18.00
+						;; size=32 bbWeight=1 PerfScore 16.00
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
             mov     x4, x3
             ; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07:        ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
 G_M892_IG08:        ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
             sxtw    x8, w1
             add     x9, x4, x8
+            ptrue   p0.b
             ld1b    { z16.b }, p0/z, [x9]
             add     x8, x6, x8
             ld1b    { z17.b }, p0/z, [x8]
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            uaddv   d16, p1, z16.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            uaddv   d16, p0, z16.b
             umov    x8, v16.d[0]
             uxtb    w8, w8
             cmp     w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 77 (0x0004d) Actual length = 308 (0x000134)
+  Function Length   : 76 (0x0004c) Actual length = 304 (0x000130)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-1.05%) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
@@ -21,14 +21,13 @@
 ;# V10 OutArgs      [V10    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V11 tmp1         [V11,T05] (  5,  8   )     ref  ->   x3         class-hnd single-def "dup spill" <char[]>
 ;* V12 tmp2         [V12    ] (  0,  0   )  ushort  ->  zero-ref    "Inlining Arg"
-;* V13 tmp3         [V13    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg" <System.Numerics.Vector`1[short]>
-;  V14 tmp4         [V14,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
-;  V15 tmp5         [V15,T01] (  3, 24   )     ref  ->   x3         "arr expr"
-;  V16 cse0         [V16,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
-;  V17 cse1         [V17,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
-;  V18 cse2         [V18,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
-;  V19 cse3         [V19,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
-;  V20 cse4         [V20,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
+;  V13 tmp3         [V13,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
+;  V14 tmp4         [V14,T01] (  3, 24   )     ref  ->   x3         "arr expr"
+;  V15 cse0         [V15,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
+;  V16 cse1         [V16,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
+;  V17 cse2         [V17,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
+;  V18 cse3         [V18,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
+;  V19 cse4         [V19,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
 ;
 ; Lcl frame size = 16
 
@@ -62,14 +61,13 @@ G_M34028_IG04:        ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
 G_M34028_IG05:        ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
             ldrh    w4, [x0, #0x14]
             dup     v16.8h, w4
-            ptrue   p0.h
-            mov     z17.h, p0/z, #1
+            mvni    v17.4s, #0
             ldr     w0, [x0, #0x10]
             ; gcrRegs -[x0]
             cnth    x5, all
             cmp     w0, w5
             ble     G_M34028_IG10
-						;; size=32 bbWeight=1 PerfScore 15.50
+						;; size=28 bbWeight=1 PerfScore 12.00
 G_M34028_IG06:        ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
             ptrue   p0.h
             cmpne   p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 94 (0x0005e) Actual length = 376 (0x000178)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+8 (+2.11%) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
@@ -8,31 +8,31 @@
 ; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T06] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;  V00 this         [V00,T05] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
 ;  V01 loc0         [V01,T02] (  6, 17.50)     int  ->   x1        
 ;  V02 loc1         [V02,T04] (  5, 10   )     int  ->   x2         single-def
-;  V03 loc2         [V03,T05] (  4, 10   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
-;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
+;* V03 loc2         [V03,T19] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
+;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T20] (  4, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V06 loc5         [V06    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V07 loc6         [V07    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V08 loc7         [V08,T11] (  3,  5   )    long  ->   x4        
-;  V09 loc8         [V09,T12] (  3,  5   )    long  ->   x5        
+;  V08 loc7         [V08,T10] (  3,  5   )    long  ->   x4        
+;  V09 loc8         [V09,T11] (  3,  5   )    long  ->   x5        
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x28]   must-init pinned class-hnd single-def <byte[]>
 ;  V11 loc10        [V11    ] (  1,  0.50)     ref  ->  [fp+0x20]   must-init pinned class-hnd single-def <byte[]>
-;  V12 loc11        [V12,T10] (  4,  5   )     int  ->   x3        
-;  V13 loc12        [V13,T18] (  3,  1.50)     int  ->   x3         single-def
+;  V12 loc11        [V12,T09] (  4,  5   )     int  ->   x3        
+;  V13 loc12        [V13,T17] (  3,  1.50)     int  ->   x3         single-def
 ;  V14 loc13        [V14,T00] (  7, 22.50)     int  ->   x4        
 ;# V15 OutArgs      [V15    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V16 tmp1         [V16,T08] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V17 tmp2         [V17,T09] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
-;  V18 tmp3         [V18,T16] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V19 tmp4         [V19,T17] (  2,  2   )    long  ->   x5         "Cast away GC"
-;  V20 tmp5         [V20,T13] (  3,  3   )     ref  ->   x2         single-def "arr expr"
-;  V21 tmp6         [V21,T14] (  3,  3   )     ref  ->   x0         single-def "arr expr"
-;  V22 cse0         [V22,T07] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
-;  V23 cse1         [V23,T19] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
-;  V24 cse2         [V24,T15] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
+;  V16 tmp1         [V16,T07] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V17 tmp2         [V17,T08] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
+;  V18 tmp3         [V18,T15] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V19 tmp4         [V19,T16] (  2,  2   )    long  ->   x5         "Cast away GC"
+;  V20 tmp5         [V20,T12] (  3,  3   )     ref  ->   x2         single-def "arr expr"
+;  V21 tmp6         [V21,T13] (  3,  3   )     ref  ->   x0         single-def "arr expr"
+;  V22 cse0         [V22,T06] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
+;  V23 cse1         [V23,T18] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
+;  V24 cse2         [V24,T14] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
 ;  V25 cse3         [V25,T03] (  3, 12   )    long  ->   x6         "CSE #06: aggressive"
 ;  V26 rat0         [V26,T21] (  3,  9   )  simd16  ->  [fp+0x10]   do-not-enreg[S] "SIMDInitTempVar"
 ;
@@ -47,10 +47,9 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs +[x0]
             mov     w1, wzr
             cntb    x2, all
-            ptrue   p0.b
             ldr     w3, [x0, #0x20]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             movi    v16.4s, #0
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs -[x5]
             cmp     w4, w5
             bne     G_M14759_IG14
-						;; size=52 bbWeight=1 PerfScore 24.00
+						;; size=48 bbWeight=1 PerfScore 22.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
             mov     x5, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M14759_IG07:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
-            ptest   p0, p1.b
+            ptrue   p1.b
+            ptest   p1, p0.b
             bge     G_M14759_IG09
-						;; size=8 bbWeight=1 PerfScore 3.00
+						;; size=12 bbWeight=1 PerfScore 5.00
 G_M14759_IG08:        ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             sxtw    x6, w1
             add     x7, x4, x6
-            ld1b    { z16.b }, p1/z, [x7]
+            ld1b    { z16.b }, p0/z, [x7]
             add     x6, x5, x6
-            ld1b    { z17.b }, p1/z, [x6]
+            ld1b    { z17.b }, p0/z, [x6]
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, #0
             ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, #0
-            ptest   p0, p1.b
+            ptest   p1, p0.b
             bne     G_M14759_IG09
             add     w1, w1, w2
-            whilelt p1.b, w1, w3
-            ptest   p0, p1.b
+            whilelt p0.b, w1, w3
+            ptrue   p1.b
+            ptest   p1, p0.b
             blt     G_M14759_IG08
-						;; size=64 bbWeight=4 PerfScore 152.00
+						;; size=72 bbWeight=4 PerfScore 168.00
 G_M14759_IG09:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             mov     w3, wzr
             mov     w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 97 (0x00061) Actual length = 388 (0x000184)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+8 (+5.26%) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
@@ -8,19 +8,20 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T05] (  4,  4   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrLen>
-;  V01 loc0         [V01,T04] (  3,  7   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V01 loc0         [V01,T11] (  2,  3   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )    mask  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
+;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d17         <System.Numerics.Vector`1[byte]>
 ;  V04 loc3         [V04,T00] (  6, 18   )    long  ->   x1        
 ;  V05 loc4         [V05,T07] (  2,  5   )    long  ->   x2         single-def
-;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
-;  V07 loc6         [V07,T03] (  4,  7   )    long  ->   x0        
+;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
+;  V07 loc6         [V07,T04] (  4,  7   )    long  ->   x0        
 ;  V08 loc7         [V08    ] (  1,  1   )     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;# V09 OutArgs      [V09    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V10 tmp1         [V10,T02] (  5,  8   )     ref  ->   x0         class-hnd single-def "dup spill" <byte[]>
 ;  V11 tmp2         [V11,T08] (  2,  2   )    long  ->   x0         "Cast away GC"
 ;  V12 cse0         [V12,T06] (  3,  6   )     int  ->   x3         "CSE #02: aggressive"
-;  V13 cse1         [V13,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
+;  V13 cse1         [V13,T03] (  3,  8   )    mask  ->   p1         "CSE #03: aggressive"
+;  V14 cse2         [V14,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
 ;
 ; Lcl frame size = 16
 
@@ -31,16 +32,16 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
+            mvni    v16.4s, #0
             mov     x1, xzr
             cntb    x2, all
             ldr     w3, [x0, #0x18]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             ldr     x0, [x0, #0x08]
             str     x0, [fp, #0x18]	// [V08 loc7]
             cbz     x0, G_M60402_IG04
-						;; size=36 bbWeight=1 PerfScore 15.00
+						;; size=36 bbWeight=1 PerfScore 13.50
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     w4, [x0, #0x08]
             cbz     w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
             mov     x0, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M60402_IG05:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ld1b    { z16.b }, p1/z, [x0]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x0]
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z16.b, #0
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             bne     G_M60402_IG07
-						;; size=24 bbWeight=2 PerfScore 33.00
+						;; size=32 bbWeight=2 PerfScore 43.00
 G_M60402_IG06:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             add     x1, x1, x2
-            whilelt p1.b, w1, w3
+            whilelt p0.b, w1, w3
             add     x4, x0, x1
-            ld1b    { z16.b }, p1/z, [x4]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x4]
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             beq     G_M60402_IG06
 						;; size=36 bbWeight=4 PerfScore 78.00
 G_M60402_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.b
-            cmpne   p0.b, p0/z, z16.b, #0
-            cntp    x0, p1, p0.b
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z17.b, #0
+            cntp    x0, p0, p1.b
             add     x0, x0, x1
 						;; size=16 bbWeight=1 PerfScore 7.50
 G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 38 (0x00026) Actual length = 152 (0x000098)
+  Function Length   : 40 (0x00028) Actual length = 160 (0x0000a0)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
benchmarks.run.linux.arm64.checked.mch 4 2 2 0 -8 +16
coreclr_tests.run.linux.arm64.checked.mch 10,995 10,810 148 37 -336,352 +2,368
benchmarks.run_pgo.linux.arm64.checked.mch 5 5 0 0 -20 +0
libraries.pmi.linux.arm64.checked.mch 10 10 0 0 -40 +0
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 0 0 0 0 -0 +0
libraries_tests.run.linux.arm64.Release.mch 0 0 0 0 -0 +0
smoke_tests.nativeaot.linux.arm64.checked.mch 0 0 0 0 -0 +0
realworld.run.linux.arm64.checked.mch 0 0 0 0 -0 +0
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch 4 2 2 0 -8 +16
libraries.crossgen2.linux.arm64.checked.mch 0 0 0 0 -0 +0
11,018 10,829 152 37 -336,428 +2,400

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
benchmarks.run.linux.arm64.checked.mch 4 2 2 0 -1.13% +6.23% +0.0003%
coreclr_tests.run.linux.arm64.checked.mch 10,995 10,811 148 36 -5.42% +1.61% -0.1610%
benchmarks.run_pgo.linux.arm64.checked.mch 5 5 0 0 -1.47% 0.00% 0.0000%
libraries.pmi.linux.arm64.checked.mch 10 10 0 0 -46.67% 0.00% -0.0024%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 0 0 0 0 0.00% 0.00% 0.0000%
libraries_tests.run.linux.arm64.Release.mch 0 0 0 0 0.00% 0.00% 0.0000%
smoke_tests.nativeaot.linux.arm64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%
realworld.run.linux.arm64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch 4 2 2 0 -1.13% +6.23% +0.0003%
libraries.crossgen2.linux.arm64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.arm64.checked.mch 34,596 2,941 31,655 0 (0.00%) 0 (0.00%)
coreclr_tests.run.linux.arm64.checked.mch 738,681 470,040 268,641 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.linux.arm64.checked.mch 132,155 61,313 70,842 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.arm64.checked.mch 259,557 5 259,552 0 (0.00%) 0 (0.00%)
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 350,289 21,739 328,550 0 (0.00%) 0 (0.00%)
libraries_tests.run.linux.arm64.Release.mch 767,445 533,158 234,287 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.linux.arm64.checked.mch 18,714 7 18,707 0 (0.00%) 0 (0.00%)
realworld.run.linux.arm64.checked.mch 28,798 39 28,759 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch 35,913 3,374 32,539 0 (0.00%) 0 (0.00%)
libraries.crossgen2.linux.arm64.checked.mch 265,227 17 265,210 0 (0.00%) 0 (0.00%)
2,631,375 1,092,633 1,538,742 0 (0.00%) 0 (0.00%)

jit-analyze output

benchmarks.run.linux.arm64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 16042184 (overridden on cmd)
Total bytes of diff: 16042192 (overridden on cmd)
Total bytes of delta: 8 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
           8 : 21403.dasm (5.26 % of base)
           8 : 8287.dasm (2.11 % of base)

Top file improvements (bytes):
          -4 : 16114.dasm (-1.30 % of base)
          -4 : 26115.dasm (-1.05 % of base)

4 total files with Code Size differences (2 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
           8 (2.11 % of base) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
           8 (5.26 % of base) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

Top method improvements (bytes):
          -4 (-1.30 % of base) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

Top method regressions (percentages):
           8 (5.26 % of base) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
           8 (2.11 % of base) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

Top method improvements (percentages):
          -4 (-1.30 % of base) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

4 total methods with Code Size differences (2 improved, 2 regressed).


coreclr_tests.run.linux.arm64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 567688716 (overridden on cmd)
Total bytes of diff: 567354732 (overridden on cmd)
Total bytes of delta: -333984 (-0.06 % of base)
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file regressions (bytes):
          16 : 567144.dasm (2.53 % of base)
          16 : 575424.dasm (2.53 % of base)
          16 : 575544.dasm (2.50 % of base)
          16 : 594152.dasm (2.50 % of base)
          16 : 575294.dasm (2.53 % of base)
          16 : 588776.dasm (2.50 % of base)
          16 : 569192.dasm (2.53 % of base)
          16 : 568047.dasm (2.53 % of base)
          16 : 568272.dasm (2.53 % of base)
          16 : 574096.dasm (2.42 % of base)
          16 : 567479.dasm (2.53 % of base)
          16 : 574200.dasm (2.42 % of base)
          16 : 574664.dasm (2.42 % of base)
          16 : 575358.dasm (2.53 % of base)
          16 : 567416.dasm (2.53 % of base)
          16 : 569088.dasm (2.53 % of base)
          16 : 575272.dasm (2.53 % of base)
          16 : 568247.dasm (2.53 % of base)
          16 : 568072.dasm (2.53 % of base)
          16 : 568472.dasm (2.53 % of base)

Top file improvements (bytes):
        -152 : 574655.dasm (-8.02 % of base)
        -152 : 574294.dasm (-8.02 % of base)
        -152 : 574605.dasm (-8.02 % of base)
        -152 : 574555.dasm (-8.02 % of base)
        -152 : 574138.dasm (-8.02 % of base)
        -152 : 574580.dasm (-8.02 % of base)
        -152 : 574268.dasm (-8.02 % of base)
        -152 : 574242.dasm (-8.02 % of base)
        -152 : 574630.dasm (-8.02 % of base)
        -152 : 574680.dasm (-8.02 % of base)
        -152 : 574164.dasm (-8.02 % of base)
        -152 : 574112.dasm (-8.02 % of base)
        -152 : 574530.dasm (-8.02 % of base)
        -152 : 574505.dasm (-8.02 % of base)
        -152 : 574190.dasm (-8.02 % of base)
        -152 : 574216.dasm (-8.02 % of base)
        -144 : 574842.dasm (-8.07 % of base)
        -144 : 574796.dasm (-8.07 % of base)
        -144 : 574819.dasm (-8.07 % of base)
        -144 : 574865.dasm (-8.07 % of base)

84 total files with Code Size differences (54 improved, 30 regressed), 20 unchanged.

Top method regressions (bytes):
          16 (2.50 % of base) : 594152.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractAfterLastActiveElement_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 594408.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractAfterLastActiveElementAndReplicate_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 594792.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractLastActiveElement_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 595048.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractLastActiveElementAndReplicate_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574096.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574200.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_sbyte:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574664.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575358.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 575544.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_FusedMultiplyAddNegated_float:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 588064.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_MultiplyAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 588776.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_MultiplySubtract_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567119.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567144.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567311.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningLower_long_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567416.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_int_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567479.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_uint_ushort:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567847.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_int:RunBasicScenario_Load():this (FullOpts)

Top method improvements (bytes):
        -152 (-8.02 % of base) : 574112.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_byte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574268.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574216.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574242.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574164.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_uint:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574190.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_ulong:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574138.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_ushort:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574505.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_byte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574655.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574680.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574605.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574630.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574555.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_uint:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574580.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_ulong:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574530.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_ushort:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574842.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574865.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574796.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574819.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)

Top method regressions (percentages):
          16 (2.53 % of base) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575358.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567119.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567144.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567311.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningLower_long_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567416.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_int_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567479.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_uint_ushort:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567847.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567872.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568047.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelect_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568072.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelect_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568247.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectLeftInverted_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568272.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectLeftInverted_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568447.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectRightInverted_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568472.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectRightInverted_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569166.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569088.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569192.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)

Top method improvements (percentages):
         -28 (-58.33 % of base) : 358603.dasm - PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
         -24 (-54.55 % of base) : 358606.dasm - PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
         -28 (-43.75 % of base) : 679474.dasm - Runtime_1068867:TestEntryPoint() (FullOpts)
         -20 (-41.67 % of base) : 358602.dasm - PredicateInstructions:And():System.Numerics.Vector`1[short] (FullOpts)
         -20 (-41.67 % of base) : 358605.dasm - PredicateInstructions:Or():System.Numerics.Vector`1[short] (FullOpts)
         -20 (-41.67 % of base) : 358604.dasm - PredicateInstructions:Xor():System.Numerics.Vector`1[int] (FullOpts)
         -48 (-33.33 % of base) : 679471.dasm - Runtime_106868:TestEntryPoint() (FullOpts)
         -28 (-23.33 % of base) : 642641.dasm - ChangeMaskUse:CastMaskUseAsMask() (FullOpts)
         -20 (-22.73 % of base) : 679365.dasm - Runtime_105720:TestEntryPoint() (FullOpts)
         -28 (-21.21 % of base) : 679478.dasm - Runtime_106872:TestEntryPoint() (FullOpts)
         -20 (-19.23 % of base) : 642640.dasm - ChangeMaskUse:CastMaskUseAsVector() (FullOpts)
         -16 (-17.39 % of base) : 349058.dasm - EmbeddedLoads:CndSelectEmbeddedOp3LoadAllBits(int[],System.Numerics.Vector`1[int]) (FullOpts)
         -16 (-17.39 % of base) : 349060.dasm - EmbeddedLoads:CndSelectEmbeddedOp3LoadZero(int[],System.Numerics.Vector`1[int]) (FullOpts)
         -20 (-17.24 % of base) : 679570.dasm - Runtime_113338:Test() (FullOpts)
          -4 (-16.67 % of base) : 524461.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 106072.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (Tier0)
          -4 (-16.67 % of base) : 524463.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 106082.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (Tier0)
          -4 (-16.67 % of base) : 524465.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 106093.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (Tier0)


benchmarks.run_pgo.linux.arm64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 71813300 (overridden on cmd)
Total bytes of diff: 71813280 (overridden on cmd)
Total bytes of delta: -20 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file improvements (bytes):
          -4 : 58518.dasm (-0.85 % of base)
          -4 : 76632.dasm (-0.49 % of base)
          -4 : 39357.dasm (-0.43 % of base)
          -4 : 14743.dasm (-0.38 % of base)
          -4 : 24532.dasm (-0.58 % of base)

5 total files with Code Size differences (5 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -4 (-0.38 % of base) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
          -4 (-0.43 % of base) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
          -4 (-0.58 % of base) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
          -4 (-0.49 % of base) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
          -4 (-0.85 % of base) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)

Top method improvements (percentages):
          -4 (-0.85 % of base) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
          -4 (-0.58 % of base) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
          -4 (-0.49 % of base) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
          -4 (-0.43 % of base) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
          -4 (-0.38 % of base) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)

5 total methods with Code Size differences (5 improved, 0 regressed).


libraries.pmi.linux.arm64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 68207660 (overridden on cmd)
Total bytes of diff: 68207620 (overridden on cmd)
Total bytes of delta: -40 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file improvements (bytes):
          -4 : 11404.dasm (-16.67 % of base)
          -4 : 11402.dasm (-16.67 % of base)
          -4 : 11398.dasm (-16.67 % of base)
          -4 : 11400.dasm (-16.67 % of base)
          -4 : 11399.dasm (-16.67 % of base)
          -4 : 11403.dasm (-16.67 % of base)
          -4 : 11401.dasm (-16.67 % of base)
          -4 : 11405.dasm (-16.67 % of base)
          -4 : 11406.dasm (-16.67 % of base)
          -4 : 11407.dasm (-16.67 % of base)

10 total files with Code Size differences (10 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -4 (-16.67 % of base) : 11398.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
          -4 (-16.67 % of base) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
          -4 (-16.67 % of base) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
          -4 (-16.67 % of base) : 11404.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSingle():System.Numerics.Vector`1[float] (FullOpts)
          -4 (-16.67 % of base) : 11405.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt16():System.Numerics.Vector`1[ushort] (FullOpts)
          -4 (-16.67 % of base) : 11406.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt32():System.Numerics.Vector`1[uint] (FullOpts)
          -4 (-16.67 % of base) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)

Top method improvements (percentages):
          -4 (-16.67 % of base) : 11398.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
          -4 (-16.67 % of base) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
          -4 (-16.67 % of base) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
          -4 (-16.67 % of base) : 11404.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSingle():System.Numerics.Vector`1[float] (FullOpts)
          -4 (-16.67 % of base) : 11405.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt16():System.Numerics.Vector`1[ushort] (FullOpts)
          -4 (-16.67 % of base) : 11406.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt32():System.Numerics.Vector`1[uint] (FullOpts)
          -4 (-16.67 % of base) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)

10 total methods with Code Size differences (10 improved, 0 regressed).


benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 16695796 (overridden on cmd)
Total bytes of diff: 16695804 (overridden on cmd)
Total bytes of delta: 8 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
           8 : 21539.dasm (5.26 % of base)
           8 : 6897.dasm (2.11 % of base)

Top file improvements (bytes):
          -4 : 26420.dasm (-1.05 % of base)
          -4 : 13109.dasm (-1.30 % of base)

4 total files with Code Size differences (2 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
           8 (2.11 % of base) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
           8 (5.26 % of base) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

Top method improvements (bytes):
          -4 (-1.30 % of base) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

Top method regressions (percentages):
           8 (5.26 % of base) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
           8 (2.11 % of base) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

Top method improvements (percentages):
          -4 (-1.30 % of base) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

4 total methods with Code Size differences (2 improved, 2 regressed).


@a74nh
Copy link
Contributor Author

a74nh commented Jun 16, 2025

After removing calls to fgMorphTryUseAllMaskVariant():

Diffs are based on 2,631,375 contexts (1,092,633 MinOpts, 1,538,742 FullOpts).

Overall (-334,028 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch 16,042,184 +8 +2.49%
coreclr_tests.run.linux.arm64.checked.mch 567,688,716 -333,984 -5.31%
benchmarks.run_pgo.linux.arm64.checked.mch 71,813,300 -20 -1.47%
libraries.pmi.linux.arm64.checked.mch 68,207,660 -40 -46.67%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch 16,695,796 +8 +2.49%
MinOpts (-31,028 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
coreclr_tests.run.linux.arm64.checked.mch 383,578,920 -31,008 -2.93%
benchmarks.run_pgo.linux.arm64.checked.mch 25,146,552 -20 -1.47%
FullOpts (-303,000 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch 15,723,912 +8 +2.49%
coreclr_tests.run.linux.arm64.checked.mch 184,109,796 -302,976 -7.75%
libraries.pmi.linux.arm64.checked.mch 68,087,900 -40 -46.67%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch 16,330,864 +8 +2.49%
Example diffs
benchmarks.run.linux.arm64.checked.mch
-4 (-1.30%) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
@@ -7,33 +7,33 @@
 ; No matching PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T09] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
-;  V01 loc0         [V01,T05] (  3,  9   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V00 this         [V00,T08] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0         [V01,T22] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V03 loc2         [V03    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V04 loc3         [V04    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T00] ( 12, 41.50)     int  ->   x1        
-;  V06 loc5         [V06,T13] (  3,  6   )     int  ->   x2         single-def
-;  V07 loc6         [V07,T17] (  3,  5   )    long  ->   x4        
-;  V08 loc7         [V08,T18] (  3,  5   )    long  ->   x6        
+;  V06 loc5         [V06,T12] (  3,  6   )     int  ->   x2         single-def
+;  V07 loc6         [V07,T16] (  3,  5   )    long  ->   x4        
+;  V08 loc7         [V08,T17] (  3,  5   )    long  ->   x6        
 ;  V09 loc8         [V09    ] (  1,  0.50)     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x10]   must-init pinned class-hnd single-def <byte[]>
-;  V11 loc10        [V11,T08] (  2,  8   )   ubyte  ->   x8        
+;  V11 loc10        [V11,T07] (  2,  8   )   ubyte  ->   x8        
 ;# V12 OutArgs      [V12    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V13 tmp1         [V13,T15] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V14 tmp2         [V14,T16] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
-;  V15 tmp3         [V15,T19] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V16 tmp4         [V16,T20] (  2,  2   )    long  ->   x6         "Cast away GC"
+;  V13 tmp1         [V13,T14] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V14 tmp2         [V14,T15] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
+;  V15 tmp3         [V15,T18] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V16 tmp4         [V16,T19] (  2,  2   )    long  ->   x6         "Cast away GC"
 ;  V17 tmp5         [V17,T01] (  3, 24   )     ref  ->   x2         "arr expr"
 ;  V18 tmp6         [V18,T02] (  3, 24   )     ref  ->   x6         "arr expr"
-;* V19 tmp7         [V19,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;* V20 tmp8         [V20,T22] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;  V21 cse0         [V21,T06] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
-;  V22 cse1         [V22,T07] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
-;  V23 cse2         [V23,T14] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
-;  V24 cse3         [V24,T12] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
-;  V25 cse4         [V25,T10] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
-;  V26 cse5         [V26,T11] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
+;* V19 tmp7         [V19,T20] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;* V20 tmp8         [V20,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;  V21 cse0         [V21,T05] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
+;  V22 cse1         [V22,T06] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
+;  V23 cse2         [V23,T13] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
+;  V24 cse3         [V24,T11] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
+;  V25 cse4         [V25,T09] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
+;  V26 cse5         [V26,T10] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
 ;  V27 cse6         [V27,T03] (  3, 12   )    long  ->   x4         "CSE #08: aggressive"
 ;  V28 cse7         [V28,T04] (  3, 12   )    long  ->   x8         "CSE #05: aggressive"
 ;
@@ -46,7 +46,6 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
             mov     w1, wzr
             cntb    x2, all
             ldr     x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
             ldr     w6, [x5, #0x08]
             cmp     w4, w6
             bne     G_M892_IG11
-						;; size=36 bbWeight=1 PerfScore 18.00
+						;; size=32 bbWeight=1 PerfScore 16.00
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
             mov     x4, x3
             ; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07:        ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
 G_M892_IG08:        ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
             sxtw    x8, w1
             add     x9, x4, x8
+            ptrue   p0.b
             ld1b    { z16.b }, p0/z, [x9]
             add     x8, x6, x8
             ld1b    { z17.b }, p0/z, [x8]
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            uaddv   d16, p1, z16.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            uaddv   d16, p0, z16.b
             umov    x8, v16.d[0]
             uxtb    w8, w8
             cmp     w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 77 (0x0004d) Actual length = 308 (0x000134)
+  Function Length   : 76 (0x0004c) Actual length = 304 (0x000130)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-1.05%) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
@@ -21,14 +21,13 @@
 ;# V10 OutArgs      [V10    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V11 tmp1         [V11,T05] (  5,  8   )     ref  ->   x3         class-hnd single-def "dup spill" <char[]>
 ;* V12 tmp2         [V12    ] (  0,  0   )  ushort  ->  zero-ref    "Inlining Arg"
-;* V13 tmp3         [V13    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg" <System.Numerics.Vector`1[short]>
-;  V14 tmp4         [V14,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
-;  V15 tmp5         [V15,T01] (  3, 24   )     ref  ->   x3         "arr expr"
-;  V16 cse0         [V16,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
-;  V17 cse1         [V17,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
-;  V18 cse2         [V18,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
-;  V19 cse3         [V19,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
-;  V20 cse4         [V20,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
+;  V13 tmp3         [V13,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
+;  V14 tmp4         [V14,T01] (  3, 24   )     ref  ->   x3         "arr expr"
+;  V15 cse0         [V15,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
+;  V16 cse1         [V16,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
+;  V17 cse2         [V17,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
+;  V18 cse3         [V18,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
+;  V19 cse4         [V19,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
 ;
 ; Lcl frame size = 16
 
@@ -62,14 +61,13 @@ G_M34028_IG04:        ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
 G_M34028_IG05:        ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
             ldrh    w4, [x0, #0x14]
             dup     v16.8h, w4
-            ptrue   p0.h
-            mov     z17.h, p0/z, #1
+            mvni    v17.4s, #0
             ldr     w0, [x0, #0x10]
             ; gcrRegs -[x0]
             cnth    x5, all
             cmp     w0, w5
             ble     G_M34028_IG10
-						;; size=32 bbWeight=1 PerfScore 15.50
+						;; size=28 bbWeight=1 PerfScore 12.00
 G_M34028_IG06:        ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
             ptrue   p0.h
             cmpne   p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 94 (0x0005e) Actual length = 376 (0x000178)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+8 (+2.11%) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
@@ -8,31 +8,31 @@
 ; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T06] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;  V00 this         [V00,T05] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
 ;  V01 loc0         [V01,T02] (  6, 17.50)     int  ->   x1        
 ;  V02 loc1         [V02,T04] (  5, 10   )     int  ->   x2         single-def
-;  V03 loc2         [V03,T05] (  4, 10   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
-;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
+;* V03 loc2         [V03,T19] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
+;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T20] (  4, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V06 loc5         [V06    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V07 loc6         [V07    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V08 loc7         [V08,T11] (  3,  5   )    long  ->   x4        
-;  V09 loc8         [V09,T12] (  3,  5   )    long  ->   x5        
+;  V08 loc7         [V08,T10] (  3,  5   )    long  ->   x4        
+;  V09 loc8         [V09,T11] (  3,  5   )    long  ->   x5        
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x28]   must-init pinned class-hnd single-def <byte[]>
 ;  V11 loc10        [V11    ] (  1,  0.50)     ref  ->  [fp+0x20]   must-init pinned class-hnd single-def <byte[]>
-;  V12 loc11        [V12,T10] (  4,  5   )     int  ->   x3        
-;  V13 loc12        [V13,T18] (  3,  1.50)     int  ->   x3         single-def
+;  V12 loc11        [V12,T09] (  4,  5   )     int  ->   x3        
+;  V13 loc12        [V13,T17] (  3,  1.50)     int  ->   x3         single-def
 ;  V14 loc13        [V14,T00] (  7, 22.50)     int  ->   x4        
 ;# V15 OutArgs      [V15    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V16 tmp1         [V16,T08] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V17 tmp2         [V17,T09] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
-;  V18 tmp3         [V18,T16] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V19 tmp4         [V19,T17] (  2,  2   )    long  ->   x5         "Cast away GC"
-;  V20 tmp5         [V20,T13] (  3,  3   )     ref  ->   x2         single-def "arr expr"
-;  V21 tmp6         [V21,T14] (  3,  3   )     ref  ->   x0         single-def "arr expr"
-;  V22 cse0         [V22,T07] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
-;  V23 cse1         [V23,T19] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
-;  V24 cse2         [V24,T15] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
+;  V16 tmp1         [V16,T07] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V17 tmp2         [V17,T08] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
+;  V18 tmp3         [V18,T15] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V19 tmp4         [V19,T16] (  2,  2   )    long  ->   x5         "Cast away GC"
+;  V20 tmp5         [V20,T12] (  3,  3   )     ref  ->   x2         single-def "arr expr"
+;  V21 tmp6         [V21,T13] (  3,  3   )     ref  ->   x0         single-def "arr expr"
+;  V22 cse0         [V22,T06] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
+;  V23 cse1         [V23,T18] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
+;  V24 cse2         [V24,T14] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
 ;  V25 cse3         [V25,T03] (  3, 12   )    long  ->   x6         "CSE #06: aggressive"
 ;  V26 rat0         [V26,T21] (  3,  9   )  simd16  ->  [fp+0x10]   do-not-enreg[S] "SIMDInitTempVar"
 ;
@@ -47,10 +47,9 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs +[x0]
             mov     w1, wzr
             cntb    x2, all
-            ptrue   p0.b
             ldr     w3, [x0, #0x20]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             movi    v16.4s, #0
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs -[x5]
             cmp     w4, w5
             bne     G_M14759_IG14
-						;; size=52 bbWeight=1 PerfScore 24.00
+						;; size=48 bbWeight=1 PerfScore 22.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
             mov     x5, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M14759_IG07:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
-            ptest   p0, p1.b
+            ptrue   p1.b
+            ptest   p1, p0.b
             bge     G_M14759_IG09
-						;; size=8 bbWeight=1 PerfScore 3.00
+						;; size=12 bbWeight=1 PerfScore 5.00
 G_M14759_IG08:        ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             sxtw    x6, w1
             add     x7, x4, x6
-            ld1b    { z16.b }, p1/z, [x7]
+            ld1b    { z16.b }, p0/z, [x7]
             add     x6, x5, x6
-            ld1b    { z17.b }, p1/z, [x6]
+            ld1b    { z17.b }, p0/z, [x6]
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, #0
             ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, #0
-            ptest   p0, p1.b
+            ptest   p1, p0.b
             bne     G_M14759_IG09
             add     w1, w1, w2
-            whilelt p1.b, w1, w3
-            ptest   p0, p1.b
+            whilelt p0.b, w1, w3
+            ptrue   p1.b
+            ptest   p1, p0.b
             blt     G_M14759_IG08
-						;; size=64 bbWeight=4 PerfScore 152.00
+						;; size=72 bbWeight=4 PerfScore 168.00
 G_M14759_IG09:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             mov     w3, wzr
             mov     w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 97 (0x00061) Actual length = 388 (0x000184)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+8 (+5.26%) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
@@ -8,19 +8,20 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T05] (  4,  4   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrLen>
-;  V01 loc0         [V01,T04] (  3,  7   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V01 loc0         [V01,T11] (  2,  3   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )    mask  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
+;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d17         <System.Numerics.Vector`1[byte]>
 ;  V04 loc3         [V04,T00] (  6, 18   )    long  ->   x1        
 ;  V05 loc4         [V05,T07] (  2,  5   )    long  ->   x2         single-def
-;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
-;  V07 loc6         [V07,T03] (  4,  7   )    long  ->   x0        
+;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
+;  V07 loc6         [V07,T04] (  4,  7   )    long  ->   x0        
 ;  V08 loc7         [V08    ] (  1,  1   )     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;# V09 OutArgs      [V09    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V10 tmp1         [V10,T02] (  5,  8   )     ref  ->   x0         class-hnd single-def "dup spill" <byte[]>
 ;  V11 tmp2         [V11,T08] (  2,  2   )    long  ->   x0         "Cast away GC"
 ;  V12 cse0         [V12,T06] (  3,  6   )     int  ->   x3         "CSE #02: aggressive"
-;  V13 cse1         [V13,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
+;  V13 cse1         [V13,T03] (  3,  8   )    mask  ->   p1         "CSE #03: aggressive"
+;  V14 cse2         [V14,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
 ;
 ; Lcl frame size = 16
 
@@ -31,16 +32,16 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
+            mvni    v16.4s, #0
             mov     x1, xzr
             cntb    x2, all
             ldr     w3, [x0, #0x18]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             ldr     x0, [x0, #0x08]
             str     x0, [fp, #0x18]	// [V08 loc7]
             cbz     x0, G_M60402_IG04
-						;; size=36 bbWeight=1 PerfScore 15.00
+						;; size=36 bbWeight=1 PerfScore 13.50
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     w4, [x0, #0x08]
             cbz     w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
             mov     x0, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M60402_IG05:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ld1b    { z16.b }, p1/z, [x0]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x0]
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z16.b, #0
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             bne     G_M60402_IG07
-						;; size=24 bbWeight=2 PerfScore 33.00
+						;; size=32 bbWeight=2 PerfScore 43.00
 G_M60402_IG06:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             add     x1, x1, x2
-            whilelt p1.b, w1, w3
+            whilelt p0.b, w1, w3
             add     x4, x0, x1
-            ld1b    { z16.b }, p1/z, [x4]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x4]
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             beq     G_M60402_IG06
 						;; size=36 bbWeight=4 PerfScore 78.00
 G_M60402_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.b
-            cmpne   p0.b, p0/z, z16.b, #0
-            cntp    x0, p1, p0.b
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z17.b, #0
+            cntp    x0, p0, p1.b
             add     x0, x0, x1
 						;; size=16 bbWeight=1 PerfScore 7.50
 G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 38 (0x00026) Actual length = 152 (0x000098)
+  Function Length   : 40 (0x00028) Actual length = 160 (0x0000a0)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
coreclr_tests.run.linux.arm64.checked.mch
-28 (-58.33%) : 358603.dasm - PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
@@ -17,22 +17,15 @@ G_M44742_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M44742_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            ptrue   p1.h
-            ptrue   p2.h
-            ptrue   p3.h
-            bic     p1.b, p3/z, p1.b, p2.b
-            pfalse  p2.b
-            sel     p0.b, p0, p1.b, p2.b
-            mov     z0.h, p0/z, #1
-						;; size=32 bbWeight=1 PerfScore 16.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M44742_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 ; END METHOD PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short]
 
-; Total bytes of code 48, prolog size 8, PerfScore 19.50, instruction count 12, allocated bytes for code 48 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=71345139) for method PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -43,7 +36,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 12 (0x0000c) Actual length = 48 (0x000030)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-24 (-54.55%) : 358606.dasm - PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
@@ -17,21 +17,15 @@ G_M19455_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M19455_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.s
-            movi    v0.4s, #0
-            cmpne   p0.s, p0/z, z0.s, #0
-            pfalse  p1.b
-            ptrue   p2.s
-            sel     p0.b, p0, p1.b, p2.b
-            mov     z0.s, p0/z, #1
-						;; size=28 bbWeight=1 PerfScore 13.50
+            mvni    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M19455_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 ; END METHOD PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int]
 
-; Total bytes of code 44, prolog size 8, PerfScore 17.00, instruction count 11, allocated bytes for code 44 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=0304b400) for method PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -42,7 +36,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 11 (0x0000b) Actual length = 44 (0x00002c)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-28 (-43.75%) : 679474.dasm - Runtime_1068867:TestEntryPoint() (FullOpts)
@@ -16,7 +16,6 @@
 ;* V05 tmp1         [V05    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact "NewObj constructor temp" <C0>
 ;* V06 tmp2         [V06    ] (  0,  0   )  simd16  ->  zero-ref    "location for address-of(RValue)"
 ;* V07 tmp3         [V07    ] (  0,  0   )  struct (16) zero-ref    do-not-enreg[SF] "stack allocated C0" <C0>
-;  V08 cse0         [V08,T00] (  3,  3   )    mask  ->   p0         "CSE #01: aggressive"
 ;
 ; Lcl frame size = 0
 
@@ -24,28 +23,19 @@ G_M538_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
             stp     fp, lr, [sp, #-0x10]!
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
-G_M538_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ptrue   p0.s
+G_M538_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movi    v0.4s, #0
-            cmpne   p0.s, p0/z, z0.s, #0
-            movi    v0.4s, #0
-            ldr     q16, [@RWD00]
-            sel     z0.s, p0, z0.s, z16.s
-            movi    v16.4s, #0
-            sel     z0.s, p0, z0.s, z16.s
             movz    x0, #0xD1FFAB1E      // code for <unknown method>
             movk    x0, #0xD1FFAB1E LSL #16
             movk    x0, #0xD1FFAB1E LSL #32
             ldr     x0, [x0]
-						;; size=48 bbWeight=1 PerfScore 17.00
+						;; size=20 bbWeight=1 PerfScore 5.00
 G_M538_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             br      x0
 						;; size=8 bbWeight=1 PerfScore 2.00
-RWD00  	dq	0000000000000001h, 0000000000000000h
 
-
-; Total bytes of code 64, prolog size 8, PerfScore 20.50, instruction count 16, allocated bytes for code 64 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
+; Total bytes of code 36, prolog size 8, PerfScore 8.50, instruction count 9, allocated bytes for code 36 (MethodHash=1c40fde5) for method Runtime_1068867:TestEntryPoint() (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -56,7 +46,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 16 (0x00010) Actual length = 64 (0x000040)
+  Function Length   : 9 (0x00009) Actual length = 36 (0x000024)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+16 (+2.53%) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[ushort]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[ushort]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ulong]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[ushort]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[ulong]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M33034_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.h
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.d
-            ld1d    { z8.d }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z10.h }, p0/z, [x0]
+            ptrue   p0.d
+            ld1d    { z10.d }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z7.h }, p0/z, [x0]
+            ptrue   p0.h
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            udot    z8.d, z10.h, z7.h[1]
+            cmpne   p0.h, p0/z, z8.h, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1h    { z8.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1h    { z7.h }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            udot    z10.d, z8.h, z7.h[1]
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M33034_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 168.50
+						;; size=580 bbWeight=1 PerfScore 168.50
 G_M33034_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=1d6a7ef5) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end
 
+16 (+2.53%) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[sbyte]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[sbyte]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[int]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[sbyte]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[sbyte]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[int]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M55930_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.b
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.s
-            ld1w    { z8.s }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1b    { z10.b }, p0/z, [x0]
+            ptrue   p0.s
+            ld1w    { z10.s }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1b    { z16.b }, p0/z, [x0]
+            ptrue   p0.b
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            sdot    z8.s, z10.b, z16.b
+            cmpne   p0.b, p0/z, z8.b, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1b    { z8.b }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1b    { z16.b }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            sdot    z10.s, z8.b, z16.b
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M55930_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 168.50
+						;; size=580 bbWeight=1 PerfScore 168.50
 G_M55930_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 183.50, instruction count 158, allocated bytes for code 632 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 186.50, instruction count 162, allocated bytes for code 648 (MethodHash=b01a2585) for method JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end
 
+16 (+2.53%) : 569192.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
@@ -9,12 +9,12 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T02] (  4,  4   )     ref  ->  x19         this class-hnd single-def <JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort>
-;* V01 loc0         [V01,T30] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[ushort]>
-;  V02 loc1         [V02,T29] (  3,  3   )    mask  ->  [fp+0x10]   spill-single-def <System.Numerics.Vector`1[ushort]>
-;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;* V01 loc0         [V01,T34] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[ushort]>
+;  V02 loc1         [V02,T32] (  2,  2   )  simd16  ->   d8         <System.Numerics.Vector`1[ushort]>
+;  V03 loc2         [V03,T33] (  2,  2   )  simd16  ->  d10         <System.Numerics.Vector`1[ushort]>
 ;# V04 OutArgs      [V04    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V05 tmp1         [V05,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
-;  V06 tmp2         [V06,T32] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V05 tmp1         [V05,T30] (  2,  4   )  simd16  ->  d10         "impAppendStmt"
+;  V06 tmp2         [V06,T31] (  2,  4   )  simd16  ->   d8         "impAppendStmt"
 ;  V07 tmp3         [V07,T18] (  2,  4   )    long  ->  x21         "impAppendStmt"
 ;  V08 tmp4         [V08,T19] (  2,  4   )    long  ->  x22         "impAppendStmt"
 ;  V09 tmp5         [V09,T20] (  2,  4   )    long  ->  x23         "impAppendStmt"
@@ -51,21 +51,23 @@
 ;* V40 tmp36        [V40    ] (  0,  0   )    long  ->  zero-ref    ld-addr-op "Inline stloc first use temp"
 ;  V41 tmp37        [V41,T28] (  2,  4   )    long  ->   x0         "Inlining Arg"
 ;  V42 tmp38        [V42,T17] (  3,  6   )    long  ->   x4         "Inlining Arg"
-;  V43 cse0         [V43,T00] (  9,  9   )   byref  ->  x20         "CSE #02: aggressive"
+;  V43 cse0         [V43,T29] (  3,  3   )    mask  ->  [fp+0x18]   spill-single-def "CSE #02: moderate"
+;  V44 cse1         [V44,T00] (  9,  9   )   byref  ->  x20         "CSE #01: aggressive"
 ;
-; Lcl frame size = 8
+; Lcl frame size = 16
 
 G_M13407_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-            stp     fp, lr, [sp, #-0x60]!
-            stp     d8, d9, [sp, #0x18]
-            stp     d10, d11, [sp, #0x28]
-            stp     x19, x20, [sp, #0x38]
-            stp     x21, x22, [sp, #0x48]
-            str     x23, [sp, #0x58]
+            stp     fp, lr, [sp, #-0x70]!
+            stp     d8, d9, [sp, #0x20]
+            stp     d10, d11, [sp, #0x30]
+            str     d12, [sp, #0x40]
+            stp     x19, x20, [sp, #0x48]
+            stp     x21, x22, [sp, #0x58]
+            str     x23, [sp, #0x68]
             mov     fp, sp
             mov     x19, x0
             ; gcrRegs +[x19]
-						;; size=32 bbWeight=1 PerfScore 7.00
+						;; size=36 bbWeight=1 PerfScore 8.00
 G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -85,9 +87,7 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             ldr     x1, [x1]
             blr     x1
             ; gcrRegs -[x0]
-            ptrue   p0.h
-            add     xip1, fp, #16
-            str     p0, [xip1]
+            mvni    v8.4s, #0
             add     x20, x19, #96
             ; byrRegs +[x20]
             mov     x21, x20
@@ -99,22 +99,6 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            blr     x1
-            ; byrRegs -[x0]
-            ldr     x1, [x21, #0x20]
-            add     x0, x0, x1
-            sub     x0, x0, #1
-            sub     x1, x1, #1
-            bic     x0, x0, x1
-            ptrue   p0.h
-            ld1h    { z8.h }, p0/z, [x0]
-            mov     x21, x20
-            add     x0, x21, #48
-            ; byrRegs +[x0]
-            movz    x1, #0xD1FFAB1E      // code for <unknown method>
-            movk    x1, #0xD1FFAB1E LSL #16
-            movk    x1, #0xD1FFAB1E LSL #32
-            ldr     x1, [x1]
             mov     v9.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
@@ -123,11 +107,10 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
+            ptrue   p0.h
             ld1h    { z10.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #56
+            add     x0, x21, #48
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
@@ -141,20 +124,20 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            add     xip1, fp, #16
-            ldr     p0, [xip1]
-            ld1h    { z16.h }, p0/z, [x0]
+            ptrue   p0.h
             mov     v8.d[1], v9.d[0]
-            mov     v10.d[1], v11.d[0]
-            eor3    z8.d, z8.d, z10.d, z16.d
+            cmpne   p0.h, p0/z, z8.h, #0
+            add     xip1, fp, #24
+            str     p0, [xip1]
+            ld1h    { z8.h }, p0/z, [x0]
             mov     x21, x20
-            add     x0, x21, #64
+            add     x0, x21, #56
             ; byrRegs +[x0]
             movz    x1, #0xD1FFAB1E      // code for <unknown method>
             movk    x1, #0xD1FFAB1E LSL #16
             movk    x1, #0xD1FFAB1E LSL #32
             ldr     x1, [x1]
-            mov     v9.d[0], v8.d[1]
+            mov     v12.d[0], v8.d[1]
             blr     x1
             ; byrRegs -[x0]
             ldr     x1, [x21, #0x20]
@@ -162,8 +145,29 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             sub     x0, x0, #1
             sub     x1, x1, #1
             bic     x0, x0, x1
-            mov     v8.d[1], v9.d[0]
-            str     q8, [x0]
+            add     xip1, fp, #24
+            ldr     p0, [xip1]
+            ld1h    { z16.h }, p0/z, [x0]
+            mov     v10.d[1], v11.d[0]
+            mov     v8.d[1], v12.d[0]
+            eor3    z10.d, z10.d, z8.d, z16.d
+            mov     x21, x20
+            add     x0, x21, #64
+            ; byrRegs +[x0]
+            movz    x1, #0xD1FFAB1E      // code for <unknown method>
+            movk    x1, #0xD1FFAB1E LSL #16
+            movk    x1, #0xD1FFAB1E LSL #32
+            ldr     x1, [x1]
+            mov     v8.d[0], v10.d[1]
+            blr     x1
+            ; byrRegs -[x0]
+            ldr     x1, [x21, #0x20]
+            add     x0, x0, x1
+            sub     x0, x0, #1
+            sub     x1, x1, #1
+            bic     x0, x0, x1
+            mov     v10.d[1], v8.d[0]
+            str     q10, [x0]
             mov     x21, x20
             add     x0, x21, #40
             ; byrRegs +[x0]
@@ -236,29 +240,30 @@ G_M13407_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x6, #0xD1FFAB1E LSL #16
             movk    x6, #0xD1FFAB1E LSL #32
             ldr     x6, [x6]
-						;; size=572 bbWeight=1 PerfScore 166.50
+						;; size=580 bbWeight=1 PerfScore 166.50
 G_M13407_IG03:        ; bbWeight=1, epilog, nogc, extend
-            ldr     x23, [sp, #0x58]
-            ldp     x21, x22, [sp, #0x48]
-            ldp     x19, x20, [sp, #0x38]
-            ldp     d10, d11, [sp, #0x28]
-            ldp     d8, d9, [sp, #0x18]
-            ldp     fp, lr, [sp], #0x60
+            ldr     x23, [sp, #0x68]
+            ldp     x21, x22, [sp, #0x58]
+            ldp     x19, x20, [sp, #0x48]
+            ldr     d12, [sp, #0x40]
+            ldp     d10, d11, [sp, #0x30]
+            ldp     d8, d9, [sp, #0x20]
+            ldp     fp, lr, [sp], #0x70
             br      x6
-						;; size=28 bbWeight=1 PerfScore 8.00
+						;; size=32 bbWeight=1 PerfScore 10.00
 
-; Total bytes of code 632, prolog size 28, PerfScore 181.50, instruction count 158, allocated bytes for code 632 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
+; Total bytes of code 648, prolog size 32, PerfScore 184.50, instruction count 162, allocated bytes for code 648 (MethodHash=f1c3cba0) for method JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)
 ; ============================================================
 
 Unwind Info:
   >> Start offset   : 0x000000 (not in unwind data)
   >>   End offset   : 0xd1ffab1e (not in unwind data)
-  Code Words        : 3
+  Code Words        : 4
   Epilog Count      : 1
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 158 (0x0009e) Actual length = 632 (0x000278)
+  Function Length   : 162 (0x000a2) Actual length = 648 (0x000288)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
@@ -266,12 +271,15 @@ Unwind Info:
   ---- Unwind codes ----
     E1          set_fp; mov fp, sp
     ---- Epilog start at index 1 ----
-    D1 0B       save_reg X#4 Z#11 (0x0B); str x23, [sp, #88]
+    D1 0D       save_reg X#4 Z#13 (0x0D); str x23, [sp, #104]
     E6          save_next
-    C8 07       save_regp X#0 Z#7 (0x07); stp x19, x20, [sp, #56]
+    C8 09       save_regp X#0 Z#9 (0x09); stp x19, x20, [sp, #72]
+    DD 08       save_freg X#4 Z#8 (0x08); str d12, [sp, #64]
     E6          save_next
-    D8 03       save_fregp X#0 Z#3 (0x03); stp d8, d9, [sp, #24]
-    8B          save_fplr_x #11 (0x0B); stp fp, lr, [sp, #-96]!
+    D8 04       save_fregp X#0 Z#4 (0x04); stp d8, d9, [sp, #32]
+    8D          save_fplr_x #13 (0x0D); stp fp, lr, [sp, #-112]!
+    E4          end
+    E4          end
     E4          end
     E4          end
 
benchmarks.run_pgo.linux.arm64.checked.mch
-4 (-0.85%) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
@@ -36,8 +36,7 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             mov     w0, #0xD1FFAB1E
             str     w0, [fp, #0x20]	// [V11 tmp2]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0x80]	// [V01 loc0]
             str     xzr, [fp, #0x58]	// [V04 loc3]
             cntb    x0, all
@@ -62,7 +61,7 @@ G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M60402_IG05
-						;; size=96 bbWeight=1 PerfScore 40.50
+						;; size=92 bbWeight=1 PerfScore 37.00
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -179,7 +178,7 @@ G_M60402_IG11:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 472, prolog size 36, PerfScore 168.27, instruction count 118, allocated bytes for code 472 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
+; Total bytes of code 468, prolog size 36, PerfScore 164.77, instruction count 117, allocated bytes for code 468 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -190,7 +189,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 118 (0x00076) Actual length = 472 (0x0001d8)
+  Function Length   : 117 (0x00075) Actual length = 468 (0x0001d4)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-0.58%) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
@@ -229,8 +229,7 @@ G_M22667_IG17:        ; bbWeight=0.01, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
 						;; size=12 bbWeight=0.01 PerfScore 0.02
 G_M22667_IG18:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             ptrue   p0.h
-            mov     z16.h, p0/z, #1
-            ptrue   p0.h
+            mvni    v16.4s, #0
             cmpne   p0.h, p0/z, z16.h, #0
             ptrue   p1.h
             ldr     q16, [fp, #0x50]	// [V05 loc4]
@@ -249,7 +248,7 @@ G_M22667_IG18:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             bl      CORINFO_HELP_COUNTPROFILE32
             ; gcr arg pop 0
             movn    w0, #0
-						;; size=76 bbWeight=1 PerfScore 25.50
+						;; size=72 bbWeight=1 PerfScore 22.00
 G_M22667_IG19:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x90
             ret     lr
@@ -265,7 +264,7 @@ G_M22667_IG21:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 688, prolog size 36, PerfScore 211.04, instruction count 172, allocated bytes for code 688 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
+; Total bytes of code 684, prolog size 36, PerfScore 207.54, instruction count 171, allocated bytes for code 684 (MethodHash=8b05a774) for method SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -276,7 +275,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 172 (0x000ac) Actual length = 688 (0x0002b0)
+  Function Length   : 171 (0x000ab) Actual length = 684 (0x0002ac)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-0.49%) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
@@ -95,8 +95,7 @@ G_M34028_IG06:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ldr     x2, [x2]
             blr     x2
             ; gcr arg pop 0
-            ptrue   p0.h
-            mov     z0.h, p0/z, #1
+            mvni    v0.4s, #0
             movz    x0, #0xD1FFAB1E      // code for <unknown method>
             movk    x0, #0xD1FFAB1E LSL #16
             movk    x0, #0xD1FFAB1E LSL #32
@@ -105,7 +104,7 @@ G_M34028_IG06:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ; gcr arg pop 0
             str     q0, [fp, #0x50]	// [V05 loc4]
             b       G_M34028_IG16
-						;; size=68 bbWeight=1 PerfScore 22.50
+						;; size=64 bbWeight=1 PerfScore 19.00
 G_M34028_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             ldr     w0, [fp, #0x84]	// [V01 loc0]
             sxtw    x0, w0
@@ -317,7 +316,7 @@ G_M34028_IG27:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 812, prolog size 36, PerfScore 237.56, instruction count 203, allocated bytes for code 812 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
+; Total bytes of code 808, prolog size 36, PerfScore 234.06, instruction count 202, allocated bytes for code 808 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -328,7 +327,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 203 (0x000cb) Actual length = 812 (0x00032c)
+  Function Length   : 202 (0x000ca) Actual length = 808 (0x000328)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-0.38%) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
@@ -49,8 +49,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             str     wzr, [fp, #0xC4]	// [V01 loc0]
             cntb    x0, all
             str     w0, [fp, #0xC0]	// [V02 loc1]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0xB0]	// [V03 loc2]
             ldr     x0, [fp, #0xC8]	// [V00 this]
             ; gcrRegs +[x0]
@@ -86,7 +85,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M14759_IG05
-						;; size=136 bbWeight=1 PerfScore 59.50
+						;; size=132 bbWeight=1 PerfScore 56.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -394,7 +393,7 @@ G_M14759_IG27:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 1052, prolog size 44, PerfScore 348.29, instruction count 263, allocated bytes for code 1052 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
+; Total bytes of code 1048, prolog size 44, PerfScore 344.79, instruction count 262, allocated bytes for code 1048 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -405,7 +404,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 263 (0x00107) Actual length = 1052 (0x00041c)
+  Function Length   : 262 (0x00106) Actual length = 1048 (0x000418)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-0.43%) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
@@ -41,8 +41,7 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             mov     w0, #0xD1FFAB1E
             str     w0, [fp, #0x28]	// [V15 tmp3]
-            ptrue   p0.b
-            mov     z16.b, p0/z, #1
+            mvni    v16.4s, #0
             str     q16, [fp, #0xA0]	// [V01 loc0]
             str     wzr, [fp, #0x6C]	// [V05 loc4]
             cntb    x0, all
@@ -71,7 +70,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, i
             ldr     w0, [x0, #0x08]
             ; gcrRegs -[x0]
             cbnz    w0, G_M892_IG05
-						;; size=104 bbWeight=1 PerfScore 46.00
+						;; size=100 bbWeight=1 PerfScore 42.50
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             movz    x0, #0xD1FFAB1E
             movk    x0, #0xD1FFAB1E LSL #16
@@ -357,7 +356,7 @@ G_M892_IG24:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 928, prolog size 36, PerfScore 307.29, instruction count 232, allocated bytes for code 928 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
+; Total bytes of code 924, prolog size 36, PerfScore 303.79, instruction count 231, allocated bytes for code 924 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
 ; ============================================================
 
 Unwind Info:
@@ -368,7 +367,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 232 (0x000e8) Actual length = 928 (0x0003a0)
+  Function Length   : 231 (0x000e7) Actual length = 924 (0x00039c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
libraries.pmi.linux.arm64.checked.mch
-4 (-16.67%) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
@@ -16,15 +16,14 @@ G_M40111_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M40111_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.s, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M40111_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=96116350) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
@@ -16,15 +16,14 @@ G_M56373_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M56373_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M56373_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c46823ca) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
@@ -16,15 +16,14 @@ G_M57390_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M57390_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.b, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M57390_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=86bf1fd1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
@@ -16,15 +16,14 @@ G_M33416_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M33416_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.h, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M33416_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=c51e7d77) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
@@ -16,15 +16,14 @@ G_M18837_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M18837_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M18837_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=e813b66a) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-16.67%) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
@@ -16,15 +16,14 @@ G_M43790_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mov     fp, sp
 						;; size=8 bbWeight=1 PerfScore 1.50
 G_M43790_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            pfalse  p0.b
-            mov     z0.d, p0/z, #1
-						;; size=8 bbWeight=1 PerfScore 4.00
+            movi    v0.4s, #0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M43790_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 24, prolog size 8, PerfScore 7.50, instruction count 6, allocated bytes for code 24 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
+; Total bytes of code 20, prolog size 8, PerfScore 4.00, instruction count 5, allocated bytes for code 20 (MethodHash=73a354f1) for method System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -35,7 +34,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 6 (0x00006) Actual length = 24 (0x000018)
+  Function Length   : 5 (0x00005) Actual length = 20 (0x000014)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch
-4 (-1.30%) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
@@ -7,33 +7,33 @@
 ; No matching PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T09] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
-;  V01 loc0         [V01,T05] (  3,  9   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V00 this         [V00,T08] (  5,  5   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;* V01 loc0         [V01,T22] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V03 loc2         [V03    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V04 loc3         [V04    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T00] ( 12, 41.50)     int  ->   x1        
-;  V06 loc5         [V06,T13] (  3,  6   )     int  ->   x2         single-def
-;  V07 loc6         [V07,T17] (  3,  5   )    long  ->   x4        
-;  V08 loc7         [V08,T18] (  3,  5   )    long  ->   x6        
+;  V06 loc5         [V06,T12] (  3,  6   )     int  ->   x2         single-def
+;  V07 loc6         [V07,T16] (  3,  5   )    long  ->   x4        
+;  V08 loc7         [V08,T17] (  3,  5   )    long  ->   x6        
 ;  V09 loc8         [V09    ] (  1,  0.50)     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x10]   must-init pinned class-hnd single-def <byte[]>
-;  V11 loc10        [V11,T08] (  2,  8   )   ubyte  ->   x8        
+;  V11 loc10        [V11,T07] (  2,  8   )   ubyte  ->   x8        
 ;# V12 OutArgs      [V12    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V13 tmp1         [V13,T15] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V14 tmp2         [V14,T16] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
-;  V15 tmp3         [V15,T19] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V16 tmp4         [V16,T20] (  2,  2   )    long  ->   x6         "Cast away GC"
+;  V13 tmp1         [V13,T14] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V14 tmp2         [V14,T15] (  5,  5   )     ref  ->   x6         class-hnd single-def "dup spill" <byte[]>
+;  V15 tmp3         [V15,T18] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V16 tmp4         [V16,T19] (  2,  2   )    long  ->   x6         "Cast away GC"
 ;  V17 tmp5         [V17,T01] (  3, 24   )     ref  ->   x2         "arr expr"
 ;  V18 tmp6         [V18,T02] (  3, 24   )     ref  ->   x6         "arr expr"
-;* V19 tmp7         [V19,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;* V20 tmp8         [V20,T22] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
-;  V21 cse0         [V21,T06] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
-;  V22 cse1         [V22,T07] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
-;  V23 cse2         [V23,T14] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
-;  V24 cse3         [V24,T12] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
-;  V25 cse4         [V25,T10] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
-;  V26 cse5         [V26,T11] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
+;* V19 tmp7         [V19,T20] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;* V20 tmp8         [V20,T21] (  0,  0   )     ref  ->  zero-ref    single-def "arr expr"
+;  V21 cse0         [V21,T05] (  3,  8.50)     int  ->   x2         "CSE #11: aggressive"
+;  V22 cse1         [V22,T06] (  3,  8.50)     int  ->   x4         "CSE #14: aggressive"
+;  V23 cse2         [V23,T13] (  3,  6   )     int  ->   x7         "CSE #07: aggressive"
+;  V24 cse3         [V24,T11] (  4,  6.50)     int  ->   x0         "CSE #06: aggressive"
+;  V25 cse4         [V25,T09] (  4,  6.50)     ref  ->   x3         "CSE #01: aggressive"
+;  V26 cse5         [V26,T10] (  4,  6.50)     ref  ->   x5         "CSE #03: aggressive"
 ;  V27 cse6         [V27,T03] (  3, 12   )    long  ->   x4         "CSE #08: aggressive"
 ;  V28 cse7         [V28,T04] (  3, 12   )    long  ->   x8         "CSE #05: aggressive"
 ;
@@ -46,7 +46,6 @@ G_M892_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, n
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
             mov     w1, wzr
             cntb    x2, all
             ldr     x3, [x0, #0x10]
@@ -57,7 +56,7 @@ G_M892_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref,
             ldr     w6, [x5, #0x08]
             cmp     w4, w6
             bne     G_M892_IG11
-						;; size=36 bbWeight=1 PerfScore 18.00
+						;; size=32 bbWeight=1 PerfScore 16.00
 G_M892_IG03:        ; bbWeight=0.50, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {}, byref, isz
             mov     x4, x3
             ; gcrRegs +[x4]
@@ -99,14 +98,14 @@ G_M892_IG07:        ; bbWeight=1, gcrefRegs=0029 {x0 x3 x5}, byrefRegs=0000 {},
 G_M892_IG08:        ; bbWeight=4, gcrefRegs=0028 {x3 x5}, byrefRegs=0000 {}, byref, isz
             sxtw    x8, w1
             add     x9, x4, x8
+            ptrue   p0.b
             ld1b    { z16.b }, p0/z, [x9]
             add     x8, x6, x8
             ld1b    { z17.b }, p0/z, [x8]
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            uaddv   d16, p1, z16.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            uaddv   d16, p0, z16.b
             umov    x8, v16.d[0]
             uxtb    w8, w8
             cmp     w8, #0
@@ -169,7 +168,7 @@ G_M892_IG15:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {},
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 308, prolog size 12, PerfScore 259.00, instruction count 77, allocated bytes for code 308 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
+; Total bytes of code 304, prolog size 12, PerfScore 257.00, instruction count 76, allocated bytes for code 304 (MethodHash=5bfdfc83) for method SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -180,7 +179,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 77 (0x0004d) Actual length = 308 (0x000134)
+  Function Length   : 76 (0x0004c) Actual length = 304 (0x000130)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-4 (-1.05%) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
@@ -21,14 +21,13 @@
 ;# V10 OutArgs      [V10    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V11 tmp1         [V11,T05] (  5,  8   )     ref  ->   x3         class-hnd single-def "dup spill" <char[]>
 ;* V12 tmp2         [V12    ] (  0,  0   )  ushort  ->  zero-ref    "Inlining Arg"
-;* V13 tmp3         [V13    ] (  0,  0   )  simd16  ->  zero-ref    "Inlining Arg" <System.Numerics.Vector`1[short]>
-;  V14 tmp4         [V14,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
-;  V15 tmp5         [V15,T01] (  3, 24   )     ref  ->   x3         "arr expr"
-;  V16 cse0         [V16,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
-;  V17 cse1         [V17,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
-;  V18 cse2         [V18,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
-;  V19 cse3         [V19,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
-;  V20 cse4         [V20,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
+;  V13 tmp3         [V13,T11] (  2,  2   )    long  ->   x3         "Cast away GC"
+;  V14 tmp4         [V14,T01] (  3, 24   )     ref  ->   x3         "arr expr"
+;  V15 cse0         [V15,T08] (  3,  6   )     int  ->   x4         "CSE #07: aggressive"
+;  V16 cse1         [V16,T03] (  5, 10.25)     int  ->   x0         "CSE #02: aggressive"
+;  V17 cse2         [V17,T07] (  3,  6   )     ref  ->   x2         "CSE #06: aggressive"
+;  V18 cse3         [V18,T04] (  4, 10   )     int  ->   x5         "CSE #05: aggressive"
+;  V19 cse4         [V19,T10] (  2,  4.25)    mask  ->   p0         hoist "CSE #03: aggressive"
 ;
 ; Lcl frame size = 16
 
@@ -62,14 +61,13 @@ G_M34028_IG04:        ; bbWeight=0.50, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}
 G_M34028_IG05:        ; bbWeight=1, gcrefRegs=0005 {x0 x2}, byrefRegs=0000 {}, byref, isz
             ldrh    w4, [x0, #0x14]
             dup     v16.8h, w4
-            ptrue   p0.h
-            mov     z17.h, p0/z, #1
+            mvni    v17.4s, #0
             ldr     w0, [x0, #0x10]
             ; gcrRegs -[x0]
             cnth    x5, all
             cmp     w0, w5
             ble     G_M34028_IG10
-						;; size=32 bbWeight=1 PerfScore 15.50
+						;; size=28 bbWeight=1 PerfScore 12.00
 G_M34028_IG06:        ; bbWeight=0.25, gcrefRegs=0004 {x2}, byrefRegs=0000 {}, byref
             ptrue   p0.h
             cmpne   p0.h, p0/z, z17.h, #0
@@ -177,7 +175,7 @@ G_M34028_IG18:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 236.38, instruction count 95, allocated bytes for code 380 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
+; Total bytes of code 376, prolog size 12, PerfScore 232.88, instruction count 94, allocated bytes for code 376 (MethodHash=3e617b13) for method SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -188,7 +186,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 94 (0x0005e) Actual length = 376 (0x000178)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+8 (+2.11%) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
@@ -8,31 +8,31 @@
 ; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 this         [V00,T06] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
+;  V00 this         [V00,T05] (  9,  7   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrCmp>
 ;  V01 loc0         [V01,T02] (  6, 17.50)     int  ->   x1        
 ;  V02 loc1         [V02,T04] (  5, 10   )     int  ->   x2         single-def
-;  V03 loc2         [V03,T05] (  4, 10   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
-;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
+;* V03 loc2         [V03,T19] (  0,  0   )    mask  ->  zero-ref    single-def <System.Numerics.Vector`1[byte]>
+;  V04 loc3         [V04,T01] (  6, 18   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
 ;  V05 loc4         [V05,T20] (  4, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V06 loc5         [V06    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
 ;* V07 loc6         [V07    ] (  0,  0   )  simd16  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V08 loc7         [V08,T11] (  3,  5   )    long  ->   x4        
-;  V09 loc8         [V09,T12] (  3,  5   )    long  ->   x5        
+;  V08 loc7         [V08,T10] (  3,  5   )    long  ->   x4        
+;  V09 loc8         [V09,T11] (  3,  5   )    long  ->   x5        
 ;  V10 loc9         [V10    ] (  1,  0.50)     ref  ->  [fp+0x28]   must-init pinned class-hnd single-def <byte[]>
 ;  V11 loc10        [V11    ] (  1,  0.50)     ref  ->  [fp+0x20]   must-init pinned class-hnd single-def <byte[]>
-;  V12 loc11        [V12,T10] (  4,  5   )     int  ->   x3        
-;  V13 loc12        [V13,T18] (  3,  1.50)     int  ->   x3         single-def
+;  V12 loc11        [V12,T09] (  4,  5   )     int  ->   x3        
+;  V13 loc12        [V13,T17] (  3,  1.50)     int  ->   x3         single-def
 ;  V14 loc13        [V14,T00] (  7, 22.50)     int  ->   x4        
 ;# V15 OutArgs      [V15    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V16 tmp1         [V16,T08] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
-;  V17 tmp2         [V17,T09] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
-;  V18 tmp3         [V18,T16] (  2,  2   )    long  ->   x4         "Cast away GC"
-;  V19 tmp4         [V19,T17] (  2,  2   )    long  ->   x5         "Cast away GC"
-;  V20 tmp5         [V20,T13] (  3,  3   )     ref  ->   x2         single-def "arr expr"
-;  V21 tmp6         [V21,T14] (  3,  3   )     ref  ->   x0         single-def "arr expr"
-;  V22 cse0         [V22,T07] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
-;  V23 cse1         [V23,T19] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
-;  V24 cse2         [V24,T15] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
+;  V16 tmp1         [V16,T07] (  5,  5   )     ref  ->   x4         class-hnd single-def "dup spill" <byte[]>
+;  V17 tmp2         [V17,T08] (  5,  5   )     ref  ->   x5         class-hnd single-def "dup spill" <byte[]>
+;  V18 tmp3         [V18,T15] (  2,  2   )    long  ->   x4         "Cast away GC"
+;  V19 tmp4         [V19,T16] (  2,  2   )    long  ->   x5         "Cast away GC"
+;  V20 tmp5         [V20,T12] (  3,  3   )     ref  ->   x2         single-def "arr expr"
+;  V21 tmp6         [V21,T13] (  3,  3   )     ref  ->   x0         single-def "arr expr"
+;  V22 cse0         [V22,T06] (  3,  6   )     int  ->   x3         "CSE #05: aggressive"
+;  V23 cse1         [V23,T18] (  3,  1.50)    long  ->   x3         "CSE #08: moderate"
+;  V24 cse2         [V24,T14] (  4,  2   )     int  ->   x1         "CSE #07: moderate"
 ;  V25 cse3         [V25,T03] (  3, 12   )    long  ->   x6         "CSE #06: aggressive"
 ;  V26 rat0         [V26,T21] (  3,  9   )  simd16  ->  [fp+0x10]   do-not-enreg[S] "SIMDInitTempVar"
 ;
@@ -47,10 +47,9 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs +[x0]
             mov     w1, wzr
             cntb    x2, all
-            ptrue   p0.b
             ldr     w3, [x0, #0x20]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             movi    v16.4s, #0
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -62,7 +61,7 @@ G_M14759_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byre
             ; gcrRegs -[x5]
             cmp     w4, w5
             bne     G_M14759_IG14
-						;; size=52 bbWeight=1 PerfScore 24.00
+						;; size=48 bbWeight=1 PerfScore 22.00
 G_M14759_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     x4, [x0, #0x10]
             ; gcrRegs +[x4]
@@ -96,27 +95,30 @@ G_M14759_IG06:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, b
             mov     x5, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M14759_IG07:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
-            ptest   p0, p1.b
+            ptrue   p1.b
+            ptest   p1, p0.b
             bge     G_M14759_IG09
-						;; size=8 bbWeight=1 PerfScore 3.00
+						;; size=12 bbWeight=1 PerfScore 5.00
 G_M14759_IG08:        ; bbWeight=4, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             sxtw    x6, w1
             add     x7, x4, x6
-            ld1b    { z16.b }, p1/z, [x7]
+            ld1b    { z16.b }, p0/z, [x7]
             add     x6, x5, x6
-            ld1b    { z17.b }, p1/z, [x6]
+            ld1b    { z17.b }, p0/z, [x6]
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, z17.b
+            mov     z16.b, p0/z, #1
+            ptrue   p0.b
+            cmpne   p0.b, p0/z, z16.b, #0
             ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, z17.b
-            mov     z16.b, p1/z, #1
-            ptrue   p1.b
-            cmpne   p1.b, p1/z, z16.b, #0
-            ptest   p0, p1.b
+            ptest   p1, p0.b
             bne     G_M14759_IG09
             add     w1, w1, w2
-            whilelt p1.b, w1, w3
-            ptest   p0, p1.b
+            whilelt p0.b, w1, w3
+            ptrue   p1.b
+            ptest   p1, p0.b
             blt     G_M14759_IG08
-						;; size=64 bbWeight=4 PerfScore 152.00
+						;; size=72 bbWeight=4 PerfScore 168.00
 G_M14759_IG09:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             mov     w3, wzr
             mov     w4, wzr
@@ -198,7 +200,7 @@ G_M14759_IG19:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             brk     #0
 						;; size=8 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 380, prolog size 12, PerfScore 248.50, instruction count 95, allocated bytes for code 380 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
+; Total bytes of code 388, prolog size 12, PerfScore 264.50, instruction count 97, allocated bytes for code 388 (MethodHash=5df7c658) for method SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -209,7 +211,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 95 (0x0005f) Actual length = 380 (0x00017c)
+  Function Length   : 97 (0x00061) Actual length = 388 (0x000184)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+8 (+5.26%) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
@@ -8,19 +8,20 @@
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T05] (  4,  4   )     ref  ->   x0         this class-hnd single-def <SveBenchmarks.StrLen>
-;  V01 loc0         [V01,T04] (  3,  7   )    mask  ->   p0         single-def <System.Numerics.Vector`1[byte]>
+;  V01 loc0         [V01,T11] (  2,  3   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
 ;* V02 loc1         [V02    ] (  0,  0   )    mask  ->  zero-ref    <System.Numerics.Vector`1[byte]>
-;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d16         <System.Numerics.Vector`1[byte]>
+;  V03 loc2         [V03,T10] (  5, 13   )  simd16  ->  d17         <System.Numerics.Vector`1[byte]>
 ;  V04 loc3         [V04,T00] (  6, 18   )    long  ->   x1        
 ;  V05 loc4         [V05,T07] (  2,  5   )    long  ->   x2         single-def
-;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p1         <System.Numerics.Vector`1[byte]>
-;  V07 loc6         [V07,T03] (  4,  7   )    long  ->   x0        
+;  V06 loc5         [V06,T01] (  5, 12   )    mask  ->   p0         <System.Numerics.Vector`1[byte]>
+;  V07 loc6         [V07,T04] (  4,  7   )    long  ->   x0        
 ;  V08 loc7         [V08    ] (  1,  1   )     ref  ->  [fp+0x18]   must-init pinned class-hnd single-def <byte[]>
 ;# V09 OutArgs      [V09    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
 ;  V10 tmp1         [V10,T02] (  5,  8   )     ref  ->   x0         class-hnd single-def "dup spill" <byte[]>
 ;  V11 tmp2         [V11,T08] (  2,  2   )    long  ->   x0         "Cast away GC"
 ;  V12 cse0         [V12,T06] (  3,  6   )     int  ->   x3         "CSE #02: aggressive"
-;  V13 cse1         [V13,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
+;  V13 cse1         [V13,T03] (  3,  8   )    mask  ->   p1         "CSE #03: aggressive"
+;  V14 cse2         [V14,T09] (  2,  1   )     int  ->   x4         "CSE #01: moderate"
 ;
 ; Lcl frame size = 16
 
@@ -31,16 +32,16 @@ G_M60402_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 						;; size=12 bbWeight=1 PerfScore 2.50
 G_M60402_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ; gcrRegs +[x0]
-            ptrue   p0.b
+            mvni    v16.4s, #0
             mov     x1, xzr
             cntb    x2, all
             ldr     w3, [x0, #0x18]
             mov     w4, wzr
-            whilelt p1.b, w4, w3
+            whilelt p0.b, w4, w3
             ldr     x0, [x0, #0x08]
             str     x0, [fp, #0x18]	// [V08 loc7]
             cbz     x0, G_M60402_IG04
-						;; size=36 bbWeight=1 PerfScore 15.00
+						;; size=36 bbWeight=1 PerfScore 13.50
 G_M60402_IG03:        ; bbWeight=0.50, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref, isz
             ldr     w4, [x0, #0x08]
             cbz     w4, G_M60402_IG04
@@ -54,28 +55,30 @@ G_M60402_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
             mov     x0, xzr
 						;; size=4 bbWeight=0.50 PerfScore 0.25
 G_M60402_IG05:        ; bbWeight=2, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            ld1b    { z16.b }, p1/z, [x0]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x0]
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z16.b, #0
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             bne     G_M60402_IG07
-						;; size=24 bbWeight=2 PerfScore 33.00
+						;; size=32 bbWeight=2 PerfScore 43.00
 G_M60402_IG06:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
             add     x1, x1, x2
-            whilelt p1.b, w1, w3
+            whilelt p0.b, w1, w3
             add     x4, x0, x1
-            ld1b    { z16.b }, p1/z, [x4]
-            movi    v17.4s, #0
+            ld1b    { z17.b }, p0/z, [x4]
+            movi    v16.4s, #0
             ptrue   p2.b
-            cmpeq   p2.b, p2/z, z16.b, z17.b
-            ptest   p0, p2.b
+            cmpeq   p2.b, p2/z, z17.b, z16.b
+            ptest   p1, p2.b
             beq     G_M60402_IG06
 						;; size=36 bbWeight=4 PerfScore 78.00
 G_M60402_IG07:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ptrue   p0.b
-            cmpne   p0.b, p0/z, z16.b, #0
-            cntp    x0, p1, p0.b
+            ptrue   p1.b
+            cmpne   p1.b, p1/z, z17.b, #0
+            cntp    x0, p0, p1.b
             add     x0, x0, x1
 						;; size=16 bbWeight=1 PerfScore 7.50
 G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
@@ -83,7 +86,7 @@ G_M60402_IG08:        ; bbWeight=1, epilog, nogc, extend
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
 
-; Total bytes of code 152, prolog size 12, PerfScore 141.00, instruction count 38, allocated bytes for code 152 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
+; Total bytes of code 160, prolog size 12, PerfScore 149.50, instruction count 40, allocated bytes for code 160 (MethodHash=b293140d) for method SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -94,7 +97,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 38 (0x00026) Actual length = 152 (0x000098)
+  Function Length   : 40 (0x00028) Actual length = 160 (0x0000a0)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
benchmarks.run.linux.arm64.checked.mch 4 2 2 0 -8 +16
coreclr_tests.run.linux.arm64.checked.mch 10,995 10,810 148 37 -336,352 +2,368
benchmarks.run_pgo.linux.arm64.checked.mch 5 5 0 0 -20 +0
libraries.pmi.linux.arm64.checked.mch 10 10 0 0 -40 +0
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 0 0 0 0 -0 +0
libraries_tests.run.linux.arm64.Release.mch 0 0 0 0 -0 +0
smoke_tests.nativeaot.linux.arm64.checked.mch 0 0 0 0 -0 +0
realworld.run.linux.arm64.checked.mch 0 0 0 0 -0 +0
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch 4 2 2 0 -8 +16
libraries.crossgen2.linux.arm64.checked.mch 0 0 0 0 -0 +0
11,018 10,829 152 37 -336,428 +2,400

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
benchmarks.run.linux.arm64.checked.mch 4 2 2 0 -1.13% +6.23% +0.0003%
coreclr_tests.run.linux.arm64.checked.mch 10,995 10,811 148 36 -5.42% +1.61% -0.1610%
benchmarks.run_pgo.linux.arm64.checked.mch 5 5 0 0 -1.47% 0.00% 0.0000%
libraries.pmi.linux.arm64.checked.mch 10 10 0 0 -46.67% 0.00% -0.0024%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 0 0 0 0 0.00% 0.00% 0.0000%
libraries_tests.run.linux.arm64.Release.mch 0 0 0 0 0.00% 0.00% 0.0000%
smoke_tests.nativeaot.linux.arm64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%
realworld.run.linux.arm64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch 4 2 2 0 -1.13% +6.23% +0.0003%
libraries.crossgen2.linux.arm64.checked.mch 0 0 0 0 0.00% 0.00% 0.0000%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
benchmarks.run.linux.arm64.checked.mch 34,596 2,941 31,655 0 (0.00%) 0 (0.00%)
coreclr_tests.run.linux.arm64.checked.mch 738,681 470,040 268,641 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo.linux.arm64.checked.mch 132,155 61,313 70,842 0 (0.00%) 0 (0.00%)
libraries.pmi.linux.arm64.checked.mch 259,557 5 259,552 0 (0.00%) 0 (0.00%)
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 350,289 21,739 328,550 0 (0.00%) 0 (0.00%)
libraries_tests.run.linux.arm64.Release.mch 767,445 533,158 234,287 0 (0.00%) 0 (0.00%)
smoke_tests.nativeaot.linux.arm64.checked.mch 18,714 7 18,707 0 (0.00%) 0 (0.00%)
realworld.run.linux.arm64.checked.mch 28,798 39 28,759 0 (0.00%) 0 (0.00%)
benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch 35,913 3,374 32,539 0 (0.00%) 0 (0.00%)
libraries.crossgen2.linux.arm64.checked.mch 265,227 17 265,210 0 (0.00%) 0 (0.00%)
2,631,375 1,092,633 1,538,742 0 (0.00%) 0 (0.00%)

jit-analyze output

benchmarks.run.linux.arm64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 16042184 (overridden on cmd)
Total bytes of diff: 16042192 (overridden on cmd)
Total bytes of delta: 8 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
           8 : 21403.dasm (5.26 % of base)
           8 : 8287.dasm (2.11 % of base)

Top file improvements (bytes):
          -4 : 16114.dasm (-1.30 % of base)
          -4 : 26115.dasm (-1.05 % of base)

4 total files with Code Size differences (2 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
           8 (2.11 % of base) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
           8 (5.26 % of base) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

Top method improvements (bytes):
          -4 (-1.30 % of base) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

Top method regressions (percentages):
           8 (5.26 % of base) : 21403.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
           8 (2.11 % of base) : 8287.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

Top method improvements (percentages):
          -4 (-1.30 % of base) : 16114.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26115.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

4 total methods with Code Size differences (2 improved, 2 regressed).


coreclr_tests.run.linux.arm64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 567688716 (overridden on cmd)
Total bytes of diff: 567354732 (overridden on cmd)
Total bytes of delta: -333984 (-0.06 % of base)
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file regressions (bytes):
          16 : 568472.dasm (2.53 % of base)
          16 : 567119.dasm (2.53 % of base)
          16 : 569192.dasm (2.53 % of base)
          16 : 569088.dasm (2.53 % of base)
          16 : 588064.dasm (2.50 % of base)
          16 : 574096.dasm (2.42 % of base)
          16 : 588776.dasm (2.50 % of base)
          16 : 567479.dasm (2.53 % of base)
          16 : 567847.dasm (2.53 % of base)
          16 : 568247.dasm (2.53 % of base)
          16 : 575544.dasm (2.50 % of base)
          16 : 567144.dasm (2.53 % of base)
          16 : 574664.dasm (2.42 % of base)
          16 : 575272.dasm (2.53 % of base)
          16 : 568047.dasm (2.53 % of base)
          16 : 567872.dasm (2.53 % of base)
          16 : 594152.dasm (2.50 % of base)
          16 : 569166.dasm (2.53 % of base)
          16 : 595048.dasm (2.50 % of base)
          16 : 568447.dasm (2.53 % of base)

Top file improvements (bytes):
        -152 : 574294.dasm (-8.02 % of base)
        -152 : 574505.dasm (-8.02 % of base)
        -152 : 574268.dasm (-8.02 % of base)
        -152 : 574680.dasm (-8.02 % of base)
        -152 : 574630.dasm (-8.02 % of base)
        -152 : 574138.dasm (-8.02 % of base)
        -152 : 574216.dasm (-8.02 % of base)
        -152 : 574605.dasm (-8.02 % of base)
        -152 : 574164.dasm (-8.02 % of base)
        -152 : 574530.dasm (-8.02 % of base)
        -152 : 574555.dasm (-8.02 % of base)
        -152 : 574112.dasm (-8.02 % of base)
        -152 : 574580.dasm (-8.02 % of base)
        -152 : 574190.dasm (-8.02 % of base)
        -152 : 574655.dasm (-8.02 % of base)
        -152 : 574242.dasm (-8.02 % of base)
        -144 : 574842.dasm (-8.07 % of base)
        -144 : 574819.dasm (-8.07 % of base)
        -144 : 574865.dasm (-8.07 % of base)
        -144 : 574796.dasm (-8.07 % of base)

84 total files with Code Size differences (54 improved, 30 regressed), 20 unchanged.

Top method regressions (bytes):
          16 (2.50 % of base) : 594152.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractAfterLastActiveElement_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 594408.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractAfterLastActiveElementAndReplicate_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 594792.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractLastActiveElement_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 595048.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_ConditionalExtractLastActiveElementAndReplicate_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574096.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574200.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_sbyte:RunBasicScenario_Load():this (FullOpts)
          16 (2.42 % of base) : 574664.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575358.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 575544.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_FusedMultiplyAddNegated_float:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 588064.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_MultiplyAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.50 % of base) : 588776.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_MultiplySubtract_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567119.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567144.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567311.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningLower_long_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567416.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_int_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567479.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_uint_ushort:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567847.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_int:RunBasicScenario_Load():this (FullOpts)

Top method improvements (bytes):
        -152 (-8.02 % of base) : 574112.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_byte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574268.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574216.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574242.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574164.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_uint:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574190.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_ulong:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574138.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakAfterPropagateMask_ushort:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574505.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_byte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574655.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574680.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574605.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574630.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574555.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_uint:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574580.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_ulong:ConditionalSelect_ZeroOp():this (FullOpts)
        -152 (-8.02 % of base) : 574530.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_CreateBreakBeforePropagateMask_ushort:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574842.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_int:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574865.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_long:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574796.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_sbyte:ConditionalSelect_ZeroOp():this (FullOpts)
        -144 (-8.07 % of base) : 574819.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleBinaryOpTest__Sve_CreateBreakPropagateMask_short:ConditionalSelect_ZeroOp():this (FullOpts)

Top method regressions (percentages):
          16 (2.53 % of base) : 575272.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575294.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProduct_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575358.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 575424.dasm - JIT.HardwareIntrinsics.Arm._Sve.SimpleTernaryOpTest__Sve_DotProductBySelectedScalar_ulong:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567119.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567144.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAdd_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567311.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningLower_long_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567416.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_int_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567479.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_AbsoluteDifferenceAddWideningUpper_uint_ushort:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567847.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 567872.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseClearXor_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568047.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelect_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568072.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelect_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568247.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectLeftInverted_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568272.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectLeftInverted_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568447.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectRightInverted_int:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 568472.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_BitwiseSelectRightInverted_long:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569166.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_byte:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569088.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_short:RunBasicScenario_Load():this (FullOpts)
          16 (2.53 % of base) : 569192.dasm - JIT.HardwareIntrinsics.Arm._Sve2.SimpleTernaryOpTest__Sve2_Xor_ushort:RunBasicScenario_Load():this (FullOpts)

Top method improvements (percentages):
         -28 (-58.33 % of base) : 358603.dasm - PredicateInstructions:BitwiseClear():System.Numerics.Vector`1[short] (FullOpts)
         -24 (-54.55 % of base) : 358606.dasm - PredicateInstructions:ConditionalSelect():System.Numerics.Vector`1[int] (FullOpts)
         -28 (-43.75 % of base) : 679474.dasm - Runtime_1068867:TestEntryPoint() (FullOpts)
         -20 (-41.67 % of base) : 358602.dasm - PredicateInstructions:And():System.Numerics.Vector`1[short] (FullOpts)
         -20 (-41.67 % of base) : 358605.dasm - PredicateInstructions:Or():System.Numerics.Vector`1[short] (FullOpts)
         -20 (-41.67 % of base) : 358604.dasm - PredicateInstructions:Xor():System.Numerics.Vector`1[int] (FullOpts)
         -48 (-33.33 % of base) : 679471.dasm - Runtime_106868:TestEntryPoint() (FullOpts)
         -28 (-23.33 % of base) : 642641.dasm - ChangeMaskUse:CastMaskUseAsMask() (FullOpts)
         -20 (-22.73 % of base) : 679365.dasm - Runtime_105720:TestEntryPoint() (FullOpts)
         -28 (-21.21 % of base) : 679478.dasm - Runtime_106872:TestEntryPoint() (FullOpts)
         -20 (-19.23 % of base) : 642640.dasm - ChangeMaskUse:CastMaskUseAsVector() (FullOpts)
         -16 (-17.39 % of base) : 349058.dasm - EmbeddedLoads:CndSelectEmbeddedOp3LoadAllBits(int[],System.Numerics.Vector`1[int]) (FullOpts)
         -16 (-17.39 % of base) : 349060.dasm - EmbeddedLoads:CndSelectEmbeddedOp3LoadZero(int[],System.Numerics.Vector`1[int]) (FullOpts)
         -20 (-17.24 % of base) : 679570.dasm - Runtime_113338:Test() (FullOpts)
          -4 (-16.67 % of base) : 524461.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 106072.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (Tier0)
          -4 (-16.67 % of base) : 524463.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 106082.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (Tier0)
          -4 (-16.67 % of base) : 524465.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 106093.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (Tier0)


benchmarks.run_pgo.linux.arm64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 71813300 (overridden on cmd)
Total bytes of diff: 71813280 (overridden on cmd)
Total bytes of delta: -20 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file improvements (bytes):
          -4 : 24532.dasm (-0.58 % of base)
          -4 : 76632.dasm (-0.49 % of base)
          -4 : 39357.dasm (-0.43 % of base)
          -4 : 14743.dasm (-0.38 % of base)
          -4 : 58518.dasm (-0.85 % of base)

5 total files with Code Size differences (5 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -4 (-0.38 % of base) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)
          -4 (-0.43 % of base) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
          -4 (-0.58 % of base) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
          -4 (-0.49 % of base) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
          -4 (-0.85 % of base) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)

Top method improvements (percentages):
          -4 (-0.85 % of base) : 58518.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (Instrumented Tier0)
          -4 (-0.58 % of base) : 24532.dasm - SveBenchmarks.StrIndexOf:SveIndexOf():int:this (Instrumented Tier0)
          -4 (-0.49 % of base) : 76632.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (Instrumented Tier0)
          -4 (-0.43 % of base) : 39357.dasm - SveBenchmarks.StrCmp:SveTail():long:this (Instrumented Tier0)
          -4 (-0.38 % of base) : 14743.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (Instrumented Tier0)

5 total methods with Code Size differences (5 improved, 0 regressed).


libraries.pmi.linux.arm64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 68207660 (overridden on cmd)
Total bytes of diff: 68207620 (overridden on cmd)
Total bytes of delta: -40 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file improvements (bytes):
          -4 : 11398.dasm (-16.67 % of base)
          -4 : 11407.dasm (-16.67 % of base)
          -4 : 11405.dasm (-16.67 % of base)
          -4 : 11400.dasm (-16.67 % of base)
          -4 : 11399.dasm (-16.67 % of base)
          -4 : 11401.dasm (-16.67 % of base)
          -4 : 11403.dasm (-16.67 % of base)
          -4 : 11402.dasm (-16.67 % of base)
          -4 : 11406.dasm (-16.67 % of base)
          -4 : 11404.dasm (-16.67 % of base)

10 total files with Code Size differences (10 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -4 (-16.67 % of base) : 11398.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
          -4 (-16.67 % of base) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
          -4 (-16.67 % of base) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
          -4 (-16.67 % of base) : 11404.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSingle():System.Numerics.Vector`1[float] (FullOpts)
          -4 (-16.67 % of base) : 11405.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt16():System.Numerics.Vector`1[ushort] (FullOpts)
          -4 (-16.67 % of base) : 11406.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt32():System.Numerics.Vector`1[uint] (FullOpts)
          -4 (-16.67 % of base) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)

Top method improvements (percentages):
          -4 (-16.67 % of base) : 11398.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskByte():System.Numerics.Vector`1[byte] (FullOpts)
          -4 (-16.67 % of base) : 11399.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskDouble():System.Numerics.Vector`1[double] (FullOpts)
          -4 (-16.67 % of base) : 11400.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt16():System.Numerics.Vector`1[short] (FullOpts)
          -4 (-16.67 % of base) : 11401.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt32():System.Numerics.Vector`1[int] (FullOpts)
          -4 (-16.67 % of base) : 11402.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskInt64():System.Numerics.Vector`1[long] (FullOpts)
          -4 (-16.67 % of base) : 11403.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSByte():System.Numerics.Vector`1[sbyte] (FullOpts)
          -4 (-16.67 % of base) : 11404.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskSingle():System.Numerics.Vector`1[float] (FullOpts)
          -4 (-16.67 % of base) : 11405.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt16():System.Numerics.Vector`1[ushort] (FullOpts)
          -4 (-16.67 % of base) : 11406.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt32():System.Numerics.Vector`1[uint] (FullOpts)
          -4 (-16.67 % of base) : 11407.dasm - System.Runtime.Intrinsics.Arm.Sve:CreateFalseMaskUInt64():System.Numerics.Vector`1[ulong] (FullOpts)

10 total methods with Code Size differences (10 improved, 0 regressed).


benchmarks.run_pgo_optrepeat.linux.arm64.checked.mch

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 16695796 (overridden on cmd)
Total bytes of diff: 16695804 (overridden on cmd)
Total bytes of delta: 8 (0.00 % of base)
    diff is a regression.
    relative diff is a regression.
Detail diffs


Top file regressions (bytes):
           8 : 6897.dasm (2.11 % of base)
           8 : 21539.dasm (5.26 % of base)

Top file improvements (bytes):
          -4 : 13109.dasm (-1.30 % of base)
          -4 : 26420.dasm (-1.05 % of base)

4 total files with Code Size differences (2 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
           8 (2.11 % of base) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)
           8 (5.26 % of base) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)

Top method improvements (bytes):
          -4 (-1.30 % of base) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

Top method regressions (percentages):
           8 (5.26 % of base) : 21539.dasm - SveBenchmarks.StrLen:SveStrLen():ulong:this (FullOpts)
           8 (2.11 % of base) : 6897.dasm - SveBenchmarks.StrCmp:SveStrCmp():long:this (FullOpts)

Top method improvements (percentages):
          -4 (-1.30 % of base) : 13109.dasm - SveBenchmarks.StrCmp:SveTail():long:this (FullOpts)
          -4 (-1.05 % of base) : 26420.dasm - SveBenchmarks.StrIndexOf:SveTail():int:this (FullOpts)

4 total methods with Code Size differences (2 improved, 2 regressed).


@a74nh
Copy link
Contributor Author

a74nh commented Jun 16, 2025

Diffs look the same to me. Which is as expected - Only PredicateInstructions.cs is triggering the optimisation.

@kunalspathak
Copy link
Member

I suggest for this PR disabling the optimisation by removing the call to fgMorphTryUseAllMaskVariant() and fixing up the test cases. A follow on PR would add the cost modelling.

In that case we will have to reopen all the issues that #114438 closed when we introduced that method.

Do you think we should fix the Predicate tests that was regressed by the work that this PR introduces and then open a follow-up issue to fix it in general?

@a74nh
Copy link
Contributor Author

a74nh commented Jun 16, 2025

In that case we will have to reopen all the issues that #114438 closed when we introduced that method.

Given these were essentially dups of each other, could we just reopen one of them?

Do you think we should fix the Predicate tests that was regressed by the work that this PR introduces and then open a follow-up issue to fix it in general?

I think fixing it will be quite a bit of work, so best in a new PR. This PR at least removes the asmcheck lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member
Projects
None yet
3 participants