Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Add an emitter peephole for post-indexed addressing #105181

Merged
merged 4 commits into from
Jul 21, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Jul 20, 2024

This transforms sequences like

ldr x0, [x1]
add x1, x1, #8

into the equivalent

ldr x0, [x1], #8

The second half of this change will be having lowering and strength reduction set up the IR such that this transformation kicks in.

Example codegen:

public static ref int ArrRef(int[] arr)
{
    return ref MemoryMarshal.GetArrayDataReference(arr);
}
@@ -6,15 +6,14 @@ G_M1984_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 
 G_M1984_IG02:        ; bbWeight=1, gcrefRegs=0001 {x0}, byrefRegs=0000 {}, byref
                              ; gcrRegs +[x0]
-            ldrsb   wzr, [x0]
-            add     x0, x0, #16
+            ldrsb   wzr, [x0], #0x10
                              ; gcrRegs -[x0]
                              ; byrRegs +[x0]
-						;; size=8 bbWeight=1 PerfScore 3.50
+						;; size=4 bbWeight=1 PerfScore 3.00
 
 G_M1984_IG03:        ; bbWeight=1, epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
 						;; size=8 bbWeight=1 PerfScore 2.00
-; Total bytes of code: 24
+; Total bytes of code: 20

No pre-indexing support yet.

This transforms sequences like
```asm
ldr x0, [x1]
add x1, x1, dotnet#8
```

into the equivalent
```asm
ldr x0, [x1], dotnet#8
```

The second half of this change will be having lowering and strength
reduction set up the IR such that this transformation kicks in.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 20, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jul 20, 2024

cc @dotnet/jit-contrib - sorry for the weekend ping, but would like to get this into preview 7.

jitstress failures are #105186, #105187

libraries-jitstress failures are #105092, #105189, #102706

gcstress failures are #105186, #105187

Diffs

@jakobbotsch jakobbotsch marked this pull request as ready for review July 20, 2024 19:47
@@ -5844,6 +5844,12 @@ void emitter::emitIns_R_R_I(instruction ins,
return;
}

if ((reg1 == reg2) && (EA_SIZE(attr) == EA_PTRSIZE) && emitComp->opts.OptimizationEnabled() &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't necessarily have to be EA_PTRSIZE, right? Rather, we just need the offset to be a multiple of 8 in range (looks to be in the range of -4096 to 4032, inclusive)?

-- Not something I think that needs to be handled in this PR, but rather that might be a possible future improvement on top.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it depends on the instruction. Some are multiples of 4/8/16 and some might be raw offsets. The ranges vary based on instruction too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be EA_PTRSIZE -- the register that is written is the address that was used for the load, so it is always pointer sized.

The offset is an unscaled 9-bit signed immediate for the (single register) loads that support the write-back. So -256 to 255 is supported for the combined form.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be EA_PTRSIZE -- the register that is written is the address that was used for the load, so it is always pointer sized.

👍

The offset is an unscaled 9-bit signed immediate for the (single register) loads that support the write-back. So -256 to 255 is supported for the combined form.

Yeah, I was looking at the wrong instruction here. There's a few different post-indexing forms. For ldp, as an example, it's:

For the 32-bit post-index and 32-bit pre-index variant: is the signed immediate byte offset, a multiple of 4 in the range -256 to 252, encoded in the "imm7" field as /4.
For the 64-bit post-index and 64-bit pre-index variant: is the signed immediate byte offset, a multiple of 8 in the range -512 to 504, encoded in the "imm7" field as /8.

There's then ldapr which is fixed 4 or 8, ldiapp which is fixed 8 or 16, and ldpsw which has the ... a multiple of 4 in the range -256 to 252 ... text

Seems we're not handling these ones in this PR, which is fine, just had a mental disconnect due to the differences between them 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, some of those forms we could definitely handle in the future. One complication is that some of those forms allow redefining the GC ness of up to 3 registers which instrDesc does not support today, so we would need to expand it in some way.

return false;
}

if ((emitLastIns->idInsFmt() != IF_LS_2A) || emitLastIns->idIsTlsGD())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any load/store instructions this doesn't cover for the initial work being done?

There's a lot of different instructions that support post-indexing, but not sure if all of them are IF_LS_2A or not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this doesn't support ldp and stp. Those instructions can have 7-bit scaled offsets.

There's a lot of different instructions that support post-indexing, but not sure if all of them are IF_LS_2A or not

I don't think there are any other instructions that load or stores that support the post-indexed writeback form, but I could be wrong. ldp and stp have support for post-indexing with writeback that we aren't supporting here yet, so that's something we could look into adding in the future.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM.

Would be nice to log an issue for LDP to be covered as well, but I don't think its important to handle in this PR

@jakobbotsch jakobbotsch changed the title JIT: Add an emitter peephole for for post-indexed addressing JIT: Add an emitter peephole for post-indexed addressing Jul 21, 2024
@jakobbotsch jakobbotsch merged commit fcb9b18 into dotnet:main Jul 21, 2024
101 of 108 checks passed
@jakobbotsch jakobbotsch deleted the post-indexed-addressing branch July 21, 2024 07:56
@jakobbotsch
Copy link
Member Author

Would be nice to log an issue for LDP to be covered as well, but I don't think its important to handle in this PR

Good idea, I opened #105192 for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants