aarch64: Specialize constant vector shifts #5976

alexcrichton · 2023-03-10T03:24:03Z

This commit adds special lowering rules for
vector-shifts-by-constant-amounts to use dedicated instructions which cuts down on the codegen here quite a bit for constant values.

cfallin

LGTM -- thanks, this codegen is much improved!

cfallin · 2023-03-10T04:06:12Z

cranelift/filetests/filetests/isa/aarch64/simd-bitwise-compile.clif

 ;   ret
 ; 
 ; Disassembled:
 ; block0: ; offset 0x0
-;   ldr q5, #8
+;   ldr q1, #8


Orthogonal to your changes here but I'm surprised we haven't moved from inline constants we branch around to the VCodeConstant infrastructure; I guess I am confusing this with the x64 backend, where we do have that. Using the end-of-func constant pool here as well might be a nice improvement for a followup PR, especially if the data lands in the middle of an inner loop and pushes the body into another icache line or fetch cycle...

Agreed! The fallback for shuffle does an inline load like this as well, and I'm curious enough to hopefully try to give this a stab in the near future

alexcrichton · 2023-03-10T15:34:14Z

Mind re-reviewing the latest commit when you get a chance @cfallin? Turns out I was only running Cranelift's test suite and the wasm test suite exercised a case where a right-shift happened at the full bit-width of the type triggering assertions in the backend for emission of these instructions with a 0-shift. I've special-cased the ISLE lowerings to not emit an instruction with 0-shifts but I'm not sure if the ISLE is correct since it relies on rule priorities to be correct, which I don't think is how we'd like to rely on ISLE, right?

cfallin

LGTM with or without below addition. Thanks for fixing; I hadn't realized that about aarch64 vector shifts!

cfallin · 2023-03-10T19:02:10Z

cranelift/filetests/filetests/isa/aarch64/simd-arithmetic.clif

+
+; VCode:
+; block0:
+;   shl v0.16b, v0.16b, #0


I might be missing something obvious but is there any reason we can't match this left-shift-by-0 (when masked) in the same way and pass the value straight through?

This seems to me like something ideally left to an egraph-style optimization which removes the shift before codegen, and I ideally wanted to do the same thing for the other lowerings but it was just the restrictions on arm64 instruction incoded which prevented ignoring the shift-right-by-zero case. That being said I may as well add it to be consistent with the other rules!

Ah, yes, that's true -- it's a reasonable algebraic simplification that we should include in the mid-end as well (might increase reach of other opts etc).

I'm confused. We do have egraph rules for removing shifts and rotates by 0. Are they not firing here? Are these filetests running with optimization disabled, maybe?

If you need to match on shifts by zero to avoid generating illegal instructions in a particular backend, that's important since we don't necessarily run the egraph pass. But I would be inclined not to bother special-casing anything in the backend that isn't necessary for correctness if it can be done in the egraph pass.

The shifts here are by the full bit width, e.g. a shift by 8 for i8x16, which I don't think is accounted for in the egraph rules (but I just checked to see the checks for 0-shifts, so this would "just" be an extension of that rule). I don't know whether filetests run with or without optimizations, but I would guess without.

Avoiding the 0-case is required for the shift-right cases for correctness since it looks like right-shifts with zero-width can't be encoded as such an instruction (I think this has to do with how right-shift is a negative left-shift on AArch64 or something like that, so maybe this is a too-strict assertion somewhere, not sure, I didn't dig deeply enough to go that far).

I went ahead and added the case here as requested for shift-lefts even though it isn't strictly necessary as required by the backend. In some sense though it does sort of feel like if someone reads that rule it depends on the time of day as to whether they would judge whether or not the rule should be included, so in that sense I don't know whether to leave it in or not as it's not obviously correct nor obviously incorrect.

I agree with @jameysharp in general that we shouldn't pull too many opts into lowering, because of cost in non-optimizing mode and because they can't interact with the main rewriting fixpoint when we're already here. I don't feel too strongly about this case; I guess I would note that it's a nice symmetry with the right-shift case here (which is why it popped out at me originally). On the other hand, we don't special-case x + 0 in the backends, nor should we. So... I'll happily hand the baton to @jameysharp for final judgment here and only request that we extend the mid-end rule to understand masking and cover this case.

Well, if I have the baton…

I think our general principle should be that algebraic simplifications should not be in backend rules. We should save our backend complexity budget for special cases where we can select a better instruction sequence due to target-specific features.

I don't feel strongly enough about this principle to block merging this PR. I'd lean toward removing the special case for left-shifts but you're welcome to merge as-is if you want to on the basis of @cfallin's r+.

We'll have to figure out how best to handle the wrapping semantics of shifts in the mid-end rules. We should review any existing mid-end rules that match on constant shift amounts to make sure they're handling this correctly, too.

Works for me! I've removed the left-shift case and rebased as well 👍

This commit adds special lowering rules for vector-shifts-by-constant-amounts to use dedicated instructions which cuts down on the codegen here quite a bit for constant values.

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. labels Mar 10, 2023

cfallin approved these changes Mar 10, 2023

View reviewed changes

alexcrichton enabled auto-merge March 10, 2023 20:02

alexcrichton added this pull request to the merge queue Mar 10, 2023

alexcrichton removed this pull request from the merge queue due to a manual request Mar 10, 2023

alexcrichton mentioned this pull request Mar 12, 2023

aarch64: Use VCodeConstant for f64/v128 constants #5997

Merged

alexcrichton force-pushed the aarch64-constant-shift branch from 828c2ec to ea8077f Compare March 13, 2023 19:33

alexcrichton enabled auto-merge March 13, 2023 19:33

alexcrichton added this pull request to the merge queue Mar 13, 2023

github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Mar 13, 2023

alexcrichton added 4 commits March 13, 2023 14:59

aarch64: Specialize constant vector shifts

3908684

This commit adds special lowering rules for vector-shifts-by-constant-amounts to use dedicated instructions which cuts down on the codegen here quite a bit for constant values.

Fix codegen for 0-shift-rights

5374c0b

Special-case zero left-shifts as well

a1a87d8

Remove left-shift special case

803e188

alexcrichton force-pushed the aarch64-constant-shift branch from ea8077f to 803e188 Compare March 13, 2023 21:59

alexcrichton enabled auto-merge March 13, 2023 22:00

alexcrichton added this pull request to the merge queue Mar 13, 2023

Merged via the queue into bytecodealliance:main with commit d6ce632 Mar 13, 2023

alexcrichton deleted the aarch64-constant-shift branch March 13, 2023 23:16

jameysharp mentioned this pull request Apr 5, 2023

Add release notes for 8.0.0 #6145

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aarch64: Specialize constant vector shifts #5976

aarch64: Specialize constant vector shifts #5976

alexcrichton commented Mar 10, 2023

cfallin left a comment

cfallin Mar 10, 2023

alexcrichton Mar 10, 2023

alexcrichton commented Mar 10, 2023

cfallin left a comment

cfallin Mar 10, 2023

alexcrichton Mar 10, 2023

cfallin Mar 10, 2023

jameysharp Mar 10, 2023

alexcrichton Mar 10, 2023

cfallin Mar 10, 2023

jameysharp Mar 13, 2023

alexcrichton Mar 13, 2023

aarch64: Specialize constant vector shifts #5976

aarch64: Specialize constant vector shifts #5976

Conversation

alexcrichton commented Mar 10, 2023

cfallin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Mar 10, 2023

cfallin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment