Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aarch64: Specialize constant vector shifts #5976

Merged

Conversation

alexcrichton
Copy link
Member

This commit adds special lowering rules for
vector-shifts-by-constant-amounts to use dedicated instructions which cuts down on the codegen here quite a bit for constant values.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:aarch64 Issues related to AArch64 backend. labels Mar 10, 2023
Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thanks, this codegen is much improved!

; ret
;
; Disassembled:
; block0: ; offset 0x0
; ldr q5, #8
; ldr q1, #8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orthogonal to your changes here but I'm surprised we haven't moved from inline constants we branch around to the VCodeConstant infrastructure; I guess I am confusing this with the x64 backend, where we do have that. Using the end-of-func constant pool here as well might be a nice improvement for a followup PR, especially if the data lands in the middle of an inner loop and pushes the body into another icache line or fetch cycle...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! The fallback for shuffle does an inline load like this as well, and I'm curious enough to hopefully try to give this a stab in the near future

@alexcrichton
Copy link
Member Author

Mind re-reviewing the latest commit when you get a chance @cfallin? Turns out I was only running Cranelift's test suite and the wasm test suite exercised a case where a right-shift happened at the full bit-width of the type triggering assertions in the backend for emission of these instructions with a 0-shift. I've special-cased the ISLE lowerings to not emit an instruction with 0-shifts but I'm not sure if the ISLE is correct since it relies on rule priorities to be correct, which I don't think is how we'd like to rely on ISLE, right?

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with or without below addition. Thanks for fixing; I hadn't realized that about aarch64 vector shifts!


; VCode:
; block0:
; shl v0.16b, v0.16b, #0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something obvious but is there any reason we can't match this left-shift-by-0 (when masked) in the same way and pass the value straight through?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to me like something ideally left to an egraph-style optimization which removes the shift before codegen, and I ideally wanted to do the same thing for the other lowerings but it was just the restrictions on arm64 instruction incoded which prevented ignoring the shift-right-by-zero case. That being said I may as well add it to be consistent with the other rules!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that's true -- it's a reasonable algebraic simplification that we should include in the mid-end as well (might increase reach of other opts etc).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. We do have egraph rules for removing shifts and rotates by 0. Are they not firing here? Are these filetests running with optimization disabled, maybe?

If you need to match on shifts by zero to avoid generating illegal instructions in a particular backend, that's important since we don't necessarily run the egraph pass. But I would be inclined not to bother special-casing anything in the backend that isn't necessary for correctness if it can be done in the egraph pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shifts here are by the full bit width, e.g. a shift by 8 for i8x16, which I don't think is accounted for in the egraph rules (but I just checked to see the checks for 0-shifts, so this would "just" be an extension of that rule). I don't know whether filetests run with or without optimizations, but I would guess without.

Avoiding the 0-case is required for the shift-right cases for correctness since it looks like right-shifts with zero-width can't be encoded as such an instruction (I think this has to do with how right-shift is a negative left-shift on AArch64 or something like that, so maybe this is a too-strict assertion somewhere, not sure, I didn't dig deeply enough to go that far).

I went ahead and added the case here as requested for shift-lefts even though it isn't strictly necessary as required by the backend. In some sense though it does sort of feel like if someone reads that rule it depends on the time of day as to whether they would judge whether or not the rule should be included, so in that sense I don't know whether to leave it in or not as it's not obviously correct nor obviously incorrect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jameysharp in general that we shouldn't pull too many opts into lowering, because of cost in non-optimizing mode and because they can't interact with the main rewriting fixpoint when we're already here. I don't feel too strongly about this case; I guess I would note that it's a nice symmetry with the right-shift case here (which is why it popped out at me originally). On the other hand, we don't special-case x + 0 in the backends, nor should we. So... I'll happily hand the baton to @jameysharp for final judgment here and only request that we extend the mid-end rule to understand masking and cover this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if I have the baton…

  1. I think our general principle should be that algebraic simplifications should not be in backend rules. We should save our backend complexity budget for special cases where we can select a better instruction sequence due to target-specific features.
  2. I don't feel strongly enough about this principle to block merging this PR. I'd lean toward removing the special case for left-shifts but you're welcome to merge as-is if you want to on the basis of @cfallin's r+.

We'll have to figure out how best to handle the wrapping semantics of shifts in the mid-end rules. We should review any existing mid-end rules that match on constant shift amounts to make sure they're handling this correctly, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me! I've removed the left-shift case and rebased as well 👍

@alexcrichton alexcrichton added this pull request to the merge queue Mar 10, 2023
@alexcrichton alexcrichton removed this pull request from the merge queue due to a manual request Mar 10, 2023
@alexcrichton alexcrichton added this pull request to the merge queue Mar 13, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Mar 13, 2023
This commit adds special lowering rules for
vector-shifts-by-constant-amounts to use dedicated instructions which
cuts down on the codegen here quite a bit for constant values.
@alexcrichton alexcrichton added this pull request to the merge queue Mar 13, 2023
Merged via the queue into bytecodealliance:main with commit d6ce632 Mar 13, 2023
@alexcrichton alexcrichton deleted the aarch64-constant-shift branch March 13, 2023 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:aarch64 Issues related to AArch64 backend. cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants