-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add opt patterns for 3-way comparison (Ord::cmp
or <=>
)
#7636
Conversation
;; x == y ? 0 : x < y ? -1 : +1 | ||
(rule (simplify (select ty (eq rty x y) | ||
(iconst ty (u64_from_imm64 0)) | ||
(select ty (ult rty x y) | ||
(iconst ty neg_1) | ||
(iconst ty (u64_from_imm64 1))))) | ||
(if-let -1 (i64_sextend_imm64 ty neg_1)) | ||
(sextend_from_i8 ty (isub $I8 (ugt rty x y) (ult rty x y)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, this is the select pair that Clang 17 appears to use today: https://cpp.godbolt.org/z/nx16oGrPj
%cmp.lt = icmp ult i32 %a, %b
%sel.lt = select i1 %cmp.lt, i8 -1, i8 1
%cmp.eq = icmp eq i32 %a, %b
%sel.eq = select i1 %cmp.eq, i8 0, i8 %sel.lt
for which it emits the surprisingly-long assembly
xor ecx, ecx
cmp edi, esi
mov eax, 0
sbb eax, eax
or al, 1
cmp edi, esi
movzx eax, al
cmove eax, ecx
whereas with this PR, cranelift can do just
function %spaceship_u32(i32, i32) -> i8 {
block0(v0: i32, v1: i32):
v2 = icmp ugt v0, v1
v3 = icmp ult v0, v1
v4 = isub v2, v3
return v4
}
; VCode:
; pushq %rbp
; movq %rsp, %rbp
; block0:
; cmpl %esi, %edi
; setnbe %al
; cmpl %esi, %edi
; setb %r10b
; subl %eax, %r10d, %eax
; movq %rbp, %rsp
; popq %rbp
; ret
(That does have an extra cmp
, but there's enough ISLE in this already that I'll leave fixing that for a future PR.)
;; Clang 17 use different sequences -- so we just match all of them. | ||
|
||
;; x < y ? -1 : x == y ? 0 : +1 | ||
;; x < y ? -1 : x != y ? +1 : 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is (after LLVM optimizations) the one that Rust started using in rust-lang/rust#63758 : https://rust.godbolt.org/z/rrrvx8xce
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me, thanks for adding these!
I've left a few comments, although they're mostly just cosmetic. The one about (extractor)
might be interesting to explore if you're curious but don't take it as a requirement for this PR.
(rule (simplify (eq _ (sextend _ x@(value_type ty)) (iconst _ (u64_from_imm64 0)))) | ||
(eq ty x (iconst ty (imm64 0)))) | ||
(rule (simplify (ne _ (sextend _ x@(value_type ty)) (iconst _ (u64_from_imm64 0)))) | ||
(ne ty x (iconst ty (imm64 0)))) | ||
(rule (simplify (icmp _ cc (sextend _ x@(value_type ty)) (iconst _ (u64_from_imm64 0)))) | ||
(if (signed_cond_code cc)) | ||
(icmp ty cc x (iconst ty (imm64 0)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't need to be included in this PR specifically, but to make sure I understand, these can all be duplicated for uextend
with the exception of changing signed_cond_code
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's definitely a more interesting pattern in here trying to get out.
I started down a path of "I think this actually works for any fits-in-i8
constant", but then I think it's more like the "known bits" pattern that uextend
has above, since i8
is just the simple version of that, and I started adding new things to the prelude and such to try to match them well, then thought better of it, reverted all that, and just did the basic "compare with zero" stuff like the "zero-extended integers aren't zero" pattern just above it.
So yeah, I think there's far more possible here, but I didn't want to go there this PR.
(rule (simplify (select ty (ult rty x y) | ||
(iconst ty neg_1) | ||
(ne rty x y))) | ||
(if-let -1 (i64_sextend_imm64 ty neg_1)) | ||
(sextend_from_i8 ty (isub $I8 (ugt rty x y) (ult rty x y)))) | ||
(rule (simplify (select ty (ult rty x y) | ||
(iconst ty neg_1) | ||
(uextend ty (ne rty x y)))) | ||
(if-let -1 (i64_sextend_imm64 ty neg_1)) | ||
(sextend_from_i8 ty (isub $I8 (ugt rty x y) (ult rty x y)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the x64 backend there's a maybe_uextend
extractor which you might be able to use here to deduplicate these rules.
Additionally the comment above includes:
;; x < y ? -1 : x == y ? 0 : +1
but these rules don't seem to match that, is that a mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does and it doesn't depending on your perspective.
That case is picked up by that pattern, as you can see in the cmp_u1a
test https://github.com/bytecodealliance/wasmtime/pull/7636/files#diff-f7401bd506e4a5000af4eed4a8d0b9a4138c423291933d35ec683ff6a88f63f8R208
But it happens via this select (icmp ...) 0 1
simplification pattern: https://github.com/bytecodealliance/wasmtime/pull/7615/files#diff-ec130b69b5b51fce365916a438a4e1178165f56a6d4f0a3ec89d6bbafec0b454R12-R16
And thus the pattern here sees it as x < y ? -1 : x != y
.
;; also matching things like `(a <=> b) < 1` or `(a <=> b) <= -1`. | ||
|
||
;; (a <=> b) == 0 --> a == b | ||
(rule (simplify (eq ty (isub ty (sgt rty x y) (slt rty x y)) (iconst ty (u64_from_imm64 0)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible way you could simplify this is something like:
(decl spaceship_s (Value Value) Value)
(extractor (spaceship_s v) (isub _ (sgt rty x y) (slt rty x y)))
I forget the precise syntax for (extractor ...)
but I think it can be used along the lines of a macro-of-sorts to reduce the repetition here. That being said I've found them historically a bit cumbersome to use and I'm not sure you can pattern-match out the rty
still as well (and maybe ty
too?) so this may not amount to much after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give this (and the maybe_uextend
from the other thread) a try. There's so much copy-pasting here -- and corresponding opportunities to forget to update the ugt
s to sgt
s and vice versa -- that even if it's a bit cumbersome it may well be worth it.
Adding the extractors and constructors worked great! Huge improvement, thanks.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice I agree it worked out well! Thanks again!
Failure looks unrelated to this PR, I'll debug that separately. |
This commit is an attempt to address the CI failure popping up in bytecodealliance#7636. I can reproduce the failure locally and it appears to be due to the fact that macOS is giving spawned threads via `pthread_create` a 512K stack by default. Cranelift when compiled in debug mode is taking more stack than this (but working in release mode). I was talking with Trevor and Jamey about possible ways to reduce the stack size here but for now I'm going with Jamey's suggestion of increasing the stack size as the best course of action for now.
Ok once #7651 lands I'll retry landing this |
This commit is an attempt to address the CI failure popping up in #7636. I can reproduce the failure locally and it appears to be due to the fact that macOS is giving spawned threads via `pthread_create` a 512K stack by default. Cranelift when compiled in debug mode is taking more stack than this (but working in release mode). I was talking with Trevor and Jamey about possible ways to reduce the stack size here but for now I'm going with Jamey's suggestion of increasing the stack size as the best course of action for now.
Previous zulip conversation: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/3-way.20compare.20.28Rust.20.60cmp.60.2C.20C.2B.2B20.20.60.3C.3D.3E.60.29/near/404519375
This PR adds a bunch of ISLE opt patterns for three-way comparison on primitives, as exposed by
Ord::cmp
in Rust or as<=>
"spaceship" in C++20. It's split in two parts:select
s that are actually doing this, of which there are sadly many possible forms and no well-known-best way. By doing this cranelift can emit a much simpler MachInst pattern for it than if it needs to use cmovs and constants..sort_by(|a, b| b.cmp(a))
in Rust to reverse-sort something, it immediately just checks forOrdering::Less
, and if you have#[derive(PartialOrd)] struct Foo(u32);
then the generated<
operator is doing the equivalent of(a <=> b) < 0
, which this will improve toa < b
.