Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opt patterns for 3-way comparison (Ord::cmp or <=>) #7636

Merged
merged 2 commits into from
Dec 6, 2023

Conversation

scottmcm
Copy link
Contributor

@scottmcm scottmcm commented Dec 5, 2023

Previous zulip conversation: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/3-way.20compare.20.28Rust.20.60cmp.60.2C.20C.2B.2B20.20.60.3C.3D.3E.60.29/near/404519375

This PR adds a bunch of ISLE opt patterns for three-way comparison on primitives, as exposed by Ord::cmp in Rust or as <=> "spaceship" in C++20. It's split in two parts:

  1. Detecting nested selects that are actually doing this, of which there are sadly many possible forms and no well-known-best way. By doing this cranelift can emit a much simpler MachInst pattern for it than if it needs to use cmovs and constants.
  2. Adding simplifications when common comparisons are done on the result of the spaceship. This is particularly relevant for any code written against a 3-way-comparison interface but that only needs a 2-way result. For example, if you use .sort_by(|a, b| b.cmp(a)) in Rust to reverse-sort something, it immediately just checks for Ordering::Less, and if you have #[derive(PartialOrd)] struct Foo(u32); then the generated < operator is doing the equivalent of (a <=> b) < 0, which this will improve to a < b.

@scottmcm scottmcm requested a review from a team as a code owner December 5, 2023 18:48
@scottmcm scottmcm requested review from abrown and removed request for a team December 5, 2023 18:48
Comment on lines 36 to 43
;; x == y ? 0 : x < y ? -1 : +1
(rule (simplify (select ty (eq rty x y)
(iconst ty (u64_from_imm64 0))
(select ty (ult rty x y)
(iconst ty neg_1)
(iconst ty (u64_from_imm64 1)))))
(if-let -1 (i64_sextend_imm64 ty neg_1))
(sextend_from_i8 ty (isub $I8 (ugt rty x y) (ult rty x y))))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, this is the select pair that Clang 17 appears to use today: https://cpp.godbolt.org/z/nx16oGrPj

  %cmp.lt = icmp ult i32 %a, %b
  %sel.lt = select i1 %cmp.lt, i8 -1, i8 1
  %cmp.eq = icmp eq i32 %a, %b
  %sel.eq = select i1 %cmp.eq, i8 0, i8 %sel.lt

for which it emits the surprisingly-long assembly

        xor     ecx, ecx
        cmp     edi, esi
        mov     eax, 0
        sbb     eax, eax
        or      al, 1
        cmp     edi, esi
        movzx   eax, al
        cmove   eax, ecx

whereas with this PR, cranelift can do just

function %spaceship_u32(i32, i32) -> i8 {
block0(v0: i32, v1: i32):
    v2 = icmp ugt v0, v1
    v3 = icmp ult v0, v1
    v4 = isub v2, v3
    return v4
}

; VCode:
;   pushq   %rbp
;   movq    %rsp, %rbp
; block0:
;   cmpl    %esi, %edi
;   setnbe  %al
;   cmpl    %esi, %edi
;   setb    %r10b
;   subl    %eax, %r10d, %eax
;   movq    %rbp, %rsp
;   popq    %rbp
;   ret

(That does have an extra cmp, but there's enough ISLE in this already that I'll leave fixing that for a future PR.)

;; Clang 17 use different sequences -- so we just match all of them.

;; x < y ? -1 : x == y ? 0 : +1
;; x < y ? -1 : x != y ? +1 : 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is (after LLVM optimizations) the one that Rust started using in rust-lang/rust#63758 : https://rust.godbolt.org/z/rrrvx8xce

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language labels Dec 5, 2023
Copy link

github-actions bot commented Dec 5, 2023

Subscribe to Label Action

cc @cfallin, @fitzgen

This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, thanks for adding these!

I've left a few comments, although they're mostly just cosmetic. The one about (extractor) might be interesting to explore if you're curious but don't take it as a requirement for this PR.

Comment on lines +32 to +38
(rule (simplify (eq _ (sextend _ x@(value_type ty)) (iconst _ (u64_from_imm64 0))))
(eq ty x (iconst ty (imm64 0))))
(rule (simplify (ne _ (sextend _ x@(value_type ty)) (iconst _ (u64_from_imm64 0))))
(ne ty x (iconst ty (imm64 0))))
(rule (simplify (icmp _ cc (sextend _ x@(value_type ty)) (iconst _ (u64_from_imm64 0))))
(if (signed_cond_code cc))
(icmp ty cc x (iconst ty (imm64 0))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be included in this PR specifically, but to make sure I understand, these can all be duplicated for uextend with the exception of changing signed_cond_code, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's definitely a more interesting pattern in here trying to get out.

I started down a path of "I think this actually works for any fits-in-i8 constant", but then I think it's more like the "known bits" pattern that uextend has above, since i8 is just the simple version of that, and I started adding new things to the prelude and such to try to match them well, then thought better of it, reverted all that, and just did the basic "compare with zero" stuff like the "zero-extended integers aren't zero" pattern just above it.

So yeah, I think there's far more possible here, but I didn't want to go there this PR.

Comment on lines 13 to 22
(rule (simplify (select ty (ult rty x y)
(iconst ty neg_1)
(ne rty x y)))
(if-let -1 (i64_sextend_imm64 ty neg_1))
(sextend_from_i8 ty (isub $I8 (ugt rty x y) (ult rty x y))))
(rule (simplify (select ty (ult rty x y)
(iconst ty neg_1)
(uextend ty (ne rty x y))))
(if-let -1 (i64_sextend_imm64 ty neg_1))
(sextend_from_i8 ty (isub $I8 (ugt rty x y) (ult rty x y))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the x64 backend there's a maybe_uextend extractor which you might be able to use here to deduplicate these rules.

Additionally the comment above includes:

;; x < y ? -1 : x == y ? 0 : +1

but these rules don't seem to match that, is that a mistake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does and it doesn't depending on your perspective.

That case is picked up by that pattern, as you can see in the cmp_u1a test https://github.com/bytecodealliance/wasmtime/pull/7636/files#diff-f7401bd506e4a5000af4eed4a8d0b9a4138c423291933d35ec683ff6a88f63f8R208

But it happens via this select (icmp ...) 0 1 simplification pattern: https://github.com/bytecodealliance/wasmtime/pull/7615/files#diff-ec130b69b5b51fce365916a438a4e1178165f56a6d4f0a3ec89d6bbafec0b454R12-R16

And thus the pattern here sees it as x < y ? -1 : x != y.

;; also matching things like `(a <=> b) < 1` or `(a <=> b) <= -1`.

;; (a <=> b) == 0 --> a == b
(rule (simplify (eq ty (isub ty (sgt rty x y) (slt rty x y)) (iconst ty (u64_from_imm64 0))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible way you could simplify this is something like:

(decl spaceship_s (Value Value) Value)
(extractor (spaceship_s v) (isub _ (sgt rty x y) (slt rty x y)))

I forget the precise syntax for (extractor ...) but I think it can be used along the lines of a macro-of-sorts to reduce the repetition here. That being said I've found them historically a bit cumbersome to use and I'm not sure you can pattern-match out the rty still as well (and maybe ty too?) so this may not amount to much after all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give this (and the maybe_uextend from the other thread) a try. There's so much copy-pasting here -- and corresponding opportunities to forget to update the ugts to sgts and vice versa -- that even if it's a bit cumbersome it may well be worth it.

@scottmcm
Copy link
Contributor Author

scottmcm commented Dec 6, 2023

Adding the extractors and constructors worked great! Huge improvement, thanks.

maybe_uextend didn't migrate trivially to the common prelude, so I think I'll skip that one for now. It doesn't help nearly as much as the spaceship_[su] patterns anyway.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice I agree it worked out well! Thanks again!

@alexcrichton alexcrichton added this pull request to the merge queue Dec 6, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 6, 2023
@alexcrichton
Copy link
Member

Failure looks unrelated to this PR, I'll debug that separately.

@alexcrichton alexcrichton added this pull request to the merge queue Dec 6, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 6, 2023
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Dec 6, 2023
This commit is an attempt to address the CI failure popping up in bytecodealliance#7636.
I can reproduce the failure locally and it appears to be due to the fact
that macOS is giving spawned threads via `pthread_create` a 512K stack
by default. Cranelift when compiled in debug mode is taking more stack
than this (but working in release mode). I was talking with Trevor and
Jamey about possible ways to reduce the stack size here but for now I'm
going with Jamey's suggestion of increasing the stack size as the best
course of action for now.
@alexcrichton
Copy link
Member

Ok once #7651 lands I'll retry landing this

github-merge-queue bot pushed a commit that referenced this pull request Dec 6, 2023
This commit is an attempt to address the CI failure popping up in #7636.
I can reproduce the failure locally and it appears to be due to the fact
that macOS is giving spawned threads via `pthread_create` a 512K stack
by default. Cranelift when compiled in debug mode is taking more stack
than this (but working in release mode). I was talking with Trevor and
Jamey about possible ways to reduce the stack size here but for now I'm
going with Jamey's suggestion of increasing the stack size as the best
course of action for now.
@alexcrichton alexcrichton added this pull request to the merge queue Dec 6, 2023
Merged via the queue into bytecodealliance:main with commit 239e4a1 Dec 6, 2023
19 checks passed
@scottmcm scottmcm deleted the spaceship branch December 7, 2023 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants