Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s390x: update some regalloc metadata to remove use of reg_mod. #4856

Merged
merged 3 commits into from
Sep 9, 2022

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Sep 2, 2022

This is a step toward ultimately removing modify-operands, which along
with removal of pinned vregs, lets us move to a completely
constraint-based and fully-SSA regalloc input and get some nice
advantages eventually.

There are still a few uses of mod operands and pinned vregs remaining,
especially around the "regpair" abstraction. Those proved to be a bit
trickier to update though, so will have to be done separately.

@cfallin
Copy link
Member Author

cfallin commented Sep 2, 2022

cc @uweigand to review?

I also spent the past day trying to go further in this branch, and managed to clean up a lot more, but there are some panics from undefined regs there so I don't have it quite right. @uweigand if you have time to poke at this more I'd very much appreciate it (but it's not urgent at all!). My long-term goal is to push the regalloc input toward fully-SSA (no mods, no multiple defs) and fully constraint-based (no pinned vregs) input, which allows for more flexibility in managing copies and spills, and makes for a more efficient solver generally.

@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Sep 2, 2022
Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good to me, with the exception of the assembler output issue (see inline comment). Thanks!

let inst = Inst::AluRR {
alu_op,
rd,
ri: rd.to_reg(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a minor nit, but it would feel slightly cleaner to me to pass rn instead of rd.to_reg(). (Of course those two expression have the same value in this branch of the if.) In fact, I'm even wondering whether it wouldn't be cleanest to rename all those ri to rn -- that should make it more obious that a AluRR { op, rd, rn, rm } has identical semantic to a AluRRR { op, rd, rn, rm } if rd is tied to rn.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use rn's value, thanks. I actually lean slightly toward keeping these fields named ri, to make it clear that they are artificial instruction fields, the "input" side of the dest reg, rather than a real rn field (this also made it easier to grep for things when updating tests just now!). But I'm happy to alter the field name as well if you feel strongly about this.

cranelift/filetests/filetests/isa/s390x/arithmetic.clif Outdated Show resolved Hide resolved
cranelift/codegen/src/isa/s390x/inst.isle Outdated Show resolved Hide resolved
@uweigand
Copy link
Member

uweigand commented Sep 5, 2022

I also spent the past day trying to go further in this branch, and managed to clean up a lot more, but there are some panics from undefined regs there so I don't have it quite right. @uweigand if you have time to poke at this more I'd very much appreciate it (but it's not urgent at all!). My long-term goal is to push the regalloc input toward fully-SSA (no mods, no multiple defs) and fully constraint-based (no pinned vregs) input, which allows for more flexibility in managing copies and spills, and makes for a more efficient solver generally.

The main problem here seems to be the tricks I had been playing with uninitialized_regpair. This was intended to solve the problem of how to initialize a register pair for those instructions that use one as input (basically, divides). My model has been that I need to allocate a register pair (uninitialized at this point), and then load up low and high parts of it. That used to work with the old regpair method, but with the new method it now exposes those uninitialized registers to regalloc, which it doesn't like.

But fortunately, with the new model we can instead just load up the two halves into independent vregs and just construct a regpair from those two vregs. That fixes the "udiv" case. For the "sdiv" case, the instruction actually does not read the high half of the input regpair, so it actually should be uninitialized. But here we can simply change the sdivmod pattern to just only take a Reg instead of a RegPair as input, which is closer to the true semantics anyway.

Overall, this change simplifies the logic around regpairs anyway, so I like it. I've attached a patch to implement those changes.
regpair-patch.txt

In addition, I noticed that you've consistently swapped register numbers: the high half of the pair goes into %r0, and the low half goes into %r1 (we're bigendian, after all ...). Also the two inputs to umul_wide were swapped (I guess the operation is commutative, but it still was a surprise). I've added those changes to the patch as well.

Now, I'm running into a new error:

FAIL filetests/filetests/isa/s390x/vec-arithmetic.clif: panicked in worker #10: Could not allocate minimal bundle, but the allocation problem should be possible to solve

This looks like a regalloc problem (at first glance, it occurs when using multiple wide multiplications in a row, so maybe regalloc runs into conflicts since they're all forced into the same physical register pair?) ... could you have a look here?

@cfallin
Copy link
Member Author

cfallin commented Sep 8, 2022

Thanks @uweigand! I've updated based on feedback (and, importantly, reverted the 2-to-3-arg change in assembly printing). Thanks for looking further at the followup patch as well; I will pick that up and try to finish it next week, most likely.

Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Note the inline comment about one minor regalloc regression, but that doesn't block this PR.

; lgr %r3, %r2
; llihf %r2, 2863311530
; iilf %r2, 2863311530
; lgr %r5, %r3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lgr look new - this is why the new code is two instructions longer than the old code. Not sure if this is something that could still be improved in regalloc, or if this is just one of those random changes ... In any case, not a big deal, I just wanted to point it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly looked but it was nothing really obvious. It's possible that my changes in bytecodealliance/regalloc2#74 might help a bit, but I'm not sure; let's see if it reverts back once I update tests there with this merged :-)

This is a step toward ultimately removing modify-operands, which along
with removal of pinned vregs, lets us move to a completely
constraint-based and fully-SSA regalloc input and get some nice
advantages eventually.

There are still a few uses of `mod` operands and pinned vregs remaining,
especially around the "regpair" abstraction. Those proved to be a bit
trickier to update though, so will have to be done separately.
@cfallin
Copy link
Member Author

cfallin commented Sep 9, 2022

Ah, I think this needs an r+ from someone with write access to the repo -- anyone want to give a rubberstamp on top of Ulrich's review above?

@cfallin cfallin merged commit 96bfd4e into bytecodealliance:main Sep 9, 2022
@cfallin cfallin deleted the s390x-ra2-semantics branch September 10, 2022 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants