Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cranelift/x64: Fix XmmRmREvex pretty-printing #8508

Merged
merged 2 commits into from
May 1, 2024

Conversation

jameysharp
Copy link
Contributor

The operand collector had these operands in src1/src2/dst order, but the pretty-printer had dst/src1/src2 order instead.

Note that fixing the pretty-printer makes it disagree with the disassembly from Capstone. I have stared at the emit code and the Intel reference manual and can't figure out how to reconcile these.

However, I have verified that the vpsraq instruction is executed on my laptop in a runtest (cranelift/filetests/filetests/runtests/simd-sshr.clif), and its runtime behavior matches the CLIF interpreter. So this does not appear to be a codegen/correctness bug.

I'm hoping @abrown or @alexcrichton can help explain the disassembly discrepancy.

@jameysharp jameysharp requested a review from a team as a code owner April 29, 2024 21:30
@jameysharp jameysharp requested review from fitzgen and removed request for a team April 29, 2024 21:30
@fitzgen fitzgen requested review from alexcrichton and abrown and removed request for fitzgen April 29, 2024 22:01
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Apr 30, 2024
Copy link
Collaborator

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand:

  • if this disagrees with the Capstone disassembly, shouldn't some test fail here?
  • why does changing the order of pretty-print invocation matter here? Is that thing still stateful?
  • are we aiming for Intel syntax here or AT&T?

@jameysharp
Copy link
Contributor Author

  • why does changing the order of pretty-print invocation matter here? Is that thing still stateful?

Right now, the order that the backend's implementation of get_operands invokes methods like collector.reg_use must match the order that pretty_print or emit invokes methods like allocs.next. The first puts operands into the input for regalloc2, and the others consume allocations from the output of regalloc2, and the only thing keeping them in sync is having all three places agree on the ordering.

So there's definitely one bug here in that x64_get_operands did the reg_use calls before reg_def, and emit called allocs.next in that same order, but pretty_print called allocs.next in a different order. And this PR fixes that bug, but then reveals that something doesn't line up, somewhere.

I have a commit locally to get rid of this ordering dependency from pretty_print and emit, so that only get_operands matters and it can use any order it wants, and we'll never have this problem again. But that commit changes the pretty-printing behavior in the same way that this PR does, by implicitly correcting the order in the pretty-printer. So I want to get this fixed and reviewed by itself first, so we can think about it without the rest of the implications of that change.

  • if this disagrees with the Capstone disassembly, shouldn't some test fail here?

As far as I know, there are no tests which compare our pretty-printer output with Capstone's output. Maybe there should be! It's tricky in cases like pseudo-instructions or synthetic address modes, but we could probably test a lot of instructions easily enough.

So the only way this shows up right now is that for precise-output compile filetests we list our pretty-printer output, and then we list Capstone's output, and a human has to decide whether those are close enough. In this PR, I used CRANELIFT_TEST_BLESS to update the one filetest which
used this instruction, and saw that the pretty-printer output changed but the Capstone output didn't, and now I don't know what to do.

The actual binary machine code that's emitted isn't changing, so it's not surprising that none of the runtests failed, or that the Capstone output didn't change either.

Note that there are currently exactly two instructions which use our Inst::XmmRmREvex representation, vpmullq and vpsraq. We don't currently have any filetests which contain vpmullq, and there's exactly one use of this non-immediate form of vpsraq. So I don't know whether both instructions are affected.

  • are we aiming for Intel syntax here or AT&T?

We're using roughly AT&T syntax in the pretty-printer output. There are good arguments for switching to Intel syntax, but Trevor tried doing that a while ago and found it's a lot of work. So instead we configure Capstone into AT&T mode to be consistent with our existing choices.

Before this PR, our pretty-printer was effectively printing {src2}, {dst}, {src1}. I don't think that makes sense for either Intel or AT&T syntax, does it?

In our sole filetest example, it happens that dst and src1 get assigned the same register, because of ABI constraints I guess, so I suppose Capstone could be printing this in {src2}, {src1}, {dst} order and we wouldn't be able to tell the difference. Would that be reasonable for AT&T syntax?

I think one possibility is that Capstone's version of AT&T syntax has the operands backwards for this particular instruction. I wouldn't blame them if the non-Intel (and non-default) printing mode on a single random SIMD instruction wasn't checked carefully.

The operand collector had these operands in src1/src2/dst order, but the
pretty-printer fetched the allocations in dst/src1/src2 order instead.

Although our pretty-printer looked like it was printing src1/src2/dst,
because it consumed operands in the wrong order, what it actually
printed was src2/dst/src1.

Meanwhile, Capstone actually uses src2/src1/dst order in AT&T mode. (GNU
objdump agrees.)

In the only filetest covering the vpsraq instruction, our output agreed
with Capstone because register allocation picked the same register for
both src1 and dst, so the two orders were indistinguishable. I've
extended the filetest to force register allocation to pick different
registers.

This format is also used for vpmullq, but we didn't have any compile
filetests covering that instruction, so I've added one with the same
register allocation pattern.

Now our pretty-printer agrees with Capstone on both instructions.
@jameysharp
Copy link
Contributor Author

Based on our discussion in the Cranelift meeting today I've updated this PR, and everything makes sense to me now. Thank you so much to @abrown and @fitzgen for helping me figure this out! I've updated the commit message to explain what happened:

The operand collector had these operands in src1/src2/dst order, but the pretty-printer fetched the allocations in dst/src1/src2 order instead.

Although our pretty-printer looked like it was printing src1/src2/dst, because it consumed operands in the wrong order, what it actually printed was src2/dst/src1.

Meanwhile, Capstone actually uses src2/src1/dst order in AT&T mode. (GNU objdump agrees.)

In the only filetest covering the vpsraq instruction, our output agreed with Capstone because register allocation picked the same register for both src1 and dst, so the two orders were indistinguishable. I've extended the filetest to force register allocation to pick different registers.

This format is also used for vpmullq, but we didn't have any compile filetests covering that instruction, so I've added one with the same register allocation pattern.

Now our pretty-printer agrees with Capstone on both instructions.

Copy link
Collaborator

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (With a couple of emit tests to fix up...)

This test for vpmullq had what we have now determined is the wrong order
for src1 and src2.

There were no emit-tests for vpsraq, so I added one.

The vpermi2b tests used the wrong form of the Inst enum, judging by the
assertions that are in x64_get_operands (which is not exercised by emit
tests) and the fact that we never use that form for that instruction
anywhere else.

Pretty-printing vpermi2b disagreed with Capstone in the same way as
vpsraq and vpmullq. I've fixed that form to agree with Capstone as well,
aside from the duplicated src1/dst operand which are required to be
different before register allocation and equal afterward.
@jameysharp
Copy link
Contributor Author

Would you review again now that I've fixed the emit-tests? I found several more things to fix in the process.

@abrown abrown added this pull request to the merge queue May 1, 2024
Merged via the queue into bytecodealliance:main with commit c66c874 May 1, 2024
23 checks passed
@jameysharp jameysharp deleted the fix-evex-disasm branch May 1, 2024 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants