cranelift/x64: Fix XmmRmREvex pretty-printing #8508

jameysharp · 2024-04-29T21:30:38Z

The operand collector had these operands in src1/src2/dst order, but the pretty-printer had dst/src1/src2 order instead.

Note that fixing the pretty-printer makes it disagree with the disassembly from Capstone. I have stared at the emit code and the Intel reference manual and can't figure out how to reconcile these.

However, I have verified that the vpsraq instruction is executed on my laptop in a runtest (cranelift/filetests/filetests/runtests/simd-sshr.clif), and its runtime behavior matches the CLIF interpreter. So this does not appear to be a codegen/correctness bug.

I'm hoping @abrown or @alexcrichton can help explain the disassembly discrepancy.

abrown

I don't understand:

if this disagrees with the Capstone disassembly, shouldn't some test fail here?
why does changing the order of pretty-print invocation matter here? Is that thing still stateful?
are we aiming for Intel syntax here or AT&T?

jameysharp · 2024-04-30T23:54:50Z

why does changing the order of pretty-print invocation matter here? Is that thing still stateful?

Right now, the order that the backend's implementation of get_operands invokes methods like collector.reg_use must match the order that pretty_print or emit invokes methods like allocs.next. The first puts operands into the input for regalloc2, and the others consume allocations from the output of regalloc2, and the only thing keeping them in sync is having all three places agree on the ordering.

So there's definitely one bug here in that x64_get_operands did the reg_use calls before reg_def, and emit called allocs.next in that same order, but pretty_print called allocs.next in a different order. And this PR fixes that bug, but then reveals that something doesn't line up, somewhere.

I have a commit locally to get rid of this ordering dependency from pretty_print and emit, so that only get_operands matters and it can use any order it wants, and we'll never have this problem again. But that commit changes the pretty-printing behavior in the same way that this PR does, by implicitly correcting the order in the pretty-printer. So I want to get this fixed and reviewed by itself first, so we can think about it without the rest of the implications of that change.

if this disagrees with the Capstone disassembly, shouldn't some test fail here?

As far as I know, there are no tests which compare our pretty-printer output with Capstone's output. Maybe there should be! It's tricky in cases like pseudo-instructions or synthetic address modes, but we could probably test a lot of instructions easily enough.

So the only way this shows up right now is that for precise-output compile filetests we list our pretty-printer output, and then we list Capstone's output, and a human has to decide whether those are close enough. In this PR, I used CRANELIFT_TEST_BLESS to update the one filetest which
used this instruction, and saw that the pretty-printer output changed but the Capstone output didn't, and now I don't know what to do.

The actual binary machine code that's emitted isn't changing, so it's not surprising that none of the runtests failed, or that the Capstone output didn't change either.

Note that there are currently exactly two instructions which use our Inst::XmmRmREvex representation, vpmullq and vpsraq. We don't currently have any filetests which contain vpmullq, and there's exactly one use of this non-immediate form of vpsraq. So I don't know whether both instructions are affected.

are we aiming for Intel syntax here or AT&T?

We're using roughly AT&T syntax in the pretty-printer output. There are good arguments for switching to Intel syntax, but Trevor tried doing that a while ago and found it's a lot of work. So instead we configure Capstone into AT&T mode to be consistent with our existing choices.

Before this PR, our pretty-printer was effectively printing {src2}, {dst}, {src1}. I don't think that makes sense for either Intel or AT&T syntax, does it?

In our sole filetest example, it happens that dst and src1 get assigned the same register, because of ABI constraints I guess, so I suppose Capstone could be printing this in {src2}, {src1}, {dst} order and we wouldn't be able to tell the difference. Would that be reasonable for AT&T syntax?

I think one possibility is that Capstone's version of AT&T syntax has the operands backwards for this particular instruction. I wouldn't blame them if the non-Intel (and non-default) printing mode on a single random SIMD instruction wasn't checked carefully.

The operand collector had these operands in src1/src2/dst order, but the pretty-printer fetched the allocations in dst/src1/src2 order instead. Although our pretty-printer looked like it was printing src1/src2/dst, because it consumed operands in the wrong order, what it actually printed was src2/dst/src1. Meanwhile, Capstone actually uses src2/src1/dst order in AT&T mode. (GNU objdump agrees.) In the only filetest covering the vpsraq instruction, our output agreed with Capstone because register allocation picked the same register for both src1 and dst, so the two orders were indistinguishable. I've extended the filetest to force register allocation to pick different registers. This format is also used for vpmullq, but we didn't have any compile filetests covering that instruction, so I've added one with the same register allocation pattern. Now our pretty-printer agrees with Capstone on both instructions.

jameysharp · 2024-05-01T16:26:37Z

Based on our discussion in the Cranelift meeting today I've updated this PR, and everything makes sense to me now. Thank you so much to @abrown and @fitzgen for helping me figure this out! I've updated the commit message to explain what happened:

The operand collector had these operands in src1/src2/dst order, but the pretty-printer fetched the allocations in dst/src1/src2 order instead.

Although our pretty-printer looked like it was printing src1/src2/dst, because it consumed operands in the wrong order, what it actually printed was src2/dst/src1.

Meanwhile, Capstone actually uses src2/src1/dst order in AT&T mode. (GNU objdump agrees.)

In the only filetest covering the vpsraq instruction, our output agreed with Capstone because register allocation picked the same register for both src1 and dst, so the two orders were indistinguishable. I've extended the filetest to force register allocation to pick different registers.

This format is also used for vpmullq, but we didn't have any compile filetests covering that instruction, so I've added one with the same register allocation pattern.

Now our pretty-printer agrees with Capstone on both instructions.

abrown

LGTM! (With a couple of emit tests to fix up...)

This test for vpmullq had what we have now determined is the wrong order for src1 and src2. There were no emit-tests for vpsraq, so I added one. The vpermi2b tests used the wrong form of the Inst enum, judging by the assertions that are in x64_get_operands (which is not exercised by emit tests) and the fact that we never use that form for that instruction anywhere else. Pretty-printing vpermi2b disagreed with Capstone in the same way as vpsraq and vpmullq. I've fixed that form to agree with Capstone as well, aside from the duplicated src1/dst operand which are required to be different before register allocation and equal afterward.

jameysharp · 2024-05-01T18:20:33Z

Would you review again now that I've fixed the emit-tests? I found several more things to fix in the process.

jameysharp requested a review from a team as a code owner April 29, 2024 21:30

jameysharp requested review from fitzgen and removed request for a team April 29, 2024 21:30

fitzgen requested review from alexcrichton and abrown and removed request for fitzgen April 29, 2024 22:01

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Apr 30, 2024

abrown reviewed Apr 30, 2024

View reviewed changes

jameysharp force-pushed the fix-evex-disasm branch from 0a093ba to 8401dab Compare May 1, 2024 16:22

abrown approved these changes May 1, 2024

View reviewed changes

jameysharp force-pushed the fix-evex-disasm branch from d484153 to 444c920 Compare May 1, 2024 18:17

jameysharp force-pushed the fix-evex-disasm branch from 444c920 to f116290 Compare May 1, 2024 18:20

abrown approved these changes May 1, 2024

View reviewed changes

abrown added this pull request to the merge queue May 1, 2024

Merged via the queue into bytecodealliance:main with commit c66c874 May 1, 2024
23 checks passed

jameysharp deleted the fix-evex-disasm branch May 1, 2024 22:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cranelift/x64: Fix XmmRmREvex pretty-printing #8508

cranelift/x64: Fix XmmRmREvex pretty-printing #8508

jameysharp commented Apr 29, 2024

abrown left a comment

jameysharp commented Apr 30, 2024

jameysharp commented May 1, 2024

abrown left a comment •

edited

Loading

jameysharp commented May 1, 2024

cranelift/x64: Fix XmmRmREvex pretty-printing #8508

cranelift/x64: Fix XmmRmREvex pretty-printing #8508

Conversation

jameysharp commented Apr 29, 2024

abrown left a comment

Choose a reason for hiding this comment

jameysharp commented Apr 30, 2024

jameysharp commented May 1, 2024

abrown left a comment • edited Loading

Choose a reason for hiding this comment

jameysharp commented May 1, 2024

abrown left a comment •

edited

Loading