-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cranelift/x64: Fix XmmRmREvex pretty-printing #8508
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand:
- if this disagrees with the Capstone disassembly, shouldn't some test fail here?
- why does changing the order of pretty-print invocation matter here? Is that thing still stateful?
- are we aiming for Intel syntax here or AT&T?
Right now, the order that the backend's implementation of So there's definitely one bug here in that I have a commit locally to get rid of this ordering dependency from
As far as I know, there are no tests which compare our pretty-printer output with Capstone's output. Maybe there should be! It's tricky in cases like pseudo-instructions or synthetic address modes, but we could probably test a lot of instructions easily enough. So the only way this shows up right now is that for precise-output compile filetests we list our pretty-printer output, and then we list Capstone's output, and a human has to decide whether those are close enough. In this PR, I used The actual binary machine code that's emitted isn't changing, so it's not surprising that none of the runtests failed, or that the Capstone output didn't change either. Note that there are currently exactly two instructions which use our
We're using roughly AT&T syntax in the pretty-printer output. There are good arguments for switching to Intel syntax, but Trevor tried doing that a while ago and found it's a lot of work. So instead we configure Capstone into AT&T mode to be consistent with our existing choices. Before this PR, our pretty-printer was effectively printing In our sole filetest example, it happens that I think one possibility is that Capstone's version of AT&T syntax has the operands backwards for this particular instruction. I wouldn't blame them if the non-Intel (and non-default) printing mode on a single random SIMD instruction wasn't checked carefully. |
The operand collector had these operands in src1/src2/dst order, but the pretty-printer fetched the allocations in dst/src1/src2 order instead. Although our pretty-printer looked like it was printing src1/src2/dst, because it consumed operands in the wrong order, what it actually printed was src2/dst/src1. Meanwhile, Capstone actually uses src2/src1/dst order in AT&T mode. (GNU objdump agrees.) In the only filetest covering the vpsraq instruction, our output agreed with Capstone because register allocation picked the same register for both src1 and dst, so the two orders were indistinguishable. I've extended the filetest to force register allocation to pick different registers. This format is also used for vpmullq, but we didn't have any compile filetests covering that instruction, so I've added one with the same register allocation pattern. Now our pretty-printer agrees with Capstone on both instructions.
0a093ba
to
8401dab
Compare
Based on our discussion in the Cranelift meeting today I've updated this PR, and everything makes sense to me now. Thank you so much to @abrown and @fitzgen for helping me figure this out! I've updated the commit message to explain what happened: The operand collector had these operands in src1/src2/dst order, but the pretty-printer fetched the allocations in dst/src1/src2 order instead. Although our pretty-printer looked like it was printing src1/src2/dst, because it consumed operands in the wrong order, what it actually printed was src2/dst/src1. Meanwhile, Capstone actually uses src2/src1/dst order in AT&T mode. (GNU objdump agrees.) In the only filetest covering the vpsraq instruction, our output agreed with Capstone because register allocation picked the same register for both src1 and dst, so the two orders were indistinguishable. I've extended the filetest to force register allocation to pick different registers. This format is also used for vpmullq, but we didn't have any compile filetests covering that instruction, so I've added one with the same register allocation pattern. Now our pretty-printer agrees with Capstone on both instructions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (With a couple of emit tests to fix up...)
d484153
to
444c920
Compare
This test for vpmullq had what we have now determined is the wrong order for src1 and src2. There were no emit-tests for vpsraq, so I added one. The vpermi2b tests used the wrong form of the Inst enum, judging by the assertions that are in x64_get_operands (which is not exercised by emit tests) and the fact that we never use that form for that instruction anywhere else. Pretty-printing vpermi2b disagreed with Capstone in the same way as vpsraq and vpmullq. I've fixed that form to agree with Capstone as well, aside from the duplicated src1/dst operand which are required to be different before register allocation and equal afterward.
444c920
to
f116290
Compare
Would you review again now that I've fixed the emit-tests? I found several more things to fix in the process. |
The operand collector had these operands in src1/src2/dst order, but the pretty-printer had dst/src1/src2 order instead.
Note that fixing the pretty-printer makes it disagree with the disassembly from Capstone. I have stared at the emit code and the Intel reference manual and can't figure out how to reconcile these.
However, I have verified that the
vpsraq
instruction is executed on my laptop in a runtest (cranelift/filetests/filetests/runtests/simd-sshr.clif
), and its runtime behavior matches the CLIF interpreter. So this does not appear to be a codegen/correctness bug.I'm hoping @abrown or @alexcrichton can help explain the disassembly discrepancy.