Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of bugs in the pretty-printer, parser, spreadsheet and other components. #240

Closed
stefanheule opened this issue Aug 22, 2017 · 11 comments

Comments

@stefanheule
Copy link
Member

stefanheule commented Aug 22, 2017

Note: The scope of this issue has been significantly expanded, see messages further below.

For instance cmpq $0x80, %rdi # SIZE=4 gets parsed incorrectly into CMP_R64_IMM32. That opcode doesn't even have size 4.

@bchurchill
Copy link
Member

bchurchill commented Aug 22, 2017 via email

@stefanheule
Copy link
Member Author

Sorry, that was a very short description, but I'm pretty sure this is a real bug. Let me try to be more explicit:

I'm trying to parse cmpq $0x80, %rdi # SIZE=4, which should only succeed if the instruction fits in at most 4 bytes (total encoding size, as you correctly say). For that, the only option is to parse as opcode CMP_R64_IMM8 (with the encoding 48 83 ff 80), but it gets parsed as CMP_R64_IMM32 instead. This is incorrect, because the CMP_R64_IMM32 opcode requires 7 bytes (48 81 ff 80 ff ff ff).

@bchurchill
Copy link
Member

bchurchill commented Aug 22, 2017 via email

@stefanheule
Copy link
Member Author

Yeah, that's what it is supposed to do, but we got it wrong. My proposed fix for this is in https://github.com/StanfordPL/x64asm/tree/issue-240

@bchurchill
Copy link
Member

bchurchill commented Aug 22, 2017 via email

@bchurchill
Copy link
Member

bchurchill commented Aug 22, 2017 via email

@stefanheule
Copy link
Member Author

Okay, I'm trying to dig into this issue a bit more, and there are more problems.

  1. We can't parse orw $0xffff, %r8w # SIZE=5 (i.e. as orw_r16_imm8), even though we should.
  2. We pretty print orw_r16_imm8 with the constant 0xff as orw $0xff, %r8w. This is not correct, or at the very least not consistent with other assemblers. Other assemblers will interpret this as or-ing r8w with 0x00ff, not 0xffff which is correct. We should be printing it as orw $0xffff, %r8w.

To fix this, we will have to take into account the following the operand size. For the orw examples above, it would be 16-bits. With this, we can realize that 0xffff can fit in the 8-bit immediate of orw_r16_imm8, but 0xff cannot and we need to use orw_r16_imm16 instead. For printing, we need to do something similar, where we can't print the 8-bit constant 0xff just as 0xff, but instead print it as 0xffff, if it would cause confusion otherwise.

Doing this properly should also allow us to completely get rid of the weird "poor" encoding thing we are doing right now in the parser, I believe.

Does this sound right? Am I missing something? @bchurchill, it would definitely good to hear your thoughts, as you know much more about this than I do.

@stefanheule
Copy link
Member Author

Okay, I've re-implemented the handling of immediates for both printing and parsing, and fixed several bugs in the process.

Changes are here: https://github.com/StanfordPL/x64asm/compare/issue-240
Together with some fixes over in STOKE: https://github.com/StanfordPL/stoke/compare/issue-974

I'll do some more testing, but so far all the cases that didn't use to work or work incorrectly seem to be okay.

@stefanheule stefanheule changed the title Parser error on some constants Large number of bugs in the pretty-printer, parser, spreadsheet and other components. Aug 24, 2017
@jrkoenig
Copy link
Contributor

I added parts to the haskell code that generates alternate instructions with the .s suffix. This should work for parsing, in that addb %cl, %dl should decode to the regular opcode whereas addb.s should be the alternate one.

Right now addb.s (%ecx), %dl won't work but the .s suffix should only be present if there is some ambiguity. The pretty printer currently does not add these suffixes, but the OPCODE= comment will disambiguate properly because the alternate coding is still in the list of possibilities for addb (no suffix), but won't be selected unless an OPCODE forces it to be.

@stefanheule
Copy link
Member Author

Okay, I fixed all these problems. If you use STOKE, you may want to update to the latest develop version. However, it is not fully backwards compatible in the following sense: x86 programs that STOKE found or disassembled (all the .s files produced by STOKE) may no longer parse correctly (because they were printed incorrectly by the old STOKE).

Here is a description of the issues that were fixed:

Printing and parsing immediates did not work correctly. Some examples:

  • We can't parse orw $0xffff, %r8w # SIZE=5 (i.e. as orw_r16_imm8), even though we should.
  • We pretty print orw_r16_imm8 with the constant 0xff as orw $0xff, %r8w. This is not correct, or at the very least not consistent with other assemblers. Other assemblers will interpret this as or-ing r8w with 0x00ff, not 0xffff which is correct (imms are sign-extended). We should be printing it as orw $0xffff, %r8w.

Under some circumstances, we would parse instructions with the wrong opcode. For instance: I'm trying to parse cmpq $0x80, %rdi # SIZE=4, which should only succeed if the instruction fits in at most 4 bytes (total encoding size, as you correctly say). For that, the only option is to parse as opcode CMP_R64_IMM8 (with the encoding 48 83 ff 80), but it gets parsed as CMP_R64_IMM32 instead. This is incorrect, because the CMP_R64_IMM32 opcode requires 7 bytes (48 81 ff 80 ff ff ff).

Could not disassemble the instructions shl, shr, sal, sar, rcl, rcr, rol, ror if they use a memory operand.

Could not disassemble cmp/vcmp pseudo-opcodes like cmpeq_sd or cmpneq_oq_ss.

Spreadsheet bugs

  • VCVTDQ2PD had the wrong operands
  • Suffix problems for VCVTPD2DQ, CVTSD2SI, VCVTSD2SI, CVTSS2SI, VCVTSS2SI, CVTTSD2SI, VCVTTSD2SI, CVTTSS2SI, VCVTTSS2SI, VMOVNTDQ, POPF, PUSHF, MOVNTI
  • Incorrect opcode for VMOVNTDQ
  • Typo in opcode for VPCMPGTQ
  • Typo in opcode for VBROADCASTI128

Disassembler had a memory and file descriptor leak. Disassembling too many programs will cause the program to run out of file descriptors.

Disassembler could parse a n-byte sequence into an instruction that takes more than n bytes.

Disassembler could parse a single instruction into several instructions (padding with no-ops). That doesn't seem right and may not be what a user expects. This is no longer the case with one or two exceptions.

We didn't support the .s suffix of some instruction variants. Thanks to Jason for fixing this one.

@bchurchill
Copy link
Member

bchurchill commented Aug 25, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants