Large number of bugs in the pretty-printer, parser, spreadsheet and other components. #240

stefanheule · 2017-08-22T00:17:04Z

Note: The scope of this issue has been significantly expanded, see messages further below.

For instance cmpq $0x80, %rdi # SIZE=4 gets parsed incorrectly into CMP_R64_IMM32. That opcode doesn't even have size 4.

The text was updated successfully, but these errors were encountered:

bchurchill · 2017-08-22T01:29:31Z

Nah, that's correct. It's possible to encode the immediate as one byte and it will be sign extended to 64 bits. The SIZE is there to disambiguate between the encodings that take different immediate sizes. SIZE refers to total encoding size, not the opcode size. On Aug 21, 2017 5:17 PM, "Stefan Heule" <notifications@github.com> wrote: For instance cmpq $0x80, %rdi # SIZE=4 gets parsed incorrectly into CMP_R64_IMM32. That opcode doesn't even have size 4. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#240>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAnG6aMv1c1x8MJJIgNGn4d3joI6Y7c8ks5sah4AgaJpZM4O993Y> .

stefanheule · 2017-08-22T06:48:32Z

Sorry, that was a very short description, but I'm pretty sure this is a real bug. Let me try to be more explicit:

I'm trying to parse cmpq $0x80, %rdi # SIZE=4, which should only succeed if the instruction fits in at most 4 bytes (total encoding size, as you correctly say). For that, the only option is to parse as opcode CMP_R64_IMM8 (with the encoding 48 83 ff 80), but it gets parsed as CMP_R64_IMM32 instead. This is incorrect, because the CMP_R64_IMM32 opcode requires 7 bytes (48 81 ff 80 ff ff ff).

bchurchill · 2017-08-22T06:58:16Z

Ah, yes, you're right. I guess I was just surprised that doesn't work right! I thought that the parser actually assembles what it thinks it has parsed to make sure the length is right before accepting its input? On Aug 21, 2017 11:48 PM, "Stefan Heule" <notifications@github.com> wrote: Sorry, that was a very short description, but I'm pretty sure this is a real bug. Let me try to be more explicit: I'm trying to parse cmpq $0x80, %rdi # SIZE=4, which should only succeed if the instruction fits in at most 4 bytes (total encoding size, as you correctly say). For that, the only option is to parse as opcode CMP_R64_IMM8 (with the encoding 48 83 ff 80), but it gets parsed as CMP_R64_IMM32 instead. This is incorrect, because the CMP_R64_IMM32 opcode requires 7 bytes (48 81 ff 80 ff ff ff). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#240 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAnG6eIuTtFYHlKXYGHpGbJSSkLuQR2Kks5sannBgaJpZM4O993Y> .

stefanheule · 2017-08-22T06:58:54Z

Yeah, that's what it is supposed to do, but we got it wrong. My proposed fix for this is in https://github.com/StanfordPL/x64asm/tree/issue-240

bchurchill · 2017-08-22T07:15:19Z

aha, I see. Just glancing at the code that fix looks right.

…

On Aug 21, 2017 11:58 PM, "Stefan Heule" ***@***.***> wrote: Yeah, that's what it is supposed to do, but we got it wrong. My proposed fix for this is in https://github.com/StanfordPL/x64asm/tree/issue-240 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#240 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAnG6THvyb5VNvXG-12rNKyWnlAcw3zOks5sanwugaJpZM4O993Y> .

bchurchill · 2017-08-22T07:16:44Z

I wonder how many hours I've wasted because of this bug and never known? Looks pretty bad... On Aug 22, 2017 12:15 AM, "Berkeley Churchill" <berkeleychurchill@gmail.com> wrote:

…

aha, I see. Just glancing at the code that fix looks right. On Aug 21, 2017 11:58 PM, "Stefan Heule" ***@***.***> wrote: > Yeah, that's what it is supposed to do, but we got it wrong. My proposed > fix for this is in https://github.com/StanfordPL/x64asm/tree/issue-240 > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#240 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAnG6THvyb5VNvXG-12rNKyWnlAcw3zOks5sanwugaJpZM4O993Y> > . >

stefanheule · 2017-08-22T21:20:17Z

Okay, I'm trying to dig into this issue a bit more, and there are more problems.

We can't parse orw $0xffff, %r8w # SIZE=5 (i.e. as orw_r16_imm8), even though we should.
We pretty print orw_r16_imm8 with the constant 0xff as orw $0xff, %r8w. This is not correct, or at the very least not consistent with other assemblers. Other assemblers will interpret this as or-ing r8w with 0x00ff, not 0xffff which is correct. We should be printing it as orw $0xffff, %r8w.

To fix this, we will have to take into account the following the operand size. For the orw examples above, it would be 16-bits. With this, we can realize that 0xffff can fit in the 8-bit immediate of orw_r16_imm8, but 0xff cannot and we need to use orw_r16_imm16 instead. For printing, we need to do something similar, where we can't print the 8-bit constant 0xff just as 0xff, but instead print it as 0xffff, if it would cause confusion otherwise.

Doing this properly should also allow us to completely get rid of the weird "poor" encoding thing we are doing right now in the parser, I believe.

Does this sound right? Am I missing something? @bchurchill, it would definitely good to hear your thoughts, as you know much more about this than I do.

stefanheule · 2017-08-23T20:19:39Z

Okay, I've re-implemented the handling of immediates for both printing and parsing, and fixed several bugs in the process.

Changes are here: https://github.com/StanfordPL/x64asm/compare/issue-240
Together with some fixes over in STOKE: https://github.com/StanfordPL/stoke/compare/issue-974

I'll do some more testing, but so far all the cases that didn't use to work or work incorrectly seem to be okay.

jrkoenig · 2017-08-24T23:21:19Z

I added parts to the haskell code that generates alternate instructions with the .s suffix. This should work for parsing, in that addb %cl, %dl should decode to the regular opcode whereas addb.s should be the alternate one.

Right now addb.s (%ecx), %dl won't work but the .s suffix should only be present if there is some ambiguity. The pretty printer currently does not add these suffixes, but the OPCODE= comment will disambiguate properly because the alternate coding is still in the list of possibilities for addb (no suffix), but won't be selected unless an OPCODE forces it to be.

stefanheule · 2017-08-25T20:55:46Z

Okay, I fixed all these problems. If you use STOKE, you may want to update to the latest develop version. However, it is not fully backwards compatible in the following sense: x86 programs that STOKE found or disassembled (all the .s files produced by STOKE) may no longer parse correctly (because they were printed incorrectly by the old STOKE).

Here is a description of the issues that were fixed:

Printing and parsing immediates did not work correctly. Some examples:

We can't parse orw $0xffff, %r8w # SIZE=5 (i.e. as orw_r16_imm8), even though we should.
We pretty print orw_r16_imm8 with the constant 0xff as orw $0xff, %r8w. This is not correct, or at the very least not consistent with other assemblers. Other assemblers will interpret this as or-ing r8w with 0x00ff, not 0xffff which is correct (imms are sign-extended). We should be printing it as orw $0xffff, %r8w.

Under some circumstances, we would parse instructions with the wrong opcode. For instance: I'm trying to parse cmpq $0x80, %rdi # SIZE=4, which should only succeed if the instruction fits in at most 4 bytes (total encoding size, as you correctly say). For that, the only option is to parse as opcode CMP_R64_IMM8 (with the encoding 48 83 ff 80), but it gets parsed as CMP_R64_IMM32 instead. This is incorrect, because the CMP_R64_IMM32 opcode requires 7 bytes (48 81 ff 80 ff ff ff).

Could not disassemble the instructions shl, shr, sal, sar, rcl, rcr, rol, ror if they use a memory operand.

Could not disassemble cmp/vcmp pseudo-opcodes like cmpeq_sd or cmpneq_oq_ss.

Spreadsheet bugs

VCVTDQ2PD had the wrong operands
Suffix problems for VCVTPD2DQ, CVTSD2SI, VCVTSD2SI, CVTSS2SI, VCVTSS2SI, CVTTSD2SI, VCVTTSD2SI, CVTTSS2SI, VCVTTSS2SI, VMOVNTDQ, POPF, PUSHF, MOVNTI
Incorrect opcode for VMOVNTDQ
Typo in opcode for VPCMPGTQ
Typo in opcode for VBROADCASTI128

Disassembler had a memory and file descriptor leak. Disassembling too many programs will cause the program to run out of file descriptors.

Disassembler could parse a n-byte sequence into an instruction that takes more than n bytes.

Disassembler could parse a single instruction into several instructions (padding with no-ops). That doesn't seem right and may not be what a user expects. This is no longer the case with one or two exceptions.

We didn't support the .s suffix of some instruction variants. Thanks to Jason for fixing this one.

bchurchill · 2017-08-25T21:05:30Z

Thank you for finding/fixing all of these Stefan!

…

On Fri, Aug 25, 2017 at 1:55 PM, Stefan Heule ***@***.***> wrote: Okay, I fixed all these problems. If you use STOKE, you may want to update to the latest develop version. However, it is not fully backwards compatible in the following sense: x86 programs that STOKE found or disassembled (all the .s files produced by STOKE) may no longer parse correctly (because they were printed incorrectly by the old STOKE). Here is a description of the issues that were fixed: Printing and parsing immediates did not work correctly. Some examples: - We can't parse orw $0xffff, %r8w # SIZE=5 (i.e. as orw_r16_imm8), even though we should. - We pretty print orw_r16_imm8 with the constant 0xff as orw $0xff, %r8w. This is not correct, or at the very least not consistent with other assemblers. Other assemblers will interpret this as or-ing r8w with 0x00ff, not 0xffff which is correct (imms are sign-extended). We should be printing it as orw $0xffff, %r8w. Under some circumstances, we would parse instructions with the wrong opcode. For instance: I'm trying to parse cmpq $0x80, %rdi # SIZE=4, which should only succeed if the instruction fits in at most 4 bytes (total encoding size, as you correctly say). For that, the only option is to parse as opcode CMP_R64_IMM8 (with the encoding 48 83 ff 80), but it gets parsed as CMP_R64_IMM32 instead. This is incorrect, because the CMP_R64_IMM32 opcode requires 7 bytes (48 81 ff 80 ff ff ff). Could not disassemble the instructions shl, shr, sal, sar, rcl, rcr, rol, ror if they use a memory operand. Could not disassemble cmp/vcmp pseudo-opcodes like cmpeq_sd or cmpneq_oq_ss. Spreadsheet bugs - VCVTDQ2PD had the wrong operands - Suffix problems for VCVTPD2DQ, CVTSD2SI, VCVTSD2SI, CVTSS2SI, VCVTSS2SI, CVTTSD2SI, VCVTTSD2SI, CVTTSS2SI, VCVTTSS2SI, VMOVNTDQ, POPF, PUSHF, MOVNTI - Incorrect opcode for VMOVNTDQ - Typo in opcode for VPCMPGTQ - Typo in opcode for VBROADCASTI128 Disassembler had a memory and file descriptor leak. Disassembling too many programs will cause the program to run out of file descriptors. Disassembler could parse a n-byte sequence into an instruction that takes more than n bytes. Disassembler could parse a single instruction into several instructions (padding with no-ops). That doesn't seem right and may not be what a user expects. This is no longer the case with one or two exceptions. We didn't support the .s suffix of some instruction variants. Thanks to Jason for fixing this one. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#240 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAnG6ZhiosR607aiwVY2hCKZ7j8Y4clXks5sbzTTgaJpZM4O993Y> .

-- Berkeley

stefanheule mentioned this issue Aug 24, 2017

Assembler/disassembler inconsistencies StanfordPL/stoke#974

Closed

stefanheule changed the title ~~Parser error on some constants~~ Large number of bugs in the pretty-printer, parser, spreadsheet and other components. Aug 24, 2017

stefanheule closed this as completed in 4640897 Aug 25, 2017

This was referenced Aug 25, 2017

Alternate instruction encodings are not supported #239

Closed

assembler produces instruction of incorrect size #193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large number of bugs in the pretty-printer, parser, spreadsheet and other components. #240

Large number of bugs in the pretty-printer, parser, spreadsheet and other components. #240

stefanheule commented Aug 22, 2017 •

edited

Loading

bchurchill commented Aug 22, 2017 via email

stefanheule commented Aug 22, 2017

bchurchill commented Aug 22, 2017 via email

stefanheule commented Aug 22, 2017

bchurchill commented Aug 22, 2017 via email

bchurchill commented Aug 22, 2017 via email

stefanheule commented Aug 22, 2017

stefanheule commented Aug 23, 2017

jrkoenig commented Aug 24, 2017

stefanheule commented Aug 25, 2017

bchurchill commented Aug 25, 2017 via email

Large number of bugs in the pretty-printer, parser, spreadsheet and other components. #240

Large number of bugs in the pretty-printer, parser, spreadsheet and other components. #240

Comments

stefanheule commented Aug 22, 2017 • edited Loading

bchurchill commented Aug 22, 2017 via email

stefanheule commented Aug 22, 2017

bchurchill commented Aug 22, 2017 via email

stefanheule commented Aug 22, 2017

bchurchill commented Aug 22, 2017 via email

bchurchill commented Aug 22, 2017 via email

stefanheule commented Aug 22, 2017

stefanheule commented Aug 23, 2017

jrkoenig commented Aug 24, 2017

stefanheule commented Aug 25, 2017

bchurchill commented Aug 25, 2017 via email

stefanheule commented Aug 22, 2017 •

edited

Loading