Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check all "unpredictable" conditions and "reserved (0)" bits when decoding #1685

Closed
derekbruening opened this issue Apr 14, 2015 · 8 comments

Comments

@derekbruening
Copy link
Contributor

Today our decoder does not care much about "unpredictable" conditions (typically pc or lr used as certain operands, but widely varying) and "reserved (0)" bits when decoding. Both would require adding extra info to our decode tables, for questionable benefit. Often the processor will execute "unpredictable" instructions just fine, and as far as DR is concerned we'd rather wait for a fault. However, to make a nice standalone decoder we might want to consider adding all this info. Some of this is a gray area and varies across processor versions, so we'd also want to be much more rigorous about which processor supports which instruction variant. Definitely low priority for DR itself.

@derekbruening
Copy link
Contributor Author

There are many examples of different decoders doing different things in this area. Here is one example:

DR:
    +0x28b88   f9b03f82   rfeib  (%r0)[8byte] %r0 -> %r0 %cpsr 
capstone 72ee3c9:
      0x00028b88:  f9b03f82   rfeib #2! 
llvm 3.4.2:
# echo ' 0x82 0x3f 0xb0 0xf9 ' | /usr/bin/llvm-mc -arch arm --disassemble
        rfeib   #2!
gdb 7.7.1:
(gdb) set {unsigned char[400]}0x04311000 = {0x82,0x3f,0xb0,0xf9}
(gdb) set arm fallback-mode arm
(gdb) x/2i 0x04311000
   0x4311000:                   ; <UNDEFINED> instruction: 0xf9b03f82

v8 and v7 manuals say:

Assembler syntax
RFE{<amode>}{<c>}{<q>} <Rn>{!}

Manual also says the bottom half should be 0x0a00 (though the values are in
parens, thus the "reserved" of this issue) so perhaps this should be invalid.

Clear those bits and everyone agrees:

# echo ' 0x00 0x0a 0xb0 0xf9 ' | /usr/bin/llvm-mc -arch arm --disassemble
        rfeib   r0!
# echo ' 0x00 0x0a 0xb0 0xf9 ' | /extsw/pkgs/disasm/capstone/build/capstone -arm -
0x00000000:  f9b00a00   rfeib   r0!

I do not see where this #2 is coming from? But why would capstone and llvm have the exact same behavior there?

Similarly:

    +0xc868   f83bff64   rfeda  (%r11)[8byte] %r11 -> %r11 %cpsr 
     0x0000c868:  f83bff64   rfeda  #3! 

@derekbruening
Copy link
Contributor Author

I'm using this issue to document some of the corner cases.

One is a SIMD consecutive register list exceeding the max register. E.g.:

0x00004458:  dcd00ab8   vldmiale    r0, {s1, s2, s3, s4, s5, s6, s7, s8, s9, s10, s11, s12, s13, s14, s15, s16, s17, s18, s19, s20, s21, s22, s23, s24, s25, s26, s27, s28, s29, s30, s31} 

We used to consider it invalid (a rare case of our decoder being stricter than capstone):

    0x00004458:   dcd00ab8   <INVALID>

I recently had to add a missing check for >32 total regs and decided to not mark it invalid. But a little inconsistent of us to consider ok if start at 0 and ask for >32 but mark as invalid if do same but start at > 0.

I ran:

        .word    0xdcd00ab8

in suite/tests/bin/common.allasm_arm and no SIGILL, so I'm going to remove the existing invalid marking for >s31.

@derekbruening
Copy link
Contributor Author

Listing a number of other issues beyond the OP_rfe and SIMD lists above:

*** DONE many cases of capstone labeling invalid but we think it's ok => assuming all are just "unpredictable" due to PC operand or sthg
     CLOSED: [2015-04-16 Thu 10:06]
     - State "DONE"       from "TODO"       [2015-04-16 Thu 10:06]

I'm getting differences due to capstone thinking some things are invalid:

+0x0540   00000000   and.eq %r0 %r0  -> %r0 
vs
0x00000540:  00000000   <invalid: errcode 0>

# echo ' 0x00 0x00 0x00 0x00 ' | /usr/bin/llvm-mc -arch arm --disassemble
        andeq   r0, r0, r0

That looks like a capstone bug.

Some of the others DR isn't complaining when maybe it should:

+0x0000   464c457f   strb.mi %r4[1byte] %r12 %pc ror 10 -> (%r12)[1byte] %r12 

# echo ' 0x7f 0x45 0x4c 0x46 ' | /usr/bin/llvm-mc -arch arm --disassemble
<stdin>:1:2: warning: invalid instruction encoding

0x00000000:  464c457f   <INVALID: errcode 0>

Manual says: "if t == 15 || m == 15 then UNPREDICTABLE;"

*** DONE OP_blx reserved bits: in general, decision is to not bother to check reserved "(0)" bits
     CLOSED: [2015-04-14 Tue 18:45]
     - State "DONE"       from "TODO"       [2015-04-14 Tue 18:45]

> grep 0x000000ce /tmp/d1
0x000000ce:  4795 3c72   <INVALID: errcode 0> @
> grep '+0x00ce' /tmp/d2
+0x00ce   4795       blx    %r2 -> %lr 
# echo ' 0x95 0x47 ' | /usr/bin/llvm-mc -arch thumb --disassemble
<stdin>:1:2: warning: invalid instruction encoding
(gdb) x/4i 0x04311000
   0x4311000:                   ; <UNDEFINED> instruction: 0x4795

Once again it's the bits marked "(0)".

Yet if I do this:

diff --git a/suite/tests/common/allasm_thumb.asm b/suite/tests/common/allasm_thumb.asm
index 83e40d9..9001fea 100644
--- a/suite/tests/common/allasm_thumb.asm
+++ b/suite/tests/common/allasm_thumb.asm
@@ -82,7 +82,8 @@ separate_bb:
         mov      r1, #13           // sizeof(hello)
         adr      r2, _print_pop_pc
         add      r2, r2, #1        // keep it Thumb
-        blx      r2
+        .short   0x4795
+//        blx      r2

It runs fine on my chromebook w/o no SIGILL: behaves as a blx.  So I think
we're fine in treating it as not invalid.  These reserved bits "(0)" are a
gray area and are really for future changes.

Another one w/ reserved bits.  This has 11:8 as "(0)":
0x00000114:   f82e b1c7  strh   %r11[2byte] -> (%lr,%r7)[2byte] 
0x00000114:  f82e b1c7   <INVALID: errcode 0> @

*** TODO OP_stm_priv w/ writeback not considered invalid by other decoders

0x00000058:  19e0d267   stmibne r0!, {r0, r1, r2, r5, r6, r9, r12, lr, pc} ^ 
    0x00000058:   19e0d267   <INVALID>

0x00000110:  79620acf   stmdbvc r2!, {r0, r1, r2, r3, r6, r7, r9, r11} ^ 
    0x00000110:   79620acf   <INVALID>

0x00000560:  09f35863   ldmibeq r3!, {r0, r1, r5, r6, r11, r12, lr} ^ 
    0x00000560:   09f35863   <INVALID>

See above "privileged stm writeback": supposed to be illegal.

# echo ' 0x67 0xd2 0xe0 0x19 ' | /usr/bin/llvm-mc -arch arm --disassemble
        stmibne r0!, {r0, r1, r2, r5, r6, r9, r12, lr, pc} ^

gdb also thinks it's ok (based on up above).

Strange that these other decoders, which are usually sticklers for
"unpredictable" when it doesn't matter as much, consider this one to be ok
when this one actually raises SIGILL.

*** TODO OP_sadd16, etc.: many opcodes with 11:8 as "(1)" when 0 not considered invalid by capstone

capstone:
  0x000020e8:  2610ba19   sadd16hs  r11, r0, r9 
DR:
  0x000020e8:   2610ba19   <INVALID>
llvm:
# echo ' 0x19 0xba 0x10 0x26 ' | /usr/bin/llvm-mc -arch arm --disassemble
<stdin>:1:2: warning: potentially undefined instruction encoding
 0x19 0xba 0x10 0x26 
 ^
        sadd16hs        r11, r0, r9

11:8 is all "(1)" in the manual.

Similar:
0x0000263c:  46350894   shadd8mi    r0, r5, r4 
    0x0000263c:   46350894   <INVALID>

0x00004070:  c61c0655   ssaxgt  r0, r12, r5 
    0x00004070:   c61c0655   <INVALID>

0x000068bc:  5650a758   usaxpl  r10, r0, r8 
    0x000068bc:   5650a758   <INVALID>

0x00008dcc:  567b2d90   uhadd8pl    r2, r11, r0 
    0x00008dcc:   567b2d90   <INVALID>

0x00008e30:  d625f270   qsub16le    pc, r5, r0 
    0x00008e30:   d625f270   <INVALID>

0x0000a714:  a62c0c3a   qasxge  r0, r12, r10 
    0x0000a714:   a62c0c3a   <INVALID>

0x0000b51c:  c63e3950   shsaxgt r3, lr, r0 
    0x0000b51c:   c63e3950   <INVALID>

0x0000bb80:  a664089f   uqadd8ge    r0, r4, pc 
    0x0000bb80:   a664089f   <INVALID>

*** TODO OP_strd w/ odd-numbered 1st reg not considered invalid by capstone

0x00007a94:  a0c031f5   strdge  r3, r4, [r0], #0x15 
    0x00007a94:   a0c031f5   <INVALID>

if Rt<0> == `1' then UNPREDICTABLE;
The first source register. For an A32 instruction, <Rt> must be
even-numbered and not R14.

Yet another surprising case of capstone not having this "unpredictable"
behavior encoded while we do.

*** TODO OP_ldm w/ sp in list not considered invalid by capstone

manual says: "The SP cannot be in the list."

0x00000c20:  e895 60b5   ldm.w  r5, {r0, r2, r4, r5, r7, sp, lr} 
    0x00000c20:   e895 60b5  <INVALID>

*** TODO OP_ssat with "(0)" as 1 not considered invalid by capstone

0x00001f36:  f700 7e89   ssat   lr, #0xa, r0, lsl #0x1e 
    0x00001f36:   f700 7e89  <INVALID>

In manual, ssat should be f3, not f7: A10 is "(0)".

(gdb) x/4i 0x04311000
   0x4311000:                   ; <UNDEFINED> instruction: 0xf7007e89
(gdb) x/2hx 0x04311000
0x4311000:      0xf700  0x7e89

It doesn't raise SIGILL though.

Similar:
0x0000f0d4:  f789 2075   usat   r0, #0x15, r9, lsl #9 
    0x0000f0d4:   f789 2075  <INVALID>

0x00028dae:  f74c 10c3   sbfx   r0, r12, #7, #4 
    0x00028dae:   f74c 10c3  <INVALID>

0x000540a6:  f7c7 4098   ubfx   r0, r7, #0x12, #0x19 
    0x000540a6:   f7c7 4098  <INVALID>

*** TODO OP_vtbl, OP_vld, OP_vst past d31 not considered invalid by capstone == capstone bug

You can see capstone with an overflow here likely beyond the dNN string
name array into other register names:

0x00011c5a:  ffbf fa8d   vtbl.8 d15, {d31, fpinst2, mvfr0}, d13 
    0x00011c5a:   ffbf fa8d  <INVALID>

if n+length > 32 then UNPREDICTABLE;


Similarly (look at the wraparound here):
0x0003e7b0:  f940 a159   vst4eq.16  {d26, d28, d30, d0}, [r0:0x40], r9 
    0x0003e7b0:   f940 a159  <INVALID>

if n == 15 || d4 > 31 then UNPREDICTABLE;

Similar:
0x000591ca:  f967 f100   vld4.8 {d31, d1, d3, d5}, [r7], r0 
    0x000591ca:   f967 f100  <INVALID>

0x0005e358:  f964 f235   vld1.8 {d31, fpinst2, mvfr0, mvfr1}, [r4:0x100], r5 
    0x0005e358:   f964 f235  <INVALID>

*** TODO OP_rbit with the two Rm not matching not checked by capstone

0x000b23d4:  fa93 f0af   rbit   r0, pc 
    0x000b23d4:   fa93 f0af  <INVALID>

if d == 15 || m == 15 then UNPREDICTABLE; // ARMv8-A removes UNPREDICTABLE for R13
if !Consistent(Rm) then UNPREDICTABLE;

*** TODO OP_ldrsb pc with writeback not considered invalid by capstone

0x00051c32:  f91c fb7b   ldrsb  pc, [r12], #0x7b 
    0x00051c32:   f91c fb7b  <INVALID>

PUW=011
if (t == 15 && W == `1') || (wback && n == t) then UNPREDICTABLE;

@derekbruening
Copy link
Contributor Author

More:

*** DONE OP_mrs with "1" as 0 not considered invalid by capstone
    CLOSED: [2015-04-16 Thu 13:06]
    - State "DONE"       from "TODO"       [2015-04-16 Thu 13:06]

0x0001d618:  81030c01   mrshi   r0, apsr 
    +0x1d618:   81030c01   <INVALID>

# echo '0x01 0x0c 0x03 0x81' | /usr/bin/llvm-mc -arch arm -mcpu=cortex-a15 --disassemble  
<stdin>:1:1: warning: potentially undefined instruction encoding
0x01 0x0c 0x03 0x81
^
        mrshi   r0, apsr

Bit 21 has to be 1: there are no parens.

*** DONE OP_vmsr with "(0)" as 1 not considered invalid by capstone
    CLOSED: [2015-04-16 Thu 13:10]
    - State "DONE"       from "TODO"       [2015-04-16 Thu 13:10]

0x00093f0c:  8eea0a57   vmsrhi  fpinst2, r0 
    +0x93f0c:   8eea0a57   <INVALID>

7:0 is supposed to be all "(0)" except bit 4 which is a solid 1.

@derekbruening
Copy link
Contributor Author

xref "unp" == DECODE_UNPREDICTABLE which we are ignoring today

@derekbruening
Copy link
Contributor Author

Summarizing what apparently isn't clear up above: the plan is to stick with the current approach, which uses sub-tables to enforce a required "0" bit, uses the stored opcode mask to enforce a required "1" or a reserved "(1)" bit, and does not enforce a reserved "(0)" bit. This issue covers possibly adding a second mask for "(0)" but it is low priority as we have yet to see one that raises SIGILL.

@egrimley
Copy link
Contributor

It is not clear to me what the plan is here. If the decoder sees an instruction that is formally UNPREDICTABLE, but likely to read and write certain registers, then it has to decode it in a way that will allow it to be correctly mangled. So how does it do that?

@derekbruening
Copy link
Contributor Author

I don't understand the question. These encodings have clear operands and are straightforward to decode and mangle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants