Skip to content
This repository has been archived by the owner on Apr 28, 2023. It is now read-only.

IRSB decode error for self-modifying code? #26

Closed
patrafter1999 opened this issue Jan 8, 2016 · 5 comments
Closed

IRSB decode error for self-modifying code? #26

patrafter1999 opened this issue Jan 8, 2016 · 5 comments

Comments

@patrafter1999
Copy link

Hi Guys,

I am pretty new to angr. I think it's really cool. I wrote some basic code for testing a shellcode. The source is as follows:

import angr

bp = 0x401010

def check(path):
    if path.state.ip.args[0] == bp:
        return True
    else:
        return False

b = angr.Project('shellcode.exe')
state = b.factory.entry_state()
pg = b.factory.path_group(state)

pg.explore(find=check)
found = pg.found[0]

print len(pg.found)

The shellocode disassembly looks like this:

.text:00401000 start           proc near
.text:00401000                 jmp     short loc_401012
.text:00401000 start           endp
.text:00401000
.text:00401002
.text:00401002 ; =============== S U B R O U T I N E =======================================
.text:00401002
.text:00401002 ; Attributes: noreturn
.text:00401002
.text:00401002 sub_401002      proc near               ; CODE XREF: sub_401002:loc_401012�p
.text:00401002                 pop     ebx
.text:00401003                 dec     ebx
.text:00401004                 xor     ecx, ecx
.text:00401006                 mov     cx, 296h
.text:0040100A
.text:0040100A loc_40100A:                             ; CODE XREF: sub_401002+C�j
.text:0040100A                 xor     byte ptr [ebx+ecx], 9Ch
.text:0040100E                 loop    loc_40100A
.text:00401010                 jmp     short loc_401017
.text:00401012 ; ---------------------------------------------------------------------------
.text:00401012
.text:00401012 loc_401012:                             ; CODE XREF: start�j
.text:00401012                 call    sub_401002
.text:00401017 ; ---------------------------------------------------------------------------
.text:00401017
.text:00401017 loc_401017:                             ; CODE XREF: sub_401002+E�j
.text:00401017                 pop     ds
.text:00401018                 js      short loc_401086
.text:0040101A                 lodsd
.text:0040101B                 push    ebp
.text:0040101C
.text:0040101C loc_40101C:                             ; CODE XREF: sub_401002+5B�j

The shellcode XORs the obfuscated block of code starting at 0x401017. My test angr script should be able to stop right before jumping into the deobfuscated code at 0x401010, which allows me to inspect deobfuscated code. But instead I've got the following error paths.

>> pg.errored
[<Errored Path with 667 runs (at 0x4010f8, AngrExitError)>, <Errored Path with 667 runs (at 0x401098, AngrExitError)>]

Since there are only a couple of direct jumps till the 0x401010, angr shouldn't attempt to parse the obfuscated block (that contains gibberish-looking code before deobfuscation). But it appears that's what angr is doing there. I might be wrong. See more error details below.

>> pg.errored[0].error
AngrExitError('IR decoding error at 0x4010f8. You can hook this instruction with a python replacement using project.hook(0x4010f8, your_function, length=length_of_instruction).',)
>> pg.errored[1].error
AngrExitError('Cannot create run following jumpkind Ijk_SigTRAP',)

Please find the shellcode in the zip (pw: infected). Any comment will be greatly appreciated.

shellcode.exe.zip

@patrafter1999
Copy link
Author

Above all, do you guys have any plan to open a forum to share knowledge? I find it very difficult to follow many different aspects of the symbolic execution. Besides it would be great to share some great techniques among researchers.

Much appreciated,

@rhelmot
Copy link
Member

rhelmot commented Jan 8, 2016

If you want angr to parse self-modifying code you need to initialize the project with support_selfmodifying_code=True.

@zardus
Copy link
Member

zardus commented Jan 8, 2016

On top of that, due to how angr works internally, your "check" function will only be called at the beginning of a basic block. The address you're looking for, 0x401010, isn't at the start of a basic block (according to VEX). You can see this by doing:

In [12]: project.factory.block(0x40100a).vex.pp()
IRSB {
   t0:Ity_I8 t1:Ity_I8 t2:Ity_I8 t3:Ity_I32 t4:Ity_I32 t5:Ity_I32 t6:Ity_I32 t7:Ity_I32 t8:Ity_I32 t9:Ity_I32 t10:Ity_I32 t11:Ity_I1 t12:Ity_I32 t13:Ity_I32

   00 | ------ IMark(0x40100a, 4, 0) ------
   01 | t6 = GET:I32(ecx)
   02 | t7 = GET:I32(ebx)
   03 | t4 = Add32(t7,t6)
   04 | t2 = LDle:I8(t4)
   05 | t0 = Xor8(t2,0x9c)
   06 | STle(t4) = t0
   07 | PUT(cc_op) = 0x0000000d
   08 | t8 = 8Uto32(t0)
   09 | PUT(cc_dep1) = t8
   10 | PUT(cc_dep2) = 0x00000000
   11 | PUT(cc_ndep) = 0x00000000
   12 | PUT(eip) = 0x0040100e
   13 | ------ IMark(0x40100e, 2, 0) ------
   14 | t9 = Sub32(t6,0x00000001)
   15 | PUT(ecx) = t9
   16 | t11 = CmpNE32(t9,0x00000000)
   17 | if (t11) { PUT(eip) = 0x40100a; Ijk_Boring }
   18 | ------ IMark(0x401010, 2, 0) ------
   NEXT: PUT(eip) = 0x00401017; Ijk_Boring
}

(if you want to learn more about VEX, check out https://github.com/angr/angr-doc/blob/master/ir.md)

There are two things you can do: break at 0x401017, which is the beginning of the basic block that it jumps to, or break at 0x40100a, which is the beginning of that basic block. Then the breakpoint, at least, should work.

If you really need to break at that exact instruction, SimuVEX breakpoints are more granular, and let you break at specific instructions or whenever any conditions are met (i.e., some specific address being written to). You can read more about that at https://github.com/angr/angr-doc/blob/master/simuvex.md#breakpoints

@zardus
Copy link
Member

zardus commented Jan 8, 2016

As for the forum, are you on #angr on freenode.net? That's the closest thing that we have at the moment...

@patrafter1999
Copy link
Author

Thanks heaps. I'm on freenode.net now. I will ask questions there from now on. salls already helped me on a couple of things. Knowing find callback gets invoked at the BBL level helps!

I'm trying to do some taint analysis aiming to identify the decryptor code and its associated encrypted block that gets decrypted. salls advised me to use 'TRACK_ACTION_HISTORY' for recording all taint info.

Thanks!

@zardus zardus closed this as completed Jun 4, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants