Results of derived constraints #15

mrphrazer · 2015-11-05T06:12:39Z

Hi,

I assume x86_64 architecture and code similar to this one:

int bar(int a, int b)
{
  <arithmetic operations in dependence on a and b>
 if (d == 100) // d depends on a and b
   return 1;

return 0;
}

My goal is to retrieve the inputs for a and b, stored in RDI and RSI such that 1 will be returned.

My code for that looks similar to

import angr
b = angr.Project(<binary>)
s = b.factory.blank_state(addr = <function start>)

pg = b.factory.path_group(s, immutable=False)
pg.explore(find=<address of return 1>)

s = pg.found[0].state

print s.se.any_int(s.regs.rdi)
print s.se.any_int(s.regs.rsi)

The returned values of RDI and RSI do not work, in the most cases. Therefore, should this approach work similary as described above or do I misunderstand something?

The text was updated successfully, but these errors were encountered:

ltfish · 2015-11-05T07:10:55Z

s = pg.found[0].state

print s.se.any_int(s.regs.rdi)
print s.se.any_int(s.regs.rsi)

Here s is the final state, not the initial state at the entrance of this function. Values in RDI and RSI might not be the same as those in initial state since RDI and RSI might be used during the (symbolic) execution of this function. You want to call .regs.rdi and .regs.rsi on your initial state s, which is overwritten later by statement s = pg.found[0].state.

mrphrazer · 2015-11-05T17:53:03Z

Hi,

thanks for the quick response! Ok, lets change it to

initial = b.factory.blank_state(addr=start_address)
[...]
found_state = pg.found[0].state

If I get you right, then initial.se.any_int(found_state.regs.rdi) should work? Well, neither this nor, the other way round, found_state.se.any_int(initial.regs.rdi) work as expected.

Perhaps I misunderstand the logic, but I expect to derive a set of constraints that depend on the input.

For instance, assume the following pseucode:

x = 2 *  RDI_init;
y = x + 7
RDI = 5 + y

if (RDI == 100) goto A else B
[...]

Assume I want to reach C. Then, I expect the constraints to look as 2 * RDI_init + 7 + 5 == 100 for the first block. In the end, if the set of path constraints is satisfiable, RDI_init can be derived.

Or do I have to declare RDI explicitly as symbolic or keep track of the changes otherwise to retrieve a concrete input?

Thanks in advance! :)

ltfish · 2015-11-05T18:07:54Z

I think found_state.se.any_int(initial_regs.rdi) should just work. It will give you one solution from all possible solutions.

mrphrazer · 2015-11-05T20:54:21Z

found_state.se.any_int(initial_regs.rdi) and similar expressions return 0, every time. The only combination that works is using found_state registers for found_state.se. But as you stated, this won't work in the most (of my) test cases.

For instance, this C code and the corresponding assembly code. Then, applying the script below, the initial regs will be zero. Those values don't satisfy the path to the return 1.

import angr
b = angr.Project("./test")
initial_state = b.factory.blank_state(addr=0x400546)
pg = b.factory.path_group(initial_state, immutable=False)
pg.explore(find=0x400587)
found_state = pg.found[0].state

# does not work
print found_state.se.any_int(initial_state.regs.rdi)
print found_state.se.any_int(initial_state.regs.rsi)

# works for this case
print found_state.se.any_int(found_state.regs.rdi)
print found_state.se.any_int(found_state.regs.rsi)

zardus · 2015-11-05T21:58:54Z

I think the confusion stems from the fact that state.regs.rdi gets you
the value at rdi, not some semantic representation of rdi itself. So, in
your example, your initial state's rdi has nothing at all to do with the
final result of rdi. angr is telling you that the initial value of rdi can
be anything, which is true since that initial value is clobbered at
0x40054a.

Yan
On Nov 5, 2015 12:54 PM, "mrphrazer" notifications@github.com wrote:

found_state.se.any_int(initial_regs.rdi) and similar expressions return
0, every time. The only combination that works is using found_state
registers for found_state.se. But as you stated, this won't work in the
most (of my) test cases.

For instance, this
https://gist.github.com/mrphrazer/07fc64397e2614d88be3 C code and the
corresponding assembly code
https://gist.github.com/mrphrazer/b18dc1dd990d89dd4e19. Then, applying
the script below, the initial regs will be zero. Those values don't satisfy
the path to the return 1.

import angr
b = angr.Project("./test")
initial_state = b.factory.blank_state(addr=0x400546)
pg = b.factory.path_group(initial_state, immutable=False)
pg.explore(find=0x400587)
found_state = pg.found[0].state

does not workprint found_state.se.any_int(initial_state.regs.rdi)print found_state.se.any_int(initial_state.regs.rsi)

works for this caseprint found_state.se.any_int(found_state.regs.rdi)print found_state.se.any_int(found_state.regs.rsi)

—
Reply to this email directly or view it on GitHub
#15 (comment).

zardus · 2015-11-05T21:59:39Z

And I just noticed that I basically repeated what Fish said :-)
On Nov 5, 2015 1:58 PM, "Yan" zardus@gmail.com wrote:

I think the confusion stems from the fact that state.regs.rdi gets you
the value at rdi, not some semantic representation of rdi itself. So, in
your example, your initial state's rdi has nothing at all to do with the
final result of rdi. angr is telling you that the initial value of rdi can
be anything, which is true since that initial value is clobbered at
0x40054a.

Yan
On Nov 5, 2015 12:54 PM, "mrphrazer" notifications@github.com wrote:

found_state.se.any_int(initial_regs.rdi) and similar expressions return
0, every time. The only combination that works is using found_state
registers for found_state.se. But as you stated, this won't work in the
most (of my) test cases.

For instance, this
https://gist.github.com/mrphrazer/07fc64397e2614d88be3 C code and the
corresponding assembly code
https://gist.github.com/mrphrazer/b18dc1dd990d89dd4e19. Then, applying
the script below, the initial regs will be zero. Those values don't satisfy
the path to the return 1.

import angr
b = angr.Project("./test")
initial_state = b.factory.blank_state(addr=0x400546)
pg = b.factory.path_group(initial_state, immutable=False)
pg.explore(find=0x400587)
found_state = pg.found[0].state

does not workprint found_state.se.any_int(initial_state.regs.rdi)print found_state.se.any_int(initial_state.regs.rsi)

works for this caseprint found_state.se.any_int(found_state.regs.rdi)print found_state.se.any_int(found_state.regs.rsi)

—
Reply to this email directly or view it on GitHub
#15 (comment).

ltfish · 2015-11-05T22:53:34Z

@mrphrazer Can I get a copy of the binary that you are working on? I'll write an angr script for you.

@zardus When are you coming?

mrphrazer · 2015-11-05T23:15:23Z

Oh awesome guys, thanks for your help! Indeed, I expected different behaviour which caused the confusion. Since the path is satisfiable, there exist valid inputs. I should restate the questions how to parse those exactly.

So, seeing a working example will be help for sure. Here is the binary. Thanks, again! :)

ltfish · 2015-11-06T00:03:34Z

Here it is:

import angr

num_0 = None
num_1 = None

def hook_atoi(state):
    if state.se.is_true(state.ip == 0x4005b2):
        state.regs.rax = state.se.ZeroExt(32, num_0)
    elif state.se.is_true(state.ip == 0x4005c8):
        state.regs.rax = state.se.ZeroExt(32, num_1)
    else:
        # Should never get here
        raise Exception('Oops')

def main():
    p = angr.Project("test_binary", load_options={'auto_load_libs': False})
    s = p.factory.blank_state(addr=0x400595)

    global num_0, num_1

    num_0 = s.se.BVS('num_0', 32)
    num_1 = s.se.BVS('num_1', 32)

    # Set two arguments by hooking atoi and returning the corresponding variable
    p.hook(0x4005b2, hook_atoi, length=5)
    p.hook(0x4005c8, hook_atoi, length=5)

    pg = p.factory.path_group(s, immutable=False)
    pg.explore(find=0x4005ed)

    # Now we should have one path found
    assert len(pg.found) == 1
    # There are many solutions... we take an arbitrary one from them, and 
    # further constrain num_0 to be that single solution
    num_0_ints = pg.found[0].state.se.any_n_int(num_0, 20)
    print "num_0 has more than %d solutions" % len(num_0_ints)
    num_0_int = num_0_ints[0]

    # With out constraining, there are many possible solutions to num_1
    num_1_ints = pg.found[0].state.se.any_n_int(num_1, 20)
    print "num_1 has more than %d solutions" % len(num_1_ints)
    # Here is how you can further constrain it
    copied_state = pg.found[0].state.copy()
    copied_state.add_constraints(num_0 == num_0_int)
    num_1_ints = copied_state.se.any_n_int(num_1, 20)
    print "After constraining variable num_0, there are %d solutions to num_1" % len(num_1_ints)
    print "Try running ./test_binary %d %d!" % (num_0_int, num_1_ints[0])

if __name__ == "__main__":
    main()

And there is the output on my machine:

num_0 has more than 20 solutions
num_1 has more than 20 solutions
After constraining variable num_0, there are 1 solutions to num_1
Try running ./test_binary 1 2306412234!

Some words:

Because I'm running an internal branch of angr, you might want to replace s.se.BVS with s.se.BV to make this script work on the current public release version of angr. In the next release, it will be s.se.BVS.
Our support to atoi is very limited for now. There is a better version in development by @salls, but it didn't work with this example (I don't know why). We'll look into it further. Generally, supporting strings is difficult :-( If you are just interested in having a toy example to run, just set those variables by hooking atoi functions should suffice.
Everything else is pretty straightforward and well-commented. Let me know if anything is not clear for you.

mrphrazer · 2015-11-06T11:37:44Z

@ltfish Really nice, thank you very much! I got the idea and were able to apply this successfully to the other test cases :)

mrphrazer closed this as completed Nov 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results of derived constraints #15

Results of derived constraints #15

mrphrazer commented Nov 5, 2015

ltfish commented Nov 5, 2015

mrphrazer commented Nov 5, 2015

ltfish commented Nov 5, 2015

mrphrazer commented Nov 5, 2015

zardus commented Nov 5, 2015

does not workprint found_state.se.any_int(initial_state.regs.rdi)print found_state.se.any_int(initial_state.regs.rsi)

works for this caseprint found_state.se.any_int(found_state.regs.rdi)print found_state.se.any_int(found_state.regs.rsi)

zardus commented Nov 5, 2015

does not workprint found_state.se.any_int(initial_state.regs.rdi)print found_state.se.any_int(initial_state.regs.rsi)

works for this caseprint found_state.se.any_int(found_state.regs.rdi)print found_state.se.any_int(found_state.regs.rsi)

ltfish commented Nov 5, 2015

mrphrazer commented Nov 5, 2015

ltfish commented Nov 6, 2015

mrphrazer commented Nov 6, 2015

Results of derived constraints #15

Results of derived constraints #15

Comments

mrphrazer commented Nov 5, 2015

ltfish commented Nov 5, 2015

mrphrazer commented Nov 5, 2015

ltfish commented Nov 5, 2015

mrphrazer commented Nov 5, 2015

zardus commented Nov 5, 2015

does not workprint found_state.se.any_int(initial_state.regs.rdi)print found_state.se.any_int(initial_state.regs.rsi)

works for this caseprint found_state.se.any_int(found_state.regs.rdi)print found_state.se.any_int(found_state.regs.rsi)

zardus commented Nov 5, 2015

does not workprint found_state.se.any_int(initial_state.regs.rdi)print found_state.se.any_int(initial_state.regs.rsi)

works for this caseprint found_state.se.any_int(found_state.regs.rdi)print found_state.se.any_int(found_state.regs.rsi)

ltfish commented Nov 5, 2015

mrphrazer commented Nov 5, 2015

ltfish commented Nov 6, 2015

mrphrazer commented Nov 6, 2015