Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dumping BIL of raw x86-code binary data doesn't work #801

Closed
d0c-s4vage opened this issue Mar 22, 2018 · 2 comments · Fixed by #807
Closed

Dumping BIL of raw x86-code binary data doesn't work #801

d0c-s4vage opened this issue Mar 22, 2018 · 2 comments · Fixed by #807
Assignees

Comments

@d0c-s4vage
Copy link

Either I'm profoundly misunderstanding how to use bap, or it's not working as the documentation describes:

cat <<-EOF > /tmp/test.asm
xor eax,eax
inc eax
mov ecx,eax
EOF
nasm /tmp/test.asm -o /tmp/test
bap /tmp/test -d bil --source-type x86-code --verbose

The final bap command above exits without printing the bil for the bytecode assembled with nasm.

To verify that /tmp/test.asm was assembled correctly with nasm:

bap@a21bc7ec5783:~$ objdump -D -b binary -m i386 /tmp/test

/tmp/test:     file format binary


Disassembly of section .data:

00000000 <.data>:
   0:   66 31 c0                xor    %ax,%ax
   3:   66 40                   inc    %ax
   5:   66 89 c1                mov    %ax,%cx
@ivg
Copy link
Member

ivg commented Mar 22, 2018

The problem is that bap doesn't find any functions in that binary (because there are none). The use case of bap is to analyze real programs, that usually have functions.

If you would like to disassemble small chunks of code, you can use bap-mc, e.g.,

echo 66 31 c0 66 40 66 89 c1 | bap-mc --arch=x86 --show-insn=asm
xorw %ax, %ax
incw %ax
movw %ax, %cx

or, for BIL

$ echo 66 31 c0 66 40 66 89 c1 | bap-mc --arch=x86 --show-bil
{
  EAX := extract:31:16[EAX].0
  AF := unknown[AF is undefined after xor]:u1
  ZF := 1
  PF := 1
  OF := 0
  CF := 0
  SF := 0
}
{
  v1 := low:16[low:32[EAX]]
  EAX := extract:31:16[EAX].low:16[low:32[EAX]] + 1
  OF := ~high:1[v1] & (high:1[v1] ^ high:1[low:16[low:32[EAX]]])
  AF := 0x10 = (0x10 & (low:16[low:32[EAX]] ^ v1 ^ 1))
  PF := ~low:1[let v2 = low:16[low:32[EAX]] >> 4 ^ low:16[low:32[EAX]] in
    let v2 = v2 >> 2 ^ v2 in
    v2 >> 1 ^ v2]
  SF := high:1[low:16[low:32[EAX]]]
  ZF := 0 = low:16[low:32[EAX]]
}
{
  ECX := high:16[ECX].low:16[EAX]
}

If you really need to use bap on such file, I would suggest you to provide manually information about function starts, e.g.,

cat > start << EOF                                              
(_start 0 1)
EOF

$ bap ./test -d --source-type=x86-code --read-symbols-from start 
00000013: program
00000012: sub _start()
00000002: 
00000003: EAX := extract:31:16[EAX].0
0000000b: EAX := extract:31:16[EAX].low:16[low:32[EAX]] + 1
00000011: ECX := high:16[ECX].low:16[EAX]

In any case, our disassembler used to default to the first available byte in case if no functions starts were provided, it looks like from this issue, that it is no longer true. I believe that's it is a sane default for such corner cases. @gitoleg, can you please restore this behavior?

@d0c-s4vage
Copy link
Author

Thank you for the reply and examples! That makes sense now.

gitoleg added a commit to gitoleg/bap that referenced this issue Mar 28, 2018
fix BinaryAnalysisPlatform#801

This PR fixes a problem with a pure code: as there are
not any functions in there, bap just doesn't output
anything.

So, what we do here is not about only those files, but
about reconstruction at all: we consider all blocks in
reconstructor that don't have input edges as functions
starts. And it make sence, e.g. in case of libraries,
where there is no guarantee that every function will be
called by some other function from this library.
gitoleg added a commit to gitoleg/bap that referenced this issue Mar 28, 2018
fixes BinaryAnalysisPlatform#801

This PR fixes a problem with a disassembling of pure code:
as there are not any functions in there, bap just doesn't
output anything.

So, what we do here is not about only those files, but
about reconstruction at all: we consider all blocks in
reconstructor that don't have input edges as functions
starts. And it make sence, e.g. in case of libraries:
there is no guarantee that every function will be
called by some other function from this library.
@ivg ivg closed this as completed in #807 Mar 29, 2018
ivg pushed a commit that referenced this issue Mar 29, 2018
fixes #801

This PR fixes a problem with a disassembling of pure code:
as there are not any functions in there, bap just doesn't
output anything.

So, what we do here is not about only those files, but
about reconstruction at all: we consider all blocks in
reconstructor that don't have input edges as functions
starts. And it make sence, e.g. in case of libraries:
there is no guarantee that every function will be
called by some other function from this library.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants