New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for decompilation of 64-bit x86 files? #9

Open
Manouchehri opened this Issue Dec 13, 2017 · 9 comments

Comments

Projects
None yet
6 participants
@Manouchehri
Contributor

Manouchehri commented Dec 13, 2017

Since this question is going to get asked sooner or later, might as well ask it now: what needs to implemented for 64-bit support?

llvmir2hll was able to decompile a simple hello world LLVM IR file compiled by a 64-bit Linux and macOS host. I didn't test anything complex or McSema (which someone should definitely try and let us know!).

Is bin2llvmir the main road block?

@PeterMatula

This comment has been minimized.

Show comment
Hide comment
@PeterMatula

PeterMatula Dec 13, 2017

Member

Hi,
I actually already hacked my local RetDec and tried a full x86-64 bit decompilation of a simple ack.c program (the one from retdec-regression-tests). It did pass the whole chain all the way to C. It even looked somehow OK-ish. The biggest problem was that x86-64 is using a different calling conventions that we currently do not handle at all. Also, even if this was not an issue, we still would not just add the support without at least a few regression tests.

When adding a new architecture, most of the work needs to be done in capstone2llvmir. This library already supports 64-bit for x86, mips, and ppc. However, other changes, like in this case, might be required to get some reasonable results.

So, what needs to be done to enable x86-64:

  1. Disable RetDec's defences agains unsupported formats -- currently x86-64 won't even get through scripts.
  2. Create x86 decoder set to 64-bit mode in decoder.
  3. Add support for x86-64 calling conventions.
  4. Test it a bit. If some other major problem occur, modify other parts.
  5. Write some regression tests.

We would like to get to it pretty soon, but before that, some refactorizations are in order:

  1. Refactor capstone2llvmir -- v1.0 is kind of a prototype.
  2. Refactor decoder pass in bin2llvmir and merge it with control-flow pass -- this is a relic from before capstone2llvmir that was just forced to use it. It does not make sense to have these two parts separated. Once this is done, all decompilation results should be much better and adding x86-64 much easier.
Member

PeterMatula commented Dec 13, 2017

Hi,
I actually already hacked my local RetDec and tried a full x86-64 bit decompilation of a simple ack.c program (the one from retdec-regression-tests). It did pass the whole chain all the way to C. It even looked somehow OK-ish. The biggest problem was that x86-64 is using a different calling conventions that we currently do not handle at all. Also, even if this was not an issue, we still would not just add the support without at least a few regression tests.

When adding a new architecture, most of the work needs to be done in capstone2llvmir. This library already supports 64-bit for x86, mips, and ppc. However, other changes, like in this case, might be required to get some reasonable results.

So, what needs to be done to enable x86-64:

  1. Disable RetDec's defences agains unsupported formats -- currently x86-64 won't even get through scripts.
  2. Create x86 decoder set to 64-bit mode in decoder.
  3. Add support for x86-64 calling conventions.
  4. Test it a bit. If some other major problem occur, modify other parts.
  5. Write some regression tests.

We would like to get to it pretty soon, but before that, some refactorizations are in order:

  1. Refactor capstone2llvmir -- v1.0 is kind of a prototype.
  2. Refactor decoder pass in bin2llvmir and merge it with control-flow pass -- this is a relic from before capstone2llvmir that was just forced to use it. It does not make sense to have these two parts separated. Once this is done, all decompilation results should be much better and adding x86-64 much easier.

@s3rvac s3rvac changed the title from 64-bit Support? to Support for decompilation of 64-bit files? Dec 13, 2017

@s3rvac s3rvac added the enhancement label Dec 14, 2017

@breznak breznak referenced this issue Dec 14, 2017

Closed

Added Dockerfile #3

@PeterMatula PeterMatula self-assigned this Dec 14, 2017

@PeterMatula PeterMatula added new-feature and removed enhancement labels Dec 19, 2017

@PeterMatula

This comment has been minimized.

Show comment
Hide comment
@PeterMatula

PeterMatula Jan 31, 2018

Member

I have been asked what changes did I make in order to try x86-64 decompilation. Instead of listing them somewhere, I decided to create a branch where it is enabled. So if anyone wants to play with it, here you go. I was able to decompile the simplest hello world program (hello-x86_64.zip). You can see in the hello.c.frontend.dsm file that 64-bit instructions were indeed created. I did not try anything more complex. There is a good chance it would not work.

Keep in mind, that everything I wrote above still holds. This does not mean RetDec supports x86-64, or anything like that. All I did was let these files go through scripts into decompilation (commit). Much more work will be needed to properly support this.

Please do not report any issues related to this. But if while you are playing with it you fix/improve something, feel free to contribute.

Member

PeterMatula commented Jan 31, 2018

I have been asked what changes did I make in order to try x86-64 decompilation. Instead of listing them somewhere, I decided to create a branch where it is enabled. So if anyone wants to play with it, here you go. I was able to decompile the simplest hello world program (hello-x86_64.zip). You can see in the hello.c.frontend.dsm file that 64-bit instructions were indeed created. I did not try anything more complex. There is a good chance it would not work.

Keep in mind, that everything I wrote above still holds. This does not mean RetDec supports x86-64, or anything like that. All I did was let these files go through scripts into decompilation (commit). Much more work will be needed to properly support this.

Please do not report any issues related to this. But if while you are playing with it you fix/improve something, feel free to contribute.

@Mcilie

This comment has been minimized.

Show comment
Hide comment
@Mcilie

Mcilie Apr 28, 2018

Guys hows it coming along? when do you think it will be done?

Mcilie commented Apr 28, 2018

Guys hows it coming along? when do you think it will be done?

@PeterMatula

This comment has been minimized.

Show comment
Hide comment
@PeterMatula

PeterMatula Apr 30, 2018

Member

@Mcilie I'm spending more time than I thought on #116, so this did not really moved much.

Member

PeterMatula commented Apr 30, 2018

@Mcilie I'm spending more time than I thought on #116, so this did not really moved much.

@bannsec

This comment has been minimized.

Show comment
Hide comment
@bannsec

bannsec Jul 8, 2018

+1 on this. Given the majority of binaries are 64-bit now (especially linux elfs), not having 64bit decompilation support is a major issue for usability.

bannsec commented Jul 8, 2018

+1 on this. Given the majority of binaries are 64-bit now (especially linux elfs), not having 64bit decompilation support is a major issue for usability.

@PeterMatula PeterMatula changed the title from Support for decompilation of 64-bit files? to Support for decompilation of 64-bit x86 files? Sep 12, 2018

@PeterMatula

This comment has been minimized.

Show comment
Hide comment
@PeterMatula

PeterMatula Sep 12, 2018

Member

I changed this issue to be specific to 64-bit x86 (x64), since this architecture was mainly discussed here, and it is better to have this separated from issues dealing with 64-bit support of other architectures (e.g. #268).

Member

PeterMatula commented Sep 12, 2018

I changed this issue to be specific to 64-bit x86 (x64), since this architecture was mainly discussed here, and it is better to have this separated from issues dealing with 64-bit support of other architectures (e.g. #268).

@PeterMatula PeterMatula added this to the x64 support milestone Sep 12, 2018

@PeterMatula

This comment has been minimized.

Show comment
Hide comment
@PeterMatula

PeterMatula Sep 12, 2018

Member

This is being worked on by one student as his bachelor thesis - see milestone and the referenced forked repository.

Member

PeterMatula commented Sep 12, 2018

This is being worked on by one student as his bachelor thesis - see milestone and the referenced forked repository.

@jonahharris

This comment has been minimized.

Show comment
Hide comment
@jonahharris

jonahharris Sep 17, 2018

@PeterMatula I've updated the Python-based replacement of the shell script with similar changes and tested it. I haven't had any x86-64 issues with the decompiler (yet). As it's been a long time since your x86_64-enabled branch was updated, it was a huge merge with master + the Python change and I opted to make a different branch you could pull the changes yourself in x86-64-support

jonahharris commented Sep 17, 2018

@PeterMatula I've updated the Python-based replacement of the shell script with similar changes and tested it. I haven't had any x86-64 issues with the decompiler (yet). As it's been a long time since your x86_64-enabled branch was updated, it was a huge merge with master + the Python change and I opted to make a different branch you could pull the changes yourself in x86-64-support

@PeterMatula

This comment has been minimized.

Show comment
Hide comment
@PeterMatula

PeterMatula Sep 17, 2018

Member

@jonahharris This will let x64 files go through, and RetDec will probably generate some output, but as I wrote above, the output is not very good. What the mentioned student is doing at the moment:

  • Adding ABI specifications for supported architectures and possible ABIs (including x64 ABIs).
  • Rewriting analysis of functions' parameters and returns to uniformly use these specifications.
  • Adding support for x64 in other architecture-specific parts like static code detection, main detection, etc.
  • Maybe even adding basic support for extended instruction sets. This is a topic for other bachelor thesis, but XMM registers might be used to pass arguments on x64, so this is needed for this topic as well.
  • Adding tests for all this.

The thesis is due at the end of summer semester 2019 (may), but i hope we will merge parts of it much sooner - and enable an experimental support for x64.

Member

PeterMatula commented Sep 17, 2018

@jonahharris This will let x64 files go through, and RetDec will probably generate some output, but as I wrote above, the output is not very good. What the mentioned student is doing at the moment:

  • Adding ABI specifications for supported architectures and possible ABIs (including x64 ABIs).
  • Rewriting analysis of functions' parameters and returns to uniformly use these specifications.
  • Adding support for x64 in other architecture-specific parts like static code detection, main detection, etc.
  • Maybe even adding basic support for extended instruction sets. This is a topic for other bachelor thesis, but XMM registers might be used to pass arguments on x64, so this is needed for this topic as well.
  • Adding tests for all this.

The thesis is due at the end of summer semester 2019 (may), but i hope we will merge parts of it much sooner - and enable an experimental support for x64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment