Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/ll2dot: R7, Produce CFGs from LLVM IR using the new Go library #97

Closed
mewmew opened this issue Jan 6, 2015 · 12 comments
Closed

cmd/ll2dot: R7, Produce CFGs from LLVM IR using the new Go library #97

mewmew opened this issue Jan 6, 2015 · 12 comments
Assignees
Projects
Milestone

Comments

@mewmew
Copy link
Member

mewmew commented Jan 6, 2015

The control flow graph generation tool is currently using the Go bindings for LLVM to generate CFGs from LLVM IR.

A pure Go library for interacting with LLVM IR will replace the Go bindings for LLVM in v0.2. This library is being developed at https://github.com/llir/llvm

@mewmew mewmew added the MUST label Jan 6, 2015
@mewmew mewmew self-assigned this Jan 6, 2015
@mewmew mewmew added this to the Meeting 13 milestone Jan 6, 2015
@mewmew
Copy link
Member Author

mewmew commented Feb 26, 2015

Marked as future ambition, closing for now. See the following comment for further information.

@mewmew mewmew closed this as completed Feb 26, 2015
@mewmew mewmew modified the milestones: v0.2, Meeting 13 May 31, 2015
@mewmew mewmew changed the title requirement: R7, Produce CFGs from LLVM IR basic blocks ll2dot: R7, Produce CFGs from LLVM IR basic blocks May 31, 2015
@mewmew mewmew changed the title ll2dot: R7, Produce CFGs from LLVM IR basic blocks ll2dot: R7, Produce CFGs from LLVM IR using the new Go library May 31, 2015
@mewmew mewmew reopened this May 31, 2015
@mewmew
Copy link
Member Author

mewmew commented May 25, 2016

This work is currently being tracked by the experimental ll2dot tool developed at https://github.com/decomp/exp/tree/master/cmd/ll2dot

@mewmew mewmew changed the title ll2dot: R7, Produce CFGs from LLVM IR using the new Go library cmd/ll2dot: R7, Produce CFGs from LLVM IR using the new Go library May 25, 2016
@mewmew
Copy link
Member Author

mewmew commented Jan 22, 2017

As of commit decomp/exp@3cfd64c, the ll2dot tool of the exp repository has become capable of producing control flow graphs in Graphviz DOT file format from LLVM IR assembly input. Only a subset of LLVM IR assembly is currently supported, and full support for parsing LLVM IR assembly is tracked by issue llir/llvm#15.

The ll2dot tool of the exp repository will be merged with the decomp repository within the next couple of months, after which there is only one tool remaining which makes use of the old bindings for LLVM IR; namely the ll2go tool. It will also be updated within the next couple of months, after which it should once again be fun to play with the decomp project as we've finally gotten rid of the C++ library dependency, which brought with it long compile times.

@mewmew
Copy link
Member Author

mewmew commented Feb 8, 2017

Status update as of 2017-02-07, a preliminary version of the ll2go tool which relies on the llir/llvm package has been implemented in the exp repository. Once the llir/llvm implementation matures a little, the experimental versions of the ll2dot and ll2go tools will be cleaned up and merged with the decomp repository.

@mewmew
Copy link
Member Author

mewmew commented Mar 18, 2017

The ll2dot tool has now reached maturity, and will be merged from the dev branch.

@mewmew mewmew closed this as completed Mar 18, 2017
@mewmew mewmew added this to Decompilation components in v0.1 Mar 23, 2017
@pfalcon
Copy link

pfalcon commented Mar 24, 2017

Once the llir/llvm implementation matures a little

Btw, a week ago I learned about 2 other pretty high-profile projects which waved LLIR bye-bye in preference of their own IRs: pfalcon/ScratchABlock@bf784f1

@mewmew
Copy link
Member Author

mewmew commented Mar 24, 2017

Hej @pfalcon!

Very interesting to read about B3, it could be a contender to LLVM IR for a decompilation pipeline. I know you've previously expressed an opinion against SSA-form as the process of translating into and out of it is rather involved.

Personally, I very much enjoy working on a SSA-form level of abstraction, as it enables very interested information retrieval, such as recovering the individual scope of local variables that end up sharing the same register in the assembly representation of the CPU architecture. Simply tracing the data flow of SSA-variables is enough to provide a clear separation between two independent local variables that have ended up sharing/reusing the same CPU register. Further more, it vastly simplifies the type analysis stage, as the decoupled register accesses may now be analysed individually.

Well, for now at least, those are ideas in my mind that I've discussed with my friend Daniel, and we both believe that SSA has a use in these situations. Future exploration will tell. However, type analysis is not planned until version 0.6 of the decompilation pipeline. Before that we will explore control flow analysis in v0.4, data flow analysis in v0.5. Version 0.2 is all about getting rid of the LLVM C++ dependency, and we are almost there! Once the dev branch is merged with the master branch, we are there! This should happen within the next two weeks or so. After that version 0.3 is focused on the general robustness of the decompilation pipeline, being able to handle corner cases without crashing :)

We will see how these preliminary plans change as development continues. In either case, playing with decompilation is tremendously fun and we rejoice in the experience!

I'm happy to see that you are making steady progress on ScratchABlock. What are the major stumbling blocks (if any) that you've come across in the last three months? I'd be very interested to know what issues you are or have been struggling with, as the easy parts are, well, easy.

@pfalcon
Copy link

pfalcon commented Mar 24, 2017

opinion against SSA-form as the process of translating into and out of it is rather involved

Well, the opinion was that: because SSA is a) rather involved to translate into/out of and b) it breaks "one-to-one" (well, at least "direct") correspondence between original code and decompilation intermediate code, my preference is not to call SSA in until all other options are explored. And I already find b) less relevant, because even without SSA, there're enough transformations which skew representation and its correspondence to the original code. And I already use poorman's SSA, of treating input registers to functions as 0-subscripted regs, which then get assigned to unsubscripted regs, as that's the prerequisite of doing proper propagation, stack var rewriting and preserveds tracking.

Personally, I very much enjoy working on a SSA-form

Did you write all the SSA code yourself? Can you explain how it works to 5-year old (not the trivialities, but all the dominance frontier cases and the need to perform graph coloring (==register allocation) to convert out of it)? I yet to meet a person who can do that. Otherwise, I too enjoy somebody else's SSA being crunched with somebody else's code on my CPU, but it has little to do with me actually...

What are the major stumbling blocks (if any) that you've come across in the last three months?

It's hard to bootstrap interprocedural analysis. Requires even more complex tools and even more complex usecases, no realworld cases work because they're too complex, etc.

@mewmew
Copy link
Member Author

mewmew commented Mar 24, 2017

Did you write all the SSA code yourself?

No, indeed I have not. I reckon it will take me quite some time to figure out exactly how to. Hopefully it will turn out to be a fun learning adventure, as many other parts have been so far. For me personally, that's the main reason to keep playing with these kind of projects, for fun. If it is fun, all else naturally follows.

It's hard to bootstrap interprocedural analysis.

Do you mean constant propagation between procedures, type analysis, or what kind of interprocedural analysis?

In decomp we are still very much working on the basic block and control flow recovery level, so these stumbling blocks are still far away.

@mewmew
Copy link
Member Author

mewmew commented Mar 24, 2017

Oh, and glad to hear you have revised your thoughts regarding SSA. It gives me confidence in exploring it more deeply.

@pfalcon
Copy link

pfalcon commented Mar 25, 2017

Do you mean constant propagation between procedures, type analysis, or what kind of interprocedural analysis?

Interprocedural dataflow analysis, as required to recover function arguments/returns. Which is required for proper intraprocedural analysis of a function. But of course, you need to do intraprocedural analysis first before you can even approach interprocedural analysis.

In decomp we are still very much working on the basic block and control flow recovery level, so these stumbling blocks are still far away.

Well, we discussed that when we met - that selecting a tool is what guides how much progress one can have, and how fast :-P ;-).

@pfalcon
Copy link

pfalcon commented Mar 25, 2017

Oh, and glad to hear you have revised your thoughts regarding SSA. It gives me confidence in exploring it more deeply.

Your SSA at its best: https://github.com/zneak/fcd-tests/blob/osx/output/1993-leo.c#L397 . My thoughts remain the same: people who did not write conversion into and out of SSA themselves should not use SSA, it's far too powerful sorcery. Hiding behind the back of LLVM leads to hilarious results, sorry.

On the good news, I thought that there's no hope not only on explaining SSA to 5-year old, but even PhDs gave up on explaining it to other PhD, with epic project http://ssabook.gforge.inria.fr/latest/book.pdf not updating since 2015. I however just pulled from their repo, and there're even fresh commits. But looks mostly like styling, not new chapters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
v0.1
Decompilation components
Development

No branches or pull requests

2 participants