New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many improvements to make CFGFast fast again. #1092

Merged
merged 14 commits into from Jul 26, 2018

Conversation

Projects
None yet
2 participants
@ltfish
Member

ltfish commented Jun 27, 2018

This series of development work is inspired by the testing binary used in #1075 -- the binary is provided by @KevOrr, thanks! I realized that angr's CFG recovery (CFGFast, I mean) was obviously too slow to run on a 35-MB blob. I then did some intensive profiling and made quite a few improvements in angr, CLE, and PyVEX.

To give you a sense of what sort of improvement I am referring to, the following is the benchmark I've been using throughout the past three days, and the results of running CFGFast for the first 6% of code.

  • Setup: the 35 MB blob (ARM big endian with a few unsupported co-processor instructions), CPython 2.7.14, and Windows 10.
  • Before the refactor and enhancement: It was taking more than 8 minutes. I had to kill it before it finished running because I did not want to waste my life.
  • After block.statements are made on-demand: 228.2 sec (over 3 minutes).
  • Now (with more optimizations implemented): 92 sec (a minute and a half).

Here is an incomplete list of things that have been changed (improved, hopefully):

  1. Implement Block.vex_nostmt so that we can get a VEX IRSB without its statements. In CFGFast, we default to using statement-free IRSBs, unless the block contains an indirect jump.
  2. Since irsb.statements don't exist any more, we dump exits in C, and save them in irsb.exit_statements.
  3. Make sure CFGFast lifts every basic block at most once. to_snippets() no longer needs to lift/re-lift blocks.
  4. Statement-free IRSBs will not be cached.
  5. Edges in function graphs (Function.transition_graph) are no longer added immediately after the source node is traversed. The addition of these edges is delayed until the destination nodes are traversed (CFGJob has a new property func_edges).
  6. As a side product of the above change, we do not need to remove any FakeRet edges from function graphs any more.
  7. ._changed_functions are renated to ._updated_nonreturning_functions. If a function is already deemed as returning, it will not be added to this set any more.
  8. PyVEX: Re-implemented the IRSB-lifting logic. Now we don't have to unnecessarily extend/copy the same block again and again before returning it to the user.

Thanks @rhelmot for going through much insanity for commiting her code as me ;)

@ltfish ltfish self-assigned this Jun 27, 2018

@ltfish ltfish changed the title from Many adjustments to make CFGFast fast again. to [WIP] Many adjustments to make CFGFast fast again. Jun 27, 2018

This was referenced Jul 1, 2018

@ltfish ltfish changed the title from [WIP] Many adjustments to make CFGFast fast again. to Many improvements to make CFGFast fast again. Jul 17, 2018

ltfish added a commit to angr/pyvex that referenced this pull request Jul 26, 2018

Make lifting faster. (#141)
See angr/angr#1092 for more details.

* The initial commit.

* Redo the lifting logic to avoid redundant IRSB copying.

* Move get_defaultexit_target() from Python to C to speed things up.

* Add a check to make sure LibVEX_Lift() does not return NULL.

* Implement IRSB.instruction_addresses.

* Implement IRSB.has_statements.

* Fix IRSB.addr.

* Fix a NULL-deref in pyvex.c.

* Postprocessor: Do not remove NoOp statements.

Otherwise it will cause a mismatch between statement indices and the
indices in IRSB.exit_statements (which are generated in PyVEX C).

* FixesPostProcessor: Get the IRSB address correctly.

* Remove NoOp statements in the C world.

* Restore the IRSB size calculation.

* Make sure exit_statements is not None before accessing.

* Enable tests for PyVEX itself.

* Implement ARM call jumpkind fixer in C for a better performance.

* Implement MIPS32 post-processing in C world.

* Add a missing undefine.

* Fix test.py.

* Add stddef.h to pyvex.c.

* Implement data reference collection in pyvex_c.

* Lint the code.

* Expose PyVEXError.

* More linting.

* oops

* Expose get_type_size and get_type_spec_size.

* Expose IRTypeEnv.

* Postprocessor: Support NeedStatementsNotification.

* Reorganize lifting/__init__.py.

* Remove the IRSB.addr property, replace with raw attribute

* Make check against MAX_DATA_REFS more explicit

* Move from data ref tuples to a DataRef class

* Lint block.py: Define data_refs and _instruction_addresses in __init__

* Make pyvex.lift positional arguments match pyvex.IRSB

* The kosher way to do this is .lift()

* Split pyvex.c into several smaller C files

(committing as fish to preserve authorship)

@ltfish ltfish merged commit 89a20b6 into master Jul 26, 2018

1 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/appveyor/branch AppVeyor build failed
Details
continuous-integration/travis-ci/push The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details

@ltfish ltfish deleted the re/faster_cfgfast branch Jul 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment