Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many improvements to make CFGFast fast again. #1092

Merged
merged 14 commits into from
Jul 26, 2018
Merged

Conversation

ltfish
Copy link
Member

@ltfish ltfish commented Jun 27, 2018

This series of development work is inspired by the testing binary used in #1075 -- the binary is provided by @KevOrr, thanks! I realized that angr's CFG recovery (CFGFast, I mean) was obviously too slow to run on a 35-MB blob. I then did some intensive profiling and made quite a few improvements in angr, CLE, and PyVEX.

To give you a sense of what sort of improvement I am referring to, the following is the benchmark I've been using throughout the past three days, and the results of running CFGFast for the first 6% of code.

  • Setup: the 35 MB blob (ARM big endian with a few unsupported co-processor instructions), CPython 2.7.14, and Windows 10.
  • Before the refactor and enhancement: It was taking more than 8 minutes. I had to kill it before it finished running because I did not want to waste my life.
  • After block.statements are made on-demand: 228.2 sec (over 3 minutes).
  • Now (with more optimizations implemented): 92 sec (a minute and a half).

Here is an incomplete list of things that have been changed (improved, hopefully):

  1. Implement Block.vex_nostmt so that we can get a VEX IRSB without its statements. In CFGFast, we default to using statement-free IRSBs, unless the block contains an indirect jump.
  2. Since irsb.statements don't exist any more, we dump exits in C, and save them in irsb.exit_statements.
  3. Make sure CFGFast lifts every basic block at most once. to_snippets() no longer needs to lift/re-lift blocks.
  4. Statement-free IRSBs will not be cached.
  5. Edges in function graphs (Function.transition_graph) are no longer added immediately after the source node is traversed. The addition of these edges is delayed until the destination nodes are traversed (CFGJob has a new property func_edges).
  6. As a side product of the above change, we do not need to remove any FakeRet edges from function graphs any more.
  7. ._changed_functions are renated to ._updated_nonreturning_functions. If a function is already deemed as returning, it will not be added to this set any more.
  8. PyVEX: Re-implemented the IRSB-lifting logic. Now we don't have to unnecessarily extend/copy the same block again and again before returning it to the user.

Thanks @rhelmot for going through much insanity for commiting her code as me ;)

@ltfish ltfish added enhancement Some subsystem of angr needs tweaking refactor Something needs to be reorganized labels Jun 27, 2018
@ltfish ltfish self-assigned this Jun 27, 2018
@ltfish ltfish changed the title Many adjustments to make CFGFast fast again. [WIP] Many adjustments to make CFGFast fast again. Jun 27, 2018
@ltfish ltfish force-pushed the re/faster_cfgfast branch 2 times, most recently from 07c3567 to 481d6e3 Compare July 4, 2018 06:58
@ltfish ltfish changed the title [WIP] Many adjustments to make CFGFast fast again. Many improvements to make CFGFast fast again. Jul 17, 2018
ltfish added a commit to angr/pyvex that referenced this pull request Jul 26, 2018
See angr/angr#1092 for more details.

* The initial commit.

* Redo the lifting logic to avoid redundant IRSB copying.

* Move get_defaultexit_target() from Python to C to speed things up.

* Add a check to make sure LibVEX_Lift() does not return NULL.

* Implement IRSB.instruction_addresses.

* Implement IRSB.has_statements.

* Fix IRSB.addr.

* Fix a NULL-deref in pyvex.c.

* Postprocessor: Do not remove NoOp statements.

Otherwise it will cause a mismatch between statement indices and the
indices in IRSB.exit_statements (which are generated in PyVEX C).

* FixesPostProcessor: Get the IRSB address correctly.

* Remove NoOp statements in the C world.

* Restore the IRSB size calculation.

* Make sure exit_statements is not None before accessing.

* Enable tests for PyVEX itself.

* Implement ARM call jumpkind fixer in C for a better performance.

* Implement MIPS32 post-processing in C world.

* Add a missing undefine.

* Fix test.py.

* Add stddef.h to pyvex.c.

* Implement data reference collection in pyvex_c.

* Lint the code.

* Expose PyVEXError.

* More linting.

* oops

* Expose get_type_size and get_type_spec_size.

* Expose IRTypeEnv.

* Postprocessor: Support NeedStatementsNotification.

* Reorganize lifting/__init__.py.

* Remove the IRSB.addr property, replace with raw attribute

* Make check against MAX_DATA_REFS more explicit

* Move from data ref tuples to a DataRef class

* Lint block.py: Define data_refs and _instruction_addresses in __init__

* Make pyvex.lift positional arguments match pyvex.IRSB

* The kosher way to do this is .lift()

* Split pyvex.c into several smaller C files

(committing as fish to preserve authorship)
@ltfish ltfish merged commit 89a20b6 into master Jul 26, 2018
@ltfish ltfish deleted the re/faster_cfgfast branch July 26, 2018 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Some subsystem of angr needs tweaking refactor Something needs to be reorganized
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants