Many improvements to make CFGFast fast again. #1092

ltfish · 2018-06-27T06:24:18Z

This series of development work is inspired by the testing binary used in #1075 -- the binary is provided by @KevOrr, thanks! I realized that angr's CFG recovery (CFGFast, I mean) was obviously too slow to run on a 35-MB blob. I then did some intensive profiling and made quite a few improvements in angr, CLE, and PyVEX.

To give you a sense of what sort of improvement I am referring to, the following is the benchmark I've been using throughout the past three days, and the results of running CFGFast for the first 6% of code.

Setup: the 35 MB blob (ARM big endian with a few unsupported co-processor instructions), CPython 2.7.14, and Windows 10.
Before the refactor and enhancement: It was taking more than 8 minutes. I had to kill it before it finished running because I did not want to waste my life.
After block.statements are made on-demand: 228.2 sec (over 3 minutes).
Now (with more optimizations implemented): 92 sec (a minute and a half).

Here is an incomplete list of things that have been changed (improved, hopefully):

Implement Block.vex_nostmt so that we can get a VEX IRSB without its statements. In CFGFast, we default to using statement-free IRSBs, unless the block contains an indirect jump.
Since irsb.statements don't exist any more, we dump exits in C, and save them in irsb.exit_statements.
Make sure CFGFast lifts every basic block at most once. to_snippets() no longer needs to lift/re-lift blocks.
Statement-free IRSBs will not be cached.
Edges in function graphs (Function.transition_graph) are no longer added immediately after the source node is traversed. The addition of these edges is delayed until the destination nodes are traversed (CFGJob has a new property func_edges).
As a side product of the above change, we do not need to remove any FakeRet edges from function graphs any more.
._changed_functions are renated to ._updated_nonreturning_functions. If a function is already deemed as returning, it will not be added to this set any more.
PyVEX: Re-implemented the IRSB-lifting logic. Now we don't have to unnecessarily extend/copy the same block again and again before returning it to the user.

Thanks @rhelmot for going through much insanity for commiting her code as me ;)

See angr/angr#1092 for more details. * The initial commit. * Redo the lifting logic to avoid redundant IRSB copying. * Move get_defaultexit_target() from Python to C to speed things up. * Add a check to make sure LibVEX_Lift() does not return NULL. * Implement IRSB.instruction_addresses. * Implement IRSB.has_statements. * Fix IRSB.addr. * Fix a NULL-deref in pyvex.c. * Postprocessor: Do not remove NoOp statements. Otherwise it will cause a mismatch between statement indices and the indices in IRSB.exit_statements (which are generated in PyVEX C). * FixesPostProcessor: Get the IRSB address correctly. * Remove NoOp statements in the C world. * Restore the IRSB size calculation. * Make sure exit_statements is not None before accessing. * Enable tests for PyVEX itself. * Implement ARM call jumpkind fixer in C for a better performance. * Implement MIPS32 post-processing in C world. * Add a missing undefine. * Fix test.py. * Add stddef.h to pyvex.c. * Implement data reference collection in pyvex_c. * Lint the code. * Expose PyVEXError. * More linting. * oops * Expose get_type_size and get_type_spec_size. * Expose IRTypeEnv. * Postprocessor: Support NeedStatementsNotification. * Reorganize lifting/__init__.py. * Remove the IRSB.addr property, replace with raw attribute * Make check against MAX_DATA_REFS more explicit * Move from data ref tuples to a DataRef class * Lint block.py: Define data_refs and _instruction_addresses in __init__ * Make pyvex.lift positional arguments match pyvex.IRSB * The kosher way to do this is .lift() * Split pyvex.c into several smaller C files (committing as fish to preserve authorship)

ltfish added enhancement Some subsystem of angr needs tweaking refactor Something needs to be reorganized labels Jun 27, 2018

ltfish self-assigned this Jun 27, 2018

ltfish changed the title ~~Many adjustments to make CFGFast fast again.~~ [WIP] Many adjustments to make CFGFast fast again. Jun 27, 2018

ltfish mentioned this pull request Jul 1, 2018

Make lifting faster. angr/pyvex#141

Merged

ltfish force-pushed the re/faster_cfgfast branch 2 times, most recently from 07c3567 to 481d6e3 Compare July 4, 2018 06:58

This was referenced Jul 7, 2018

Disassemble code #1116

Closed

Can I use angr to list all functions' names and addresses of ndk built binary? #1115

Closed

ltfish force-pushed the re/faster_cfgfast branch from 1601c28 to 3846f92 Compare July 16, 2018 18:56

ltfish and others added 14 commits July 17, 2018 05:27

Many adjustments to make CFGFast fast again.

136ae58

Adjust sim_unicorn.cpp accordingly.

dac8958

CFGFast: Use IRSB.instruction_addresses.

b9199b4

CFGNode: Fix a recursion bug.

5ea3bca

CFGFast: Convert IRSBs without statements to normal IRSBs when needed.

0c7b5db

Lint the code.

ca77184

VECEngine: Only initialize cache_key tuple when use_cache is True.

d504d6d

Temporarily nop-out data reference collection for CFGFast.

3fb033d

Add a smoketest for data reference collection in CFGFast.

53ff940

Out-source data reference collection to pyvex_c.

e642e1d

Fix the vex_lift() call in sim_unicorn.cpp.

a9217c7

Fix invalid argument to CFGJob

a5b1741

Use new pyvex DataRef class

562203a

Make pyvex.lift positional arguments match pyvex.IRSB

2801bd7

ltfish force-pushed the re/faster_cfgfast branch from 982a4cd to 2801bd7 Compare July 17, 2018 12:28

ltfish changed the title ~~[WIP] Many adjustments to make CFGFast fast again.~~ Many improvements to make CFGFast fast again. Jul 17, 2018

ltfish mentioned this pull request Jul 26, 2018

Make things slightly faster. angr/cle#143

Merged

ltfish merged commit 89a20b6 into master Jul 26, 2018

ltfish deleted the re/faster_cfgfast branch July 26, 2018 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many improvements to make CFGFast fast again. #1092

Many improvements to make CFGFast fast again. #1092

ltfish commented Jun 27, 2018 •

edited

Loading

Many improvements to make CFGFast fast again. #1092

Many improvements to make CFGFast fast again. #1092

Conversation

ltfish commented Jun 27, 2018 • edited Loading

ltfish commented Jun 27, 2018 •

edited

Loading