Skip to content

Commit 070d058

Browse files
Bunkerbuster release.
This update allows ARCUS to explore nearby paths not covered in the processor trace, snapshot APIs, and more.
1 parent 40462c3 commit 070d058

File tree

668 files changed

+5642
-37
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

668 files changed

+5642
-37
lines changed

README.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,15 +65,24 @@ The general setup steps are:
6565

6666
* [ARCUS Paper Evaluation](https://super.gtisc.gatech.edu/arcus-dataset-public.tgz)
6767

68+
* [Bunkerbuster Paper Evaluation](https://super.gtisc.gatech.edu/bunkerbuster-dataset-public.tgz)
69+
6870
## Unit Tests
6971

7072
If you make contributions to the repository, please try to keep `tools/angr/test/test.py` up-to-date. For non-unit tests,
7173
create a new script in `tools/angr/test`. We currently do not have a unified framework for non-unit tests.
7274

7375
## Publications
7476

75-
* C. Yagemann, M. Pruett, S. P. Chung, K. Bittick, B. Saltaformaggio, W. Lee, *ARCUS: Symbolic Root Cause Analysis of Exploits
76-
in Production Systems.* To appear in the 30th USENIX Security Symposium (USENIX'21). August 11--13, 2021.
77+
* C. Yagemann, S. Chung, B. Saltaformaggio, W. Lee,
78+
*Automated Bug Hunting With Data-Driven Symbolic Root Cause Analysis.*
79+
Appeared in the 2021 ACM Conference on Computer and Communications Security (CCS’21).
80+
Seoul, Republic of Korea. November 15--19, 2021.
81+
82+
* C. Yagemann, M. Pruett, S. P. Chung, K. Bittick, B. Saltaformaggio, W. Lee,
83+
*ARCUS: Symbolic Root Cause Analysis of Exploits in Production Systems.*
84+
Appeared in the 30th USENIX Security Symposium (USENIX'21).
85+
August 11--13, 2021.
7786

7887
## Related Work
7988

docs/arcus.md

Lines changed: 42 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,38 @@ Upon detecting a vulnerability, the analysis will attempt to look backwards to f
1818
introducing the problem along with a root cause for the misbehavior. Currently, this will appear as part
1919
of the logging output.
2020

21+
## Exploration
22+
23+
With the release of exploration plugins (codename: Bunkerbuster), ARCUS can now explore nearby paths to
24+
find more bugs. Simply add the `--explore` flag to `analysis.py`.
25+
26+
**Note:** Exploration is significantly slower and more memory expensive than simply following the trace.
27+
28+
The "Explore Options" section in `analysis.py --help` contains additional advanced settings. For example,
29+
you can configure the analysis to switch to exploration after a timeout, regardless of whether the end
30+
of the trace has been reached. You can also use a Redis database to record which paths were explored so
31+
other analysis sessions won't re-explore the same stuff.
32+
33+
## Snapshots
34+
35+
Also new in the Bunkerbuster release is the ability for Tracer to break traces down into snapshots. This
36+
is useful for programs that are too big to analyze symbolically from start to finish. Passing `tracer.py`
37+
the flag `--snapshot-api` will cause it to snapshot invocations of imported functions. If you really know
38+
what you're doing, you can use `--snapshot-rva` to give a virtual address (relative to the main object's
39+
base address) to snapshot.
40+
41+
**Note:** When using `--snapshot-rva`, results may be unstable if the address is not the start of a function.
42+
43+
For each snapshot, the analysis will attempt to symbolize its parameters, using either a prototype definition
44+
(if one is available in `tools/angr/plugins/prototypes`) or by analyzing the memory accesses from the trace.
45+
46+
**Note:** This is an under-constrained symbolic analysis, so bugs found under these conditions may not be
47+
reachable in real executions. However, since snapshots are taken at API entry points, bugs found this way
48+
are typically of relevance to the API's developers.
49+
50+
It is possible to generate more prototypes for the `tools/angr/plugins/prototypes` directory using C/C++
51+
headers. See `tools/prototype-parser/parse_function.py --help` for more details.
52+
2153
# Development
2254

2355
This section is for developers who want to contribute to the project.
@@ -34,9 +66,16 @@ types of vulnerabilities, keep tracing on track, etc. The recommended steps for
3466

3567
## Plugins
3668

37-
The analysis uses a plugin system to make extending easy. There are currently two kinds of plugins: **hooks** and
38-
**detectors**. Hooks provide angr `SimProcedure` classes to speed up analysis. Detectors scan each state for
39-
vulnerabilities and then analyze detections at the end.
69+
The analysis uses a plugin system to make extending easy. There are currently four kinds of plugins: **hooks**,
70+
**detectors**, **explorers**, and **prototypes**:
71+
72+
* Hooks provide angr `SimProcedure` classes to speed up analysis.
73+
74+
* Detectors scan each state for vulnerabilities and then analyze detections at the end.
75+
76+
* Explorers guide the analysis down interesting nearby paths.
77+
78+
* Prototypes define the parameters to functions captured as snapshots.
4079

4180
Adding plugins is as simple as creating a Python file to the appropriate directory. The `__init__.py` script in
4281
each plugin directory will automatically handle loading and validating the plugins. You should not need to modify

docs/misc-tools.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,33 @@ Here's a list of other tools provided and their purpose:
55
* `tools/angr/memlayout.py` - Takes a trace and prints the memory layout.
66
* `tools/angr/decode.py` - Decodes a GRIFFIN trace (usually named `trace.griffin`) and prints out the sequence
77
of executed basic blocks along with some other details.
8+
* `tools/angr/rewriter.py` - Instruments program to record run-time data in PT trace. Explained in more detail
9+
in a following subsection.
810
* `tools/pt/cmppath` - Compares traces using hashmaps and checksums.
911

12+
## Instrumenting Programs
13+
14+
Processor traces usually only record control flow, so for analysis that requires data flow, the program has to be rewritten
15+
to encode the data of interest. This can be done with `tools/angr/rewriter.py`. See `-h` for more details.
16+
17+
The JSON file provided to this tool is used to specify where to place hooks and what to record at those
18+
places. For example, to capture the contents of the `rax` register at the code located at relative virtual
19+
address `0x839`:
20+
21+
```
22+
{"hooks": [
23+
{"addr": 2105, "src": "rax"}
24+
]}
25+
```
26+
27+
And if we want to record the value at a memory location instead:
28+
29+
```
30+
{"hooks": [
31+
{"addr": 2105, "src": 123456}
32+
]}
33+
```
34+
1035
## Comparing Traces
1136

1237
The tool `cmppath` uses hashmaps and checksums to compare traces. Setup is as easy as:

docs/scaling-arcus.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,4 @@ you may want to use `tools/pt/cmppath` to filter out similar traces.
3131
./run.sh /path/to/traces-dir master
3232

3333
# pass additional arguments to analysis.py
34-
./run.sh /path/to/traces-dir master --logging=40
34+
./run.sh /path/to/traces-dir master --explore --logging=40

tools/angr/analysis.py

Lines changed: 136 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,13 @@
3737
import agc
3838
import angrpt
3939
import dwarf
40+
import explore
4041
from globals_deep import SimStateDeepGlobals
4142
import griffin
4243
import hooks
4344
import plugins.detectors
4445
import plugins.hooks
46+
import plugins.explorers
4547
import reporting
4648
import taint
4749
import xed
@@ -108,6 +110,20 @@ def parse_args():
108110
help='Save reports as JSON files to provided directory')
109111
parser.add_option_group(group_analysis)
110112

113+
group_explore = OptionGroup(parser, 'Explore Options')
114+
group_explore.add_option('--explore', action='store_true', default=False,
115+
help='Explore paths near the traced path for additional bugs')
116+
group_explore.add_option('--explore-after', action='store', type='str', default=None,
117+
help='Explore after given time, even if end of trace has not been reached '
118+
'(supports suffixes: s/m/h, defaults to m)')
119+
group_explore.add_option('--explore-db', action='store', default=None,
120+
help='Use database so explorers can share data across sessions (supported: '
121+
'Redis - redis://111.222.333.444:6379/0)')
122+
group_explore.add_option('--explore-plugins', action='store', type='str', default=None,
123+
help='Comma seperated list of explorer plugins to use, by module name '
124+
'(example: "arg_max,loop_bounds")')
125+
parser.add_option_group(group_explore)
126+
111127
group_logging = OptionGroup(parser, 'Logging Options')
112128
group_logging.add_option('-l', '--logging', action='store', type='int', default=20,
113129
help='Log level [10-50] (default: 20 - Info)')
@@ -572,6 +588,10 @@ def set_log_levels(options):
572588
for module in list(plugins.detectors.loaded.values()):
573589
logging.getLogger(module.__name__).setLevel(options.logging)
574590

591+
# explorer plugins
592+
for module in list(plugins.explorers.loaded.values()):
593+
logging.getLogger(module.__name__).setLevel(options.logging)
594+
575595
def get_predecessor(tech, index):
576596
"""Helper function to get predecessors by index, ignoring None elements.
577597
@@ -848,6 +868,19 @@ def main():
848868

849869
set_log_levels(options)
850870

871+
# input validation
872+
if options.explore_after:
873+
if not options.explore:
874+
options.explore = True # user clearly intends to explore
875+
explore_delta = parse_timedelta(options.explore_after, default_suffix='m')
876+
if explore_delta is None:
877+
log.error("Invalid value for --explore-after: %s" % options.explore_after)
878+
return
879+
if explore_delta < 1:
880+
log.error("Value for --explore-after must be positive: %d" % explore_delta)
881+
return
882+
options.explore_after = explore_delta
883+
851884
trace_dir = args[0]
852885
input_trace_candidates = ['trace.griffin', 'trace.griffin.gz']
853886
input_trace = None
@@ -955,7 +988,7 @@ def main():
955988

956989
# initialize the starting state, exploration technique and simulation manager
957990
tech = angrpt.Tracer(bb_seq, start_address=regs['rip'])
958-
init_state, init_env = parse_entry_state_json(proj, trace_dir, snapshot_dir, False,
991+
init_state, init_env = parse_entry_state_json(proj, trace_dir, snapshot_dir, options.explore,
959992
options.override_max_argv)
960993
simgr = proj.factory.simgr(init_state)
961994

@@ -1062,13 +1095,19 @@ def main():
10621095
raise CriticalMemoryException('Memory critically low')
10631096

10641097
analysis_duration = (datetime.now() - analysis_start_time).total_seconds()
1098+
if options.explore_after and analysis_duration > options.explore_after:
1099+
log.warning("Analysis timeout reached, moving to exploration early")
1100+
break
10651101

10661102
except KeyboardInterrupt:
10671103
log.warning("Received interrupt, cleaning up...")
1104+
options.explore = False # forcibly disable exploration
10681105
except (CriticalMemoryException, MemoryError):
10691106
log.error("Memory critically low, halting analysis")
1107+
options.explore = False
10701108
except AssertionError:
10711109
log.error("Failed assertion: %s" % format_exc())
1110+
options.explore = False
10721111
except angr.errors.AngrTracerError as ex:
10731112
log.error("Angr stopped early: %s" % str(ex))
10741113

@@ -1086,17 +1125,112 @@ def main():
10861125
log.error("Stopped here: (symbolic)")
10871126
except KeyboardInterrupt:
10881127
log.warning("Received interrupt, cleaning up...")
1128+
options.explore = False # forcibly disable exploration
10891129

10901130
# update reports
10911131
log.info("Updating reports with root cause analysis")
10921132
try:
10931133
reports = analyze(simgr, bb_seq)
10941134
except KeyboardInterrupt:
10951135
log.warning("Received interrupt, cleaning up...")
1136+
options.explore = False # forcibly disable exploration
1137+
1138+
################################################
1139+
### PHASE 2: Optionally Explore Nearby Paths ###
1140+
################################################
1141+
1142+
try:
1143+
if options.explore:
1144+
log.info("Starting explorers")
1145+
# backup list of predecessors for original trace
1146+
orig_preds = simgr._techniques[0].predecessors.copy()
1147+
# we no longer need the tracer exploration technique
1148+
simgr.remove_technique(tech)
1149+
assert len(simgr._techniques) == 0
1150+
1151+
# filter for if user wants us to only use a subset of plugins
1152+
allowed_explorers = None
1153+
if options.explore_plugins:
1154+
allowed_explorers = options.explore_plugins.split(',')
1155+
1156+
for explorer in list(plugins.explorers.loaded.values()):
1157+
e_short_name = explorer.__name__.split('.')[-1]
1158+
if not (allowed_explorers is None or e_short_name in allowed_explorers):
1159+
log.info("Skipping explorer %s at user's request" % e_short_name)
1160+
continue
10961161

1162+
log.debug("Invoking explorer: %s" % e_short_name)
1163+
1164+
# reactivate detectors because this is a new exploration
1165+
for detector in list(plugins.detectors.loaded.values()):
1166+
detector.active = True
1167+
1168+
try:
1169+
explorer_tech = explorer.explorer(orig_preds, bb_seq, options)
1170+
simgr.use_technique(explorer_tech)
1171+
except KeyboardInterrupt:
1172+
# let the outer try-except catch these
1173+
raise ex
1174+
except:
1175+
log.error("Uncaught exception while trying to setup explorer: %s" % format_exc())
1176+
buggy_plugins.add(explorer.__name__)
1177+
continue
1178+
1179+
mem_mgr.enable()
1180+
# step until explorer is complete
1181+
while not simgr.complete():
1182+
try:
1183+
simgr.step()
1184+
except (KeyboardInterrupt, CriticalMemoryException, MemoryError) as ex:
1185+
# let the outer try-except catch these
1186+
raise ex
1187+
except ReferenceError:
1188+
# something is wrong with the active state, drop it, all explorers are
1189+
# designed to react robustly to an empty active stash
1190+
simgr.drop(stash='active')
1191+
except:
1192+
log.error("Uncaught exception raised by explorer plugin: %s" % format_exc())
1193+
log.error("Stopping explorer and moving on")
1194+
buggy_plugins.add(explorer.__name__)
1195+
break
1196+
1197+
if not hasattr(simgr._techniques[0], 'predecessors'):
1198+
log.error("Detectors rely on explorers maintaining predecessors, which is missing, cannot continue")
1199+
buggy_plugins.add(explorer.__name__)
1200+
break
1201+
1202+
if len(simgr.stashes['active']) > 0:
1203+
if len(simgr.stashes['active']) > 1:
1204+
log.warn("Explorer created %d active states, most detectors only examine one" % len(simgr.stashes['active']))
1205+
1206+
check_for_vulns(simgr, proj)
1207+
# this is a bit excessive, but because we don't know when an explorer is going to rewind
1208+
# we have to check for new states to analyze after each step so simgr._techniques[0].predecessors
1209+
# remains accurate
1210+
analyze(simgr, bb_seq, reports)
1211+
1212+
if psutil.virtual_memory().available <= min_memory:
1213+
mem_mgr.reap_predecessors()
1214+
if psutil.virtual_memory().available <= crit_memory:
1215+
raise CriticalMemoryException('Memory critically low')
1216+
1217+
# cleanup explorer
1218+
log.debug("Explorer %s complete" % explorer.__name__)
1219+
mem_mgr.disable()
1220+
simgr.remove_technique(explorer_tech)
1221+
assert len(simgr._techniques) == 0
1222+
1223+
except KeyboardInterrupt:
1224+
log.warning("Received interrupt, cleaning up...")
1225+
mem_mgr.disable()
1226+
except (CriticalMemoryException, MemoryError):
1227+
log.error("Memory critically low, halting analysis")
1228+
mem_mgr.disable()
1229+
except AssertionError:
1230+
log.error("Failed assertion: %s" % format_exc())
10971231

10981232
################################
1099-
### PHASE 2: Final Reporting ###
1233+
### PHASE 3: Final Reporting ###
11001234
################################
11011235

11021236
log.info("** Analysis complete, final results **")

0 commit comments

Comments
 (0)