adds the symbolic executor #1105

ivg · 2020-05-27T22:10:05Z

Finally, BAP can boast with its own symbolic executor! The executor
itself comprise a rather small part of this PR (roughly 25%, or 1kLoC)
the rest of the PR includes necessary changes that enable the symbolic
executor.

Before we delve into details, let's give a high-level picture of the
symbolic executor. It is a plugin (no library interface at this
release) which when enabled will track all inputs and compute their
symbolic values. Inputs are either automatically identified or
explicitly set using Lisp primitives, symbolic-value and
symbolic-memory. Two more primitives, symbolic-assert and
symbolic-assume could be used to specify verification conditions and
further refine symbolic models of various subsystems (aka symbolic
stubs). In addition to the symbolic computer we have the symbolic
executor itself that computes inputs trying to reach all parts of the
program. The algorithm is rather simple, we follow the concrete
execution path, and on each condition evaluation we enqueue a task
that refutes it. Each location is tried up to the certain number of
times, set with --primus-symbolic-executor-cutoff command-line
argument.

To get the idea of the symbolic executor capabilities you can look at
our current testsuite (see testsuite/src/symexec-*.c files), for
example this test is the hardened version of angr's fauxware (the
original version was too easy to break). In this test (as well as in
all others we can reach all parts of the program and compute inputs
that will lead to them, including the SOSNEAKY password).

Since the symbolic executor is driven by the concrete executor we rely
on the state randomization and Primus generators a lot. We implemented
a new primus-random plugin with a sophisticated interface that allows
specifying generators for registers, memory regions, and function
values. We also rewrote our random generator completely from scratch
to prevent unecessary aliasing. Another important addition is that now
we can generate words of abitrary sizes (previously we could generate
only bytes).

One of the significant limitation (if not to say a bug) of the Primus
Lisp primitives was that many operations were opaque and not
observable via the Primus interpreter observation interface, therefore
the symbolic computer wasn't able to track values to their origins. We
rewrote signficant number of primitives and expressed them in terms of
the Primus interpreter. We also added ite, branch, and repeat
operations to the Primus interpreter so that the primitives author can
express directly structured control flow. As the result, all Primus
Lisp control operators are correctly translated into path constraints,
so Primus Lisp could be now seen as an advanced interface to SMT with
theory that includes branches and loops, e.g.,

(defun main ()
  (declare (external "main"))
  (symbolic-assume (> RDI 0))
  (symbolic-assert 'a1 (= RDI 0))
  (symbolic-assert 'a2 (not (= (+ RDI RDI) 0)))
  (symbolic-assert 'a3 (not (= RDI 0)))
  (when (< RDI 16)
    (symbolic-assert 'a4 (= RDI 0))
    (symbolic-assert 'a5 (not (= (+ RDI RDI) 0)))
    (symbolic-assert 'a6 (not (= RDI 0)))))

Besides, there is now the eval-lisp command so you can run the above
code with

bap eval-lisp main --primus-lisp-load example --system bap:symbolic-executor --primus-print-obs=assert-failure,call
(call (main))
(assert-failure (a1 (RDI #x0000000000000001)))
(assert-failure (a2 (RDI #x8000000000000000)))
(assert-failure (a4 (RDI #x0000000000000002)))
(call (main))
(assert-failure (a1 (RDI #x0000000000000002)))
(assert-failure (a2 (RDI #x8000000000000000)))

Notice that the function was evaluated twice, since it has two paths.

Since the Primus Lisp interpreter and Primus Lisp primitives are now
using the main Primus interpreter for all operations and our Primus
Lisp type system is less strict than the Bil type system we decided to
relax the interpreter and injected a few coercions here and there. So
now it is ok to add two words with different sizes, the interpreter
will just coerce the to the size of the bigger word, and so one with
other operations.

While evaluation the new symbolic executor on various tests we added
many new stubs that handle various posix operations. We also
implemented the symbolic version of stdio, so now input files and
streams are treated symbolically. To enable reuse of the concrete
versions of function stubs we added an ability to overload functions
based on the system and used components. (We always had the
overloading facility, which was used to provide different
implementations for different ABI and architectures, now we added an
ability to overload based on the components that comprise the system
or system itself), for example adding

(declare (context (component bap:symbolic-computer)))

to a definition will make it applicable only in the context of a system that
has this component.

The symbolic executor uses z3 underneath the hood and is limited to
QF_FD logic. One of the drawbacks is that z3 requires custom runtime,
so baptop is currently broken. But we already fixed this issue
upstream, see Z3Prover/z3#4468, and backported
this patch via opam-repository to 4.8.8 so that we can use the latest
released version without having to wait for 4.8.9 (see
ocaml/opam-repository#16535).

Below is the summary of changes. If you have any questions about them
feel free to ask for clarifications. The commit messages are also
usually quite informative. As always reviews and comments are very
welcome.

List of changes:

adds support for modern C runtime
adds support for stack canaries
adds the default prototype
prettifies error printing (prunes backtraces)
adds the fini operation to System.run
adds ite, branch, and repeat Primus operations
complete rewrite of Primus generators (wide generators)
completely overhauls the random generators implementation
new primus-random plugin that controls Primus randomness
adds Primus.Env and Primus.Memory generated observations
extends the Priumus.Memory interface
extends the Primus.Env interface
new less heavy interface for Lisp primitives
relaxes Primus Interpreter typing rules
makes Primus Lisp interpreter more transparent
adds static and global variables to Primus Lisp
fixes Lisp msg operator
enables overloading based on systems and components
adds the Primus.Memory.add_region function
allows Primus execution from any basic block
makes values unique across different machines
adds the primus track visited library
adds the primus symbolic executor plugin
adds symblolic IO system
adds many new stubs
extends Primus Lisp's Dictionary interface
adds the default limit to Primus Limiter
reimplements Primus Lisp memory allocator (malloc)
adds the symbol-of-string primitive
adds the eval-lisp bap command
fixes the multisystem run observation subscription
splits the promiscuous mode into subcomponents
adds new primitives to the Primus region library
tweaks the core systems
fixes memcheck-malloc on strn* operations
adds incident deduplication
adds x86 non-standard registers intialization
extends the run plugin

Generators now are able to produce objects of random width modulo that the random value underneath the hood is still an integer. TODO: drop the width parameter from the byte generator.

We're now using a much more efficient MSG algorithm with the implementation that uses the new Bitvec module, so that we can now generate arbitrary number of random bits and specify any range we like.

it produces lots of false positives due to unwanted aliasing.

It uses malloc to allocate a place for the register and lisp parameters to store the location. We should also consider clearing errno as soon as possible as it is the source of aliasing and lots of UAF and DF false positives.

basically, the idea is that each function that defines a buffer or a string (e.g., malloc, strdup, memcpy, etc) imposes constraints on the buffer that we may track. For example, if we break the bounds of the strdup generated string it could be a soft alarm.

it is just a warning, there is no need to overreact

coerces boolean operations to bitvectors (we will keep booleans only for actual logic formulas, i.e., for meta logic - to answer questions whether a path is feasible and so on).

Now each generator will generate its own sequence. Note, we still need to create a separate generator for each independent data location.

TODO: document it TODO: publish bitvector interfaces

this plugin will control all the random generators... as much as they could be controlled

forgot to commit it

Not only it is possible to enable/disable it, but it is also now possible to provide alternative competing runtime support without modifying the existing code. However, the main motivaiton to move it was the desire to get access to the project data structure that enables a much more robust runtime identification that works even in the absence of symbols.

fixes a type error in strcat fixes an error in strncat

1) the greedy-promiscuous-executor was still mentioned by other systems 2) yet another type error in strings

instead of the parameters

The convention is for each concrete feature `xxx` that needs a symbolic counterpart to provide the `symbolic-xxx` feature. The stdlib is right now empty we will start to fill it in with the stuff after the release.

It is the same as the old `allocate` (which is still available, no worries), but takes two words instead of one word and int, which makes it possible to create regions whose size do not fit into OCaml int, which is quite possible during execution in a randomized mode. And makes the code much eaiser to write as now we don't need to deal with conversion errors.

it will no longer fail on big numbers

so it is possible to map the whole 2^64 memory in one run

ivg added 30 commits May 22, 2020 16:02

changes the trace representation and some initial design

2c7e14a

implements deduplication and classification incidence

9c13257

enables wide generators

2044c0d

Generators now are able to produce objects of random width modulo that the random value underneath the hood is still an integer. TODO: drop the width parameter from the byte generator.

revamps Primus Generators

d827000

We're now using a much more efficient MSG algorithm with the implementation that uses the new Bitvec module, so that we can now generate arbitrary number of random bits and specify any range we like.

enables wide generators for memory

8f25753

makes wide generators available through memory-allocate

5380457

hardcodes generators for registers and memory in promiscuous mode

e9a0087

disables alloc-on-free hack

75b333a

it produces lots of false positives due to unwanted aliasing.

fixes the memcheck rules for strn* family of functions

56ae344

EXPERIMENTAL: implements errno support

8005cf9

It uses malloc to allocate a place for the register and lisp parameters to store the location. We should also consider clearing errno as soon as possible as it is the source of aliasing and lots of UAF and DF false positives.

hushes ida error message

596a1cd

it is just a warning, there is no need to overreact

adds the generated observations for undefined vars and pointers

10bd4fb

the initial implementation of the Primus symbolic executor

5940c6c

adds the oasis file for the symbolic executor

d4ed4b9

makes z3 typechecker happy

2f4eec5

coerces boolean operations to bitvectors (we will keep booleans only for actual logic formulas, i.e., for meta logic - to answer questions whether a path is feasible and so on).

abstracts the input type (previously known as origin)

b36970b

adds the solver

821ca74

implements eq and neq using only only the BitVector logic

ccde6c2

adds salt to generators

410b3ce

Now each generator will generate its own sequence. Note, we still need to create a separate generator for each independent data location.

UNFINISHED: publishes the of_iterator function

2a9eb50

TODO: document it TODO: publish bitvector interfaces

adds the random plugin

8cdb816

this plugin will control all the random generators... as much as they could be controlled

makes the generators syntax a little bit less ugly

842bca2

implements memory randomization

2024d25

registers symbolic executor components using the new interface only

fd933b0

splits promiscuous mode into several independent components

0062cae

restructures the core system

a692126

adds the translation from formula values to words

a7334ef

don't resubscribe printers every time a new system is started

a0d134b

implements the master-forker process

97eb3e9

ivg and others added 21 commits May 22, 2020 16:02

changes the cutoff algorithm

628e54c

code cleanup

e8dd83e

adds the fflush prototype

c4d857f

adds the only-queue option to the run-entry-points parameter

385d55d

adds crc32 test to the testsuite

ade70b2

forgot to commit it

adds symbols to the test binaries to enable test passing on Travis

9168228

fixes type errors in strcat and strncat

32c24b8

fixes a type error in strcat fixes an error in strncat

fixes issues in the tests

ad48d0a

uses core_kernel in primus-random to enable older ocaml versions

4d97e13

fixes the cache test

ca058f7

fixes few small bugs

a58f717

1) the greedy-promiscuous-executor was still mentioned by other systems 2) yet another type error in strings

uses static binding for the errno-location

f7ca2b7

instead of the parameters

renames symbolic-io to symbolic-stdio adds symbolic-stdlib

922a6ef

The convention is for each concrete feature `xxx` that needs a symbolic counterpart to provide the `symbolic-xxx` feature. The stdlib is right now empty we will start to fill it in with the stuff after the release.

switches to add_region in primus-loader

542837e

switches the memory-allocate primitive to add_region

11fa2df

it will no longer fail on big numbers

primus random now also uses add_region

0724e24

so it is possible to map the whole 2^64 memory in one run

switches the symbolic memory to add_region

a83500a

adds the documentation for the Generator.of_iterator function

a2595fe

drops a bogus observation

ed2e01c

ivg added this to the 2.1.0 milestone May 27, 2020

ivg requested a review from gitoleg May 27, 2020 22:10

ivg self-assigned this May 27, 2020

ivg added the symbolic-executor label May 27, 2020

gitoleg approved these changes May 28, 2020

View reviewed changes

ivg merged commit 9d8c51d into BinaryAnalysisPlatform:master May 28, 2020

gitoleg mentioned this pull request May 29, 2020

BAP 2.1.0 release ocaml/opam-repository#16548

Merged

gitoleg mentioned this pull request Jun 25, 2020

heap-overflow check generates wrong incidents locations #1152

Closed

ivg deleted the dev branch March 9, 2022 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds the symbolic executor #1105

adds the symbolic executor #1105

ivg commented May 27, 2020

adds the symbolic executor #1105

adds the symbolic executor #1105

Conversation

ivg commented May 27, 2020