Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds the symbolic executor #1105

Merged
merged 142 commits into from
May 28, 2020
Merged

adds the symbolic executor #1105

merged 142 commits into from
May 28, 2020

Conversation

ivg
Copy link
Member

@ivg ivg commented May 27, 2020

Finally, BAP can boast with its own symbolic executor! The executor
itself comprise a rather small part of this PR (roughly 25%, or 1kLoC)
the rest of the PR includes necessary changes that enable the symbolic
executor.

Before we delve into details, let's give a high-level picture of the
symbolic executor. It is a plugin (no library interface at this
release) which when enabled will track all inputs and compute their
symbolic values. Inputs are either automatically identified or
explicitly set using Lisp primitives, symbolic-value and
symbolic-memory. Two more primitives, symbolic-assert and
symbolic-assume could be used to specify verification conditions and
further refine symbolic models of various subsystems (aka symbolic
stubs). In addition to the symbolic computer we have the symbolic
executor itself that computes inputs trying to reach all parts of the
program. The algorithm is rather simple, we follow the concrete
execution path, and on each condition evaluation we enqueue a task
that refutes it. Each location is tried up to the certain number of
times, set with --primus-symbolic-executor-cutoff command-line
argument.

To get the idea of the symbolic executor capabilities you can look at
our current testsuite (see testsuite/src/symexec-*.c files), for
example this test is the hardened version of angr's fauxware (the
original version was too easy to break). In this test (as well as in
all others we can reach all parts of the program and compute inputs
that will lead to them, including the SOSNEAKY password).

Since the symbolic executor is driven by the concrete executor we rely
on the state randomization and Primus generators a lot. We implemented
a new primus-random plugin with a sophisticated interface that allows
specifying generators for registers, memory regions, and function
values. We also rewrote our random generator completely from scratch
to prevent unecessary aliasing. Another important addition is that now
we can generate words of abitrary sizes (previously we could generate
only bytes).

One of the significant limitation (if not to say a bug) of the Primus
Lisp primitives was that many operations were opaque and not
observable via the Primus interpreter observation interface, therefore
the symbolic computer wasn't able to track values to their origins. We
rewrote signficant number of primitives and expressed them in terms of
the Primus interpreter. We also added ite, branch, and repeat
operations to the Primus interpreter so that the primitives author can
express directly structured control flow. As the result, all Primus
Lisp control operators are correctly translated into path constraints,
so Primus Lisp could be now seen as an advanced interface to SMT with
theory that includes branches and loops, e.g.,

(defun main ()
  (declare (external "main"))
  (symbolic-assume (> RDI 0))
  (symbolic-assert 'a1 (= RDI 0))
  (symbolic-assert 'a2 (not (= (+ RDI RDI) 0)))
  (symbolic-assert 'a3 (not (= RDI 0)))
  (when (< RDI 16)
    (symbolic-assert 'a4 (= RDI 0))
    (symbolic-assert 'a5 (not (= (+ RDI RDI) 0)))
    (symbolic-assert 'a6 (not (= RDI 0)))))

Besides, there is now the eval-lisp command so you can run the above
code with

bap eval-lisp main --primus-lisp-load example --system bap:symbolic-executor --primus-print-obs=assert-failure,call
(call (main))
(assert-failure (a1 (RDI #x0000000000000001)))
(assert-failure (a2 (RDI #x8000000000000000)))
(assert-failure (a4 (RDI #x0000000000000002)))
(call (main))
(assert-failure (a1 (RDI #x0000000000000002)))
(assert-failure (a2 (RDI #x8000000000000000)))

Notice that the function was evaluated twice, since it has two paths.

Since the Primus Lisp interpreter and Primus Lisp primitives are now
using the main Primus interpreter for all operations and our Primus
Lisp type system is less strict than the Bil type system we decided to
relax the interpreter and injected a few coercions here and there. So
now it is ok to add two words with different sizes, the interpreter
will just coerce the to the size of the bigger word, and so one with
other operations.

While evaluation the new symbolic executor on various tests we added
many new stubs that handle various posix operations. We also
implemented the symbolic version of stdio, so now input files and
streams are treated symbolically. To enable reuse of the concrete
versions of function stubs we added an ability to overload functions
based on the system and used components. (We always had the
overloading facility, which was used to provide different
implementations for different ABI and architectures, now we added an
ability to overload based on the components that comprise the system
or system itself), for example adding

(declare (context (component bap:symbolic-computer)))

to a definition will make it applicable only in the context of a system that
has this component.

The symbolic executor uses z3 underneath the hood and is limited to
QF_FD logic. One of the drawbacks is that z3 requires custom runtime,
so baptop is currently broken. But we already fixed this issue
upstream, see Z3Prover/z3#4468, and backported
this patch via opam-repository to 4.8.8 so that we can use the latest
released version without having to wait for 4.8.9 (see
ocaml/opam-repository#16535).

Below is the summary of changes. If you have any questions about them
feel free to ask for clarifications. The commit messages are also
usually quite informative. As always reviews and comments are very
welcome.

List of changes:

  • adds support for modern C runtime
  • adds support for stack canaries
  • adds the default prototype
  • prettifies error printing (prunes backtraces)
  • adds the fini operation to System.run
  • adds ite, branch, and repeat Primus operations
  • complete rewrite of Primus generators (wide generators)
  • completely overhauls the random generators implementation
  • new primus-random plugin that controls Primus randomness
  • adds Primus.Env and Primus.Memory generated observations
  • extends the Priumus.Memory interface
  • extends the Primus.Env interface
  • new less heavy interface for Lisp primitives
  • relaxes Primus Interpreter typing rules
  • makes Primus Lisp interpreter more transparent
  • adds static and global variables to Primus Lisp
  • fixes Lisp msg operator
  • enables overloading based on systems and components
  • adds the Primus.Memory.add_region function
  • allows Primus execution from any basic block
  • makes values unique across different machines
  • adds the primus track visited library
  • adds the primus symbolic executor plugin
  • adds symblolic IO system
  • adds many new stubs
  • extends Primus Lisp's Dictionary interface
  • adds the default limit to Primus Limiter
  • reimplements Primus Lisp memory allocator (malloc)
  • adds the symbol-of-string primitive
  • adds the eval-lisp bap command
  • fixes the multisystem run observation subscription
  • splits the promiscuous mode into subcomponents
  • adds new primitives to the Primus region library
  • tweaks the core systems
  • fixes memcheck-malloc on strn* operations
  • adds incident deduplication
  • adds x86 non-standard registers intialization
  • extends the run plugin

ivg added 30 commits May 22, 2020 16:02
Generators now are able to produce objects of random width modulo that the
random value underneath the hood is still an integer.

TODO: drop the width parameter from the byte generator.
We're now using a much more efficient MSG algorithm with
the implementation that uses the new Bitvec module, so that
we can now generate arbitrary number of random bits and specify
any range we like.
it produces lots of false positives due to unwanted aliasing.
It uses malloc to allocate a place for the register and lisp
parameters to store the location.

We should also consider clearing errno as soon as possible as it is
the source of aliasing and lots of UAF and DF false positives.
basically, the idea is that each function that defines a buffer or a
string (e.g., malloc, strdup, memcpy, etc) imposes constraints on the
buffer that we may track. For example, if we break the bounds of the
strdup generated string it could be a soft alarm.
it is just a warning, there is no need to overreact
coerces boolean operations to bitvectors (we will keep booleans only
for actual logic formulas, i.e., for meta logic - to answer questions
whether a path is feasible and so on).
Now each generator will generate its own sequence. Note, we still
need to create a separate generator for each independent data location.
TODO: document it
TODO: publish bitvector interfaces
this plugin will control all the random generators... as much as they
could be controlled
ivg and others added 21 commits May 22, 2020 16:02
Not only it is possible to enable/disable it, but it is also now
possible to provide alternative competing runtime support without
modifying the existing code.

However, the main motivaiton to move it was the desire to get access
to the project data structure that enables a much more robust runtime
identification that works even in the absence of symbols.
fixes a type error in strcat

fixes an error in strncat
1) the greedy-promiscuous-executor was still mentioned by other
systems
2) yet another type error in strings
The convention is for each concrete feature `xxx` that needs a symbolic
counterpart to provide the `symbolic-xxx` feature.

The stdlib is right now empty we will start to fill it in with the
stuff after the release.
It is the same as the old `allocate` (which is still available, no
worries), but takes two words instead of one word and int, which makes
it possible to create regions whose size do not fit into OCaml int,
which is quite possible during execution in a randomized mode.

And makes the code much eaiser to write as now we don't need to deal
with conversion errors.
it will no longer fail on big numbers
so it is possible to map the whole 2^64 memory in one run
@ivg ivg added this to the 2.1.0 milestone May 27, 2020
@ivg ivg requested a review from gitoleg May 27, 2020 22:10
@ivg ivg self-assigned this May 27, 2020
@ivg ivg merged commit 9d8c51d into BinaryAnalysisPlatform:master May 28, 2020
@ivg ivg deleted the dev branch March 9, 2022 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants