Primus - the Microexecution Framework #651

Merged
merged 136 commits into from Apr 25, 2017

Conversation

Projects
None yet
2 participants
@ivg
Member

ivg commented Apr 24, 2017

This PR brings lots of new stuff to BAP, but the main contribution is Primus - the microexecution framework. It also adds few libraries, that are independent of BAP, but on which Primus depends, namely the Monads library, and the Ogre library. The PR also reimplements Beagle, and provides its internals as a library, as it will be explained later.

Primus

Primus is a framework for microexecution. The documentation will be added soon, so I will provide only short description. Primus consist of an Interpreter that can be extended with components. An interpreter together with the components (such as Memory, IO, Linker, Env) is called a Primus Machine. The interpreter itself is implemented as a mutistate monad transformer. The multistate monad is implemented in the Mondads library (described) below and is basically a State Monad that can have multiple states. In other words, the Primus Machine is a non-deterministic machine that can clone itself. The idea is to make it possible to solve NP-hard problems using polynomial approach by separating the concerns - a P-hard kernel, that is applied non-deterministically (in the Turing sense) to all possible clones of a machine. Putting it more simply, a default use-case is when an analyst is implementing the logic behind his analysis assuming that machine is deterministic and the execution is totally linear (thus totally ignoring the state explosion and path enumeration problems), and then chooses a particular mode of execution that will apply the analysis to some paths of execution.

Currently, the only mode of execution that is implemented (i.e., that is available out of the box from a command line) is a deterministic execution, that follows directly the small-step semantics of the language. Not particularly useful from the analysis perspective, it is mostly intended for verification and testing of the Primus framework, as well as the whole BAP as a white-box. In particular, we are able to execute a non-trivial binary, that uses dynamic memory allocation, string parsing, and IO (see our test-suite), and get the same output as the real execution would provide.

The primus linker can dynamically link code into our program abstraction. The linked code can be loaded from a real binary, stubbed with an arbitrary OCaml (and later with C or Python) program, that interacts with the Machine components directly, or, most commonly, the code can be implemented in the Primus Lisp. The Primus Lisp is a dialect of the Common Lisp, that basically can be seen as a concrete syntax for BIL. Thus the Primus Lisp is sort of an assembler for the Primus Machine. The easiest way to demonstrate the Primus Lisp is to show an example:

(defun strcpy (dst src)
  (declare (external "strcpy"))
  (let ((dst dst))
    (while (not (points-to-null src))
      (copy-byte-shift dst src))
    (memory-write dst 0:8))
  dst)

Since the abstract machine of the language matches with the primus interpreter, that is a CPU emulator, the Primus Lisp doesn't provide any abstract datatypes, instead its typesystem includes only a family of scalar types indexed with their bitwidth. Thanks to the macro system, as well as to different techologies borrowed from the elisp, such as function advisors, the language is still quite powerfull.

Monads Library

A new monad library is implemented as a part of the Primus effort. This libary doesn't depend on BAP, and provides a set of monad transformers for most (if not all) common Monads. The library provides a rich interface in the Core style, for example, it provides List and Seq modules that contain the corresponding container interfaces lifted into the monad. The old monads and monads transformers are deprecated and will be removed in BAP.2.0.0. To use the new library, add open Monads.Std to the top of a module, and -pkg monads to the compilation string.

OGRE

Ogre is an Open Generic REpresentation - a library and a specification for the NoSQL style database. It was originally designed as a representation that can consume Elves and Dwarves, as well as the information provided by COFF and Mach-O. Eventually, it evolved to a NoSQL database engine, akin to JSON, except that it uses SEXP data representation, disallows recursive datatypes, and follows the Third Manifesto Ideas, trying to be as close to the first order predicate logic, as possible. Unlike other contributions, this library is documented rather thoroughly, so an interested reader may proceed directly to the ogre.mli file.

The Bap_string library

The Bap_strings library provides several algorithms on strings that are useful in binary analysis and reverse engineering:

  • Strings.Detector - that uses maximum aposteriori likelihood estimator (MAP) to detect code that operates with textual data (aka Bayesian inference).
  • Strings.Unscrambler - that is capable of finding all possible words, that can be built from a bag of letters in O(1)
  • Strings.Scanner - a generic algorithm for finding strings of characters (a library variant of strings tool)

Strings plugin

The strings plugin implements a functionality of the conventional strings tool

Beagle

Beagle was totally rewritten in this PR. It now uses Primus framework, and relies on the Strings.Detector to dynamically detect strings with the requires probabilities of false positives and false negatives. It will be extended soon, with few new tricks.

Other

The run plugin, that actually takes a binary and runs it from an enty point, can be seen as a white-box test for the BAP framework as a whole. Any error in a lifter, or CFG reconstructor, or the Interpreter, will lead to an incorrect result, and, most likely to an immediate crash of the emulated program. Thus, during the development, we found and fix quite a few bugs.

ivg added some commits Jul 1, 2016

a first step to a better microxecution
* Removed trace from the Biri.context

   It will not store a trace in the context anymore. The trace was used
   only for evaluation of phi nodes, as we need to know where we came
   from.

   Since, phi nodes are rarely (if ever) occurs in real programs, we
   don't need to pay for them, if we don't use them. The default biri
   interpreter will evaluate all phi-nodes to unknown values. In case if
   it is needed we can implement an interpreter capable of evaluating
   phi-nodes under some enriched context.

* Added polymorphic `enter_term`/`leave_term` methods to the
  `Biri.context`. These methods by default provides lots of
  information to the context.

* Added `current` method to the `Biri.context` that evaluates to a tid
  of a currently evaluated term.

* Rewrote microx_conqueror

  The idea is to remove code from the context and interpreter and put it
  into reusable components. The derived classes must only dispatch
  information to these components. In particular, this patch will reify
  a traversing method and early escaping into first class values, that
  can be combined and reused. This is much better than reusing code with
  inheritance that is very fragile and non-modular.

** A trace limiter defines how and when a trace should be stopped. It is
   reified into `limit` abstract type, that is a first class value, that
   can be chosen dynamically and passed as a parameter to a
   context. Loop escaping mechanism, as well as trace bounding are
   implemented as instances of this type.

** Graph traversal policy is reified into a value of type `'s crawler`,
   that stores context of type '`s` as checkpoints. Three instances are
   prototyped:

    - deterministic - no checkpoints, only one path is executed;
    - exponential - every path is executed, leading to exponential blow up;
    - linear - only linearly independent paths are visited, leading to a
      linear complexity.

   Removed callstack tracking. No more special tracking of return
   statement. In non-deterministic mode we will rely on backtracking
   mechanism for escaping from dead ends like PLT entries. In
   deterministic mode we should provide PLT summaries for that or will
   stop.

* Small fix in the Table module. Some functions were accidentally
hidden.
generalized interpeters with a monad type
This commit will not touch the API, as bap.mli is remained
unchanged. The commit will serve as a testimony that the API is
unchanged, as all changes satisfy unchanged bap.cmi.

Ideally, we should make a CI test, that will check, that new cmis are
subsests of the old one (or that new implementations satisfy old cmis).
wip
just to make sure, that is stored somewhere else
added lot's of monad transformers
Option, List, Seq, Writer, Reader, State, Fun, Lazy, Cont

forgot to add Result...
added Result monad
Error and Reader monads now have Make and Make2 functors. The first will
create a monad with one type parameter, and will concretize state/error
to the value of type passed as a first parameter of
functor. Consequently, Make2 will create a monad with two type
variables, with the second ranging over arbitrary state/error types.
state monad now also has Make and Make2
only Cont is left, but I'm afraid of her...
implemented MultiState monad
a State monad with multiple states.
moved error into separate module
started implementing interpreter.
factored observation out of machine
the observation type, and the statement mandate do not depend on
the context itself.
implemented interpreter
it should be possible to implement tainter based
on this interpreter
implemented virtual memory interface
except the brk setter/getter. Neet to rethink how
much control I'm ready to give away...
rearranged the library
gathered everything under the std umbrella
fix a bug in the IR lookup function
The caching will break, after term is transformed,
this fix, will relax the requirement, and actually
will do the lookup more efficiently.
factored out components state out of machine monad
So that now we will create state keys once and only once.

ivg and others added some commits Apr 12, 2017

fixes few storage issues in Ogre and adds docs
The signature was stored and in the reversed order, so all tuples were
written in a reverse. Also fixed an issue with double quoting of string
parameters.
fixes an empty row query
if a row is empty we shouldn't even try to access to the columns.
adds more docs and few changes to the interface
We now more or less distinguishing between an error that denotes that a
document is not well-formed and a system error. They should not be
merged. Currently we're raising an exception in case of the latter, this
however brings us into a world of doomed who raises an exception from a
function that returns a result. Probably, in the future we will switch
from the Or_error.t, to some error type, that distinguishes between
different kind of errors.

Note, so far the documentation describe the desired behavior, not the
actual one. We still have some work to do there (actually type-checking
is not implemented).

This commit also adds a [merge] function, that takes two documents and
merges them.
adds attribute values typechecking
each added attribute must have correct arity and value representation.
adds a self-join operation
This commit makes it possible to select the same attribute multiple
times (i.e., to make a self join). It also provides a new kind of
variables - the positional variables. A string-like subscript is used to
create such variable, e.g., `fld.[n]` will bind a field [fld] of the
[n]'th attribute in a selection, e.g., to find all different students
with the same name:

```
      {[select (from students $ students)
          ~join:[[field name]]
          ~where:(id.[0] <> id.[1])]}
```

This commit also fixes a bug, that was triggered when an empty attribute
was selected as a part of a bigger selection.
a bunch of query optimizations
Got a ten-factor speedup, but there is still a room for improvement.
switches to a normalized form to fasten comparison
this commit adds an equality operator that is applied to fields without
parsing them. This removes a constant factor to less than a microsecond
per entry. The algorithm is still exponential and no query optimization
(e.g., reordering and filtering before product) is done. This the field
of the future improvements.
Ogre.foreach no longer requires a monadic result
Now a user of the `foreach` function can choose between monadic result
and immediate one. For the former, an `Ogre.Seq.all` function can be
used to move the monad out of the sequence.
Merge pull request #23 from gitoleg/microflux
added description for image backends in a Ogre format
moves image module to the OGRE rails
Image no longer uses the backend abstraction, except for the legacy
backends, for wich we derive an ogre specification from the backend.
passes both virtual and physical sizes to loader
This commit adds a small workaround for the legacy loader, with which we
can pass both the virtual and the physical size of a segment. The idea,
is that for each segment in a file, we create the same named section,
that has the same base address, but the size field is filled with the
virtual size of a segment. The solution is backward compatible with the
legacy backend behavior, hence if the virtual section is not provided,
the physical size will be used.

Implementation
--------------

1. On the backend side, a vsize field was added to the segment data
structure. The old trick with negative offsets is removed, instead both
sizes are stored in the segment data structures. After an image is
created, for each segment we add a section that has the same name and
base address, but uses the vsize field as the size.

2. In the Image module, when a mapped region is created we search for
the section that has the same name and base address as the segment, that
we are constructing. If such section is found, then we use its size as a
segment size, otherwise we fallback to the size that is provided by the
segment.

P.S. All tests are passing! (At least on a machine with IDA).
adds explicit casts for segment offsets
as clang emits an error without them.
removes dependency from the kasprintf function
this function is not available in OCaml 4.02.3 or less.
adds libc runtime detection heuristics
Hacky and defnitely at the wrong place but works. On the other hand, I
would be happy to remove it.
beagle et alas cleanup
now we are ready for the merge
@ivg

This comment has been minimized.

Show comment
Hide comment
@ivg

ivg Apr 25, 2017

Member

since it is nearly impossible to review such a big commit, I will merge it without a review. However, comments are still welcome.

Member

ivg commented Apr 25, 2017

since it is nearly impossible to review such a big commit, I will merge it without a review. However, comments are still welcome.

@ivg ivg merged commit 3fb406e into BinaryAnalysisPlatform:master Apr 25, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ivg ivg deleted the ivg:primus branch Mar 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment