Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Final 1.3 update #688
This PR brings multiple new features from our development branches. In general this PR focuses on three main features:
Note for reviewers and readers: I wouldn't expect you guys to check the code thoroughly, but you are welcome to ask questions, request clarifications, and discuss changes. Most likely just reading this PR message would be enough for the review.
Note: the Table of Contents doesn't work since Github doesn't emit anchors on the PR page. They will work (I hope so), when we will move this page to the release page.
Table of Contents
We added a type checker, and now force all BIL code to be well-typed. Lifters must produce only well-typed code. So far, we will issue a warning for each piece of code that is not, along the 1.4 we are planning to fix all issues if any. The
BIL Effect Analysis
A precise effect analysis was added to ensure correctness of program transformations. The effect analysis distinguishes between different effects and coeffects:
The raises effect includes the division by zero CPU exception and a conservative AI will infer a possible presence of the division by zero in a BIL expression.
BIL Simplification and Constant Propagation
The old constant propagation and simplification algorithm was buggy, and didn't take into account effects when removing statements and subexpressions. It was rewritten from scratch with an algorithm that is sound and parametrized by the set of effects, that may be ignored during the analysis. The new analysis treats operations accurately with respect to their algebraic properties and takes into account operations associativity. It is also now a big step analysis and no longer requires to be wrapped into the
BIL normal form and normalization utilities
We introduce a notion of a BIL Normal Form, that is a sub-language of BIL that disallows certain constructions, such as
SSA transformation plugin
This is a simple plugin that just transfers all functions into SSA form. It was moved from the bap-plugins repository.
Dead code elimination plugin
A sound analysis that will remove all unused code. This plugin is very useful, as it will remove lots of unnecessary code that are emitted by lifters. In particular, on x86 it removes flag compuations that are not used anywhere. As a result, this reduces the size of the output to 50% on x86 and makes BIL much more readable. It also highly increases the speed of interpretation. Especially, since many flag computations are quite computationally hard.
New pretty printer for bitvectors
Printers from the Z library proved to be slow, buggy, and ugly. Thus we decided to provide our own pretty-printer that will suit all the needs. A generic printer is added, that allows a user to customize the look of the textual representation of a bitvector. We also provide 9 different instantiations of the generic printer, e.g.,
New functions for bitvectors
We added the
We also added the
New pretty printer for BIL
We implemented a new pretty printer that (i) doesn't emit unnecessary parentheses anymore, and uses more concise syntax whenever it is possible. We followed the C language for the operator precedence (not OCaml). Here is the precedence table:
The load and store expressions are also simplifed, now it is not necessary to put the endiannes and size if an expression is byte sized.
Identifiable values for the Primus interpreter
We used the word type for values in the Primus interpreter, however, it showed that it is very hard to write a data tracking analysis without being able to identify values. For example, in the taint propagation plugin, we implemented a stack machine for tracking the taint. It is not only hard to maintain and understand, but also is inefficient. This defeats the main reason why we decided to use the word type as a value representation in the interpreter -- the efficiency.
We now represent a computation result as a pair of a word and a unique identifier. Basically the same as with the Bil.Result.t except that we are not using a variant type to represent storages and bottom values. In the Primus model, there are no such values, and everything is represented with a machine word, coined with an identifier.
To make it easier to work with the new values, we lifted most of the bitvector interface into the value interface.
New Primus interpreter
The Primus Interpreter is reimplemented and doesn't use the eval or microx code anymore (though there is still lots of code sharing since we factorized a lot of code from microx to the BIL library). The interpreter now tracks data much more closely, and provides an interface that allows code summaries, that are implemented in Lisp or in OCaml or in any other language to describe all code effect precisely. We also revisited observations that are made by the interpreter, and moved memory and environment observations into the interpreter.
New Primus Mark Visitied plugin
This plugin will mark each term that is visited (evaluated) by the Primus interpreter with the
Diagnostic messages with better backtraces
There is an impedance mismatch between OCaml exceptions and the error monad. Sometimes we switch from one to another and loose the backtracing information. Now the backtraces are always stored in
Caveats, Limitations, and Regressions
Fixes and updates for the x86 lifter
The shift operation representation was slightly incorrect, as it was using
The shift lifter also introduced unnecessary temporal variables, as well as used lots of ite. The former is removed, the latter is translated into one if statement (basically all ite were guarded by the same condition, so we can gather them, that will lead to much better CFG).
Bug fixes in ARM lifter
The ARM lifter was emitting badly typed code. Now it is fixed.
Support for tracing and other minor fixes
We added a handler for SIGINT so that gprof and other utilites, that requires a correct exit call, can work properly. We also added BAP_DEBUG=true to the Travis environment to ease the debugging on it.
Note, this is a work in progress, there are still a few minor issues that should be resolved before this branch would be ready for the merge. Here comes a list
Yeah, what you wrote there was (almost) exactly what I was suggesting. The main difference is that I'm not suggesting that you put all basic block addresses there, but instead the list you assumed it was possible to jump to when you did your dead code analysis, e.g. only the immediate successor nodes for the indirect jump. Hopefully this is usually a short list... We could even package it as a separate pass that you do before any analysis that assumes a complete CFG. This would get us two wins:
Yeah, @maurer, I think it would be a good idea to have this pass. The main concern is that it will by default clobber the output. We can alleviate this by rewriting the test like this:
It will be easier to analyse, and easier to eliminate clauses that are not feasible.
Yeah, looks nice, we will add this cfi-or-abort pass. But let's it do later, after the release.