Commits on Dec 11, 2018
  1. makes dead code elimination less conservative (#906)

    ivg authored and gitoleg committed Dec 11, 2018
    * maked dead code elimination less conservative
    As was noted in #905 the optimization passes sometimes misses an
    opportunity to remove obvious assignments. The underlying reason
    is that we compute a very conservative set of variables which are
    used for interprocedural communication. And if a variable is a
    member of this set then we do not delete it (under assumption, that
    it could be used in other subroutines).
    What was overconservative is that all versions of the same variable
    were protected, while only the last definition one could be used for
    interprocedural communication. The propsed change relaxes the
    protected set, and removes only the last version of a variable from
    the set of the dead variables.
    Fixes #905
    * makes it sound
    Unfortunately, the original proposal was unsound, as the assumption
    that only the last definition should be preserved is implied by the
    assumption that only the last definition is used for interprocedural
    communication, which is totally wrong.
    The liveness property is propagated from each call (and indirect jump)
    to all members of the protected set. We're using the SSA form for
    def-use analysis which is inherently intraprocedural, so to preserve
    the liveness of the variables with the interprocedural scope we use
    the protected set as an overapproximation.  A tighter approximation
    would be to reinstantiate their liveness after each call or indirect
    jump. However, this would require tampering with the ssa algorithm,
    that we don't want to do right now.
    The updated solution reinstantiates liveness of protected variables
    at the end of each block, even the block is not a call or even jump.
    The liveness is killed by the first use of protection, so it is
    propagated only to the last (if any) definition.
    * removes a dead function
    note: this PR shall be merged after #907 is merged
    * drops debug info
    * drops a change that is irrelevant to the proposal
    to make it cleaner
  2. fixes the free-vars-by-dominators computation algorithm (#907)

    ivg authored and gitoleg committed Dec 11, 2018
    This algorithm kicks in in the optimization-level=3 and ... breaks
    everything. The bug was trivial, the kill set was computed before
    the live set, hence a block was defining its own variables before it
    was used.
    This bug affects only 1.5 version with optimization-level=3.
    Note, there is still a subtle problem with this algorithm that comes
    from the "Everything Dominates Unreachable Block" lemma in the
    dominators algorithm. And altough this theorem is valid wrt to the
    graph theoretic definition of unreachable, which indeed implies that a
    block is not reachable under no circumstances, in the binary analysis
    an unreachable block is just a block that might be reachable we just
    don't know by which paths. In other words, we have an
    under-approximation of a control flow graph, and the unsoundness of
    the graph is propagated to the unsoundness of the dominator tree
    computation. That lies more or less in the vein with the idea of the
    optimization-level=3 which is as as unsound as the underlying graph.
Commits on Oct 10, 2018
  1. fixes the --list-recipes option (#888)

    ivg authored and gitoleg committed Oct 10, 2018
    it was broken since after we enabled absolute path loading
Commits on Sep 18, 2018
  1. adds constant tracking analysis (#872)

    ivg authored and gitoleg committed Sep 18, 2018
    The analysis tracks static constants in a binary, i.e., values
    are hardcoded in a program.
    For the purposes of the analysis a value is a static constant if
    it was initialized from a constant value or computed from a static
    constant value.
    The constant tracker analysis can be called from Primus Lisp using
    the `all-static-constants` and `points-to-static-data` functions.
    See `--primus-lisp-documentation` for the documentation on this
    Along with the constant tracker this PR adds some sample analysis.
    For example, check-hardcoded-values reports the
    `hardcoded-socket-address` incident every time a hardcoded socket
    address is used in one of the BSD Socket API function.
    Another analysis is implemented in the `check-null-pointers` module
    that reports a `null-pointer-dereference` incident every time a static
    NULL pointer is dereferenced.
Commits on Sep 13, 2018
  1. provides information sensitivity analysis (#871)

    ivg authored and gitoleg committed Sep 13, 2018
    The analysis is based on the Primus Taint Engine and is fully
    implemented in Primus Lisp and consists of two modules:
    - taint-sources: is a module that automatically detects untrusted
    user inputs and correspondingly introduces taint into the system;
    - sensitive-sinks: monitors functions that are considered sensitive
    to untrusted/unsanitized user inputs, and reports policy violations
    if such functions are called with tainted arguments.
    It is possible to specify arbitrary sanitization procedures (and in
    general extend the policy with your own rules).
    The code is thoroughly documented, so it should be more or less
    obvious on how to use it.
Commits on Jun 8, 2018
  1. adds traps for memory and division operations (#840)

    ivg committed Jun 8, 2018
    * adds traps for memory and division operations
    Division by zero as well as faulty memory operations may terminate the
    Primus Machine. Previously we were just terminating it with a specific
    exception. However, this exceptions should actually be represented by
    an CPU/ABI-specific interrupt or trap. The proposed implementation
    provides a mechanism that allows:
    - trap the exception and translate it to a machine specific
    - ignore or fix it;
    - catch it on a low-level.
    The mechanism is based on the same idea as in the Linker that used the
    primus_unresolved_handler to trap unresolved names. For each trap we
    provide a corresponding observation, that could be used to install the
    trap. The trap itself, is a special name, that could be linked (either
    on a permanent basis, or from the observation). If the trap hanlder is
    assigned, then it is invoked. Concrete behavior depends on a
    particular trap, e.g.,
    - for linker trap - the hanlder is invoked instead of the missing code;
    - for the division by zero - if the handler returns
      then the result is undefined;
    - for memory fault - the trap should fix the problem or terminate.
    We also introduce the Pagefault exception, to represent memory
    traps. We keep segfault as a non-maskable (non-preventable)
    In addition, we have provided several new operations in the Linker
    - unlink: for code unlinking, useful for removing traps
    - lookup: useful for restoring other's traps
    As a showcase, we also reimplemented some parts of the promiscuous
    mode. Now we use the pagefault trap to prevent segmentation
    faults. Also the fixing is more efficient as instead of mapping one
    byte pages, we are mapping pages with size of up to 4k.
    Besides others, this commit will also provide a fix for #839.
    * fixes a typo in the pagefault handler function name
Commits on Jun 4, 2018
  1. Several primus fixes (#841)

    ivg committed Jun 4, 2018
    * fixes the order of the trace events
    previously we were leaving a function after the first block (and all
    other operations attached to the function leaving event were performed
    after the entry block)
    * fixes the kill operation in the Machine implementation
    The kill operation is a hard beast, as it is very hard to suicide,
    i.e., to kill the current machine. It is also hard to routinely kill
    other machines in the scheduler, as usually during the rescheduling we
    are in the context of a machine that we would like to kill, and
    switching to another machine will take control from us.
    Previous implementation of kill was just wrong, it was killing the
    state without switching the continuations. In case of the self-killing
    it basically was switching the state while leaving the machine in the
    same continuation.
    The proposed implementation, delays the machine killing if it is the
    current machine, and performs it on a next fork.
    Note, an easier to use (and to implement) primitive, would be [die x]
    that will kill the current machine and tranfer the control to the
    machine [x]. However, we can't add new primitives to the Primus
    Machine interface without bumping the version. So let's keep it for
    the future Primus 2.0
    * few documentation updates (totally irrelevant to Primus)
    sorry for putting it here, just don't want to create a separate PR for
    such a minor stuff
    * adds a couple of new functions to the posix interface
    * prevents memory leaking by killing finished machines
    Previously, we were waiting until the end and didn't kill any
    The proposed code will kill finished machines in the greedy
    scheduler. This saves tons of gigabytes of memory.
    * fixes a bug in the mark-visited plugin, adds progress reports
    The progress report will show the percentage of visited basic blocks,
    basically the code coverage.
    Also, previously marker was broken as it was marking all terms.
    * removes the memory corruption check
    memory corruption check was verifying that a freed pointer was
    allocated. In real life it has an incredible amount of false
    * fixes a bug in the primus's warn-unused analysis
    also extends it a bit.
    * adds machine numbers to the incident backtraces
    in general it is very useful to see where the backtrace changes the
    machine id, in case of the greedy scheduler and promiscuous mode those
    are exactly the points where infeasibility could be introduced.
    * changes the order in which machines a run by the run plugin
    When all-subroutines are selected, then we're running them in the
    topological order, starting with those functions that are marked as
    entry points (usually the _start function in case of an executable).
    Also added an option to run each machine in isolation (i.e., in a
    separate computation), might be useful for someone.
Commits on Apr 30, 2018
  1. a few fixes to primus et al (#831)

    ivg committed Apr 30, 2018
    * renames eval_cond to eval-cond
    All our observations, plugings, options, passes attributes, and other
    textual identifiers use lisp-style dash separated notation, so
    `eval_cond` looks very inconsistent.
    * makes parameter evaluation visible
    So far we were bypassing the interpreter when Lisp parameters were
    evaluated. However, we would like to see them as first-class
    In particular it helps us to track which values are used undefined.
    * adds a generic taint-attached observation
    this observation occurs every time a taint is attached to a value.
    * fixes the compare macro in Primus Lisp
    not that the macro was wrong, but it was not doing what was
    expected. The old version had type `'a -> 'a -> 'a`, i.e., when
    it is applied to a byte it will return a byte, and so on. This made
    using this macro a little bit hard, as we need to apply casting as
    usually we need just int, so our new type is `'a -> 'a -> int`.
    * propagates branch computations in the promiscuous mode
    This is actually a major breakthrough, as this issue was preventing
    taint analysis to work in promiscuous mode as it was killing all
    taints on the branch conditions.
    The long story - in the promiscuous mode we're flip-floping branch
    conditions to reach as much code as possible. However, since we're
    just assigning `1` to the branch condition we loose all taints
    associated with the existing condition (or any other information that
    other analyses can add).
    The solution is to derive the new value from the old one. I'm using
    `1 \/ c` if we need to assign `1` and `0 /\ c` if we need to assign
    * changes the machine spawning algorithm in the run pass
    The previous implementation was spawing machines by unhierarchical
    depth-first order and was essentially creating 2^n resumptions.
    The new aproach uses BFS, is linear, and, more important, is easier to
    reason about.
    * fixes a typo in the sign macro
Commits on Apr 12, 2018
  1. intoduces a dependency between callsites and abi passes (#822)

    ivg committed Apr 12, 2018
    I was really reluctant since the very beginning to introduce this
    dependency, since we shouldn't depend on implementations but on
    abstractions, but so far it looks like the easy solution, that, I
    hope, won't last too long.
    The Problem
    `bap ./exe --taint-reg=malloc_result` may or may not work, depending
    on the order of pass invocations. The root of the problem is that the
    taint plugin depends on the callsites plugin, that requires the
    argument information to be present in the subroutine
    definitions. Since, the `abi` pass is an autorun pass it is aplied
    before all other passes, including the `callsites` pass. However,
    since for the better user experience, (haha) we decided to mark
    the `taint` pass as an autorun pass, it is also run in the autorun
    phase along all of its dependencies. So, it is quite possible that it
    will be run before the abi pass.
    The Solution
    I've added the dependency from the `abi` pass and altough I'm
    insisting that we shouldn't depend on implementations, but on
    abstractions, we don't have a better solution. Also `abi` pass
    became an abstraction de-facto in bap, as it is actually a pass
    that has its own pipeline of passes. We may devise a better solution
    later, as currently @gitoleg is working on a generic solution with
    services and service providers that will allow us to build right
    dependencies between our analyses.
Commits on Apr 9, 2018
  1. enhances and documents recipes (#815)

    ivg committed Apr 9, 2018
    In case if you happen to know already what recipes in bap are, then
    the main chainges are summarized below:
    - the recipe is no longer requred to be a zip file, plain files, or
      directories are now accepted
    - the recipes interface is now documented and a couple of issues were
      fixed during the process
    - filenames are no longer substituted, however there is now a builtin
      parameter `$prefix` that is substituted with the path to the recipe
      root folder.
    For the rest, I'm copypasting excerpts from the bap man page, that
    describe the recipe system in detail as well as few motivational
           A recipe is either a single file or a directory (optionally zipped)
           that contains a parametrized specification of command line parameters
           and support files if necessary.
           The purpose of the recipe is to make bap runs reproducible, so that you
           can share one simple file - the recipe, instead of a bunch of scripts
           together with some verbal instructions. Since recipes introduce an
           extra layer of abstraction they actually simplify interaction with bap
           by hiding unnecessary details. Recipes also provide a mechanism for
           building ready solutions and sharing them with users.
           To use a recipe, just specify its name using the --recipe command line
           parameter. If a recipe has parameters then they could be specified as
           colon separated list of <key>=<value> pairs. See the --recipe parameter
           for more information. To read the recipe description, including the
           list of parameters and resulting command line, use the --show-recipe
           command line option. To list all installed recipes use the
           --list-recipes option.
           The main (and the only necessary) part of a recipe is the recipe
           specification, that is a file that contains a list of recipe items in
           an arbitrary order. Each item is either a command line option, a
           parameter, or a reference to another recipe. All items share the same
           syntax - they are flat s-expressions, i.e., a whitespace separated list
           of strings enclosed in parentheses. The first string in the lists
           denotes the type of the item. E.g., (option run-entry-points malloc
           calloc free).
           The option command requires one mandatory parameter, the option name,
           and an arbitrary number of arguments that will be passed to the
           corresponding command line option. If there are more than one argument
           then they will be concatenated with the comman symbol, e.g., (option
           opt a b c d) will be translated to --opt=a,b,c,d. Option arguments may
           contain substitution symbols. A subsitution symbol starts with the
           dollar sign, that is followed by a named (optionally delimited with
           curly braces, to disambiguate it from the rest of the argument). There
           is one built in parameter called prefix, that is substituted with the
           path to the recipe top folder. See the parameter command to learn how
           to introduce parameters.
           The parameter command introduces a parameter to the recipe, i.e., a
           variable ingredient that could be changed when the recipe is used. The
           parameter command has 3 arguments, all required. The first argument is
           the parameter name, the second is the default value, that is used if
           the a parameter wasn't set, and the last argument is the parameter
           description. The substitution symbol will be replaced with the default
           value of a parameter, if a value of the parameter wasn't passed through
           the command line. Example,
               (parameter depth 128 "maximum depth of analysis")
               (option analysis-depth depth)
           If the parameter is not set through the command line, then it will be
           substituted with 128 otherwise it will receive whatever value a user
           has passed. Finally, the extend command is like the include statement
           in the C preprocessor as it includes all the ingredients from another
           recipe. (Make sure that you're not introducing loops!). The command has
           one mandatory argument, the name of the recipe to include.
               recipe ::= {<recipe-item>}
               recipe-item ::= <option> | <parameter> | <extend>
               option ::= (option <atom> {<atom>})
               parameter ::= (parameter <atom> <atom> <atom>)
               extend ::= (extend <atom>)
    And here comes the description of the `--recipe` command line
               Load the specified recipe. The  parameter specifies the name of the
               recipe along with an optional colon separated list of arguments.
               The  parameter has the following syntax: <name> : <arg1>=<val1> :
               <arg2>=<val2> : .... The <name> part could be either a path to a
               valid recipe or the name of a recipe, that would be search in the
               recipe search paths. A name may omit .recipe or .scm extensions for
               brevity. The valid representations of a recipe is either the recipe
               file itself (i.e., a file consisting of a list of s-expressions), a
               directory with a valid undefined file, or zip file, that contains a
               valid recipe directory. See the RECIPES section for more
               information. If a recipe is a zip file then it will be unpacked to
               a temporary folder that will be removed after the program
               terminates. Otherwise, all recipe resources will be used as is and
               won't be restored to their previous state, e.g., if the
               input/output files were provided, and they were modified, the
               modifications will persist.
    This is how the `--list-recipes` option works:
    $ bap --list-recipe
    uaf                              Performs Use-After-Free Analysis
    fuzz                             Testing Recipe
    derive-inline                    Description was not provided
    direct                           Testing Recipe
    And this is how a description of the recipe is printed:
    $ bap --show-recipe uaf
    Performs Use-After-Free Analysis
    Uses the Primus promiscuous mode and fuzzes a binary through the specified
    $entry points with the maximum depth set to $depth RTL instructions. Outputs
    all found Use-After-Free incidents into the `incident` file.
    - depth - maximum number of instructions per path (defaults to 8192)
    - width - maximum number of visits to the same block (defaults to 128)
    - entry - entry points (defaults to all-subroutines)
    --run \
    --run-entry-points=all-subroutines \
    --primus-greedy-scheduler \
    --primus-limit-max-length=8192 \
    --primus-limit-max-visited=128 \
    --primus-promiscuous-mode \
    --primus-lisp-load=posix,memcheck-malloc,limit-malloc \
    --primus-print-obs=incident,incident-location,call-return \
    --primus-print-rules=/tmp/recipe-943c0231-491a-43a5-86de-e6c2c662396f.unzipped/select.scm \
    --primus-print-output=incidents \
    --primus-lisp-channel-redirect=<stdin>:/tmp/recipe-943c0231-491a-43a5-86de-e6c2c662396f.unzipped/stdin,<stdout>:/tmp/recipe-943c0231-491a-43a5-86de-e6c2c662396f.unzipped/stdout \
    --report-progress \
  2. fixes an incorrect store operation introduced by #806 (#816)

    ivg committed Apr 9, 2018
    * fixes an incorrect store operation introduced by #806
    It was possible to store a word of arbitrary width in our byte
    storage. It is fixed now, and I've also updated documentation and
    added few sanity checks.
    * fixes the endianess bug in the memory primitives
    even for one byte write the endianness matters, when the argument is
    not a byte.
  3. fixes a few bugs in Primus (#818)

    ivg committed Apr 9, 2018
    * skips evaluations of the mem variable in the store operation
    since this variable doesn't have any meaningfull value
    * fixes the cleanup procedure after term evaluation
    in case of abnormal termination the cleanup procedure wasn't called.
    * ignores return statements
    this is a big change actually, and we will return to it later, to make
    it more robust.
    If we won't ignore them, then we will have double returns, the first
    one when a function is returned based on the IR return statement (if
    such is present and is well defined) and the second one, when function
    exec finishes and we finally call the return destination. That's was a
    disaster, and this is the fast solution. I will evolve it later
    * adds machine-switch and machine-fork observations
    also simplifies swith and fork operators, for some reason we were
    holding a machine identifier in the stored continuation, probably
    this is a reminiscent of the debugging stage, in any case we dont
    need it anymore.
    * halts machine after switch ion the greedy scheduler
    it shouldn't have any observable effect, though it is better to make
    it explicit.
    * disables call bypassing
    When we make a call we add a failsafe machine that will resume the
    computation in case if the call doesn't return. We shouldn't use
    this failsafe machine if everything went allright.
    * handles empty blocks with branches correctly
    * few cosmetic changes
    * fixes the parameter order in linked lisp stubs
    One of the previous commits [1] in PR #798 was claiming that we do not
    need to preserve the order of the arguments, so we can provide a more
    efficient frame allocation.
    Apparently, we do need to preserve the order, at least of the proper
    signaling the `call` and `call-return` messages. This commit reversed
    the order of the arguments, since they were reversed.
    [1]: 6d5afcd
    * fixes strncpy, strncmp and adds strpbrk summaries
    * parametrizes memory allocator with the zero-sentinel
    that's a value that is returned when malloc is called with the zero
    * implements a buffer overflow check
    This check is a part of the memcheck check suite and it verifies that
    functions from the string.h API are called correctly. The correctness
    property is the following, if the pointer to the begining of a string
    that is passed to the function, belongs to some heap region, then the
    pointer to the end of this string must belong to the same heap region
  4. hardens IDA plugin (#817)

    ivg committed Apr 9, 2018
    To run IDA we cd to a newly created temporary folder. If IDA fails,
    then we end up there. This not only breaks the rest of BAP, but
    produces extremely confusing error messages, such as `bap ./exe` will
    produce `Sys error: ./exe no such file or directory`.
Commits on Mar 28, 2018
  1. changes the Primus memory representation to preserve value identity (#…

    ivg committed Mar 28, 2018
    Historically, our implementation of the memory storage was a mapping
    from bitvectors to bitvectors, thus any store operation was forgetful
    as the identity of a value was discarded (where a value in Primus is a
    tuple of data and unique identifier). For a long time we were
    considering this as a feature, articulating that `x` and `load
    a (store a x)` are two different values, even though, they are usually
    structuraly equal. However, it was also observed, that in many cases
    during the analysis we would like to preserve this identities, thus we
    were forced to maintain our own mappings between words and value
    identities in this or that way.
    The proposed solution should satisfy both camps, those who believe
    that `x` and `load a (store a x)` are in general different, and those
    who say, that in practice they are equal as in Leibniz equality. The
    memory component will now store values instead of bitvectors, so it
    won't loose any information. At the same time components may define
    semantics of the load/store operations as they wish, e.g., to
    implement a mapping of control registers to memory, and other
    non-trivial memory representations and mechanisms.
    By default, however, the interpreter will ensure that `load a (store a
    x)` and `x` has the same idenity if `x` is a byte. If `x` is a word of
    an arbitrary length then it will be projected into several bytes with
    the extract operation during the store operation, and assembled back
    with the concat operation durintg the load operation.
Commits on Mar 27, 2018
  1. increments the step number on each iteration (#802)

    ivg authored and gitoleg committed Mar 27, 2018
Commits on Mar 19, 2018
  1. Primus Lisp enhancements (#798)

    ivg committed Mar 19, 2018
    * publishes the interrupt operation
    * provides jumping observation and resolve_xxx functions
    The jumping observation is made just before a jump is taken and has
    two parameters: jump condition and jump destination. This is a much
    more precise and useful observation than the `{enter,leave}-term`
    one, as it doesn't loose the origin of values that define the
    jump (i.e., we can track (taint) both the condition and the
    We also added three functions to the Linker interface, mostly for
    convenience. These functions resolve code names into addresses,
    symbolic names, and tid.
    * rectifies the primus-lisp plugin
    So far the plugin loads some stuff during the link time, some after
    the configuration is available, and the rest during the analysis phase
    when it is invoked as a pass. It may even be invoked  twice.
    Now it is rectified. Everything that doesn't require an access to the
    project data structure is loaded automatically just after the
    configuration phase has finished. The rest (e.g., linking with the
    program) is done during the analysis.
    * adds an optional fill parameter to the allocate-memory primitive
    if the parameter is not set, then the allocated memory chunk is not
    filled (i.e., filled with random values), if it is set, then we fill
    the chunk with the specified value.
    The motivation, is to make calloc more efficient.
    * adds the parameter definition abstract representation
    * adds parameters to the program data structure
    and applies the reindexer to all runtime definitions.
    * prints methods and parameters when printing a program
    previously only definitions were printed
    * adds defparameters form to the parser specification
    * modifies defconstant grammar to fit with C. Lisp and defparameter
    To be in line with Common Lisp we've decided to put docstring and
    declarations after the value of defparameter and defconstant. This
    indeed looks more natural.
    * more efficient Lisp stack and a bug fix in signal handlers
    1. Provides an efficient stack that now is equipped with a map data
    structure that allows us effeciently query for locally bound
    In Primus Lisp variables live in two scope:
     1. the Primus global such as CPU registers, parameters, etc.
     2. the Primus Lisp local scope that is populated by lexically scoped
        local variables such as function parameters and let bound
    Any Lisp operation on a variable involves checking whether this is a
    local variable or global variable. Since we're using lexical scoping
    we should be able to distinguish between them in compile time. And
    that's what we will eventually do in the future. But so far it
    requires to many changes to the interpreter. Previously, we were using
    an assoc list, that was traversed every time we read or set a
    variable. The new implementation uses a balanced tree that maps
    variables to the total number of their occurences on the stack. That
    makes the check to be logarithmic in the number of unique variable
    names on the stack.
    Few other small optimizations:
    a) faster function frame costructor, that is not monadic (as it
    shouldn't be), and that is not preserving the order of arguments (so
    that we can append them in the reversed order, using fold left), and
    that counts the total number of arguments, so that we don't call
    List.length when we're poping the frame off.
    b) a faster frame pushing mechanism - instead of just appending
    arguments, we are prepending them in the reverse order. This is not
    only faster, but since the frame is in the reverse order already, it
    preserves the existing order (so no changes in semantics)
    2. Fixes an nasty bug in the signal dispatcher. When a signal is
    dispatched to several methods each method ignores effects from other
    methods as it resets the state as it was before the dispatcher was
    * keeps primitives during linking and moves init_env to loader
    1. we do not want primitives, that were registered by other components
    to be dropped when a new program is linked. In fact, primitives is a
    complete different beast than other definitions, as they are defined
    by OCaml modules (i.e., plugins) rather than by the Lisp code, so
    probably they shouldn't be a part of the program data structure at
    all. Though having them in the program makes things much easier.
    Besides, when we drop primitives during the program linking time, we
    are discarding all primitives that were registered prior to the call
    to program_link. So far, just by accident, the link_program function
    was called before any other components (since it was registered last -
    after all other lisp components). This is so fragile, that I consider
    this is a bug. Nasty one.
    2. The `init_env` function in the Primus.Lisp module was adding
    special variables of the form `%<id>` and `@<id>` that hold addresses
    of the correspondingly named program terms. It was using the local
    variables stack. Neither should it be using the local stack for
    storing this variables (as they are globals by their nature), nor this
    should be done during the linking time of the program. This function
    is now moved to the loader (that is the component that's responsible
    for setting up the environemnt).
    * adds the runtime support for parameters
    parameters are lazy, and are not evaluated unless are used, and not
    set before.
    * parametrize malloc allocator
    This implementation adds the following features, that are useful
    during the analysis:
    - the upper limit to the maximum size of the allocated memory chunk
    - the upper limit to the maximum size of the malloc heap
    - malloc guards with optional coloring
    - efficient calloc that doesn't
      a) take memory
      b) take time
    * publishes Primus Lisp eval functions
    There are two entities in Primus Lisp that could be evaluated -
    functions and methods. Both are now runable directly from OCaml via
    the `eval_fun` and `eval_method`.
    * adds Primus Lisp autodoc generator
    so far we can generate only index, but still not that bad.
    * adds the documentation option that renders primus lisp documentation
    * normalizes the generated documentation
    - merges different definitions of the same entity
    - removes quotations
    - removes duplicating whitespaces
    * adds documentation to various primus lisp definitions
    * fixes the name initialization in the loader
    an address can have multiple addresses, but a name shall have only
    one, not vice verse
    * drops the primus-lisp dependency
Commits on Mar 12, 2018
  1. removes the bogus `open Re_perl` statement (#788)

    ivg committed Mar 12, 2018
    it's not necessary and breaks buils in the newer jbuilder compiled
    `re` library, as it no longer installs cmi files to the same folder.
    also fixes the .merlin file by adding the build files location
  2. produces correct callgraphs if there are no calls in a program (#791)

    ivg committed Mar 12, 2018
    An empty call graph was produced in a degenerative case, when there
    are no calls at all, i.e., when there are no edges in the callgraph.
    Now it will produce a graph that will have only nodes, in that case.
  3. fixes BIL normalization procedure (#789)

    ivg committed Mar 12, 2018
    * intentation changes
    apparently ocp-indent was broken, when I was editingt this file the
    last time.
    * adds hoisting non-generatives from specific points
    This commit extends BNF1 with an extra condition, that no load or
    store expressions in the following positions:
     1. the right-hand side of the let expression;
     2. address or value subexpressions of the store expression;
     3. storage or address subexpressions of the load expression.
    Those positions are the specific positions where the BNF2
    normalization will involve rearragement and copying of
    expressions. Thus we need to ensure that in those positions all
    expressions are generative (i.e., do not produce observable (by other
    expressions) effects). We enforce this restriction by hoisting all
    non-generative expressions into statements. Since this transformation
    produces new statements its impossible to apply it on the exression
    level, so we are putting BNF1 as a precondition for the
    `Exp.normalize` function.
  4. bumps the bap version to 1.5.0-dev (#790)

    ivg committed Mar 12, 2018
    * bumps the bap version to 1.5.0-dev
    to prevent the confusion with the already released version
    * removes signature update from the test script
    The signatures are already installed in previous stages, so they are
    not needed. Also the development version doesn't have released
    artifacts, so by default the update won't work unless an explicit url
    is specified. So it is easier just to remove it.
Commits on Mar 7, 2018
  1. updates CI tools (#787)

    ivg committed Mar 7, 2018
    * updates CI tools
    This PR adds 7 docker files and 3 vagrant files that are using in our
    CI system to test bap installability.
    Also this PR updates our `` script, that is now capable
    of performing the whole update procedure of the OPAM repo in fully
    automated way (you need to create only PR manually)
    * adds a script to delete all non-master packages
    it keeps the conf-* packages so far, but probably we can get rid of
    them too, they all landed up in the upstream long time ago, and most
    likely we do not need to update them often, neither do they point to
    the git repositories.
Commits on Mar 1, 2018
  1. updates documentation, copyrights, readme, and changes (#785)

    ivg committed Mar 1, 2018
    Documented new functions, deprecated those that are superceeded by
    Primus. Enumerated all new features and bug fixes in the CHANGES,
    bumped all versions and copyright dates.
    Also added a sed script to our document generator ocamldoc now started
    to generate lots of `include ??` entries (probably from the ppx
    derived stuff).
    Also I've reindented bap.mli. The ocp-indent tool was chocking on it,
    because of attributes that were put inside of the recursive type
    definition. Fine by OCaml, not fine to ocp-indent. Once I've fixed it,
    it moved quite a significant part of bap.mli... so sorry for this big
Commits on Feb 28, 2018
  1. fixes taint frameworks compatibility layer (#784)

    ivg committed Feb 28, 2018
    * fixes powerpc dependency list
    Surprisingly powerpc depends on bap. If the dependency is not
    specified explicitly then oasis won't compute dependencies correctly
    and won't rebuild some modules, when the bap library is changed.
    * prevents the TCF pass from running many times
    It doesn't hurt, though there is definitely no reasons to run it more
    than once.
    * enables compatibility between different taint analysis engines
    Now we have two approaches to taint analysis:
     - The new Taint Analysis Framework
     - The legacy Taint Propagation Framework
    While the former is more precise and versatile then the latter we
    still have a bunch of legacy software that we want to support, e.g.,
    Saluki and our IDA Pro integration still relies on the old
    framework. We may update the downstream tools to use the new
    Taint Analysis Framework, but right now we don't have enough recourses
    to perform such task. And I doubt that in case of Saluki this will make
    any sense.
    The `primus-propagate-taint` plugin provides a compatibility layer that
    allows legacy tools to benefit from Primus without event knowing
    anything about it. There are some trade-offs starting from the
    semantics of the propagation and ending with the desire to minimize
    breaking changes in bap command line interface, that are explained in
    detail in the man page of the plugin (provided below for the
           This plugin implements a compatibility layer between the new Primus
           Taint Analysis Framework and the old taint propagation framework (the
           propagate-taint plugin). The new framework uses the
           pubslisher-subscriber pattern, provides sanitization operations, and
           tracks the taints liveness, that enables more conventional and online
           taint analysis. However it represents taints as abstract objects
           associated with computations (values), while the old taint propagation
           framework uses a pipeline approach, with taints represented as
           attributes attached to program terms. Since the new representation of
           taints is much more precise and there is no bijection between terms and
           values, this layer will loose information due to this impendance
           mismatch. The trade-offs of the translation and described below. New
           analysis, if possible, shall rely on the new framework.
           The translation is achieved by mapping the tainted-ptr and tainted-reg
           attributes to corresponding taint introduction operations of the Primus
           Taint Analysis Framework, and by reflecting the taint state of the
           analysis into the tainted-regs and tainted-ptrs attributes. Both steps
           are optional, and could be enabled and disabled individually.
           Since an attribute is attached to the whole term not to an individual
           expression or value we need some rule that prescribes how terms maps to
           values. If a term is marked as a term that introduces a taint, then we
           assume that a value, computed in this term, references the tainted
           object either directly (in case of tainted-reg) or indirectly (in case
           of tainted-ptr). We always taint a value contained in the left-hand
           side of a definition. In addition, we also try to taint values on the
           right hand side. If there is a load or store operation, then we taint
           address as a pointer to the object that will track, if it was marked
           with the tainted-reg attribute. If it was marked with the tainted-ptr
           attribute then we dereference this pointer and taint the dereferenced
           address. If the right hand side is an abritrary expression, then we
           assume that all variables that are used in this expression contain
           values that are referencing directly or indirectly the tainted object.
               Introduces taint in terms that are marked with the tainted-ptr and
               tainted-reg attribute.
           --help[=FMT] (default=auto)
               Show this help in format FMT. The value FMT must be one of `auto',
               `pager', `groff' or `plain'. With `auto', the format is `pager` or
               `plain' whenever the TERM env var is `dumb' or undefined.
               Disables the projection of the taint engine state to term
               attributes. The option is only valid when the run option is
               specified. This option is left for compatibility with the old
               interface and is not compatible with the from-attributes or
               to-attrbutes options. It is an error to mix options from the new
               and old interfaces.
               Enables propagating taint from term attributes and back to
               attributes, unless the latter is disabled with the no-marks option.
               This option is left for compatibility with the old interface and is
               not compatible with the from-attributes or to-attrbutes options. It
               is an error to mix options from the new and old interfaces.
               Reflects the state of the taint propagation engine to the
               tainted-ptrs and tainted-regs term attributes.
    If you read that far, then you deserve the bonus track, this is how we
    can run Saluki using the new Taint Analysis Framework as a taint
    propagation engine:
    bap ./exe --saluki-print-models --propagate-taint-print-coverage \
      	  --passes=trivial-condition-form,saluki-taint,run,saluki-solve \
    	  --primus-propagate-taint-from-attr --primus-propagate-taint-to-attr \
    	  --primus-promiscuous-mode --primus-greedy-scheduler \
    	  --primus-limit-max-visited=64 --primus-limit-max-length=4096
    Yep, that's scary... I will later provide a recipe, and will also update
    Saluki's Makefile to facilitate Saluki experimentation with the new
    * do not taint the memory variable
    it doesn't hurt, as memory is never valuated by Primus, but still a
    little bit ugly and introduce extra weight to the tainter state.
Commits on Feb 23, 2018
  1. prevents x86 lifter from botching bap (#782)

    ivg committed Feb 23, 2018
    There are quite a few `_exn` functions in the modern x86 that make
    some assumptions about the structure of data, that shouldn't be done,
    as we shall not make any hard assumptions about the output of the
    disassembler or the binary input itself.
    This commit fixes an issue that arises when bap is applied to binaries
    that are compiled for the latest version of Mac OS X that leads to an
    abnormal termination with an unhandled exception. The failure is
    indeed provoked by the llvm backend that loses the segment prefix,
    though under no circumstances shall we do any preconditioning on the
    LLVM output. I've grepped the code to find other possible issues and
    found quite a lot uses of the `_exn` that are not justified or at
    least questionable.
    So far, I've added a try/catch clause around the lifter, that will
    prevent those exceptions from crashing the whole process. Though in
    future we shall revisit this issue and carefully review all mournings
    that x86 lifter is dumping into the log file. This commit also makes
    those messages more actionable by adding binary representations of
    failed instructions.
Commits on Feb 16, 2018
  1. Revert "renames oasis files (#777)" (#778)

    ivg committed Feb 16, 2018
    This reverts commit e8d2675.
Commits on Feb 15, 2018
  1. Update

    ivg committed Feb 15, 2018
  2. primus updates and many more (#734)

    ivg committed Feb 15, 2018
    See PR #734 for the detailed description, below is a dump of all commits:
    * changes the default warnings set
    OCaml officially disables warning `40` (type disambiguation of record
    fields and variant constructors)by default. See ocaml/ocaml#1333. So do
    we. This commit will set presets warning to the same set as it is now in
    OCaml trunk. Moreover, the same set is used in jbuilder by default, and
    the modern OCaml actually started to rely on the absence of the warning
    40. For example, new version of Core heavily rely on by using data
    constructors that are defined deeply in the internals.
    I personally believe that this is a good thing, since the neccessity to
    qualify each record or data constructor with its full path usually leads
    to lots of open statements that pollute the namespace (unsafe) or to
    just to unreadable code. The type disambiguation is a safe way to make
    the code much more concise and gives us an opportunity to write more
    readable code.
    From now on, warning 40 is officially disabled in BAP.
    This commit also sync merlin warnings with the compiler warnings. And
    let's keep it this way.
    * suppresses the Image.register_backend warning
    * introduces progress reports
    This commit provides a new feature: progress reports. A new mechanism
    that allows plugins to provide a feedback to the system. A rendering
    plugin is added that prints this reports in a way that shows the
    time profile.
    Progress event
    The progress event is a new kind of logging event that allows different
    components to provide a feedback about their progress. The supporting
    functions like `report_progress` are added to facilitate the usage of
    the new event.
    Progress reporting plugin
    In fact the report plugin is planned to provide more functionality, but
    so far it renders progress bars from different components. It requires
    ANSI compatible terminal and may clobber output.
    * passes the return value of a function to advisors
    The after advisors will now get the returned value, and can even change
    * updates to lisp stubs
    - fixes puts - outputs '\n' at the end
    - stubs [exit]
    - fixes `memset` and `memcmp` stubs
    * changes --primus-limit-blocks-max-visited behavior
    Previously the option counted the state transitions of a linker, now it
    will count only real basic blocks
    * adds `Pos.get` to extract attribute from a pos
    Since program posistion is a term we can directly extract attributes
    associated with a position, e.g., to get an address of a position:
    get pos address
    * simplifies expression before evaluation in Primus
    * [docs] several documentation fixes
    1. updates Stmt.normalize docs
    The previous version didn't reflect the role of the `normalize_exp`
    flag. So now we intoduce the `BNF1` and `BNF2` form to distinguish
    between BIL with normalized and not-normalized expressions.
    2. better descriptions for Primus schedulers
    so far there were bukazoids
    3. documents better the advice mechanism
    * [build] [WIP] a set of build system optimizations
    1. switches build system to compiles
    2. builds only those plugins that are needed
    our build plugin subsystem of the build system is performing lots of
    unnecessary work, by rebuilding all plugins every time. The proposed
    version builds a plugin only if it is not yet build or if it is older
    then its library.
    Note, it is a work on progress and contains a hardcoded path to mine
    3. a tool that deletes bap
    in case if you messed with the oasis state and can't use uninstall any
    * revised sexpual representation of Primus events
    1. BIL expressions are printed in a more lispish notation, I think that
    it would be worthwile to extract this code later from Primus and provide
    it as a pretty-printing format for BIL expressions.
    2. value representation is changed, now it is an lexeme of the following
    format: `value:type#id`, a little bit too noise, but we don't want to
    loose the id information, since we can use it to derive data dependency
    * [WIP] adds monadic watchers
    Previously, watcher were only used for monitoring the event stream and
    introspecting, and so far there is a comment that warns users that it
    should be only for this (though the comment is superfluous, as the type
    system will not allow to affect the machine state in the watcher
    This commit proposes a watcher interface that allows machine
    modifications, i.e., the watcher observer will be run the machine
    monad. The idea, is that we can subscribe to the observation by name,
    from the Lisp machine (using the upcomming rule engine), and generate
    observations from observations without any OCaml code. Though the idea
    sounds nice, there are several drawbacks. The main is the performance,
    as we need to transform each observation into the sexp expression. The
    code tries to alleviate this issue by not transforming unwatched
    observations, though the core of the problem remains.
    Warning: so far, this commit has no dependencies, so it is possible that
    it will be discarded or not merged until it is really needed.
    * [WIP] adds an unresolved observation
    it is easier to detect non-stubbed functions using it.
    * [DIRTY] [WIP] debugging the round-robin scheduler
    totally work in a progress, it still doesn't work as I want it to work.
    * [WIP] [UNTESTED] do not mark terms on every block
    this may break things and I didn't check that it works right. The idea
    is not to update the visited attribute on every block, as it is done
    now, but do this once the machine is finished, as every term update
    involves the whole rewrite of the program term.
    This may have issues as the [finished] stage is heavily overloaded with
    different other observers, and it might be the case, that the order of
    their invocations matter.
    This is a general problem with the [finished] event - as components
    may commit their results at this stage, though other components may
    collect these results at the same stage, given that there is no ordering
    between handlers, the latter might be called before the former. So this
    problem should formally addressed.
    * adds the BAP Rule Engine
    BARE is a production system (a forward chaining rule system) is a
    pattern matching stream processing rule engine that features non-linear
    pattern matching.
    Basically, this can be seen as a generalization of the Saluki
    solver. Implementation wise, there are two main differences:
    1. All facts are represented as a Sexp.t tuples and matching is
    performed purely syntactically. In Saluki rules were uninterpreted
    functions with arbitrary behavior, though in the paper they were
    represented as tuples, except the predicates. So this is closer to
    formal behavior of Saluki.
    2. Bare operates on the rule level, not on the whole specification
    level. In Saluki we were sharing the partial matching state between all
    the rules, so if two different rules rely on a the same pattern it will
    be matched only once. Though it sounds nice, I still not sure whether
    the Saluki solver implemented this stuff correctly. Also, since Bare
    doesn't have any a priori information about the input domain, it is
    impossible to perform any query optimization, thus we are relying on
    Bare users to build efficient rules, based on their local domain
    knowledge, and performing this on a rule level, probably, is easier.
    Note, though the implementation may look trivial, in comparison with
    Rete for example, it is not that naive as it may look like, thanks to
    the inherited data sharing of OCaml heap. Although we have just a list
    of hypothesis that represent the partial matching state, if you will
    look deeper into the representation, then the actual representation, is
    that we have a list of patterns, and a list of pairs pointers, where the
    first points somewhere in the patterns list, indicating the patterns
    that are not yet matched, and the second pointer points to the list of
    bindings. Bindings are also shared by different hypotheses that share
    the same root hypothesis. So in the memory it is a prefix tree with lots
    of data sharing. With all this said, there is a room for optimization,
    so far we are performing lots of unnecessary list reordering, though we
    can employ some zipper-like structure with O(1) merging.
    * moves primus-print on BARE
    The `--primus-print-rules` parameter specifies a list of files with
    printing rules (though the name is probably bad, as it sounds like the
    rules are printed, may be `-by-rules` or `-with-rules` will be better
    Each file may contain zero or more BARE rule specifications. If any rule
    is not well-formed then a nice (its really nice thanks to the new Sexpar
    library) error message is printed and the program is terminated.
    * adds Primus memcheck tool
    The memcheck rule is akin to the Valgrind memcheck and detects the same
    kind of bugs, i.e., use-after-free, corrupted pointer, memory leaks and
    so on.
    The tool is memory allocator agnositc and is controlled from the Lisp
    machine via two three primitives:
    - `(memcheck-acquire ptr len)`
    - `(memcheck-release ptr len)`
    TODO: make the primitives available to OCaml components
    The tool reports all violation via the `memcheck-violate` observation,
    that has the following form:
    - `(memcheck-violate pos kind)` where `pos` is a program
      location (address) of the violation, and kind is a specific kind of
      violation, e.g., corrupted, use-after-free, def-after-free, and so
    The tool also provides observations of the allocator state, that have
    the same name as primitives (probably this is a bad idea to give the
    same names to observations and Lisp primitives...):
    - `(memcheck-acquire pos (ptr len))` - made every time the
      `memcheck-acquire` primitive is called at the given program position
    - `(memcheck-release pos (ptr len))` - made every time the
      `memcheck-release` primitive is called at the given program position
    1. Currently the analysis is broken, due to incorrect comparison of
    values, I will fix it soon.
    2. Although the tool is allocator agnostic, it doesn't allow checking
    several allocators in parallel, as there is no way to distinguish
    between different allocators. It would be nice to have this feature, as
    it will allow us to handle such bugs as freeing memory using an
    incorrect dealloctor.
    * ignores value id during the comparison
    value represents a result of a computation with an attached identifier,
    that can be used to attach different properties to the result, using
    externalized mapping. Thus the id is not the identity of a value (like
    an term identifier that implies that it was a bad idea to name it `id`),
    but rather a tag, that allows selection of properties of this particular
    This should be addressed carefully in documentation.
    * [DELETE] looback derived facts into facts
    So the problem is that it is now the event processor is not only
    generating reports now, but generates facts, that are also processed,
    so now it is a fact-processor. Probably it shouldn't be in the print
    pluing, that should just print.
    Also, the current implementation produces facts derived from the derived
    facts before the derived facts.
    Another issue - rules currently can communicate with each other as they
    do not share the fact stream - thus we need to look at the right hand
    side of each rule, and treat them as observation providers. Should we do
    this at all, or it would be better to be able to provide only those
    facts, that are registered in the system?
    I think that this is a bad idea. For example, if the observation
    processor create new observations from thin area, we basically create
    observations that are not visible from OCaml realm.
    I believe, that this commit should be discarded, and instead the
    observation processor should be moved in the Lisp machine.
    The question is:
    How to represent it in the Lisp machine?
    * [SQUASH] documentation fix
    * [WIP] moves primus lisp parser to parsexp
    It is a first step in moving the Lisp parser to parsexp to deliver
    better error handling and diagnostic. Still a work in progress.
    * [WIP] adds promiscuity to each call
    This change will enable a promiscuous jump at each call, that will
    ensure coverage of the code, that follows a call that didn't return.
    Currently, the implementation just forks at each call, and creates
    a clone that will follow directly the fallthrough edge.
    * [WIP] split Primus Lisp into many many modules
    Trying to order the things.
    * tracks all freed bytes in the memcheck tool
    The right way would be to use a segmentation tree, though currently we
    just add all bytes into the dead heap.
    * FIXUP: fixes the promiscuity mode
    * [WIP] switches to indexed notation
    Instead of trying to use parsexp, we can use sexp that is indexed with
    conventional numbers. Such indexing, not only solves a proble with
    getting the location information of a parse tree and an AST, but, in
    general, allows us to attach arbitrary semantic information to a tree
    node. We are using dual-indexing, one index represents identity, that
    denotes a node identity, and another index represents equality, such
    that all equal nodes have equal equality index.
    The main drawback, is that we can't index an arbitrary recursive data
    type non-intrusively, as we need to attach the index to each node, thus
    we need to define our own data representation for the s-expression,
    type tree = token indexed
    and token = Atom of string | List of tree list
    The solution above is still partial, as list is a recusive data type on
    itself, thus the right solution would be to introduce an indexed list
    data type, e.g.,
    type 'a ilist = 'a node indexed
    and 'a node = Nil | Cons of 'a * 'a ilist
    However, switching from the regular list to the indexed list will
    complicate the code base a lot, as we can't use any more the syntactic
    sugar for List deconstruction, afaik, OCaml will soon get an ability to
    extend the list notation to arbitrary user data type, that will make
    this solution viable.
    The proposed implementations implements a multipass processing:
    1. we take a set of features and parse all source files into a source
    repository. The parse tree is an indexed s-expression. The source
    repository implements hashconsing and computes equality
    relationships. It also records location and file mapping.
    2. After all required features are loaded into the source repository,
    the repository is parsed in several steps. The first step loads the meta
    program, that consists of term-rewriting definitions. The second step
    loads the definitions while applying all term-rewriting rules. The
    second step can be seen as a symbolic computation step.
    * [WIP] separates parsing steps
    The steps should be applied on the per source repository basis, not on
    per module basis. Now, when we are parsing definitions the meta program
    is totally ready and we have all meta definitions. Since we need to
    apply macro definitions during the parsing we need all of them. Since we
    do not apply regular lambda applications at this stage, the order of
    parsing is not relevant.
    We may later add a pass that will resolve calls, so far, we will use the
    late binding.
    * [WIP] moves `set` back to special forms
    since the var field shouldn't be interpreted it can't be a function.
    * factored a program into an abstract type
    * updates Primus Lisp to new data representation
    * rectified the Lisp Machine interface
    Instead of ugly initialization and parsing during component
    initialization, we will parse the program outside of the Machine
    computation (that makes total sense), and then use the `load_program`
    operation to finally link the Lisp program into the Primus Machine.
    Also, the new primitive interface is made public, and the old one is
    Also, textualized lots of error messages, though a bridge is still too
    * fixes Lisp library intialization
    * fixes context ordering
    * nice static errors in primus lisp
    * implements all operations using primitives
    * updates lisp codes to the new interfaces
    * updates memcheck to the new primitives inteface
    * fixes ambiguity in the grammar
    also adds a pretty printer for generated code as well as the
    --primus-lisp-dump option that dumps a genreated program
    * adds back the error form
    * updates ascii functions to the new syntax of chars
    * removes duplication from the or/and macros
    the previous definition was duplicating expressions, and if a duplicated
    expression is not expansive, then the result of the or form, for
    example, could be false, e.g.,
    (or (f x) (g x))
    was expanded to
    (if (f x) (f x) (g x))
    if `f` is not pure, the it will produce two different values, one of
    which could be truth value, and another false.
    The new implementation makes sure that each expression in the or form is
    evaluated no more than once, e.g., an example above is expanded to:
    (let ((r (f x))) (if r r (g x)))
    * a new definition of memcheck-malloc
    that uses the new advice syntax
    * adds gradual type inference algorithm to P.Lisp
    Also implements a dataflow framework, that I will later move to
    * adds type annotations to primitives
    * wip: added tree reindexing
    * adds type variables to annotation
    fixed a few issues, still a few bugs are lurking
    * fixes some unification problems
    some terms were accidentally having the same id, and were, thus
    * fixes word-read function
    * reduces the amount of meets in gamma
    apparently we can perform substitution instead of creating a meet
    * performs join on a set of possible applications
    when we do type inference we shouldn't pick the first applicable
    definition and use it as a source of ground types, instead we should
    join all applicable definitions. A better approximation will also take
    the context into the consideration.
    * fixes malloc definition
    accidentally deleted the external specification.
    * adds a generic data flow framework
    * fixes the fixpoint description
    * moves Lisp typer to the graphlib's fixpoint
    * adds an indexing library for interned strings
    * adds some docstrings to the index module
    * adds empty to the index signature
    it is hardly usable without it ;)
    * adds symbols to Primus Lisp
    Adds a new term kind - symbol. A symbol is a first class value, that
    has the same runtime representation as a word (we use interned strings
    as underlying representation). A symbol is introduced in a program
    using a regular `'<sym>` literal, e.g., `'cat`.
    This commit also reverts character literals back to `?<char>` Emacs
    Lispish notation (instead of '<char>' that was used for some time in
    this branch).
    * allows memcheck users to handle mutliple allocators
    Now all primitives take a symbolic argument that denotes an allocator
    Note - the argument is ignored right now, since the next step would be
    to introduce the region manager and deprecate memcheck, since soon we
    will be able to represent memory checking policy without writing any
    OCaml code.
    * adds interval trees to Bap.Std
    Previously, we had interval trees with the key (interval) type
    hardcoded to `Memory.t`. Now the tree data structure is properly
    functorized and the old implementation is using the new functor.
    * adds the symbol type to the Primus Lisp type system
    * makes the sym type available in the public API
    * redefines the type system interface
    instead of the low-level signature, this provides a DSL for specifying
    primitive types. The implentation is architecture agnostic and doesn't
    require any machine computations (that totaly makes sense).
    * unrestricts shift primitives
    there is no need to unify left and right arguments
    * adds has_loc operation to the Source module
    * fixes the int-width operator name
    accidentaly renamed it during batch renaming
    * Use value as the runtime representation of a symbol
    instead of word, to minimize translations
    * revised the typeinference
    it fixes some issues, but introduces a bigger one - now it doesn't terminate.
    * rewrote the type inference engine
    I don't know whether it works or not, I don't have any time to work on
    it anymore.
    * final polishing with the Primus Lisp typing
    Still far away from being production ready, but as a base it should
    * [SQUASH] adds interval trees to Bap.Std
    lost files (the interval tree itself)
    * adds the interval trees library to Primus Lisp
    * adds signaling mechanism to Primus Lisp
    We reflect Primus Observations onto signals. Every time an observation
    is done, it is mapped to a signal, that is dispatched to
    handlers. A handler is defined with the `defmethod` stanza and can
    perform aribtrary side effects.
    * exposes the symbol interface
    now it is possible to translate a symbol from symbolic to numeric
    representation and vice verse
    * moves TCF into a separate pass
    and makes run dependent on it.
    * adds call tracing capabilities
    This commit introduces a unified interface for Lisp and regular BIR
    calls. Now we have covenient `call` and `call-return`
    observatiosn (and signals) that act as intuitively.
    Also fixes quite a few bugs.
    * fixes a stupidiest error in the index library
    * parser now dequotes an symbol
    * fixes docstring loosage in the regions library
    * adds primus lisp key value storage library
    * makes merlin to warn about innocuous unused variables
    By default Merlin warns only about unused let-bound variables. With
    this warning all unused variables, including those that are bound as
    function parameters or during matching will be warned. This is very
    when writing code in the monadic style, e.g.,
    f x >>= y ->
    f y >>= z ->
    where instead of the `let x` we need to use a more general `fun x`.
    We will enable this warning globally in the compilation at some point
    of time.
    * SQUASH TRAILS dictionary
    * expresses lisp primitives via the interpreter primitives
    * fixes the write-word macro
    it didn't work for the little endian, since the comparison to zero
    wasn't signed (and so far we even do not provide the signed comparison
    the big endian was also broken - off by a byte.
    * [SQUASH] cleanups program
    * registers primus-dictionary
    * fixes compare and adds min,max,fold functions
    * adds strdup,strcmp, and strncmp functions
    * increases precision of the taint analysis
    if a reference already points to the tainted object we overwrite it
    with the new taint, while previously we were merging the old set of
    referenced tainted objects with the new set.
    It is still not precise, as it doesn't account for the assignments
    that are not tainted.
    Also fixes a couple of bugs, and introduces a nicer mechanism for
    generalization tainting functions (i.e., selectors).
    * introduces garbage collection to the taint analysis
    this will improve performance and precision, and will also enable
    tracking the existence of the tainted objects - that will allow us
    to express properties that dependend on a taint being killed.
    * improves performance of taint propagation
    First of all, the previous note on the taint propagation precision was
    wrong, it is precise, as when we rewrite a reference with an untainted
    object we are associating it with an empty set of tainted objects. To
    minimize the number of references that we track, we can just remove a
    references instead of associating it with an empty set. This is
    effectively what this commit is doing.
    * removes unnecessary stuff from the propagator
    now more introduction, marking, or mapping -- all these stuff will be
    moved to other plugins.
    * moves symbolic index to the value module
    We would like to intern symbols into values in other modules.
    * WIP: moving the tainting module to the library
    * makes the concat observation public
    it was a sever undertainting in the taint propagation engine due to
    the absence of this event.
    * makes policy selectable on per taint kind basis
    * moves bap_taint to the library universe
    * adds a conservative GC to the Taint analysis framework
    in future we may provide several implementaiton of GC that will have
    different tradeoffs, so that a user may choose a GC to her/his taste.
    * removes assertions about taint object correctess
    We can't actually enforce by the abstract interface the invariant of
    the taint object, that it must have an associated kind, as we still
    expose the `of_value` function, that injects any value into the object
    type. So it is possible to associate with a value anything besides the
    taint object. Thus instead of doing the assertion false, we assume
    this and silently ignore all other values. Honestly, I don't like the
    silent part of this approach, so in future it may change.
    * publishes Interpreter's `exp` operation
    I was very reluctant to make it public, though it looks like that I
    need it too often.
    * rectifies the interrupt handling
    Previously a interrupt was halting a machine with an exception
    without any option to handle it. The new implementation provides an
    observation when an interrupt happens and, by default, proceeds to
    the interrupt fall-through destination. It is possible now, to attach
    arbitrary behavior to the interrupt using the observation mechanism,
    or by providing a method for the `interrupt` signal in the Lisp
    interpreter, e.g.,
    (defmethod interrupt (n)
       (when (= AH 8)
         (msg "DOS print_string is called")))
    * new taint introduction mechanism
    This mechanism is more precise, efficient, and easier to undertand.
    Also started exposing the taint framework to the Lisp land.
    * deprecates primus-propagate-taint in favor of primus-taint
    The new interface makes much more sense with Primus and is much easier
    to use, as there is no need to explicitly enable anything. The
    propagation is enabled by default. It is possible to choose a default
    policy out of provided two policies or implement a custom policy. The
    policy is selectable on the per taint kind basis, thus at the same time we
    can propagate different taints with different policies.
    I still didn't expose all the functions to the lisp. And didn't expose
    the `taint-policy-select` function, as well as the
    `taint-policy-set-default`. This would be done in the next commit, I hope.
    * exposes the rest of the taint interface to lisp
    * switches to the finalization events
    instead of killed, we now expose the taint-finalize event, that is
    used for both cases (i) when a taint is removed by a GC when it is no
    longer reachable and (ii) when the machine terminates (so there is no
    more chance to sanitize it).
    * revives the primus-propagate-taint plugin
    We shouldn't really deprecated it until we deprecate the old taint
    propagation pipeline (and I'm not sure whether we will do this, or
    need to do this). The (new) purpose of the primus-propagate-taint
    plugin is to enable integration between the new Taint Analysis
    Framework and the old taint propagation pipeline, i.e., it establishes
    mapping between the taint attributes attached to program term, and
    taints attached to computations.
    * erases the taint introduction component from primus-taint
    now its new place is in the propagate-taint as it connects the old
    taint representation with the new one.
    * fixes a couple of dozens of bugs
    looks like it works!
    * rectifies the policy based taint propagation interface
    There should be a surjection from kinds onto taint policies, not vice
    Now the default policy works, though it is not possible to set several
    policies to the same kind (that actually should make sense).
    * introduces recipes
    This commit introduces a new experimental feature called "recipe".
    On a high level a recipe is a collection of command line options and
    artifacts that were used to run bap, that can be used to replicate a
    Underneath the hood the recipe is a zip file with a flat structure:
    M Filemode      Length  Date         Time      File
    - ----------  --------  -----------  --------  ----------
      -rw-rw-r--        15   9-Feb-2018  16:33:26  descr
      -rw-rw-r--       362   9-Feb-2018  17:38:20  recipe.scm
    - ----------  --------  -----------  --------  ----------
    where `descr` contains a description of the recipe, and `recipe.scm`
    the recipe itself, e.g.,
    (parameter depth 1000 "how far to search")
    (option run)
    (option run-entry-points all-subroutines)
    (option primus-greedy-scheduler)
    (option primus-limit-max-length $depth)
    (option primus-promiscuous-mode)
    (option dump bir:out.bir)
    (option dump asm:out.asm)
    (option run-argv 'test-1,Joe,CS,Bart,ECE,Ana,PP')
    (option primus-lisp-load run)
    (option report-progress)
    So far, we support three stanzas:
    - (parameter <name> <default> <desc>) - enables a simple parametrization of a recipe;
    - (option <name> <args>) - passes this option to bap
    - (extend <recipe>) - loads the specified <recipe>
    With the `extend` command it is possible to build recipes from
    existing recipes, via extension.
    Last but not least, it is possible to pack files with recipes like
    schemes, signatures, symbol tables, etc.
    * relaxes assumptions in the TCF transformer
    The TCF transformation was only applied to blocks that have two or
    more branches, under assumption, that a block with less than two jumps
    may not have any conditions at all. Unfortunately, as #769 shows its
    not true.
    * adds more options to the primus-limiter
    Now it is possible to set the maximum length of a trace (path) in
    blocks, instructions, terms, and expressions. The kind of a counter
    that should be used is defined via the suffix, where `b` stands for
    blocks, `i` for instrucitons, and so on, e.g., `1000b` means a
    thousand of basic blocks.
    By default, if no prefix is set, we use expressions. Since the Lisp
    Interpreter, as well as any other linked code, is not required to
    express its computation in terms of blocks, it is better to use
    expressions, since the Lisp machine expresses its computation on this
    * removes debugging messages from the promiscuous mode.
    * adds the `--log-dir` option to bap
    Now it is possible to specify the log destination with a command
    line parameter, not only via the environment variables.
    * counts expression as primitive operations
    instead of relying on expressions, we now count primitive, load,
    store, unop, binop, and exec operations instead.
    This commit also adds progress reports from the limiter.
    * erases debug printing
    * handles overflows in memory manager
    Handles the special case when a memory region stretches across the
    null address (i.e., when lower bound + length results in the
    overflow). This is usually needed in the promiscuous mode, when we map
    a negative page.
    * enables negative literals in Primus lisp
    I think it's too odd to write (-1 0) instead of -1 :)
    * adds a simple IO emulation level
    now it is possible to redirect channels and files
    This is rough, but shall work as an initial approximation
    * implements many stdio stubs
    include open, fopen, write, fwrite, read, fread, fput, fget and the
    * adds the missing _exit stub
    so far it is the same as the exit.
    * SQUASH me onto channels
    lost oasis files
    * drops the old memcheck implementation
    * introduces primus-test plugin and new memcheck
    * fixes a bug in the promiscuous mode
    the recently added fork point at a call site was missing the check for
    already visited, thus we were spawning an exponential number of paths.
    * disables printing non messages into log
    * adds the check-value analysis
    also included few examples, like unchecked malloc, etc, though later
    we will extend it with more examples.
    * adds init and fini signals (and init observation)
    * adds options for printing recipes
    --list-recipes - shows all available recipes
    --show-recipe=<recipe> - shows detailed information about the recipe
      including the list of command line arguments
    * makes the build system optimization more portable
    * first attempt to pass tests
    * fixes context constraining during parsing
    When we parse a function that is defined in some context we are
    applying this context to all macro definitions, but we fogrot to
    include the global context.
    * updated tessuite and opam file
    * adds TCF library to opam/opam
    * removes a trailing whitespace after the package name
    * fixes context ordering operation
    The comparison was one sided :(
    * allows quoted strings in the context declaration
    Features can be written with and without quotes and will be treated
    equally (i.e., quotes are ignored).
    * handles mips32 startup procedure more correctly
    Problem: is that main returns to the address that is stored in the
    RA register, which is a nop instruction, that is implemented as a `b
    0` operation, that doesn't terminate.
    Solution: abuse `__libc_csu_fini` as a termination procedure, and set
    RA to its address.
    * updates testsuite
    * changed int printing and thrown away abs symbols
    * revives recipes that were accidentally killed with merge
    * fixes error printing
    1. print errors in one place
    2. do not clear screen when progress report is done as it clears the
    error message
    * nicer error messages from recipes
    * declare all program variables in promiscuous mode
    probably, we shall do this in any mode...
Commits on Oct 3, 2017
  1. Bap with core v0.9 (#704)

    ivg authored and gitoleg committed Oct 3, 2017
    * makes it compilable with the new core
    Most of the changes are due to the following breaking changes in the
    1. change of the monad interface, now `bind` takes a function as a
    labeled argument. This is the most annoying change, as we need to update
    our Base interface to be consistent with Core, but this will force us to
    release the 2.0 version.
    2. The `equal` parameter for different finders now became mandatory (to
    discourage people from using the polymorphic compare).
    This will fix #648.
    * completely disables inline tests
    The new inline tests from Janestreet doesn't work with the plugin
    system. It is not even our tests, but core's own tests, that give
    a runtime error.
    * switches opam/opam to the newer version of core
    * drops 4.03, adds 4.05 to the CI
    * picks llvm-3.8 for the CI opam file
    I hope there is such on Travis.
    * maybe which doesn't work
    so why it works for the cxx?
    * we need to set custom_ppx to prevent ppx_deriving
    otherwise it will become a dependency, that is not really used.
    * supports building without ppx_deriving
    the ppx_deriving is an optional depedency of many ppx_X packages from
    Janestreet. The optionality is specified using a predicated
    requirement. The problem is mostly in ocamlfind, that if there is no
    `-pp` option will perform a query is made to build a preprocessor
    command that doesn't include the predicates passed through the command
    line, that lead to a failure, since ppx_deriving is not installed.
    The workaround is simple - we need to pass the `-pp` option to the
    linking phase - kind of stupid, but it will prevent the ocamlfind driver
    from making this query.
    The bap_build library was also update, I've added a set of default
    predicates that are used when the set of bap dependencies is inferred.
    * pass the `-pp` flag to the c compler
    Now it becomes more and more weird - as we need to pass the `-pp` flag
    even when we are compiling a C file since the syntax command is still
    build even there.
    * cleans the ppx dependencies
    we are depending only on ppx_jane, and the dependcy is put into the
    common section. Everything else is removed.
    * disables the bap_elf library
    So due to issues with [ocamlfind][1] and the new generation of the ppx derivers
    from the Janestreet we can't build the bap_elf library without a bogus
    dependecy on ppx_deriving. A trick to supress this dependency from
    jumping in was to specify a `-pp` flag, and we can't use this trick with
    a library that uses camlp4. So we can only build it, if ppx_deriving is
    installed (it is not actually used, it's needed just to keep ocamlfind
    Since `ppx_deriving` is not really needed, and it usually lags behind
    the compiler development, we decided to disable this library by
    default, until we will rewrite it using `ppx_bitstrings`.
    * updates test runner
    * sets version to numeric and disables BAP_DEBUG
    The latter probably is the cause of the "No space left on device"
    error (since BAP_DEBUG will fill the /tmp folder with lots of trash).
    The former will prevent the version test from passing.
    * solved warnings
    Lots of warning occured because of new core_kernel version, mainly
    connected with {in/out}_channel types, that should be now
    {In/Out}_channel.t, according to Jane Street point of view
    * edited a travis env matrix
    so now we will have 6 tests for 4.03, 4.04 and 4.05 compilers
    with either BAP_RUN_TEST or BAP_RUN_CHECK set to true
    * updated testsuite
Commits on Aug 25, 2017
  1. fixes bap installation instructions

    ivg committed Aug 25, 2017
    OPAM acts like stupid and always tries to upgrade conf-llvm package to the latest.
  2. updates to the front page

    ivg committed Aug 25, 2017
    * adds a TOC for the ease of navigation
    * rewrites the introduction
    * adds the sponsors information
    * updates the example output
    * updates versions
  3. updates the BAP version

    ivg committed Aug 25, 2017
  4. adds changes to

    ivg committed Aug 25, 2017
Commits on Aug 24, 2017
  1. Final 1.3 update (#688)

    ivg authored and gitoleg committed Aug 24, 2017
    * makes main state global and few bug fixes
    The main change is that now there is only one main state in the
    machine (aka project), not a separate copy per machine.
    This commit also fixes leave-{blk,sub} events and adds several new
    observations, such as exn-raised, that occurs every time a machine
    switches to an exceptional control flow.
    The commit also adds few common method to the monad interface.
    * adds attributes to bitvectors
    Now it is possible to attach arbitrary values to a bitvector.
    * names the input interface of the Regular functor
    * wip
    * wip
    * wip
    * wip
    * wip
    * wip - now it compiles at least
    * ensures normalization and well-typedness
    also fixes a bug in Eval.binop.
    * fixes a bug in arm lifter
    This bug leads to a type error, as a result of a byte load and a half
    word load is stored in a 32 bit variable.
    * published Exp.normalization function.
    * fixes a bug in type checker and exp normalizer
    a type checker didn't held correctly shifts
    and the normalizer didn't properly recurse
    * do not force constant folding
    allow a backend to choose whether to propagate consts or not.
    * so far force the normalization in the exp
    this is temporal cludge.
    * removes exp normalization
    BIR becomes unreadable. We will apply normalization when needed.
    Later, we may add a memoization or stuff like that.
    * sets BAP_DEBUG to true
    this enables better diagnostics of compilation and configuration errors
    on travis.
    * adds primus mark visted plugin
    the plugin will mark all terms that were visited by Primus with the
    visited attribute.
    * a precedence sensitive exp pretty printer
    Should do this long ago, but finally found some time. No more
    unnecessary parentheses!
    Note: operator precedences follow the C language, not OCaml.
       | x[y], <cast>:<N>[x], extract:<N>:<M>[x]   | 10 |
       | ~x, -x                                    |  9 |
       | *,/,/$, %, %$                             |  8 |
       | +,-                                       |  7 |
       | <<,>>,~>>                                 |  6 |
       | <,>,<=,>=                                 |  5 |
       | =, <>                                     |  4 |
       | &                                         |  3 |
       | ^                                         |  2 |
       | |                                         |  1 |
       | let.., x with y <- z, if.., . (dot)       |  0 |
    1) Higher precedence means that operators binds tighter.
    2) Any bil statement has precedence lower than an expression.
    * fixes simplifier, adds cast simplification
    1. at some branches the simplified didn't recurce
    2. if a cast casts to the same type it is removed
    * apply constant folding by default
    * fixes the Bitvector.unsigned function
    it was wrong.
    * removes unnecessary optimizations from the IR.lift
    * adds ssa and dead-code-elimination plugins
    The ssa just translates a program into the SSA form
    The dead-code-elimination is a conservative deadcode elimination
    plugin, that helps alot with x86 binaries by removing tons of unused
    flags calcualations.
    * makes string_of_value more consistent
    The string_of_value function now emits the `0x` prefix for hexnumbers,
    unless instructed explicitly with the `prefix:false` flag.
    The reason for this change is to enable a consitent handling of signed
    and unsinged, negative and positive values.
    * publishes Exp.substitute function
    long time ago we forgot to make it public.
    * enhances dead-code elimination plugin
    1. Now it will run until a fix point is reached
    2. A simple constant propagation is added, so that more virtuals are
    * renames mem32 and mem64 in x86 lifter to mem
    there is no need to have two different names for memory.
    * fixes and enhances lifted representation of shifts
    1. The original implementation emited the following code
        flag := flag
       to denote an unchanged flag. The code was problematic for both static
       and dynamic analysis. Static analysis treated this as a used before
       defined variable and treated flag as a free variable in the eclosing
       definition. The dynamic analysis treated `flag` as an undefined
       variable with all the consequences.
       The new implementation uses the `if` statement and changes flag only
       if it is necessary. Moreover, instead of using multiple if/then/else
       expressions, all assignment are gathered under one big if statement,
       that leads to a more simple CFG representation.
    2. The original implementation emited a temporary variable that holds a
       number of bits to which the destination should be shifted. There is
       no need for such variable (it is never changed). Moreover, the
       value of this variable is usually (if not always) a constant, so it
       is better to inline it directly in expressions. It looks like that
       this is the case of a common error -  delegating OCaml computation to
       BIL. In our case this problem is solved by constant propagation.
    * applies simplification to assoc operations
    If in operator is associative and expression is left recursive, then
    recurse to the right in hope to meet constants there. In particular,
    this will simplify `x + 1 + 1 + 1` to `x + 3`.
    * few optimizations and refactoring in old tainter
    1. do not store values with just empty sets of values
    2. special handling for the one byte storage
    3. remove a key from the taint set if it is set to empty
    * normalizes and simplifies expression before eval
    * switches propagator to a sparse storage
    * removes unused code
    * calls exit if bap is interupted by Ctrl-C
    so that profiler will work, as well as other at exit handlers.
    * implements new Bitvector pretty printer
    and rolls back the [Word.string_of_value] behavior.
    The pretty printing function was totally rewritten from scratch, as
    Z.format and other Z printing functions are broken. Now we have a
    generic printer that suits all tastes, as well as 9 preinstantiated
    pretty printers.
    We also changed the bitvector default string representation. The new
    representation doesn't used [true] or [false] for the [1] and [0] (this
    is binary analysis anyway), and doesn't loose the signedness
    * docs update and few renamings
    Updated bap.mli with a documentation for new functions.
    Also, renamed Image.Scheme.reference to Image.Scheme.relocation.
    * updated testsuite
    * fix dash/underscore issue
    * automates documentation generation
    Now it is enough just to type `make doc` and everything will work out of
    box, if a correct version of OCaml is used, i.e., 4.03. The Makefile
    recipe will pull the latest version of argot (git is required) and use
    it to build the documentation. BAP should be installed either from the
    source tree, or from opam, doesn't matter.
    * updates Primus docs
    also removes some unnecessary stuff from it.
    * adds monads interfaces
    * updates monads docs
    * cleans and documents the monads library
    I've removed few unnecessary funciton, e.g., `State.modify` and also
    fixed the type of the call/cc function.
    * removes extra blank lines from bap.mli
    * adds type checker to backends
    also fixes few bugs that were found as a result
    1) a zero width word was created (should be a one bit width)
    2) a bug in the typechecker that led to a stackoverflow
    So far no more type errors
    * removes unnecessary calls to Bil.fixpoint
    * adds missing manpages
    also added parameters to the primus loader.
    * typo in the docs
    * gone crazy and rewrote the taint propagation plugin
    because my last update to primus broke it(((
    * makes oasy little bit less hungry
    * dispatches linker unresolved calls to a lisp stub
    The __primus_linker_unresolved_call function is called in case if a
    linker can resolved a call.
    Also adds the `pc` method to the interpreter interface and publish it
    as a primitive `get-current-program-counter`.
    * fixes a bug in random number generation
    not actually in the random number generator, but rather in a casting it
    to a word.
    * don't fail the whole program on a division by zero
    just fail the machine.
    * enhances the run plugin with the multi-entry mode
    Now we can specify a list of entry points, or a magic `all-subroutines`
    to start execution in parallel from all specified entry points. If no
    specified, then all subroutine terms marked with the `entry_point`
    attribute are entered.
    * flushes output from primus after each character
    * adds a Primus component that ensures termination
    The primus-limit plugin will terminate a machine after a certain amount
    of computations has happened.
Commits on Aug 1, 2017
  1. reduces the memory representation RAM consumption (#682)

    ivg committed Aug 1, 2017
    As a continuation of #680, this PR will reduce the memory footprint of
    the memory representation in bap, the `mem` type. As an optimization, we
    were precomputing a memory accessors during the construction of a memory
    object. Although, it might save some time and didn't at all, as the
    memory accessor operations are rarely used if used at all. Thus, this PR
    removes the precomputed getters from the memory, and moves all the check
    to the time of getter invokation.
    As a result, this reduces the size of memory representation by 10
    words (80 bytes on amd64). In total with the #680, this will reduce
    memory representation from 21 words to only 7.
Commits on Jul 31, 2017
  1. new bitvector representation (#680)

    ivg committed Jul 31, 2017
    The proposed representation uses the Z.t as a container for both: the
    payload, and the metadata.
    We reserve 15 bits for the metadata. The first bit represents whether a
    bitvector should be treated as signed or unsigned. The rest 14 bits
    contain the size of a bitvector. With such representation we can encode
    only bitvectors of size 2^14 or less. This is, however, enough in
    practice. If more would be needed we may switch to a more complex
    prefixed encoding akin to UTF.
         |  payload  | size | s |
          size+15  15 14       0
    Given this representation all bitvectors that are smaller than
    0x1_000_000_000_000 are represented as an OCaml int, and are thus
    unboxed, so creating new bitvector, and performing bitvector
    arithmentics no longer perfoms allocation.
     # let deadbeef = Word.of_int32 0xDEADBEEFl;;
     val deadbeef : Bap.Std.Word.t = 0xdeadbeef:32
     # Obj.is_int (Obj.repr deadbeef);;
     - : bool = true
    The previous representation is still available under
    `Word.Stable.V1`. It is useful for serialization and deserialization of
    data stored in previous representation. In particular, we are using the
    old representation to read byteweight signatures.