Skip to content

Commit

Permalink
Beagle: obfuscated string solver (#627)
Browse files Browse the repository at this point in the history
* first prototype of beagle

Idea
====

On a high level, Beagle is tracking everything that smells to him. If
something smells, then it will grab the track and follow, until the
trail temperature is above the given threshold, in that case it will
grab the target and mark it as his prey.

The temperature is estimated with the following recurrences:
   $$$
   t = t * (1 - alpha) + alpha * phi,
   $$$

   where $t$ - is a temperature, between 0 and 1,
               is our confidence,
               that whatever we're sniffing is our target;
         $alpha$ - sensitivity parameter (defines how fast
               the trail is loosing the temperature);
         $phi$ - a confidence that something smells
                (again between 0 and 1).
                Defines how fast we will hunt the prey.
                Currently, it can be either 0 if something doesn't
               smell, or 1-1e4 if it smells.

Implementation
==============

Beagle uses microexecution, to run each subroutine. Every time a word is
seen by the interpreter the beagle is targeted on it.

Current status
=============

Currently all the parameters are hardcoded, and many parameters should
be estimated, not specified by a user. The only predicate, that defines
that something smells is whether a character is_ascii or is_null.

Usage
=====

1. Build and install with

   make

2. Run from IDA with `Shift-S` and specify option `--beagle`
   and the attribute name `beagle-prey`

3. Use the text search (Alt-T) to navigate through all preys.

* beagle update

The new features:

1. searching for static strings in the binary
2. identifying references to the static strings
3. signal processing of a char data to identify word-like patterns
4. word recovering based on dictionary

Attributes:

beagle-words -- words, that can be built by the term
beagle-chars -- a sequence of characters that were computed by the term
                and nearby terms
beagle-strings -- static strings that were explicitly referenced by this
                  term (either directly or indirectly)

Basically, here is the description of the algorithm:

1. identify all null-terminated strings
2. microexecute each subroutine and each block
   and try to detect word like signals
3. microexecute each subroutine and each block
   and detect any direct or indirect accesses to the known null
   terminated string
4. use user provided dictionary, plus strings that were find in the
   binary and recover all possible words, that can be built from the
   set of strings.
5. annotate project with whatever was found.

* matures beagle

This is a big commit, reflecting several days of beagle training.

First of all, I've trained him to understand few new commands:

--beagle-print-strings - will print all found static strings;
--beagle-print-words - will print all the generated words;
--beagle-print-chars - print the detected char sequences;
--beagle-ignore-strings - will ignore found static strings;
--beagle-no-words - will not try to build words from a dictionary;
--beagle-alphabet - will force beagle to use different alphabet;
--beagle-words - will add words to beagle's dictionary.

The default use case is still the same - run beagle via `--pass=beagle`
and annotate program tree with references to strings.

The print options are mostly for the debugging and standalone using.

Speaking of the static strings, i.e., if we will compare it with the
`strings` utility, currently beagle outputs slightly less strings than
`strings`. This is not a bug, or limitation. As by design Beagle finds
only strings that are null-terminated, where `strings` will find all
strings even if the last character is an arbitrary non-printabl byte. To
my experiments, this behavior only adds more trash. However, probably,
there was a good reason why `strings` authors decided to use any
strings. For example, to work with pascal binaries, where strings are
not required to have a null character at the end. Given this
consideration, we will probably resign from our allegiency to the
null-terminator in the future.

* prepares beagle for the release

prefixed names, packaged in oasis, and other boring stuff.
  • Loading branch information
ivg authored and gitoleg committed Feb 9, 2017
1 parent 5a82d0e commit 139b7cb
Show file tree
Hide file tree
Showing 6 changed files with 631 additions and 0 deletions.
33 changes: 33 additions & 0 deletions lib/beagle/beagle_prey.ml
@@ -0,0 +1,33 @@
open Core_kernel.Std
open Bap.Std
open Format

module Words = struct
type t = String.Set.t [@@deriving bin_io, compare, sexp]

let max = 80

let pp ppf set =
let words = Set.to_list set |> String.concat ~sep:", " in
let words = if String.length words < max then words
else String.subo ~len:max words in
fprintf ppf "%s" (String.escaped words)

let to_string set = asprintf "%a" pp set
end

type words = Words.t


let chars = Value.Tag.register (module Words)
~uuid:"ff83ee29-1f58-4dc4-840c-4249de04a977"
~name:"beagle-chars"

let words = Value.Tag.register (module Words)
~uuid:"08e1ca88-eab9-4ac3-8fa8-3b08735a30e5"
~name:"beagle-words"


let strings = Value.Tag.register (module Words)
~uuid:"386efa37-85b0-4b48-b04d-8bafd5160670"
~name:"beagle-strings"
26 changes: 26 additions & 0 deletions lib/beagle/beagle_prey.mli
@@ -0,0 +1,26 @@
open Core_kernel.Std
open Bap.Std

module Words : sig
type t = String.Set.t
include Value.S with type t := t
val to_string : t -> string
end
type words = Words.t

(** Attributes that are added by beagle analysis. *)

(** Each string in a set is a sequence of characters
that were detected by Beagle on each emulation (it is possible,
that beagle will sniff the same term more than once).
The characters are specified in an order in which they were
observed.*)
val chars : words tag

(** a set of static strings that we directly or indirectly referenced
the emulation of a term.*)
val strings : words tag

(** a set of words that can be built from a specified alphabet with
the observed characters. *)
val words : words tag

0 comments on commit 139b7cb

Please sign in to comment.