From 139b7cbb19234c008306669ea4a0e67a6e55faa3 Mon Sep 17 00:00:00 2001 From: Ivan Gotovchits Date: Thu, 9 Feb 2017 15:29:07 -0500 Subject: [PATCH] Beagle: obfuscated string solver (#627) * first prototype of beagle Idea ==== On a high level, Beagle is tracking everything that smells to him. If something smells, then it will grab the track and follow, until the trail temperature is above the given threshold, in that case it will grab the target and mark it as his prey. The temperature is estimated with the following recurrences: $$$ t = t * (1 - alpha) + alpha * phi, $$$ where $t$ - is a temperature, between 0 and 1, is our confidence, that whatever we're sniffing is our target; $alpha$ - sensitivity parameter (defines how fast the trail is loosing the temperature); $phi$ - a confidence that something smells (again between 0 and 1). Defines how fast we will hunt the prey. Currently, it can be either 0 if something doesn't smell, or 1-1e4 if it smells. Implementation ============== Beagle uses microexecution, to run each subroutine. Every time a word is seen by the interpreter the beagle is targeted on it. Current status ============= Currently all the parameters are hardcoded, and many parameters should be estimated, not specified by a user. The only predicate, that defines that something smells is whether a character is_ascii or is_null. Usage ===== 1. Build and install with make 2. Run from IDA with `Shift-S` and specify option `--beagle` and the attribute name `beagle-prey` 3. Use the text search (Alt-T) to navigate through all preys. * beagle update The new features: 1. searching for static strings in the binary 2. identifying references to the static strings 3. signal processing of a char data to identify word-like patterns 4. word recovering based on dictionary Attributes: beagle-words -- words, that can be built by the term beagle-chars -- a sequence of characters that were computed by the term and nearby terms beagle-strings -- static strings that were explicitly referenced by this term (either directly or indirectly) Basically, here is the description of the algorithm: 1. identify all null-terminated strings 2. microexecute each subroutine and each block and try to detect word like signals 3. microexecute each subroutine and each block and detect any direct or indirect accesses to the known null terminated string 4. use user provided dictionary, plus strings that were find in the binary and recover all possible words, that can be built from the set of strings. 5. annotate project with whatever was found. * matures beagle This is a big commit, reflecting several days of beagle training. First of all, I've trained him to understand few new commands: --beagle-print-strings - will print all found static strings; --beagle-print-words - will print all the generated words; --beagle-print-chars - print the detected char sequences; --beagle-ignore-strings - will ignore found static strings; --beagle-no-words - will not try to build words from a dictionary; --beagle-alphabet - will force beagle to use different alphabet; --beagle-words - will add words to beagle's dictionary. The default use case is still the same - run beagle via `--pass=beagle` and annotate program tree with references to strings. The print options are mostly for the debugging and standalone using. Speaking of the static strings, i.e., if we will compare it with the `strings` utility, currently beagle outputs slightly less strings than `strings`. This is not a bug, or limitation. As by design Beagle finds only strings that are null-terminated, where `strings` will find all strings even if the last character is an arbitrary non-printabl byte. To my experiments, this behavior only adds more trash. However, probably, there was a good reason why `strings` authors decided to use any strings. For example, to work with pascal binaries, where strings are not required to have a null character at the end. Given this consideration, we will probably resign from our allegiency to the null-terminator in the future. * prepares beagle for the release prefixed names, packaged in oasis, and other boring stuff. --- lib/beagle/beagle_prey.ml | 33 +++ lib/beagle/beagle_prey.mli | 26 ++ plugins/beagle/beagle_main.ml | 416 ++++++++++++++++++++++++++++++ plugins/beagle/beagle_main.mli | 0 plugins/beagle/beagle_trapper.ml | 122 +++++++++ plugins/beagle/beagle_trapper.mli | 34 +++ 6 files changed, 631 insertions(+) create mode 100644 lib/beagle/beagle_prey.ml create mode 100644 lib/beagle/beagle_prey.mli create mode 100644 plugins/beagle/beagle_main.ml create mode 100644 plugins/beagle/beagle_main.mli create mode 100644 plugins/beagle/beagle_trapper.ml create mode 100644 plugins/beagle/beagle_trapper.mli diff --git a/lib/beagle/beagle_prey.ml b/lib/beagle/beagle_prey.ml new file mode 100644 index 000000000..dbb7d3a83 --- /dev/null +++ b/lib/beagle/beagle_prey.ml @@ -0,0 +1,33 @@ +open Core_kernel.Std +open Bap.Std +open Format + +module Words = struct + type t = String.Set.t [@@deriving bin_io, compare, sexp] + + let max = 80 + + let pp ppf set = + let words = Set.to_list set |> String.concat ~sep:", " in + let words = if String.length words < max then words + else String.subo ~len:max words in + fprintf ppf "%s" (String.escaped words) + + let to_string set = asprintf "%a" pp set +end + +type words = Words.t + + +let chars = Value.Tag.register (module Words) + ~uuid:"ff83ee29-1f58-4dc4-840c-4249de04a977" + ~name:"beagle-chars" + +let words = Value.Tag.register (module Words) + ~uuid:"08e1ca88-eab9-4ac3-8fa8-3b08735a30e5" + ~name:"beagle-words" + + +let strings = Value.Tag.register (module Words) + ~uuid:"386efa37-85b0-4b48-b04d-8bafd5160670" + ~name:"beagle-strings" diff --git a/lib/beagle/beagle_prey.mli b/lib/beagle/beagle_prey.mli new file mode 100644 index 000000000..3275882be --- /dev/null +++ b/lib/beagle/beagle_prey.mli @@ -0,0 +1,26 @@ +open Core_kernel.Std +open Bap.Std + +module Words : sig + type t = String.Set.t + include Value.S with type t := t + val to_string : t -> string +end +type words = Words.t + +(** Attributes that are added by beagle analysis. *) + +(** Each string in a set is a sequence of characters + that were detected by Beagle on each emulation (it is possible, + that beagle will sniff the same term more than once). + The characters are specified in an order in which they were + observed.*) +val chars : words tag + +(** a set of static strings that we directly or indirectly referenced + the emulation of a term.*) +val strings : words tag + +(** a set of words that can be built from a specified alphabet with + the observed characters. *) +val words : words tag diff --git a/plugins/beagle/beagle_main.ml b/plugins/beagle/beagle_main.ml new file mode 100644 index 000000000..08cdbe493 --- /dev/null +++ b/plugins/beagle/beagle_main.ml @@ -0,0 +1,416 @@ +open Core_kernel.Std +open Bap.Std +open Microx.Std +open Regular.Std +open Format + +open Beagle_prey + +module Trapper = Beagle_trapper + +include Self() + +module Param = struct + open Config;; + + manpage [ + `S "DESCRIPTION"; + + `P "Beagle is an obfuscated string solver that uses CPU emulation + for discovering and decoding strings. Beagle can also be used as + an advanced data cross-referrence tool. Beagle combines + Microexecution with digital signal processing techniques and + efficient dictionary search. Data that pass through a CPU emulator + is processed with a low-pass filter and string-like patterns are + detected. At the final step, words are recovered from a detected + shuffled sequence of characters. For this we employ a + scrabble-like algorithm, that will detect all possible words, that + can be built from a given sequence of characters. The algorithm is + parameterized by a dictionary. It can be just an English + dictionary, but it can be also a dictionary built from signatures + obtained from a known compromised software. The dictionary search + algorithm uses a special trie data structure with a search + procedure that doesn't depend on the size of a dictionary or the + size of a character sequence. " ];; + + let no_strings = flag "ignore-strings" + ~doc:"don't put static strings into the initial dictionary" + + let dicts = param (list file) "dictionary" + ~doc:"Add dictionary file(s)." + + let words = param (list string) "words" + ~doc:"Add specified words to the dictionary." + + let pwords = flag "print-words" + ~doc:"Print all buildable words." + + let pchars =flag "print-chars" ~doc:"Print all observed letters." + + let pstrings = flag "print-strings" ~doc:"Print static strings" + + let no_words = flag "no-words" + ~doc:"Don't try to build words from a dictionary" + + let alphabet : + (module Trapper.Alphabet) param = + param Trapper.(enum [ + "printable", (module Ascii.Printable : Trapper.Alphabet); + "ascii", (module Ascii); + "alpha", (module Ascii.Alpha); + "alpha.caseless", (module Ascii.Alpha.Caseless); + "alphanum", (module Ascii.Alphanum); + "alphanum.caseless", (module Ascii.Alphanum.Caseless); + "digits", (module Ascii.Digits); + ]) "alphabet" + ~doc:"Build words from the specified alphabet"; + +end + + +let memory_lookup proj addr = + let memory = Project.memory proj in + Memmap.lookup memory addr |> Seq.hd |> function + | None -> None + | Some (mem,_) -> match Memory.get ~addr mem with + | Ok w -> Some w + | _ -> None + +let register_lookup proj = + let arch = Project.arch proj in + let width = Arch.addr_size arch |> Size.in_bits in + let mem_start = Word.of_int64 ~width 0x40000000L in + let module Target = (val target_of_arch arch) in + fun var -> Option.some_if (Target.CPU.is_sp var) mem_start + + +module Strings = struct + type state = + | String of addr * int * char list + | Data + + let to_ascii word = match Word.to_int word with + | Error _ -> assert false + | Ok n -> match Char.of_int_exn n with + | '\x00' -> Some '\x00' + | ch when Char.is_print ch || Char.is_whitespace ch -> Some ch + | _ -> None + + let make_string len chars = + let bytes = Bytes.create len in + List.iteri chars ~f:(fun i c -> bytes.[len-i-1] <- c); + Bytes.to_string bytes + + let scan (mem,sec) = + let addr = Memory.min_addr mem in + let rec next strings state disp = + match Memory.get mem ~disp ~addr with + | Error _ -> strings + | Ok byte -> match to_ascii byte,state with + | None,_ -> next strings Data (disp+1) + | Some '\x00',Data -> next strings Data (disp+1) + | Some '\x00',String (_,len,_) when len < 4 -> + next strings Data (disp+1) + | Some '\x00',String (base,len,chars) -> + let data = make_string len chars in + next (Map.add strings ~key:base ~data) Data (disp+1) + | Some ch,Data -> + let base = Addr.nsucc addr disp in + next strings (String (base,1,[ch])) (disp+1) + | Some ch,String (base,n,chars) -> + next strings (String (base,n+1,ch::chars)) (disp+1) in + next Addr.Map.empty Data 0 + + let union = Map.merge ~f:(fun ~key -> function + | `Both (s1,s2) -> + Option.some @@ + if String.length s1 > String.length s2 then s1 else s2 + | `Left s | `Right s -> Some s) + + let extract memmap = + let ms = Memmap.to_sequence memmap in + Seq.(ms >>| scan |> reduce ~f:union) |> function + | None -> Addr.Map.empty + | Some m -> m + +end + +module Beagle = struct + let len = 8 + + let p0 = 0.0 + let p1 = 1. /. float Trapper.Ascii.Alphanum.length + let threshold = p1 + let alpha = 0.1 + + type target = { + len : int; + tids : tid list; + chars : char list; + } + + type hypot = { + target : target; + accepted : bool; + } + + type t = { + temp : float; + targets : target list; + hypot : hypot; + } + + + let empty_target = {tids=[]; chars=[]; len=0} + let h0 = {target = empty_target; accepted=false} + + let empty = { + temp = p0; + hypot = h0; + targets = []; + } + + let targets t = t.targets + let uniform _ = p1 + let pdf c = uniform c + + let smells ch = Char.is_print ch + + let phi c = match Char.of_int c with + | Some ch when smells ch -> Some (ch,1. -. pdf ch) + | _ -> None + + let smooth t x = t *. (1. -. alpha) +. alpha *. x + + type action = + | Done + | Keep + | Lock + + let classify temp hypot = + if hypot.accepted then + if temp < threshold then Done else Keep + else + if temp >= threshold then Lock else Keep + + + let step_char t tid char = + let p,hypot = match phi char with + | None -> 0.0, t.hypot + | Some (char,p) -> + (if t.hypot.target.len = 3 * len then 0.0 else p), { + t.hypot with target = { + len = t.hypot.target.len + 1; + tids = tid :: t.hypot.target.tids; + chars = char :: t.hypot.target.chars; + }} in + let temp = max p0 (smooth t.temp p) in + match classify temp hypot with + | Done -> {temp; hypot=h0; targets=hypot.target::t.targets} + | Keep -> {t with temp; hypot} + | Lock -> {t with temp; hypot = { + accepted=true; + target = { + len = 8; + tids = List.take hypot.target.tids 8; + chars = List.take hypot.target.chars 8; + }}} + + let step t tid x = + if Word.bitwidth x = 1 then t + else Word.enum_bytes x LittleEndian |> + Seq.map ~f:(fun w -> ok_exn (Word.to_int w)) |> + Seq.fold ~init:t ~f:(fun t c -> step_char t tid c) + +end + +module Interpreter = struct + open Monad.State.Monad_infix + + let max_steps = 100 + let max_loop = 10 + + class context prog = object + inherit Conqueror.context ~max_steps ~max_loop prog + val beagle = Beagle.empty + val strings : string list Tid.Map.t = Tid.Map.empty + method with_beagle b = {< beagle = b >} + method beagle = beagle + method strings = strings + method with_strings s = {< strings = s >} + end + + let main proj strings = + let prog = Project.program proj in + let memory = memory_lookup proj in + let lookup = register_lookup proj in + let arch = Project.arch proj in + let endian = Arch.endian arch in + let size = (Arch.addr_size arch :> size) in + let module Target = (val target_of_arch arch) in + let mem = Bil.var Target.CPU.mem in + object(self) + constraint 'a = #context + inherit ['a] Conqueror.main prog as super + inherit! ['a] Concretizer.main ~memory ~lookup () + + method! eval_def def = + super#eval_def def >>= fun () -> + self#sniff_def (Def.lhs def) + + method! eval_arg arg = + super#eval_arg arg >>= fun () -> + self#sniff_def (Arg.lhs arg) + + + method! eval_load ~mem ~addr e s = + self#update_strings addr >>= fun () -> + super#eval_load ~mem ~addr e s + + method private sniff_def var = + super#lookup var >>| Bil.Result.value >>= function + | Bil.Imm x -> + self#update_context (fun ctxt tid -> + ctxt#with_beagle (Beagle.step ctxt#beagle tid x)) >>= fun () -> + self#update_strings (Bil.Int x) + | _ -> Monad.State.return () + + method private update_strings addr = + super#eval_exp addr >>| Bil.Result.value >>= function + | Bil.Mem _ | Bil.Bot -> Monad.State.return () + | Bil.Imm addr -> match Map.find strings addr with + | Some data -> self#update_with_string data + | None -> + super#eval_load ~mem ~addr:(Bil.Int addr) endian size >>| + Bil.Result.value >>= function + | Bil.Mem _ | Bil.Bot -> Monad.State.return () + | Bil.Imm addr -> match Map.find strings addr with + | Some data -> self#update_with_string data + | None -> Monad.State.return () + + method private update_with_string data = + self#update_context (fun ctxt tid -> + Map.add_multi ctxt#strings ~key:tid ~data |> + ctxt#with_strings) + + method private update_context f = + Monad.State.update (fun ctxt -> match List.hd ctxt#trace with + | None -> assert false + | Some tid -> f ctxt tid) + + end +end + +type prey = { + chars : string list Tid.Map.t; + strings : string list Tid.Map.t; +} + +let collect_chars targets ctxt = + Beagle.targets ctxt#beagle |> + List.fold ~init:targets ~f:(fun targets {Beagle.tids; chars} -> + let data = String.of_char_list chars in + List.fold ~init:targets tids ~f:(fun targets key -> + Map.add_multi targets ~key:key ~data)) + +let collect_strings existing ctxt = + Map.merge existing ctxt#strings ~f:(fun ~key -> function + | `Both (s1,s2) -> Some (s1 @ s2) + | `Left s | `Right s -> Some s) + + +let run proj strings = + let prog = Project.program proj in + let interp = Interpreter.main proj strings in + let start = new Interpreter.context prog in + let init = {chars = Tid.Map.empty; strings = Tid.Map.empty} in + Term.enum sub_t prog |> + Seq.fold ~init ~f:(fun prey sub -> + let ctxt = Monad.State.exec (interp#eval_sub sub) start in + let prey = { + chars = collect_chars prey.chars ctxt; + strings = collect_strings prey.strings ctxt + } in + Term.enum blk_t sub |> Seq.fold ~init:prey ~f:(fun prey blk -> + let ctxt = Monad.State.exec (interp#eval_blk blk) start in + { + chars = collect_chars prey.chars ctxt; + strings = collect_strings prey.strings ctxt + })) + +let create_marker {Config.get=(!!)} statics {chars=cs; strings=ss} = + let module Alphabet = (val !!Param.alphabet) in + let module Trapper = Trapper.Make(Alphabet) in + let add_statics trapper = + Map.fold statics ~init:trapper ~f:(fun ~key ~data dict -> + Trapper.add_word dict data) in + let dict = Trapper.of_files !!Param.dicts in + let dict = if !!Param.no_strings then dict else add_statics dict in + let dict = List.fold !!Param.words ~f:Trapper.add_word ~init:dict in + let add_chars t = + match Map.find cs (Term.tid t) with + | None -> t + | Some sets -> + let sets = String.Set.of_list sets in + let t = Term.set_attr t chars sets in + let ws = + Set.fold sets ~init:String.Set.empty ~f:(fun words cs -> + Trapper.build dict cs |> + Seq.filter ~f:(fun w -> String.length w > 3) |> + Seq.fold ~init:words ~f:Set.add) in + if Set.is_empty ws then t + else Term.set_attr t words ws in + let add_string t = match Map.find ss (Term.tid t) with + | None -> t + | Some ss -> + String.Set.of_list ss |> + Term.set_attr t strings in + object(self) + inherit Term.mapper as super + method! map_term cls t = + let t = super#map_term cls t in + add_chars t |> + add_string + + end + +let print_strings = Map.iteri ~f:(fun ~key ~data -> + if String.length data > 3 then printf "%s\n" (String.escaped data)) + + +let create_printer {Config.get=(!!)} = + let pchars = !!Param.pchars + and pwords = !!Param.pwords + and pstrings = !!Param.pstrings in + + let pr guard tag t = + if guard then Option.iter (Term.get_attr t tag) + ~f: (fun t -> printf "%s\n" @@ Words.to_string t) in + object + inherit [unit] Term.visitor as self + method! enter_term cls t () = + pr pchars chars t; + pr pwords words t; + pr pstrings strings t; + method print_program prog = + if pchars || pwords || pstrings then + self#run prog () + + method print_strings strings = + if pstrings then print_strings strings + end + +let main conf proj = + let strings = Strings.extract (Project.memory proj) in + let printer = create_printer conf in + printer#print_strings strings; + let preys = run proj strings in + let marker = create_marker conf strings preys in + let prog = marker#run (Project.program proj) in + printer#print_program prog; + Project.with_program proj prog + + + + +let () = Config.when_ready (fun conf -> Project.register_pass (main conf)) diff --git a/plugins/beagle/beagle_main.mli b/plugins/beagle/beagle_main.mli new file mode 100644 index 000000000..e69de29bb diff --git a/plugins/beagle/beagle_trapper.ml b/plugins/beagle/beagle_trapper.ml new file mode 100644 index 000000000..becfdf0cd --- /dev/null +++ b/plugins/beagle/beagle_trapper.ml @@ -0,0 +1,122 @@ +open Core_kernel.Std +open Format + + +module type Alphabet = sig + val length : int + val index : char -> int +end + +module Ascii = struct + let (-) x y = Char.(to_int x - Char.to_int y) + module Alpha = struct + module Caseless = struct + let length = 26 + let index c = Char.uppercase c - 'A' + end + let length = 26 * 2 + let index c = + if Char.is_uppercase c then c - 'A' + else c - 'a' + 26 + end + + module Digits = struct + let length = 10 + let index c = c - '0' + end + + module Alphanum = struct + module Caseless = struct + let length = 26 + 10 + let index c = + if Char.is_alpha c then Alpha.Caseless.index c + else Digits.index c + 26 + end + let length = 26 * 2 + 10 + let index c = + if Char.is_alpha c then Alpha.index c + else Digits.index c + 26 * 2 + end + + module Printable = struct + let length = 95 + let index c = c - ' ' + end + let length = 128 + let index = Char.to_int +end + +module Make(Alpha : Alphabet) = struct + type t = { + dict : t Int.Map.t; + data : string list; + } [@@deriving bin_io, compare, sexp] + + + let spectrum word = + Array.init Alpha.length ~f:(fun i -> + String.count word ~f:(fun c -> Alpha.index c = i)) + + let empty = {dict = Int.Map.empty; data=[]} + + + let add_word dict word = + let count = spectrum word in + let rec add {dict; data} i = + if i < Array.length count then { + data; + dict = Map.update dict count.(i) ~f:(function + | None -> add empty (i+1) + | Some sub -> add sub (i+1)) + } else {empty with data = word :: data} in + add dict 0 + + let build dict word = + let count = spectrum word in + let rec find {dict; data} i = + if i < Array.length count then + Sequence.range 0 count.(i) ~stop:`inclusive |> + Sequence.concat_map ~f:(fun cnt -> match Map.find dict cnt with + | None -> Sequence.empty + | Some dict -> find dict (i+1)) + else Sequence.of_list data in + find dict 0 + + let is_buildable dict word = + let count = spectrum word in + let rec find {dict} i = + i >= Array.length count || + Sequence.range 0 count.(i) ~stop:`inclusive |> + Sequence.exists ~f:(fun cnt -> match Map.find dict cnt with + | None -> false + | Some dict -> find dict (i+1)) in + find dict 0 + + let add_from_file dict name = + In_channel.(with_file name ~f:(fold_lines ~init:dict ~f:add_word)) + + let of_file name = add_from_file empty name + + let of_files files = + List.fold files ~init:empty ~f:add_from_file + +end + + +(* module Test = struct *) +(* module Dict = Make(Ascii.Printable) *) +(* let run () = *) +(* let dict = *) +(* In_channel.(with_file Sys.argv.(1) *) +(* ~f:(fold_lines ~init:Dict.empty ~f:Dict.add_word)) in *) +(* printf "Dictionary is %d MB big\n" (Dict.bin_size_t dict / 1024 / 1024); *) +(* let prompt () = *) +(* printf "Enter characters and hit enter (or Ctrl-D to stop): %!" in *) +(* prompt (); *) +(* In_channel.iter_lines stdin ~f:(fun set -> *) +(* Dict.build dict set |> Sequence.iter ~f:print_endline; *) +(* prompt ()) *) + +(* end *) + +(* let () = Test.run () *) diff --git a/plugins/beagle/beagle_trapper.mli b/plugins/beagle/beagle_trapper.mli new file mode 100644 index 000000000..c45b56ac6 --- /dev/null +++ b/plugins/beagle/beagle_trapper.mli @@ -0,0 +1,34 @@ +open Bap.Std + +module type Alphabet = sig + val length : int + val index : char -> int +end + +module Ascii : sig + module Alpha : sig + module Caseless : Alphabet + include Alphabet + end + module Alphanum : sig + module Caseless : Alphabet + include Alphabet + end + module Digits : Alphabet + module Printable : Alphabet + include Alphabet +end + +module Make(A : Alphabet) : sig + type t [@@deriving bin_io, compare, sexp] + + val empty : t + + val of_file : string -> t + val of_files : string list -> t + + val add_word : t -> string -> t + val build : t -> string -> string seq + val is_buildable : t -> string -> bool + +end