first commit

JonathanSalwan · Oct 28, 2016 · 020e8bc · 020e8bc
commit 020e8bc
Show file tree

Hide file tree

Showing 211 changed files with 84,802 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,3 @@
+.~*
+private/*
+*.pyc
diff --git a/README.md b/README.md
@@ -0,0 +1,160 @@
+# Tigress Protections
+
+> [Tigress](http://tigress.cs.arizona.edu/) is a diversifying virtualizer/obfuscator for the C language that supports many novel defenses against both static and dynamic reverse engineering and de-virtualization attacks.
+
+> In particular, Tigress protects against static de-virtualization by generating virtual instruction sets of arbitrary complexity and diversity, by producing interpreters with multiple types of instruction dispatch, and by inserting code for anti alias analysis. Tigress protects against dynamic de-virtualization by merging the real code with bogus functions, by inserting implicit flow, and by creating slowly-executing reenetrant interpreters. Tigress implements its own version of code packing through the use of runtime code generation. Finally, Tigress' dynamic transformation provides a generalized form of continous runtime code modification.
+
+# VMs descriptions
+
+Tigress team has provided some [challenges](http://tigress.cs.arizona.edu/challenges.html#current) where we can find different kind of protections
+
+* **VM-0**: One level of virtualization, random dispatch.
+* **VM-1**: One level of virtualization, superoperators, split instruction handlers.
+* **VM-2**: One level of virtualization, bogus functions, implicit flow.
+* **VM-3**: One level of virtualization, instruction handlers obfuscated with arithmetic encoding, virtualized function is split and the split parts merged.
+* **VM-4**: Two levels of virtualization, implicit flow.
+* **VM-5**: One level of virtualization, one level of jitting, implicit flow.
+* **VM-6**: Two levels of jitting, implicit flow.
+
+# Challenge
+
+All challenges take as input a number and return a hash. Example:
+
+<pre>
+$ ./obfuscated_binaries/tigress-2-challenge-2 1234
+202180712448
+
+$ ./obfuscated_binaries/tigress-2-challenge-2 823748
+50564355584
+
+$ ./obfuscated_binaries/tigress-2-challenge-2 2834723
+50714072576
+</pre>
+
+The hash computation function is obfuscated. Types of possible attacks:
+
+* In a source recovery attack the task is to identify the algorithm that computes SECRET.
+* In a data recovery attack the task is to extract a specific run-time or compile-time data item.
+* In a metadata recovery attack the task is to identify the sequence of transformations that resulted in SECRET, along with arguments to those transformations, such as the dispatch method used in a virtualization.
+* In a location attack the task is to identify the code bytes of the program that comprise the obfuscated SECRET function.
+
+# Automatic deobfuscation
+
+Our goals were to:
+
+* Symbolically extract the hash algorithm
+* Simplify these symbolic expressions
+* Provide python scripts where we can get the hash from a given input **and** get input collisions from a given hash
+* Provide a new simplified version of the binary
+
+And all of this with only one generic script :). To do so, we made in the following order:
+
+* Symbolically emulate the obfuscated binary with [Triton](https://github.com/JonathanSalwan/Triton)
+* Concretize everything which are not related to the user input.
+* Extract the hash algorithm and create `input->hash` and `hash->inputs` using [templates](templates.py)
+* Convert Triton's expressions to the [Arybo's](https://github.com/quarkslab/arybo) expressions
+* Convert Arybo's expressions to the LLVM-IR representation
+* Apply LLVM optimizations (O2)
+* Rebuild a simplified binary version
+
+If you want more information, you can checkout our [solve-vm.py](solve-vm.py) script.
+
+# solve-vm.py
+
+**Prerequisites**: you must clone the branch `dev-319-bis` of Triton, the branch `feature/exprs` of Arybo and the [llvmlite](https://github.com/numba/llvmlite) project.
+
+However, we already pushed all of our results in this repository but if you want to reproduce by yourself this
+analysis, you only have to do execute `solve-vm.py` like this:
+
+<pre>
+$ ./solve-vm.py ./obfuscated_binaries/_binary_
+</pre>
+
+Example:
+
+<pre>
+$ ./solve-vm.py ./obfuscated_binaries/tigress-0-challenge-0
+./solve-vm.py:441: SyntaxWarning: name 'VM_INPUT' is assigned to before global declaration
+  global VM_INPUT
+[+] Loading 0x400040 - 0x400238
+[+] Loading 0x400238 - 0x400254
+[+] Loading 0x400000 - 0x400f14
+[+] Loading 0x601e28 - 0x602550
+[+] Loading 0x601e50 - 0x601fe0
+[+] Loading 0x400254 - 0x400298
+[+] Loading 0x400dc4 - 0x400e08
+[+] Loading 0x000000 - 0x000000
+[+] Loading 0x601e28 - 0x602000
+[+] Hooking printf
+[+] Hooking __libc_start_main
+[+] Hooking strtoul
+[+] Starting emulation.
+[+] __libc_start_main hooked
+[+] argv[0] = ./obfuscated_binaries/tigress-0-challenge-0
+[+] argv[1] = 1234
+[+] strtoul hooked
+[+] Symbolizing the strtoul return
+[+] printf hooked
+3035321144166078008
+[+] Slicing end-point user expression
+[+] Instruction executed: 39817
+[+] PC len: 0
+[+] Emulation done.
+[+] Generating symbolic_expressions/./tigress-0-challenge-0_input_to_hash.py
+[+] Generating symbolic_expressions/./tigress-0-challenge-0_hash_to_input.py
+[+] Converting symbolic expressions to an LLVM module...
+warning: overriding the module target triple with x86_64-pc-linux-gnu [-Woverride-module]
+1 warning generated.
+[+] LLVM module wrote in llvm_expressions/./tigress-0-challenge-0.ll
+[+] Recompiling deobfuscated binary...
+warning: overriding the module target triple with x86_64-pc-linux-gnu [-Woverride-module]
+1 warning generated.
+[+] Deobfuscated binary recompiled: deobfuscated_binaries/./tigress-0-challenge-0.deobfuscated
+$
+</pre>
+
+Then, symbolic expressions can be found [here](symbolic_expressions), LLVM representations can be found [here](llvm_expressions)
+and recompiled binaries can be found [here](deobfuscated_binaries).
+
+# Testing our simplified binaries
+
+As we simplified and recompiled new binaries, we must provide the same behavior of the original binaries. So, to test our binary versions we use this [script](testing_equality.py).
+
+<pre>
+$ ./testing_equality.py ./obfuscated_binaries/tigress-0-challenge-0 ./deobfuscated_binaries/tigress-0-challenge-0.deobfuscated
+[...]
+[+] Success with 272966812638982633
+[+] Success with 2304147855662358786
+[+] Success with 15697842028176298504
+[+] Success with 15273138908025273913
+[+] Success with 17329851347176088980
+[+] Success with 12160831137213706322
+[+] Success with 3489058267725840982
+[+] Success with 6474275930952607745
+[+] Success with 7363567981237584398
+[+] Success with 3685039181436704621
+[+] Success: 100.00
+</pre>
+
+Basically, this script runs the obfuscated and the deobfuscated binaries with random inputs and checks if they have the same output results.
+
+# Benchmarks
+
+## Results with only one trace
+
+![With one trace](pictures/result_with_one_trace.png)
+
+## Results with the union of two traces
+
+![With two traces](pictures/result_with_two_traces.png)
+
+## Time of extraction per trace
+
+![Time per trace](pictures/time_per_trace.png)
+
+# Credits
+
+* [Adrien Guinet](https://twitter.com/adriengnt) for the Arybo and LLVM parts (Quarkslab)
+* [Romain Thomas](https://twitter.com/rh0main) for the Triton part (Quarkslab)
+* [Jonathan Salwan](https://twitter.com/JonathanSalwan) for the Triton part (Quarkslab)
+
diff --git a/deobfuscated_binaries/run.c b/deobfuscated_binaries/run.c
@@ -0,0 +1,21 @@
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <errno.h>
+
+extern uint64_t __arybo(uint64_t a);
+
+int main(int argc, char** argv)
+{
+  if (argc < 2) {
+    fprintf(stderr, "Usage: %s n\n", argv[0]);
+    return 1;
+  }
+  uint64_t n = strtoull(argv[1], NULL, 0);
+  if (errno != 0) {
+    perror("unable to represent the input as an uint64");
+    return 1;
+  }
+  printf("%lu\n", __arybo(n));
+  return 0;
+}
diff --git a/deobfuscated_binaries/run.cpp b/deobfuscated_binaries/run.cpp
@@ -0,0 +1,19 @@
+#include <cstdlib>
+#include <cstdint>
+#include <iostream>
+#include <sstream>
+
+extern "C" uint64_t __arybo(uint64_t a);
+
+int main(int argc, char** argv)
+{
+  if (argc < 2) {
+    std::cerr << "Usage: " << argv[0] << " n" << std::endl;
+    return 1;
+  }
+  uint64_t n;
+  std::istringstream ss(argv[1]);
+  ss >> n;
+  std::cout << __arybo(n) << std::endl;
+  return 0;
+}
diff --git a/deobfuscated_binaries/tigress-0-challenge-0.deobfuscated b/deobfuscated_binaries/tigress-0-challenge-0.deobfuscated
diff --git a/deobfuscated_binaries/tigress-0-challenge-1.deobfuscated b/deobfuscated_binaries/tigress-0-challenge-1.deobfuscated
diff --git a/deobfuscated_binaries/tigress-0-challenge-3.deobfuscated b/deobfuscated_binaries/tigress-0-challenge-3.deobfuscated
diff --git a/deobfuscated_binaries/tigress-0-challenge-4.deobfuscated b/deobfuscated_binaries/tigress-0-challenge-4.deobfuscated
diff --git a/deobfuscated_binaries/tigress-1-challenge-0.deobfuscated b/deobfuscated_binaries/tigress-1-challenge-0.deobfuscated
diff --git a/deobfuscated_binaries/tigress-1-challenge-1.deobfuscated b/deobfuscated_binaries/tigress-1-challenge-1.deobfuscated
diff --git a/deobfuscated_binaries/tigress-1-challenge-2.deobfuscated b/deobfuscated_binaries/tigress-1-challenge-2.deobfuscated
diff --git a/deobfuscated_binaries/tigress-1-challenge-3.deobfuscated b/deobfuscated_binaries/tigress-1-challenge-3.deobfuscated
diff --git a/deobfuscated_binaries/tigress-1-challenge-4.deobfuscated b/deobfuscated_binaries/tigress-1-challenge-4.deobfuscated
diff --git a/deobfuscated_binaries/tigress-2-challenge-1.deobfuscated b/deobfuscated_binaries/tigress-2-challenge-1.deobfuscated
diff --git a/deobfuscated_binaries/tigress-2-challenge-2.deobfuscated b/deobfuscated_binaries/tigress-2-challenge-2.deobfuscated
diff --git a/deobfuscated_binaries/tigress-2-challenge-3.deobfuscated b/deobfuscated_binaries/tigress-2-challenge-3.deobfuscated
diff --git a/deobfuscated_binaries/tigress-2-challenge-4.deobfuscated b/deobfuscated_binaries/tigress-2-challenge-4.deobfuscated
diff --git a/deobfuscated_binaries/tigress-3-challenge-0.deobfuscated b/deobfuscated_binaries/tigress-3-challenge-0.deobfuscated
diff --git a/deobfuscated_binaries/tigress-3-challenge-1.deobfuscated b/deobfuscated_binaries/tigress-3-challenge-1.deobfuscated
diff --git a/deobfuscated_binaries/tigress-3-challenge-2.deobfuscated b/deobfuscated_binaries/tigress-3-challenge-2.deobfuscated
diff --git a/deobfuscated_binaries/tigress-3-challenge-3.deobfuscated b/deobfuscated_binaries/tigress-3-challenge-3.deobfuscated
diff --git a/deobfuscated_binaries/tigress-3-challenge-4.deobfuscated b/deobfuscated_binaries/tigress-3-challenge-4.deobfuscated
diff --git a/deobfuscated_binaries/tigress-4-challenge-0.deobfuscated b/deobfuscated_binaries/tigress-4-challenge-0.deobfuscated
diff --git a/deobfuscated_binaries/tigress-4-challenge-1.deobfuscated b/deobfuscated_binaries/tigress-4-challenge-1.deobfuscated
diff --git a/deobfuscated_binaries/tigress-4-challenge-2.deobfuscated b/deobfuscated_binaries/tigress-4-challenge-2.deobfuscated
diff --git a/deobfuscated_binaries/tigress-4-challenge-3.deobfuscated b/deobfuscated_binaries/tigress-4-challenge-3.deobfuscated
diff --git a/deobfuscated_binaries/tigress-4-challenge-4.deobfuscated b/deobfuscated_binaries/tigress-4-challenge-4.deobfuscated
diff --git a/llvm_expressions/tigress-0-challenge-0.O2.ll b/llvm_expressions/tigress-0-challenge-0.O2.ll
@@ -0,0 +1,33 @@
+; ModuleID = 'llvm_expressions/./tigress-0-challenge-0.ll'
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-pc-linux-gnu"
+
+; Function Attrs: norecurse nounwind readnone
+define i64 @__arybo(i64 %SymVar_0) #0 {
+.3:
+  %.5 = or i64 %SymVar_0, 74171520
+  %.6 = add i64 %SymVar_0, 886599889
+  %.7 = and i64 %.6, 6
+  %.8 = or i64 %.7, 1
+  %.14 = shl i64 %.5, %.8
+  %.15 = shl i64 %.14, 4
+  %.18 = and i64 %.15, 1008
+  %.19 = add i64 %SymVar_0, 500810693
+  %.22 = mul i64 %.6, 951885855
+  %.24 = and i64 %.22, 14
+  %.25 = or i64 %.24, 1
+  %.26 = sub nsw i64 64, %.25
+  %.32 = shl i64 %.19, %.26
+  %.45 = lshr i64 %.19, %.25
+  %.46 = or i64 %.32, %.45
+  %.47 = or i64 %.46, %.18
+  %.56 = or i64 %.6, %SymVar_0
+  %.57 = or i64 %.56, -637752949
+  %.58 = add i64 %.57, %.6
+  %.49 = mul i64 %.5, 746348727
+  %.53 = mul i64 %.49, %.58
+  %.60 = mul i64 %.53, %.47
+  ret i64 %.60
+}
+
+attributes #0 = { norecurse nounwind readnone }
diff --git a/llvm_expressions/tigress-0-challenge-0.ll b/llvm_expressions/tigress-0-challenge-0.ll
@@ -0,0 +1,67 @@
+; ModuleID = ""
+target triple = "unknown-unknown-unknown"
+target datalayout = ""
+
+define i64 @"__arybo"(i64 %"SymVar_0") nounwind
+{
+.3:
+  %".4" = sext i64 746348727 to i128
+  %".5" = or i64 74171520, %"SymVar_0"
+  %".6" = add i64 886599889, %"SymVar_0"
+  %".7" = and i64 7, %".6"
+  %".8" = or i64 1, %".7"
+  %".9" = trunc i64 %".8" to i32
+  %".10" = zext i32 %".9" to i64
+  %".11" = trunc i64 %".10" to i8
+  %".12" = zext i8 %".11" to i64
+  %".13" = and i64 %".12", 63
+  %".14" = shl i64 %".5", %".13"
+  %".15" = and i64 63, %".14"
+  %".16" = zext i8 4 to i64
+  %".17" = and i64 %".16", 63
+  %".18" = shl i64 %".15", %".17"
+  %".19" = add i64 500810693, %"SymVar_0"
+  %".20" = sext i64 951885855 to i128
+  %".21" = sext i64 %".6" to i128
+  %".22" = mul i128 %".20", %".21"
+  %".23" = trunc i128 %".22" to i64
+  %".24" = and i64 15, %".23"
+  %".25" = or i64 1, %".24"
+  %".26" = sub i64 64, %".25"
+  %".27" = trunc i64 %".26" to i32
+  %".28" = zext i32 %".27" to i64
+  %".29" = trunc i64 %".28" to i8
+  %".30" = zext i8 %".29" to i64
+  %".31" = and i64 %".30", 63
+  %".32" = shl i64 %".19", %".31"
+  %".33" = add i64 500810693, %"SymVar_0"
+  %".34" = sext i64 951885855 to i128
+  %".35" = sext i64 %".6" to i128
+  %".36" = mul i128 %".34", %".35"
+  %".37" = trunc i128 %".36" to i64
+  %".38" = and i64 15, %".37"
+  %".39" = or i64 1, %".38"
+  %".40" = trunc i64 %".39" to i32
+  %".41" = zext i32 %".40" to i64
+  %".42" = trunc i64 %".41" to i8
+  %".43" = zext i8 %".42" to i64
+  %".44" = and i64 %".43", 63
+  %".45" = lshr i64 %".33", %".44"
+  %".46" = or i64 %".32", %".45"
+  %".47" = or i64 %".18", %".46"
+  %".48" = sext i64 %".47" to i128
+  %".49" = mul i128 %".4", %".48"
+  %".50" = trunc i128 %".49" to i64
+  %".51" = sext i64 %".50" to i128
+  %".52" = sext i64 %".5" to i128
+  %".53" = mul i128 %".51", %".52"
+  %".54" = trunc i128 %".53" to i64
+  %".55" = sext i64 %".54" to i128
+  %".56" = or i64 %".6", %"SymVar_0"
+  %".57" = or i64 18446744073071798667, %".56"
+  %".58" = add i64 %".57", %".6"
+  %".59" = sext i64 %".58" to i128
+  %".60" = mul i128 %".55", %".59"
+  %".61" = trunc i128 %".60" to i64
+  ret i64 %".61"
+}
diff --git a/llvm_expressions/tigress-0-challenge-1.O2.ll b/llvm_expressions/tigress-0-challenge-1.O2.ll
@@ -0,0 +1,72 @@
+; ModuleID = 'llvm_expressions/./tigress-0-challenge-1.ll'
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-pc-linux-gnu"
+
+; Function Attrs: norecurse nounwind readnone
+define i64 @__arybo(i64 %SymVar_0) #0 {
+.3:
+  %.5 = and i64 %SymVar_0, 573319932
+  %.6 = add nsw i64 %.5, -341319700
+  %.8 = mul nsw i64 %.6, 502412191
+  %.11 = add i64 %SymVar_0, 584234876
+  %.12 = add i64 %.11, %.8
+  %.13 = shl i64 %.12, 2
+  %.16 = and i64 %.13, 28
+  %.17 = lshr i64 %.6, 48
+  %.32 = and i64 %.17, 255
+  %.3412 = lshr i64 %.6, 56
+  %.37 = shl nuw nsw i64 %.3412, 8
+  %.38 = or i64 %.32, %.37
+  %0 = lshr i64 %.6, 16
+  %.61 = and i64 %0, 16711680
+  %.62 = or i64 %.38, %.61
+  %.67 = and i64 %0, 4278190080
+  %.68 = or i64 %.62, %.67
+  %1 = shl nsw i64 %.6, 16
+  %.85 = and i64 %1, 1095216660480
+  %.86 = or i64 %.68, %.85
+  %.91 = and i64 %1, 262783279038464
+  %.92 = or i64 %.86, %.91
+  %.113 = shl i64 %.6, 48
+  %.114 = and i64 %.113, 70931694131085312
+  %.115 = or i64 %.92, %.114
+  %.117 = lshr i64 %.6, 8
+  %.120 = shl i64 %.117, 56
+  %.121 = or i64 %.115, %.120
+  %.122 = or i64 %.121, %.16
+  %.125 = lshr i64 %.12, 1
+  %.126 = and i64 %.125, 14
+  %.127 = or i64 %.126, 1
+  %.128 = sub nsw i64 64, %.127
+  %.134 = shl i64 %.122, %.128
+  %.145 = lshr i64 %.122, %.127
+  %.146 = or i64 %.134, %.145
+  %.147 = and i64 %SymVar_0, 335886564
+  %.148 = add nsw i64 %.147, -1595821287
+  %.149 = and i64 %.6, 12
+  %.150 = or i64 %.149, 1
+  %.151 = sub nsw i64 64, %.150
+  %.157 = lshr i64 %.148, %.151
+  %.165 = shl i64 %.148, %.150
+  %.166 = or i64 %.157, %.165
+  %.167 = shl nsw i64 %.166, 2
+  %.170 = and i64 %.167, 48
+  %.173 = lshr i64 %.148, 3
+  %.174 = and i64 %.173, 6
+  %.175 = or i64 %.174, 1
+  %.181 = lshr i64 %.12, %.175
+  %.182 = add i64 %.181, %SymVar_0
+  %.183 = lshr i64 %.182, 16
+  %.254 = or i64 %.170, %.183
+  %.255 = xor i64 %.254, %.166
+  %.258 = lshr i64 %.255, 3
+  %.259 = and i64 %.258, 14
+  %.260 = or i64 %.259, 1
+  %.261 = sub nsw i64 64, %.260
+  %.267 = lshr i64 %.146, %.261
+  %.303 = shl i64 %.146, %.260
+  %.304 = or i64 %.267, %.303
+  ret i64 %.304
+}
+
+attributes #0 = { norecurse nounwind readnone }