- Requirements
- Installation
- Usage
- Experiments
- Synthesize all blocks from a execution trace
- Synthesize a specific output
- References
On debian like systems, run the following command:
sudo apt install libgmp3-dev gcc-multilib gdb python3 python3-pip python3-tqdm z3
You can install all the packages above with opam.
Xyntia is a standalone tool which takes I/O example as input and synthesize a corresponding expression. Still, in practice, you do not want to give these I/O examples by hand. Thus we give scripts to automatically sample them from a given binary. It relies on the Binsec symbolic execution engine:
- Binsec (version >= 0.6.3)
First, if you do not have a opam switch for an ocaml version >= 4.09 do:
$ opam switch create . 4.09 # or any version >= 4.09
$ eval $(opam env)
Then you can install ocaml dependencies presented previously using opam install <package-name>
.
Finally, to install Xyntia, execute:
$ make
$ make install
The help of Xyntia is available through xyntia -help
. In the following we will explain the two ways for using Xyntia.
The module Fun from src/fun.ml contains a list of sample functions that can be used to test Xyntia without the need of processing a sampling file. To run Xyntia on one of the functions in Fun, run the following command:
$ xyntia [-ops <grammar>] [-time <time>] [-heur <heur>] -nosamp <id> <nargs>
where grammar is the abbreviation of the grammar used to define the search space (see below), time is the time budget of the synthesis in seconds, heur is the abbreviation for the search heuristic to be used (see below), id is the index of the function in the list, and nargs is the number of arguments that the function takes as input.
To synthesize a function from a sampling file, execute the following command:
$ xyntia [-ops <grammar>] [-time <time>] [-heur <heur>] <file.json>
where grammar is the abbreviation of the grammar used to define the search space (see below), heur is the abbreviation for the search heuristic to be used (see below), time is the time budget of the synthesis in seconds, and file is the path of the sampling file.
The sampling file should be in the format produced by Syntia's random sampling module. The file examples/samples/example.json
is an example of a sampling file.
grammar | abbreviation |
---|---|
Mixed Boolean Arithmetic (MBA) | mba |
MBA+Division | expr |
MBA+Division+Mod+Shift | full |
MBA+Shift | mba_shift |
MBA+If then else | mba_ite |
heuristic | abbreviation |
---|---|
Iterated Local Search | ils |
Hill Climbing | hc |
Random Walk | rw |
Simulated Annealing | sa |
Metropolis-Hastings | mh |
All datasets and scripts are given to reproduce expriments presented in [1]. Especially, it contains the B1 dataset from the Syntia paper [2] (Thank you Tim Blazytko for sharing it with us), our B2 dataset and the datasets used to evaluate anti-black-box deobfuscation.
To facilitate installation, we also give the requirements.txt
to easily install the python dependencies.
To create and activate a python environment, execute the following commands (optional):
$ python3 -m venv <name-virtualenv> # create a virtual environment for python3
$ source <name-virtualenv>/bin/activate # active the virtual environment
Then, install dependencies:
$ pip install -r requirements.txt
Datasets used in [1] can be found in the ./datasets
directory.
To launch Xyntia over a dataset (e.g., B2) with a given timeout (e.g., 1s) execute the following commands:
$ python3 ./scripts/bench/sample.py --bench ./datasets/b2 --out <resdir>
$ python3 ./scripts/bench/bench.py --bench ./datasets/b2 --timeout 1 --out <resdir>
The option and their meanings can be found through the --help
option.
We also give ./scripts/utils/all_from_trace.sh
which traces code execution with GDB, extracts each code block executed, samples and synthesizes them.
The manual is available through ./scripts/utils/all_from_trace.sh --help
and it can be run as follows:
$ ./scripts/utils/all_from_trace.sh --outdir <resdir> --all -- binary arg1 arg2 ...
Here is an example:
$ cd examples/bin && make && cd -
$ ./scripts/utils/all_from_trace.sh --outdir <resdir> --all -- ./examples/bin/add
The previous section explained how to synthesize each output of each basic block. However, you may be interested in only one output of one basic block.
We explain how to do it for the add
function of ./examples/bin/add
and the eax
output.
First, you need to trace the code:
$ ./scripts/utils/all_from_trace.sh --outdir <resdir> --gdb -- ./examples/bin/add
Then you need to find the basic block of interest in <resdir>/add/cfgs
. In our case, this is the file <resdir>/add/cfgs/0x56556191.bin
.
Now you can sample the eax
output and run Xyntia over it:
$ python3 ./scripts/utils/binsec/sample.py --bin <resdir>/add/cfgs/0x56556191.bin --reg_out eax --out 0x56556191_eax
$ xyntia 0x56556191_eax/out_0.json
[1] Menguy, G., Bardin, S., Bonichon, R., & Lima, C. D. S. (2021, November). Search-Based Local Black-Box Deobfuscation: Understand, Improve and Mitigate. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (pp. 2513-2525).
[2] Blazytko, T., Contag, M., Aschermann, C., & Holz, T. (2017). Syntia: Synthesizing the semantics of obfuscated code. In 26th USENIX Security Symposium (USENIX Security 17) (pp. 643-659).