This file is part of Sibyl.
Copyright 2014 - 2017 Camille MOUGEY
A Miasm2 based function divination.
In reverse engineer work, stripped binaries are common (malwares, firmwares, ...). Often, they carry usual libraries, such as libc or openssl. Identifying such libraries and their functions can be an interesting starting point. But it is a time consuming task. Moreover, this task is made more difficult due to optimizations, architectures and compilers diversity, custom implementations, obfuscation, ...
Tools have been developed to automate this task. Some are based on CFG (Control Flow Graph) signature (Bindiff), others on magic constants (FindCrypt) or enhanced pattern matching (FLIRT).
Sibyl is one of these tools, dynamic analysis oriented and based on Miasm2 (https://github.com/cea-sec/miasm). The idea is to identify functions from their side effects. That way, identification is independent of the used implementation.
Identifications are done thanks to these steps:
- Initialize a minimalist VM for the targeted architecture, with only needed elements
- Prepare the function call using the correct ABI and API
- Run the target function code inside the VM
- If the function crashes (null derefencement, not enough stack arguments, ...), switch to the next test case 4b. If the function ends correctly, compare the final VM state with the expected one. If they match, consider the test case as a candidate
For instance, if one want to identify a strlen, the test will be as follow:
- Allocate a string containing "Hello %sworld!" in a read-only memory page
- Call the function with a pointer on the string as first argument
- Compare the result with 14
- Execute the same test with a different string to avoid false positives (detecting a function which always returns 14)
Sibyl test cases are written architecture and ABI independant.
Basically, Sibyl suffers from false positives (identifying a non strlen as a strlen one) and false negatives (misidentifying or skipping a real strlen). Given the hypothesis that the ABI is exactly the one used by the function, Sibyl becomes complete (no more false negatives).
As a sideline, Sibyl can be used to bruteforce a program ABI.
Long story short, this is an enhanced API bruteforcing tool.
Sibyl comes with a CLI, named
sibyl, and an IDA (https://www.hex-rays.com)
sibyl tool is a wrapper on several sub-actions.
$ sibyl Usage: /usr/local/bin/sibyl [action] Actions: config Configuration management find Function guesser func Function discovering learn Learn a new function
The main usage of Sibyl, function recognition, is done through the
action. This action comes with several options, to specify ABI, architecture,
test cases, ...
To launch function recognition on the ARMv6 binary
1.21.1 http://www.busybox.net/downloads/binaries/1.21.1/), targetting address
0x8550 and using included test cases:
$ sibyl find binaries/busybox-armv6l 0x00008550 0x00008230 0x00008230 : strlen 0x00008550 : memmove
The IDA stub is located in
sibyl is installed on the
system, no other action is needed to have it running (see section Installation
for more details)
Once the script has been loaded by IDA, the user is asked to launch Sibyl either on the current function, or on all function detected by IDA.
The architecture and ABI are provided by IDA. Optionnaly, the set of test to use can be modified.
And the associated result:
Python> Launch identification on 3085 function(s) Found memcpy at 0x8057120 Found memmove at 0x805714c Found memset at 0x8057174 Found strcat at 0x80571a8 Found strchr at 0x80571cc Found strcmp at 0x8057208 Found strcpy at 0x8057228 Found strlen at 0x8057244 Found strncmp at 0x8057258 Found strncpy at 0x8057280 Found strnlen at 0x80572a8 Found strrchr at 0x80572c0 Found memcmp at 0x80572ff Found strsep at 0x80576ac Found strspn at 0x8057704 Found stricmp at 0x805799c Found strpbrk at 0x8057ab8 Found strtok at 0x8057b30 Found strcmp at 0x8057b48 Found atoi at 0x805df1c Current: 64.83% (sub_0x80b4ab3)| Estimated time remaining: 14.45s Found atoi at 0x80f1cf3 Current: 100.00% (sub_0x80f7a93)| Estimated time remaining: 0.00s Finished ! Found 21 candidates in 42.70s Results are also available in 'sibyl_res'
The corresponding function get an additionnal comment like
Additionnaly, a method
launch_on_funcs is provided for scripting purposes, and
the result of the last run, in addition to the human output on console, is
Binary Ninja stub
A more detailed documentation is available in
Current version is v0.1. See changelog for more details.
Sibyl requires at least Miasm2 version
67117bf and the corresponding version of Elfesteem.
qemu engine, the
unicorn python package must be installed (refer to the documentation of Unicorn for more detail).
Sibyl comes as a Python module, and the installation follow the standard procedure:
$ python setup.py build # Add the resulting build directory in your PYTHONPATH, or: $ python setup.py install
In addition of the
sibyl Python module, a CLI tool is provided, named
sibyl. See the usage documentation for more information.
If needed, consult testing documentation to check your Sibyl installation.
The IDA stub is located in
ext/ida. To benefit from multiprocessing, Sibyl is
invoke through the CLI as a subprocess. Then, there is no need to have the
sibyl module in IDA Python namespace.
Long story short, it should work out of the box once
sibyl CLI is available.
Sibyl is also available through Docker automated build. Use:
$ docker run -i -t commial/sibyl Usage: /usr/local/bin/sibyl [action] Actions: config Configuration management find Function guesser func Function discovering learn Learn a new function
Sibyl comes with several test cases, located in
sibyl/test. These tests are
based on function from string.h, stdlib.h and ctype.h.
Architectures by engine
Sibyl comes with the support of multiple architecture, and multiple engine.
Do not hesitate to consult and open an issue if precisions are still needed.
How infinite loops are managed?
Behaviors close to infinite loop happen quite often, especially when the arguments are not formatted as expected by the function (trying another test case). To avoid these behaviors, there is a timeout on each sub-test. The -i/--timeout argument adjusts this parameter (2 by default, 0 to disable timeout).
How to run the tool on a custom architecture?
Once the architecture and corresponding semantic is implemented in Miasm2, one just needs to implement the wanted ABI in sibyl/abi/.
If writing the jitter engine part is an issue, one can directly use the python jitter option with -j/--jitter argument.
If the semantic is not complete enough, one can add the corresponding bridge with qemu in
sibyl/engine/qemu.py, if available.
sibyl func freezed?
Sibyl may take time due to the number of function to consider and the test set size (Sibyl time complexity is approximately in O(number function * test set size)).
In addition, library are often present in the same binary zone, giving the impression that Sibyl got result by burst.
A convenient way to observe its progress is the use of the
How many coffees could I take while Sibyl is running?
|binary||architecture||test set size||addresses to check||number of function found||elapsed time|
These tests have been done on a standard, 4 i7 CPU laptop, using the default
qemu jitter) and addresses provided by IDA.
Please note that, by design, Sibyl is embarrassingly parallel.