Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Beagle: obfuscated string solver (#627)
* first prototype of beagle Idea ==== On a high level, Beagle is tracking everything that smells to him. If something smells, then it will grab the track and follow, until the trail temperature is above the given threshold, in that case it will grab the target and mark it as his prey. The temperature is estimated with the following recurrences: $$$ t = t * (1 - alpha) + alpha * phi, $$$ where $t$ - is a temperature, between 0 and 1, is our confidence, that whatever we're sniffing is our target; $alpha$ - sensitivity parameter (defines how fast the trail is loosing the temperature); $phi$ - a confidence that something smells (again between 0 and 1). Defines how fast we will hunt the prey. Currently, it can be either 0 if something doesn't smell, or 1-1e4 if it smells. Implementation ============== Beagle uses microexecution, to run each subroutine. Every time a word is seen by the interpreter the beagle is targeted on it. Current status ============= Currently all the parameters are hardcoded, and many parameters should be estimated, not specified by a user. The only predicate, that defines that something smells is whether a character is_ascii or is_null. Usage ===== 1. Build and install with make 2. Run from IDA with `Shift-S` and specify option `--beagle` and the attribute name `beagle-prey` 3. Use the text search (Alt-T) to navigate through all preys. * beagle update The new features: 1. searching for static strings in the binary 2. identifying references to the static strings 3. signal processing of a char data to identify word-like patterns 4. word recovering based on dictionary Attributes: beagle-words -- words, that can be built by the term beagle-chars -- a sequence of characters that were computed by the term and nearby terms beagle-strings -- static strings that were explicitly referenced by this term (either directly or indirectly) Basically, here is the description of the algorithm: 1. identify all null-terminated strings 2. microexecute each subroutine and each block and try to detect word like signals 3. microexecute each subroutine and each block and detect any direct or indirect accesses to the known null terminated string 4. use user provided dictionary, plus strings that were find in the binary and recover all possible words, that can be built from the set of strings. 5. annotate project with whatever was found. * matures beagle This is a big commit, reflecting several days of beagle training. First of all, I've trained him to understand few new commands: --beagle-print-strings - will print all found static strings; --beagle-print-words - will print all the generated words; --beagle-print-chars - print the detected char sequences; --beagle-ignore-strings - will ignore found static strings; --beagle-no-words - will not try to build words from a dictionary; --beagle-alphabet - will force beagle to use different alphabet; --beagle-words - will add words to beagle's dictionary. The default use case is still the same - run beagle via `--pass=beagle` and annotate program tree with references to strings. The print options are mostly for the debugging and standalone using. Speaking of the static strings, i.e., if we will compare it with the `strings` utility, currently beagle outputs slightly less strings than `strings`. This is not a bug, or limitation. As by design Beagle finds only strings that are null-terminated, where `strings` will find all strings even if the last character is an arbitrary non-printabl byte. To my experiments, this behavior only adds more trash. However, probably, there was a good reason why `strings` authors decided to use any strings. For example, to work with pascal binaries, where strings are not required to have a null character at the end. Given this consideration, we will probably resign from our allegiency to the null-terminator in the future. * prepares beagle for the release prefixed names, packaged in oasis, and other boring stuff.
- Loading branch information