Skip to content
This repository has been archived by the owner on Mar 6, 2022. It is now read-only.

Commit

Permalink
faster root cause analysis for when gdb and unicorn have a conflict
Browse files Browse the repository at this point in the history
  • Loading branch information
2xic committed Jul 2, 2019
1 parent f75be9a commit cbe620e
Show file tree
Hide file tree
Showing 36 changed files with 686 additions and 677 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ compiler:
script:
- python -m pip install --upgrade pip setuptools
# - python3.7 --version
- sudo ./install.sh
- sudo ./install.sh travis
#script:
# - ./test.sh
#after_success:
Expand Down
File renamed without changes.
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@
<img src="README/new_version.png" width="800px" />
(web interface, click for better resolution)

# status
# Status
There is now a branch called version 0.1, it shows the idea and part of the vision. Master is maybe already version 0.2, it's web component is a lot faster than the previous version(still some left to do). However, I want more work done on the emulator before I do a version bump. I feel I had a bit too much focus on the web interface for version 0.1. Now a big focus will be on extending the dynamic side, this is key for making this software good. That is why I have made a gdb like interface for the terminal, features will come to the terminal before the web interface to make design iterations faster.

# Note
Mostly tested on CTF (elf)binaries + some glibc binary, I'm sure you can make the program do weird things if you try it on something big and complicated. *This program is still under construction*.
Mostly tested on CTF (elf)binaries, I'm sure you can make the program do weird things if you try it on something big and complicated. *This program is still under construction*.

# Features
- flat view (see all the sections with code)
Expand All @@ -23,11 +23,13 @@ Mostly tested on CTF (elf)binaries + some glibc binary, I'm sure you can make th


# Design philosophy
What do you expect from a reverse engineering tool? You want quick insight into a program. How do you get insight? The best way is to get static data with the aid of dynamic information. The dynamic data show you where you have been, the static can help you get where you want to be. If the binary has been obfuscated the dynamic part will guide the static part. You want to be able to move around in the program flawlessly. This tool will have focus on speed, you want to do things like root cause analysis fast and this tool should work fast in environments like CTFs.
What do you expect from a reverse engineering tool? You want quick insight into a program. How do you get insight? The best way is to combine static data with the dynamic data. The dynamic data show you where you have been, the static can help you get where you want to be. If the binary has been obfuscated the dynamic part will guide the static part. You want to be able to move around in the program flawlessly. This tool will have focus on speed, you want to do things like root cause analysis fast and this tool should work fast in environments like CTFs.


# Status with unicorn (dynamic side)
Finally got a static binary with glibc to run from start to finish! I will continue to improve the emulator, still many syscalls and other funconality to implement, the future is bright. However, first I want to take some time to better integrate the emulator into the software before extending it's functionality. Like doing some root cause analysis on the debugging hooks to figure if and how to remove them and improve things like memory mapping and the stack handler for the emulator.
Finally got a static binary with glibc to run from start to finish! I will continue to improve the emulator, still many syscalls and other functionality to implement, the future is bright. However, first I want to take some time to better integrate the emulator into the software before extending it's functionality. Like doing some root cause analysis on the debugging hooks to figure if and how to remove them and improve things like memory mapping and the stack handler for the emulator.

Lately there has been a big focus on making tools to speed up development on the emulator, like how to quickly determine why unicorn and gdb disagree on something. This is key to get the emulator good. Having to re-run the binary and cross-check unicorn and gdb many times to resolve a bug does not scale. That is also why the emulator haven't gotten more syscalls implemented.


# Do I think I can make a better tool than IDA?
Expand Down
1 change: 0 additions & 1 deletion common/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ def wrap(*args):
def test(sleep_duration=5):
time.sleep(sleep_duration)


if __name__ == "__main__":
test()

11 changes: 0 additions & 11 deletions common/web.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,17 +129,6 @@ def event_code(json_data, methods=["GET", "POST"]):
#print(target.custom_comments)
#socketio.emit("control", target.get_cfg())

@socketio.on("give_me_dynamic_data")
def event_code(methods=["GET", "POST"]):
# print("im happy")
socketio.emit('dynamic_view', {"data":target.dynamic.address_register})

# print(json_data)
# print(json_data)
# target.save_model(json_data["data"]["file_name"])



@app.after_request
def add_header(request):
request.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"
Expand Down
21 changes: 21 additions & 0 deletions db/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# DB

Need a database for all the registers and access patterns, extending python with c is something I wanted to try.

The database is bascially a key value storage, what the cool kids call a hash table. Using some clever techniques to not have to store all the memory for each state. The register design will be changed in the future to use the memory design, saves a lot of memory this way(or maybe I will add as an option, O(1) access with O(n) memory or O(log(n)) access with a lot less memory usage (only new memory cell for each edit))!

# Regarding Valgrind
There might be some memory leaks lurking somewhere. Haven't gotten [valgrind working for python](https://github.com/python/cpython/blob/master/Misc/README.valgrind) yet. Valgrind will say there are problems even when passed an empty script, so I can't actually run the test script to debug.

# What do we want to store?
- Register values
- at different points in time/at different runs
- you should be able to see the difference between all the registers at diffrent runs
- A dynamic array can point to each run, and each run can be stored as a hashtable.
(want that o(1) lookup when checking for a specific addresses)

- this is basically what the current db has to offer.

- Memory layout ?
- Unicorn report changes to memory layout
- Look up a value at a state with a binary search.
191 changes: 176 additions & 15 deletions db/db.c
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,12 @@ static void clean_object(db_object *self) {

static PyObject *add_memory_trace(db_object *self, PyObject *args, PyObject *kwargs) {
char *address; // register is a keyword
int value;
static char *kwlist[] = {"address", "value", NULL};
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "si", kwlist, &address, &value)){
unsigned long value;
unsigned long address_exectuion;
int add_unique_only = 1;
int increment_op = 1;
static char *kwlist[] = {"address", "value", "address_exectuion", "increment_op", "add_unique_only", NULL};
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "sKK|ii", kwlist, &address, &value, &address_exectuion, &increment_op, &add_unique_only)){
return NULL;
}

Expand All @@ -92,13 +95,28 @@ static PyObject *add_memory_trace(db_object *self, PyObject *args, PyObject *kwa
if(memory_table == NULL){
printf("memory table is null...\n");
}else{
int add = 1;

if(add_unique_only == 1){
struct vector_stucture_pointer *memory_values = get_hash_table_value(memory_table, address);
if(!(memory_values == NULL)){
if(*latest_cell_value(memory_values) == value){
add = 0;
}
}
}

struct memory *memory_cell = malloc(sizeof(struct memory));
memory_cell->value = int_deepcopy(value);
memory_cell->op_count = int_deepcopy(*current_op_count);
if(add == 1){
struct memory *memory_cell = malloc(sizeof(struct memory));
memory_cell->value = unsinged_deep_copy(value);
memory_cell->op_count = int_deepcopy(*current_op_count);
memory_cell->adress_execution = unsinged_deep_copy(address_exectuion);

add_hash_table_value(memory_table, address, memory_cell, VALUE_MEMORY);
(*current_op_count)++;
add_hash_table_value(memory_table, address, memory_cell, VALUE_MEMORY);
if(increment_op == 1){
(*current_op_count)++;
}
}
}
}else{
PyErr_SetString(PyExc_TypeError, "Something is wrong. Register table is NULL");
Expand All @@ -107,6 +125,12 @@ static PyObject *add_memory_trace(db_object *self, PyObject *args, PyObject *kwa
return PyLong_FromLong(19);
}

static PyObject *force_increment_op(db_object *self, PyObject *Py_UNUSED(ignored)){
int *current_op_count = vector_get_pointer(self->op_count, self->execution_time->size - 1);
(*current_op_count)++;
Py_RETURN_NONE;
}

// (in the future this will check all adress keys and then rebuild the entire memory at that given opcount state.)
static PyObject *rebuild_memory(db_object *self, PyObject *args, PyObject *kwargs) {
char *address; // register is a keyword
Expand All @@ -127,8 +151,10 @@ static PyObject *rebuild_memory(db_object *self, PyObject *args, PyObject *kwarg
PyErr_SetString(PyExc_TypeError, "Something is wrong. No memory cells.");
return NULL;
}


// the memory cell value at that given op_count state.
int *state_value = findClosest(memory_values, op_count);
unsigned long long *state_value = find_closest(memory_values, op_count, MEMORY_VALUE);
if(state_value == NULL){
Py_RETURN_NONE;
}
Expand All @@ -140,6 +166,134 @@ static PyObject *rebuild_memory(db_object *self, PyObject *args, PyObject *kwarg
return NULL;
}

static PyObject *rebuild_memory_full(db_object *self, PyObject *args, PyObject *kwargs) {
char *address; // register is a keyword
int execution_count;
int op_count;
int hit_count;
static char *kwlist[] = {"address", "execution_count", "hit_count" , NULL};
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "sii", kwlist, &address, &execution_count, &hit_count)){
return NULL;
}

if(self->memory_trace != NULL){
struct hash_table_structure *memory_table = vector_get_pointer(self->memory_trace, execution_count);

if(memory_table == NULL){
PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
return NULL;
}

struct vector_stucture_pointer *based_off_memory_state = get_hash_table_value(memory_table, address);
if(based_off_memory_state == NULL){
PyErr_SetString(PyExc_TypeError, "Did not find address, have it been added?");
return NULL;
}

if(based_off_memory_state->size < hit_count || hit_count < 0){
PyErr_SetString(PyExc_TypeError, "Hit count exceeds the actual hit count. (overflow)");
return NULL;
}

struct memory *target_cell_state = vector_get_pointer(based_off_memory_state, hit_count);
if(target_cell_state == NULL){
PyErr_SetString(PyExc_TypeError, "Something is wrong with the cell state.");
return NULL;
}
op_count = *target_cell_state->op_count;

struct vector_stucture_pointer *keys = memory_table->keys;
PyObject *results = PyDict_New();
if(keys != NULL){
for(int i = 0; i < keys->size; i++){

struct vector_stucture_pointer *memory_values = get_hash_table_value(memory_table, vector_get_pointer(keys, i));
if(memory_values == NULL){
PyErr_SetString(PyExc_TypeError, "Something is wrong. No memory cells.");
return NULL;
}
// the memory cell value at that given op_count state.
unsigned long long *state_value = find_closest(memory_values, op_count, MEMORY_VALUE);
if(state_value == NULL){
continue;
}else{
// printf("hit == %s, state value == %i\n", (char*)vector_get_pointer(keys, i), *state_value);
PyDict_SetItem(results, PyUnicode_FromString(vector_get_pointer(keys, i)) , PyLong_FromLong(*state_value));
}
}
}
return results;
}
PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
return NULL;
}

static PyObject *latest_memory_commit(db_object *self, PyObject *args, PyObject *kwargs) {
char *address; // register is a keyword
int execution_count;
int op_count;

static char *kwlist[] = {"address", "execution_count", "op_count", NULL};
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "sii", kwlist, &address, &execution_count, &op_count)){
return NULL;
}

if(self->memory_trace != NULL){
struct hash_table_structure *memory_table = vector_get_pointer(self->memory_trace, execution_count);

if(memory_table == NULL){
PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
return NULL;
}

struct vector_stucture_pointer *based_off_memory_state = get_hash_table_value(memory_table, address);
if(based_off_memory_state == NULL){
PyErr_SetString(PyExc_TypeError, "Did not find address, have it been added?");
return NULL;
}

// unsigned long long *address_exectuion = latest_state_adress(based_off_memory_state);
unsigned long long *address_exectuion = find_closest(based_off_memory_state, op_count, MEMORY_EXECUTION);
if(address_exectuion == NULL){
Py_RETURN_NONE;
}else{
return PyLong_FromLong(*address_exectuion);
}
}
PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
return NULL;
}


static PyObject *view_memory_commits(db_object *self, PyObject *args, PyObject *kwargs) {
char *address; // register is a keyword
int execution_count;

static char *kwlist[] = {"address", "execution_count", NULL};
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "si", kwlist, &address, &execution_count)){
return NULL;
}

if(self->memory_trace != NULL){
struct hash_table_structure *memory_table = vector_get_pointer(self->memory_trace, execution_count);

if(memory_table == NULL){
PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
return NULL;
}

struct vector_stucture_pointer *based_off_memory_state = get_hash_table_value(memory_table, address);
if(based_off_memory_state == NULL){
PyErr_SetString(PyExc_TypeError, "Did not find address, have it been added?");
return NULL;
}
pretty_print(based_off_memory_state);
Py_RETURN_NONE;
}
PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
return NULL;
}


static PyObject *get_memory_trace(db_object *self, PyObject *args, PyObject *kwargs) {
char *address; // register is a keyword
Expand Down Expand Up @@ -452,6 +606,18 @@ static PyMethodDef Custom_methods[] = {
{"rebuild_memory", (PyCFunction) rebuild_memory, METH_VARARGS,
"want to see the memory at a given state?"
},
{"rebuild_memory_full", (PyCFunction) rebuild_memory_full, METH_VARARGS,
"want to see the full memory at a given state?"
},
{"latest_memory_commit", (PyCFunction) latest_memory_commit, METH_VARARGS,
"what was the last instruction that comitted to a part of memory?"
},
{"view_memory_commits", (PyCFunction) view_memory_commits, METH_VARARGS,
"easy to debug"
},
{"force_increment_op", (PyCFunction) force_increment_op, METH_NOARGS,
"easy to debug"
},
{NULL} /* Sentinel */
};

Expand Down Expand Up @@ -479,8 +645,6 @@ static PyTypeObject CustomType = {


PyMODINIT_FUNC PyInit_triforce_db(void){
// init();

PyObject *m;
if (PyType_Ready(&CustomType) < 0){
return NULL;
Expand All @@ -493,10 +657,7 @@ PyMODINIT_FUNC PyInit_triforce_db(void){

Py_INCREF(&CustomType);
PyModule_AddObject(m, "db_init", (PyObject *) &CustomType);
/*
if (Py_AtExit(clean)) {
return NULL;
}*/
return m;
}


16 changes: 15 additions & 1 deletion db/hash_table.c
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ struct hashtable_item_pointer{
struct hash_table_structure {
char *keyword;
void **items;
struct vector_stucture *keys;
struct vector_stucture_pointer *keys;
int max_capacity;
int size;
};
Expand Down Expand Up @@ -56,13 +56,22 @@ void debug_print(char *string){
}
}

void *deepcopy_char(char *input){
void *copy = malloc(sizeof(char) * (strlen(input) + 1));
strncpy(copy, input, strlen(input));
return copy;
}

struct hash_table_structure *init_table(char *keyword){
struct hash_table_structure *hash_table = malloc(sizeof(struct hash_table_structure));
hash_table->max_capacity = TABLE_SIZE;

hash_table->items = malloc(sizeof(void *) * hash_table->max_capacity);
memset(hash_table->items, 0, sizeof(void *) * hash_table->max_capacity);

hash_table->keys = init_vector_pointer("keys");
//init_vector_pointer(hash_table->keys);

hash_table->size = 0;
hash_table->keyword = keyword;

Expand All @@ -76,6 +85,9 @@ void *add_hash_table_value(struct hash_table_structure *hash_table, char *keywor
printf("neeed to resize capacity %i, index %i \n", hash_table->max_capacity, index);
exit(0);
}
// printf("%s\n", deepcopy_char(keyword));
// printf("%p\n", hash_table->keys);
// printf("%i\n", hash_table->keys->size);

struct vector_stucture_pointer *value_vector = NULL;
if(hash_table->items[index] == NULL){
Expand Down Expand Up @@ -103,6 +115,7 @@ void *add_hash_table_value(struct hash_table_structure *hash_table, char *keywor
value_vector->malloc_keyword = 1;
vector_add_pointer(value_vector, init_vector_pointer(value));
}
vector_add_pointer(hash_table->keys, deepcopy_char(keyword));
}else{
struct hashtable_item_pointer *address = hash_table->items[index];
struct vector_stucture_pointer *vector_table = address->value;
Expand Down Expand Up @@ -131,6 +144,7 @@ void *add_hash_table_value(struct hash_table_structure *hash_table, char *keywor
value_vector->malloc_keyword = 1;
vector_add_pointer(value_vector, init_vector_pointer(value));
}
vector_add_pointer(hash_table->keys, deepcopy_char(keyword));
}else{
value_vector = found_entry;
if(type == VALUE_INT || type == VALUE_MEMORY){
Expand Down
Loading

0 comments on commit cbe620e

Please sign in to comment.