faster root cause analysis for when gdb and unicorn have a conflict

2xic-archive · Jul 2, 2019 · cbe620e · cbe620e
1 parent f75be9a
commit cbe620e
Show file tree

Hide file tree

Showing 36 changed files with 686 additions and 677 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -19,7 +19,7 @@ compiler:
 script: 
   - python -m pip install --upgrade pip setuptools
 #  - python3.7 --version
-  - sudo ./install.sh
+  - sudo ./install.sh travis
 #script:
 #  - ./test.sh
 #after_success:

diff --git a/test/test_compress.py → RD/compress.py b/test/test_compress.py → RD/compress.py
diff --git a/README.md b/README.md
@@ -7,11 +7,11 @@
 <img src="README/new_version.png"  width="800px" />
 (web interface, click for better resolution)
 
-# 	status
+# 	Status
 There is now a branch called version 0.1, it shows the idea and part of the vision. Master is maybe already version 0.2, it's web component is a lot faster than the previous version(still some left to do). However, I want more work done on the emulator before I do a version bump. I feel I had a bit too much focus on the web interface for version 0.1. Now a big focus will be on extending the dynamic side, this is key for making this software good. That is why I have made a gdb like interface for the terminal, features will come to the terminal before the web interface to make design iterations faster.
 
 #   Note
-Mostly tested on CTF (elf)binaries + some glibc binary, I'm sure you can make the program do weird things if you try it on something big and complicated. *This program is still under construction*.
+Mostly tested on CTF (elf)binaries, I'm sure you can make the program do weird things if you try it on something big and complicated. *This program is still under construction*.
 
 #	Features
 -	flat view (see all the sections with code)
@@ -23,11 +23,13 @@ Mostly tested on CTF (elf)binaries + some glibc binary, I'm sure you can make th
 
 
 # Design philosophy
-What do you expect from a reverse engineering tool? You want quick insight into a program. How do you get insight? The best way is to get static data with the aid of dynamic information. The dynamic data show you where you have been, the static can help you get where you want to be. If the binary has been obfuscated the dynamic part will guide the static part. You want to be able to move around in the program flawlessly. This tool will have focus on speed, you want to do things like root cause analysis fast and this tool should work fast in environments like CTFs. 
+What do you expect from a reverse engineering tool? You want quick insight into a program. How do you get insight? The best way is to combine static data with the dynamic data. The dynamic data show you where you have been, the static can help you get where you want to be. If the binary has been obfuscated the dynamic part will guide the static part. You want to be able to move around in the program flawlessly. This tool will have focus on speed, you want to do things like root cause analysis fast and this tool should work fast in environments like CTFs. 
 
 
 #  Status with unicorn (dynamic side)
-Finally got a static binary with glibc to run from start to finish! I will continue to improve the emulator, still many syscalls and other funconality to implement, the future is bright. However, first I want to take some time to better integrate the emulator into the software before extending it's functionality. Like doing some root cause analysis on the debugging hooks to figure if and how to remove them and improve things like memory mapping and the stack handler for the emulator.
+Finally got a static binary with glibc to run from start to finish! I will continue to improve the emulator, still many syscalls and other functionality to implement, the future is bright. However, first I want to take some time to better integrate the emulator into the software before extending it's functionality. Like doing some root cause analysis on the debugging hooks to figure if and how to remove them and improve things like memory mapping and the stack handler for the emulator.
+
+Lately there has been a big focus on making tools to speed up development on the emulator, like how to quickly determine why unicorn and gdb disagree on something. This is key to get the emulator good. Having to re-run the binary and cross-check unicorn and gdb many times to resolve a bug does not scale. That is also why the emulator haven't gotten more syscalls implemented.
 
 
 #  Do I think I can make a better tool than IDA? 

diff --git a/common/benchmark.py b/common/benchmark.py
@@ -15,7 +15,6 @@ def wrap(*args):
 def test(sleep_duration=5):
 	time.sleep(sleep_duration)
 
-
 if __name__ == "__main__":
 	test()
 
diff --git a/common/web.py b/common/web.py
@@ -129,17 +129,6 @@ def event_code(json_data, methods=["GET", "POST"]):
 	#print(target.custom_comments)
 	#socketio.emit("control", target.get_cfg())
 
-@socketio.on("give_me_dynamic_data")
-def event_code(methods=["GET", "POST"]):
-#	print("im happy")
-	socketio.emit('dynamic_view', {"data":target.dynamic.address_register})
-
-#	print(json_data)
-#	print(json_data)
-#	target.save_model(json_data["data"]["file_name"])
-
-
-
 @app.after_request
 def add_header(request):
 	request.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"

diff --git a/db/README.md b/db/README.md
@@ -0,0 +1,21 @@
+# DB
+
+Need a database for all the registers and access patterns, extending python with c is something I wanted to try. 
+
+The database is bascially a key value storage, what the cool kids call a hash table. Using some clever techniques to not have to store all the memory for each state. The register design will be changed in the future to use the memory design, saves a lot of memory this way(or maybe I will add as an option, O(1) access with O(n) memory or O(log(n)) access with a lot less memory usage (only new memory cell for each edit))!
+
+# Regarding Valgrind
+There might be some memory leaks lurking somewhere. Haven't gotten [valgrind working for python](https://github.com/python/cpython/blob/master/Misc/README.valgrind) yet. Valgrind will say there are problems even when passed an empty script, so I can't actually run the test script to debug.
+
+# What do we want to store?
+-	Register values
+	-	at different points in time/at different runs
+		-	you should be able to see the difference between all the registers at diffrent runs
+		- 	A dynamic array can point to each run, and each run can be stored as a hashtable.
+			(want that o(1) lookup when checking for a specific addresses)
+
+		-	this is basically what the current db has to offer.
+
+-	Memory layout ? 
+	-	Unicorn report changes to memory layout
+		-	Look up a value at a state with a binary search. 
diff --git a/db/db.c b/db/db.c
@@ -79,9 +79,12 @@ static void clean_object(db_object *self) {
 
 static PyObject *add_memory_trace(db_object *self, PyObject *args, PyObject *kwargs) {
 	char *address;  // register is a keyword
-	int value;
-	static char *kwlist[] = {"address", "value", NULL};
-	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "si", kwlist, &address, &value)){
+	unsigned long value;
+	unsigned long address_exectuion;
+	int add_unique_only = 1;
+	int increment_op = 1;
+	static char *kwlist[] = {"address", "value", "address_exectuion", "increment_op", "add_unique_only", NULL};
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "sKK|ii", kwlist, &address, &value, &address_exectuion, &increment_op, &add_unique_only)){
 		return NULL;
 	}
 
@@ -92,13 +95,28 @@ static PyObject *add_memory_trace(db_object *self, PyObject *args, PyObject *kwa
 		if(memory_table == NULL){
 			printf("memory table is null...\n");
 		}else{
+			int add = 1;
+
+			if(add_unique_only == 1){
+				struct vector_stucture_pointer *memory_values = get_hash_table_value(memory_table, address);
+				if(!(memory_values == NULL)){
+					if(*latest_cell_value(memory_values) == value){
+						add = 0;				
+					}
+				}
+			}
 
-			struct memory *memory_cell = malloc(sizeof(struct memory));
-			memory_cell->value = int_deepcopy(value);
-			memory_cell->op_count = int_deepcopy(*current_op_count);
+			if(add == 1){
+				struct memory *memory_cell = malloc(sizeof(struct memory));
+				memory_cell->value = unsinged_deep_copy(value);
+				memory_cell->op_count = int_deepcopy(*current_op_count);
+				memory_cell->adress_execution = unsinged_deep_copy(address_exectuion);
 
-			add_hash_table_value(memory_table, address, memory_cell, VALUE_MEMORY);	
-			(*current_op_count)++;
+				add_hash_table_value(memory_table, address, memory_cell, VALUE_MEMORY);	
+				if(increment_op == 1){
+					(*current_op_count)++;
+				}
+			}
 		}
 	}else{
 		PyErr_SetString(PyExc_TypeError, "Something is wrong. Register table is NULL");
@@ -107,6 +125,12 @@ static PyObject *add_memory_trace(db_object *self, PyObject *args, PyObject *kwa
 	return PyLong_FromLong(19);   
 }
 
+static PyObject *force_increment_op(db_object *self, PyObject *Py_UNUSED(ignored)){
+	int *current_op_count = vector_get_pointer(self->op_count, self->execution_time->size - 1);
+	(*current_op_count)++;
+	Py_RETURN_NONE;
+}
+
 //	(in the future this will check all adress keys and then rebuild the entire memory at that given opcount state.)
 static PyObject *rebuild_memory(db_object *self, PyObject *args, PyObject *kwargs) {
 	char *address;  // register is a keyword
@@ -127,8 +151,10 @@ static PyObject *rebuild_memory(db_object *self, PyObject *args, PyObject *kwarg
 				PyErr_SetString(PyExc_TypeError, "Something is wrong. No memory cells.");
 				return NULL;
 			}
+
+
 			//	the memory cell value at that given op_count state.
-			int *state_value = findClosest(memory_values, op_count);
+			unsigned long long *state_value = find_closest(memory_values, op_count, MEMORY_VALUE);
 			if(state_value == NULL){
 				Py_RETURN_NONE;
 			}
@@ -140,6 +166,134 @@ static PyObject *rebuild_memory(db_object *self, PyObject *args, PyObject *kwarg
 	return NULL;
 }
 
+static PyObject *rebuild_memory_full(db_object *self, PyObject *args, PyObject *kwargs) {
+	char *address;  // register is a keyword
+	int execution_count;
+	int op_count;
+	int hit_count;
+	static char *kwlist[] = {"address", "execution_count", "hit_count" , NULL};
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "sii", kwlist, &address, &execution_count, &hit_count)){
+		return NULL;
+	}
+
+	if(self->memory_trace != NULL){
+		struct hash_table_structure *memory_table = vector_get_pointer(self->memory_trace, execution_count);
+
+		if(memory_table == NULL){
+			PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
+			return NULL;
+		}
+
+		struct vector_stucture_pointer *based_off_memory_state = get_hash_table_value(memory_table, address);
+		if(based_off_memory_state == NULL){
+			PyErr_SetString(PyExc_TypeError, "Did not find address, have it been added?");
+			return NULL;
+		}
+
+		if(based_off_memory_state->size < hit_count || hit_count < 0){
+			PyErr_SetString(PyExc_TypeError, "Hit count exceeds the actual hit count. (overflow)");
+			return NULL;			
+		}
+
+		struct memory *target_cell_state = vector_get_pointer(based_off_memory_state, hit_count);
+		if(target_cell_state == NULL){
+			PyErr_SetString(PyExc_TypeError, "Something is wrong with the cell state.");
+			return NULL;		
+		}
+		op_count = *target_cell_state->op_count;
+
+		struct vector_stucture_pointer *keys = memory_table->keys;
+		PyObject *results = PyDict_New();
+		if(keys != NULL){
+			for(int i = 0; i < keys->size; i++){
+
+				struct vector_stucture_pointer *memory_values = get_hash_table_value(memory_table, vector_get_pointer(keys, i));
+				if(memory_values == NULL){
+					PyErr_SetString(PyExc_TypeError, "Something is wrong. No memory cells.");
+					return NULL;
+				}
+				//	the memory cell value at that given op_count state.
+				unsigned long long *state_value = find_closest(memory_values, op_count, MEMORY_VALUE);
+				if(state_value == NULL){
+					continue;
+				}else{
+//					printf("hit == %s, state value == %i\n", (char*)vector_get_pointer(keys, i), *state_value);
+					PyDict_SetItem(results, PyUnicode_FromString(vector_get_pointer(keys, i)) , PyLong_FromLong(*state_value));
+				}
+			}
+		}
+		return results;
+	}
+	PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
+	return NULL;
+}
+
+static PyObject *latest_memory_commit(db_object *self, PyObject *args, PyObject *kwargs) {
+	char *address;  // register is a keyword
+	int execution_count;
+	int op_count;
+
+	static char *kwlist[] = {"address", "execution_count", "op_count", NULL};
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "sii", kwlist, &address, &execution_count, &op_count)){
+		return NULL;
+	}
+
+	if(self->memory_trace != NULL){
+		struct hash_table_structure *memory_table = vector_get_pointer(self->memory_trace, execution_count);
+
+		if(memory_table == NULL){
+			PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
+			return NULL;
+		}
+
+		struct vector_stucture_pointer *based_off_memory_state = get_hash_table_value(memory_table, address);
+		if(based_off_memory_state == NULL){
+			PyErr_SetString(PyExc_TypeError, "Did not find address, have it been added?");
+			return NULL;
+		}
+
+//		unsigned long long *address_exectuion = latest_state_adress(based_off_memory_state);
+		unsigned long long *address_exectuion = find_closest(based_off_memory_state, op_count, MEMORY_EXECUTION);
+		if(address_exectuion == NULL){
+			Py_RETURN_NONE;
+		}else{
+			return PyLong_FromLong(*address_exectuion);
+		}
+	}
+	PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
+	return NULL;
+}
+
+
+static PyObject *view_memory_commits(db_object *self, PyObject *args, PyObject *kwargs) {
+	char *address;  // register is a keyword
+	int execution_count;
+
+	static char *kwlist[] = {"address", "execution_count", NULL};
+	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "si", kwlist, &address, &execution_count)){
+		return NULL;
+	}
+
+	if(self->memory_trace != NULL){
+		struct hash_table_structure *memory_table = vector_get_pointer(self->memory_trace, execution_count);
+
+		if(memory_table == NULL){
+			PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
+			return NULL;
+		}
+
+		struct vector_stucture_pointer *based_off_memory_state = get_hash_table_value(memory_table, address);
+		if(based_off_memory_state == NULL){
+			PyErr_SetString(PyExc_TypeError, "Did not find address, have it been added?");
+			return NULL;
+		}
+		pretty_print(based_off_memory_state);
+		Py_RETURN_NONE;
+	}
+	PyErr_SetString(PyExc_TypeError, "Something is wrong. Memory table is NULL");
+	return NULL;
+}
+
 
 static PyObject *get_memory_trace(db_object *self, PyObject *args, PyObject *kwargs) {
 	char *address;  // register is a keyword
@@ -452,6 +606,18 @@ static PyMethodDef Custom_methods[] = {
 	{"rebuild_memory", (PyCFunction) rebuild_memory, METH_VARARGS,
 		"want to see the memory at a given state?"
 	},
+	{"rebuild_memory_full", (PyCFunction) rebuild_memory_full, METH_VARARGS,
+		"want to see the full memory at a given state?"
+	},
+	{"latest_memory_commit", (PyCFunction) latest_memory_commit, METH_VARARGS,
+		"what was the last instruction that comitted to a part of memory?"
+	},
+	{"view_memory_commits", (PyCFunction) view_memory_commits, METH_VARARGS,
+		"easy to debug"
+	},
+	{"force_increment_op", (PyCFunction) force_increment_op, METH_NOARGS,
+		"easy to debug"
+	},
 	{NULL}  /* Sentinel */
 };
 
@@ -479,8 +645,6 @@ static PyTypeObject CustomType = {
 
 
 PyMODINIT_FUNC PyInit_triforce_db(void){
-//	init();
-
 	PyObject *m;
 	if (PyType_Ready(&CustomType) < 0){
 		return NULL;
@@ -493,10 +657,7 @@ PyMODINIT_FUNC PyInit_triforce_db(void){
 
 	Py_INCREF(&CustomType);
 	PyModule_AddObject(m, "db_init", (PyObject *) &CustomType);
-	/*
-	if (Py_AtExit(clean)) {
-		return NULL;
-	}*/
 	return m;
 }
 
+
diff --git a/db/hash_table.c b/db/hash_table.c
@@ -26,7 +26,7 @@ struct hashtable_item_pointer{
 struct hash_table_structure {
 	char *keyword;
 	void **items;
-	struct vector_stucture *keys;
+	struct vector_stucture_pointer *keys;
 	int max_capacity;
 	int size;
 };
@@ -56,13 +56,22 @@ void debug_print(char *string){
 	}
 }
 
+void *deepcopy_char(char *input){
+	void *copy = malloc(sizeof(char) * (strlen(input) + 1));
+	strncpy(copy, input, strlen(input));
+	return copy;
+}
+
 struct hash_table_structure *init_table(char *keyword){
 	struct hash_table_structure *hash_table = malloc(sizeof(struct hash_table_structure));
 	hash_table->max_capacity = TABLE_SIZE;
 
 	hash_table->items = malloc(sizeof(void *) * hash_table->max_capacity);
 	memset(hash_table->items, 0, sizeof(void *) * hash_table->max_capacity);
 
+	hash_table->keys = init_vector_pointer("keys");
+	//init_vector_pointer(hash_table->keys);
+
 	hash_table->size = 0;
 	hash_table->keyword = keyword;
 
@@ -76,6 +85,9 @@ void *add_hash_table_value(struct hash_table_structure *hash_table, char *keywor
 		printf("neeed to resize capacity %i, index %i \n", hash_table->max_capacity, index);
 		exit(0);
 	}
+//	printf("%s\n", deepcopy_char(keyword));
+//	printf("%p\n", hash_table->keys);
+//	printf("%i\n", hash_table->keys->size);
 
 	struct vector_stucture_pointer *value_vector = NULL;
 	if(hash_table->items[index] == NULL){
@@ -103,6 +115,7 @@ void *add_hash_table_value(struct hash_table_structure *hash_table, char *keywor
 			value_vector->malloc_keyword = 1;
 			vector_add_pointer(value_vector, init_vector_pointer(value));
 		}
+		vector_add_pointer(hash_table->keys, deepcopy_char(keyword));	
 	}else{
 		struct hashtable_item_pointer *address = hash_table->items[index];
 		struct vector_stucture_pointer *vector_table = address->value;
@@ -131,6 +144,7 @@ void *add_hash_table_value(struct hash_table_structure *hash_table, char *keywor
 				value_vector->malloc_keyword = 1;
 				vector_add_pointer(value_vector, init_vector_pointer(value));
 			}
+			vector_add_pointer(hash_table->keys, deepcopy_char(keyword));	
 		}else{
 			value_vector = found_entry;
 			if(type == VALUE_INT || type == VALUE_MEMORY){