Skip to content

Latest commit

 

History

History
317 lines (228 loc) · 7.99 KB

rev.md

File metadata and controls

317 lines (228 loc) · 7.99 KB

► rev

Writeups by topic

Constraint solving:

Reverse engineering VM-based obfuscation:

Resources

Books

  • Reverse Engineering for Beginners
  • Practical Reverse Engineering
  • Practical Malware Analysis
  • Linux Device Drivers
  • Windows Internals

Recon

Run these commands on the given binary:

  1. file
  2. strings
  3. readelf
  4. md5sum
  5. objdump
  6. checksec
  7. ghex (patching, diffing)
  8. ltrace (library calls)
  9. strace (system calls)

Tools

Basics

Translation process:

We often have multiple high-level source files to compile into a single program, so we first compile each file into an object file and then we link them using a linker.

To see the various steps in the compilation process, you can use certain flags in gcc.

  • To see the preprocessor stage: gcc -E main.c
  • To see the assembly stage: gcc -s main.c
  • To see the object stage: gcc -c main.c

When you compile a C file into an object file, the object file you get has machine-level executable code. However, it needs to be linked first. So you will see that such files are “relocatable” since the code itself is there but it needs to be linked.

Linking:

Once these objects files are created, we use a linker to link them all into a single executable. We may also link libraries that we have not coded ourselves. These can be static libraries (.a), or they can be dynamic libraries (.so) which still have unresolved symbols.

In the case of objects, you can check things to be relocated using:

$ readelf --relocs main.o

Symbols:

  • ELF has DWARF. (in the binary)
  • PE has PDB. (separate file)

Interpreter:

  • Mapped into process's virtual memory.
  • Performs relocations. (Lazy bindings)
  • ld-linux.so and ntdll.dll

ELF

Used for:

  • Executables
  • Objects
  • Shared libraries
  • Core dumps

Components:

  • Executable header
  • Program headers
  • Sections
  • Section headers

Executable header:

  • struct can be found in /usr/include/elf.h
  • Parse it using: readelf -h a.out

ARM assembly

Compile ARM assembly using this:

aarch64-linux-gnu-gcc -static -c chall_1.S

Addressing modes

  1. Immediate
MOV R0, #0x0C
MOV R0, #12
  1. Direct
LDR R0, MEM
  1. Register direct
MOV R0, R1
  1. Register indirect
LDR R0, [R1]
  1. Pre-indexed
LDR R0, [R1, #4]
  • Loads R0 with the word pointed at by R1+4.
  1. Pre-indexed with write-back
LDR R0, [R1, #4]!
  • Loads R0 with the word pointed at by R1+4.
  • Then updates the pointer by adding 4 to R1.
  1. Post-indexed
LDR R0, [R1], #4
  • Loads R0 with the word pointed at by R1.
  • Then updates the pointer by adding 4 to R1.

x86 Assembly

Basics:

  • Byte (8 bits), word (16 bits) and double word (32 bits)
  • RAX: 64-bit, EAX: 32-bit value, AX is the lower 16-bits, AL is the lower 8 bits, AH is the bits 8 through 15 (zero-based).
  • Passing arguments: https://ctf101.org/binary-exploitation/what-are-calling-conventions/
    • 64-bit:
      • Linux: RDI, RSI, RDX, RCX, R8, R9
      • Windows: RCX, RDX, R8, R9, stack.
    • 32-bit: push arguments on to the stack (include them in the payload).
    • Arguments are pushed before the EIP in reverse order (right-to-left).
  • .bss segment is used for statically-allocated variables that are not explicitly initialized to any value.
  • Least significant three nibbles are the offset within a page (4KB) 3*4=12 => 2^12 = 4*(2^10)

Instructions:

  • LEAVE: equivalent to mov esp,ebp; pop ebp
  • CALL: push address of next instruction and change eip to given address.
  • MOVS/MOVSB/MOVSW/MOVSD: move data from string to string.
  • MOVSX: move with signed extension.
  • BND: the return target should be checked against the bounds specified in the BND0 to BND3 registers
  • ENDBR64: it's an instruction that compilers put at the top of main in case the CPU supports the CET feature and is using it. If it's not supported, it's just a nop.
  • Indirect branches are things like call rax that _libc_start_main uses to call your main, after _start passes it a pointer to your main.

Coding in assembly:

  • 64-bit: $ nasm -felf64 hello.asm && ld hello.o && ./a.out
  • 32-bit: $ nasm -felf32 -g -F dwarf eip.asm && ld -m elf_i386 -o eip eip.o
  • In gdb, you can set breakpoint for the asm program using its labels, for example b _start
  • Running assembly in C
  • We usually use _start in assembly similar to how we use main in C.
  • Therefore we often break at _start within gdb.
  • Use fin to continue until current function finishes.

C

32-bit compilation:

$ sudo apt install gcc-multilib
$ gcc -m32 test.c -o test

Signedness:

  • Make sure to use %u format specifier for unsigned data-types.
  • The CPU does not care about signed and unsigned representations.
  • We can see the difference while shifting integers and overflows.
  • The conditional opcodes help differentiate in signedness.

Datatypes:

  • For uint_ related datatypes you need to #include <stdint.h>

Debugging stripped binaries

  • (gdb) info file
  • gef> entry
  • gef> disas _start

Further reading:

64-bit code interpreted as 32-bit code.


TODO

  • Known constants while reversing.
  • Idek CTF 2021: {lights out, exponential}

Binary Analysis

  • Binary Loaders
  • Dynamic Instrumentation
  • Dynamic Taint Analysis

Code coverage

Download and run DynamoRIO on your binary:

$ ./drrun -t drcov -dump_text -- ~/Desktop/ctf/dragon21/runofthemill

Install Dragon Dance:

You will need to follow build instructions from the README since the last release is not compatible with the latest version of Ghidra.


How to be a full-stack reverse engineer [1]

Year 1:

  • Reversing by Eldad-Eilam
  • Learn assembly:
    • Hand decompile
    • Floating point
    • Vector code
  • Reverse a game:
    • 3D game, late 90s to mid 20s, custom engine.
    • Reverse data archive format and write an unpacker.
    • Reverse model format and write a renderer.
  • Compilers by Aho-et-al
  • Write a source-to-source compiler. (Scheme to Python)
  • Consider making ur own source language. (not that hard)
  • Write an assembler. (not x86: pick mips, 32-bit ARM, CIL)

Year 2:

  • Write compiler to assembly. (subset of C)
  • Reverse Compilation Techniques by Cristina Cifuentes
  • Write a bytecode decompiler. (Dalvik or CIL)
    • Start with go-to based flows.
    • Reconstruct flow based on graph.
    • Transform to SSA for opt and clean.
  • Write a machine code decompiler. (ARM to pseudo-C)
  • Read the osdev wiki.
  • Write a toy kernel.
    • C x86 protected.
    • Text, input, basic graphics.
  • Read the osdev wiki.
  • Rewrite your kernel. (in rust)
  • Write a microkernel. (L4)

Year 3:

  • Write an interpreting emulator. (NES,SNES,Gameboy,PS)
  • Write a recompiling emulator.
  • Write an emulator for a black box platform.