# Lecture 05: ELF and Program Anatomy

**Executable and Linkable Format (ELF)**: common format for executable and library files on modern *nix systems

All ELF files begin with a 'magic number' of 0x7F'E''L''F'

**Sections**: organize binary into logical sections used by the linker and loaders. Some examples are: 

- **.bss**: uninitialized data (global variables)
- **.text**: code 

**Segments**: define the parts that should be loaded into memory and how

The ```readelf``` command examines ELF file data

- little endian: least significant bit is first
- 2's complement: negative numbers
- **entrypoint address**: where program will start running

Process: instance of a running program. Provides each program with 2 key abstractions:

- logical control flow
- private address space

Virtual memory decouples address space from physical memory:

- can be larger than physical memory
- not all pages need to be in physical memory to run a program (see page switching or mapping/un-mapping)
- kernel is always mapped

The stack grows from **top to bottom**

The heap is dynamically allocated data accessed via ```malloc()```, ```calloc()```, ```realloc()```, ```free()```

If the most significant bit is 1, then an address is kernel space. Else it is user/application space

The ```/proc``` pseudo-filesystem exposes different aspects of running programs (memory layout of running processes)

- global data in ```/bin/cat``` with r-w permissions
- constant data in ```/bin/cat``` with w permissions

## (Dis)assembly

Intel x86 processor are **Complex Instruction Set Computers (CISCs)**:

- instructions may have variable sizes and formats
- general purpose registers (EAX, EDI, ESP etc.)
- instruction pointer/program counter (IP/PC)
- important registers:
  - ESP: stack pointer
  - EBP: frame pointer

There are 8 registers in 32-bit systems and 16 registers in 64-bit systems

Assembly instructions:

- **push**: push word to stack
- **mov**: move word
- **imul**: signed multiply
- **add**: add

These base commands are combined with the datatype to be operated on. The operand therefore must match the size, e.g.:

- pushq: push quad word to stack
- imull: signed multiply long
- etc.

Memory operands: parentheses indicate a memory operand. Each memory address can be defined as:

- Base + Index * Scale + Displacement
- AT&T syntax: disp(base, index, scale)
  - disp, index, scale are optional
  - ex: -20(%rdx.%rdi,4) = rdx - 20 + (rdi*4)
- ex: movl %edi, -20(%rbp): store %edi 20 below %rdp

**Constants** (aka immediates) are defined using ```$``` and are in decimal unless:

- 0x prefix: hexadecimal
- 0 prefix: octal

Endianness (big/little) determines ordering:

- big: least significant bit is last
- little: least significant bit is first

Load effective address:

- leaq src,dest (set pointer and address something to it)
  - src is address pode expression
  - dest is set to address denoted by the expression
- computing addresses without a memory reference:
  - e.g. translate p = &x[i];
- computing arithmetic expressions of the form x + k*y

## Control Flow Instructions

**Jumps**: sets the PC and redirects the control flow

- ```jmp rel (jmp 0x6)```: rel can be 8, 16, or 32-bit intermediate
  - relative jump to next instruction
  - target = next instruction address + relative
- ```jmp *%reg(jmp *%rax)```: absolute jump to the address in the register
- ```jmp*(%reg)(jmp*(%rac))```: absolute jump to register

**Conditional jumps**: sets the PC and redirects the control flow only if the condition is true

- ```jle```: (jump less or equal)
- ```jcc rel (jle 0x8)```: if condition is met, then jump to rel
- normally preceded by compare instructions (e.g., if/else) or test instruction (```testq```)

**Calls**: calls function at the address

**Returns**: returns from the current functions

**nop**: not an operation (do nothing)

**sal**: shift arithmetic left (multiply by 2)

### Memory Usage

Immediate (constant) jumps, conditional jumps, calls etc. use **relative addressing**

Memory operations always use **absolute addressing**

When loading addresses into registers:

- 32-bit systems: use constant
- 64-bit systems: use RIP-relative addressing

### Disassembling

Process of recovering assembly code of a binary. Various means to do so, such as the ```disas``` command of ```gdb``` and the ```objdump -d``` utility

