Abstration
Abstraction is good but know the reality
I took this note from this [ Computer Systems: A programmer Perspective ] Book
Compilation Process [ from Think-OS ]
Pre-processing
- C is one of several languages that include preprocessing directives that take effect before the program is compiledParsing
- an abstract syntax tree. Errors detected during this step are generally syntax errors.Static checking
- The compiler checks whether variables and values have the right type, whether functions are called with the right number and type of arguments, etc.Code generation
- The compiler reads the internal representation of the program and generates machine code or byte code.Linking
- If the program uses values and functions defined in a library, the compiler has to find the appropriate library and include the required code.Optimization
- the compiler can trans- form the program to generate code that runs faster or uses less space.
Code Generation
Text ( C ) -> Assembly Code -> Assembler -> Object Code ->
Object Code
luna@luna-LOL:~/Desktop/pwn$ gcc -c -o hello.o hello.c
luna@luna-LOL:~/Desktop/pwn$ nm hello.o
00000000 T main
U puts // We used printf but optimization changed to puts
- Object files can in turn be linked to form an executable file or library file. In order to be used, object code must either be placed in an executable file, a library file, or an object file.
- Machine code is a computer program written in machine language instructions that can be executed directly by a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very specific task, such as a load, a store, a jump, or an arithmetic logic unit (ALU) operation on one or more units of data in the CPU's registers or memory.
If you want to know how machine code and CPU works with Visual , Please watch this short video from Crash Course. [ CPU & Instructions ]
Assembly Code
luna@luna-LOL:~/Desktop/pwn$ gcc -S -o hello.s hello.c
luna@luna-LOL:~/Desktop/pwn$ cat hello.s
.file "hello.c"
.section .rodata
.LC0:
.string "Hello World"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
leal 4(%esp), %ecx
.cfi_def_cfa 1, 0
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
.cfi_escape 0x10,0x5,0x2,0x75,0
movl %esp, %ebp
pushl %ecx
.cfi_escape 0xf,0x3,0x75,0x7c,0x6
subl $4, %esp
subl $12, %esp
pushl $.LC0
call puts
addl $16, %esp
movl $0, %eax
movl -4(%ebp), %ecx
.cfi_def_cfa 1, 0
leave
.cfi_restore 5
leal -4(%ecx), %esp
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits
This is an Assembly code that generated by Assembler.
Let's write some basic Assembly code for understanding.
Before you write
System Calls
- Connection between application and kernel [ Ref ]Interrupts
- an event that alters the normal execution flow of a program and can be generated by hardware devices or even by the CPU itself [int 0x80
is systemcalls interrupt ]Registers
Stack
Sections
[ ELF Intro ]
Compile assembly code to object and Linking with executable
nasm -f elf32 -o filename.o filename.asm
ld -m elf_i386 -o filename filename.o
Registers [ Ref ]
#General registers
EAX EBX ECX EDX
#Segment registers
CS DS ES FS GS SS
#Index and pointers
ESI EDI EBP EIP ESP
#Indicator
EFLAGS
32 bits : EAX EBX ECX EDX
16 bits : AX BX CX DX
8 bits : AH AL BH BL CH CL DH DL
H = Higher 8 bits , L = Lower 8 bits
General Registers
EAX,AX,AH,AL : Called the Accumulator register.
It is used for I/O port access, arithmetic, interrupt calls,
etc...
EBX,BX,BH,BL : Called the Base register
It is used as a base pointer for memory access
Gets some interrupt return values
ECX,CX,CH,CL : Called the Counter register
It is used as a loop counter and for shifts
Gets some interrupt values
EDX,DX,DH,DL : Called the Data register
It is used for I/O port access, arithmetic, some interrupt
calls.
Segment Registers
CS : Holds the Code segment in which your program runs.
Changing its value might make the computer hang.
DS : Holds the Data segment that your program accesses.
Changing its value might give erronous data.
ES,FS,GS : These are extra segment registers available for
far pointer addressing like video memory and such.
SS : Holds the Stack segment your program uses.
Sometimes has the same value as DS.
Changing its value can give unpredictable results,
mostly data related.
Pointer Registers
ES:EDI EDI DI : Destination index register
Used for string, memory array copying and setting and
for far pointer addressing with ES
DS:ESI EDI SI : Source index register
Used for string and memory array copying
SS:EBP EBP BP : Stack Base pointer register
Holds the base address of the stack
SS:ESP ESP SP : Stack pointer register
Holds the top address of the stack
CS:EIP EIP IP : Index Pointer
Holds the offset of the next instruction
It can only be read
EFLAGS registers
Bit Label Desciption
---------------------------
0 CF Carry flag
2 PF Parity flag
4 AF Auxiliary carry flag
6 ZF Zero flag
7 SF Sign flag
8 TF Trap flag
9 IF Interrupt enable flag
10 DF Direction flag
11 OF Overflow flag
12-13 IOPL I/O Priviledge level
14 NT Nested task flag
16 RF Resume flag
17 VM Virtual 8086 mode flag
18 AC Alignment check flag (486+)
19 VIF Virutal interrupt flag
20 VIP Virtual interrupt pending flag
21 ID ID flag
Those that are not listed are reserved by Intel.
We also need some instructions [ Cheatsheet ]
Lesson 1 - exit program ( understand syscalls )
l1.asm
mov eax,1 // eax = syscalls number
mov ebx,2 // first arguments
int 0x80 // interrupt
Testing l1.asm ( echo $? = exit status )
luna@luna-LOL:~/Desktop/pwn/86_ASM$ nasm -f elf32 -o l1.o l1.asm
luna@luna-LOL:~/Desktop/pwn/86_ASM$ ld -o l1 l1.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000008048060
luna@luna-LOL:~/Desktop/pwn/86_ASM$ ./l1
luna@luna-LOL:~/Desktop/pwn/86_ASM$ echo $?
2 // exit status
Lesson 2 - helloworld program ( understand write() syscall and strings )
l2.asm
msg db "Hello, world!", 0x0a -> defining global db and assign Hello , world! to this variable
len equ $ - msg -> caculate the length of msg variable and assign to len
mov eax,4 -> set write sys call number (4) to eax
mov ebx,1 -> set file descriptor as first argument
mov ecx,msg -> set message as second argument
mov edx,len -> set length as third argument
int 0x80 -> call interrupt
Lesson 3 - Comparison program ( conditional jump )
l3.asm
_start
mov ecx,99 - set 99 to eax
mov ebx,42 - set 42 to ebx
mov eax,1 - set exit system call to eax
cmp ecx,100 - compare ecx (99) and 100
jl skip - if less than jump to skip
skip
int 0x80 - sys call interrupt
Lesson 4 - Loop Program
l4.asm
_start
mov ebx,1 -> set 1 to ebx
mov ecx,6 -> set 6 to ecx ( ecx is counter register)
label
add ebx,ebx -> same with ebx + ebx
dec ecx -> same with ecx--
cmp ecx,0 -> compare ecx and 0
jg label -> Jump to label if greater than 0
mov eax,1
int 0x80
(Note)
* ecx will decrease to 0 because of compare with 0
* first round ebx=2 , ecx=5
* second round ebx=4 , ecx=4
* third round ebx=8 , ecx=3
* fourth round ebx=16 , ecx=2
* fifth round ebx=32 , ecx=1
* sixth round ebx=64 , ecx=0
* When ecx is not greater than 0 , prgram call exit interrupt
Lesson 5 - Data Movement Program
l5.asm
addr db "yellow" - > define "yellow" in global variable addr
mov [addr],byte 'H' -> move 'H' to the first byte of addr
mov [addr+5],byte '!' -> move '!' to the sixth byte of addr
(Note)
* after moving some byte ,write and exit
Lesson 6 - Stack Usage Program
l6.asm
addr db "yellow" - > define "yellow" in global variable addr
mov [addr],byte 'H' -> move 'H' to the first byte of addr
mov [addr+5],byte '!' -> move '!' to the sixth byte of addr
(Note)
* after moving some byte ,write and exit
Lesson 7 - Function Call
l7.asm
_start:
call func -> when call func , push eip to the stack
mov eax,1 -> set sys call exit to eax
int 0x80 -> call interrupt
func:
mov ebx,42 -> set 42 to ebx
ret -> set address at the top of the stack as eip
(Note)
* when func called , push next instruction to the stack
* when ret , top of the stack will be next instruction
* it will return to mov eax,1
Lesson 8 - Function Proluge Program
l8.asm
func
push ebp -> saved current ebp at the top of the stack
mov ebp,esp -> move esp address to ebp
sub esp,2 -> take 2 bytes from the esp
....
mov esp,ebp -> move ebp address to esp
pop ebp -> popping saved ebp to current ebp
ret
(Note)
* the function proluge is take 2 bytes for func
* when function is finished instructions , taken 2 bytes space on the stack where destroyed (same with leave) , and then put saved ebp to the current ebp
Lesson 9 - Push into stack and Return value
l9.asm
_start
push 21 -> push 21 to the top of the stack
call times2 -> call times function
mov ebx,eax -> mov eax's value to ebx
mov eax,1
int 0x80
times2
push ebp -> saved ebp
mov ebp,esp -> move esp to ebp
mov eax,[ebp+8] -> move 21 to eax
add eax,eax -> 21+21 =42
mov esp,ebp -> move ebp to esp
pop ebp -> popped saved ebp to ebp
ret -> set next instruction pointer on the top of the stack
(Note)
* we pushed 21 to the stack and call times2
* times 2 make function proluge and 21 is exists at [ebp+8]
* mov 21 to eax and peform 21+21 , now eax will be 42
* and the stack frame is destroy and return to main function
* eax stil 42 and mov this value to ebx
* when exit , exit status will be 42 because ebx=42
Reference