Skip to content

Latest commit

 

History

History
349 lines (314 loc) · 12.1 KB

13.CPU_Opcode_Assembly.md

File metadata and controls

349 lines (314 loc) · 12.1 KB

CPU, OP_Code & Assembly

Abstration

Abstraction is good but know the reality

I took this note from this [ Computer Systems: A programmer Perspective ] Book

Compilation Process [ from Think-OS ]

  • Pre-processing - C is one of several languages that include preprocessing directives that take effect before the program is compiled
  • Parsing - an abstract syntax tree. Errors detected during this step are generally syntax errors.
  • Static checking - The compiler checks whether variables and values have the right type, whether functions are called with the right number and type of arguments, etc.
  • Code generation - The compiler reads the internal representation of the program and generates machine code or byte code.
  • Linking - If the program uses values and functions defined in a library, the compiler has to find the appropriate library and include the required code.
  • Optimization - the compiler can trans- form the program to generate code that runs faster or uses less space.

Code Generation

Text ( C ) -> Assembly Code -> Assembler -> Object Code ->

Object Code

luna@luna-LOL:~/Desktop/pwn$ gcc -c -o hello.o hello.c
luna@luna-LOL:~/Desktop/pwn$ nm hello.o
00000000 T main
         U puts  // We used printf but optimization changed to puts

If you want to know how machine code and CPU works with Visual , Please watch this short video from Crash Course. [ CPU & Instructions ]

Assembly Code

luna@luna-LOL:~/Desktop/pwn$ gcc -S -o hello.s hello.c
luna@luna-LOL:~/Desktop/pwn$ cat hello.s
        .file   "hello.c"
        .section        .rodata
.LC0:
        .string "Hello World"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        leal    4(%esp), %ecx
        .cfi_def_cfa 1, 0
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        .cfi_escape 0x10,0x5,0x2,0x75,0
        movl    %esp, %ebp
        pushl   %ecx
        .cfi_escape 0xf,0x3,0x75,0x7c,0x6
        subl    $4, %esp
        subl    $12, %esp
        pushl   $.LC0
        call    puts
        addl    $16, %esp
        movl    $0, %eax
        movl    -4(%ebp), %ecx
        .cfi_def_cfa 1, 0
        leave
        .cfi_restore 5
        leal    -4(%ecx), %esp
        .cfi_def_cfa 4, 4
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609"
        .section        .note.GNU-stack,"",@progbits

This is an Assembly code that generated by Assembler.


x86 Assembly

Let's write some basic Assembly code for understanding.

Before you write

  • System Calls - Connection between application and kernel [ Ref ]
  • Interrupts - an event that alters the normal execution flow of a program and can be generated by hardware devices or even by the CPU itself [ int 0x80 is systemcalls interrupt ]
  • Registers
  • Stack
  • Sections [ ELF Intro ]

Compile assembly code to object and Linking with executable

nasm -f elf32 -o filename.o filename.asm

ld -m elf_i386 -o filename filename.o

Registers [ Ref ]

#General registers
EAX EBX ECX EDX

#Segment registers
CS DS ES FS GS SS

#Index and pointers
ESI EDI EBP EIP ESP

#Indicator
EFLAGS
32 bits :  EAX EBX ECX EDX
16 bits : AX BX CX DX
 8 bits : AH AL BH BL CH CL DH DL

H = Higher 8 bits , L = Lower 8 bits

General Registers

EAX,AX,AH,AL : Called the Accumulator register. 
               It is used for I/O port access, arithmetic, interrupt calls,
               etc...

EBX,BX,BH,BL : Called the Base register
               It is used as a base pointer for memory access
               Gets some interrupt return values

ECX,CX,CH,CL : Called the Counter register
               It is used as a loop counter and for shifts
               Gets some interrupt values

EDX,DX,DH,DL : Called the Data register
               It is used for I/O port access, arithmetic, some interrupt 
               calls.

Segment Registers

CS         : Holds the Code segment in which your program runs.
             Changing its value might make the computer hang.

DS         : Holds the Data segment that your program accesses.
             Changing its value might give erronous data.

ES,FS,GS   : These are extra segment registers available for
             far pointer addressing like video memory and such.

SS         : Holds the Stack segment your program uses.
             Sometimes has the same value as DS.
             Changing its value can give unpredictable results,
             mostly data related.

Pointer Registers

ES:EDI EDI DI : Destination index register
                Used for string, memory array copying and setting and
                for far pointer addressing with ES

DS:ESI EDI SI : Source index register
                Used for string and memory array copying

SS:EBP EBP BP : Stack Base pointer register
                Holds the base address of the stack
                
SS:ESP ESP SP : Stack pointer register
                Holds the top address of the stack

CS:EIP EIP IP : Index Pointer
                Holds the offset of the next instruction
                It can only be read

EFLAGS registers

Bit   Label    Desciption
---------------------------
0      CF      Carry flag
2      PF      Parity flag
4      AF      Auxiliary carry flag
6      ZF      Zero flag
7      SF      Sign flag
8      TF      Trap flag
9      IF      Interrupt enable flag
10     DF      Direction flag
11     OF      Overflow flag
12-13  IOPL    I/O Priviledge level
14     NT      Nested task flag
16     RF      Resume flag
17     VM      Virtual 8086 mode flag
18     AC      Alignment check flag (486+)
19     VIF     Virutal interrupt flag
20     VIP     Virtual interrupt pending flag
21     ID      ID flag

Those that are not listed are reserved by Intel.

We also need some instructions [ Cheatsheet ]

Lesson 1 - exit program ( understand syscalls )
l1.asm

mov eax,1 // eax = syscalls number
mov ebx,2 // first arguments
int 0x80 // interrupt

Testing l1.asm ( echo $? = exit status )

luna@luna-LOL:~/Desktop/pwn/86_ASM$ nasm -f elf32 -o l1.o l1.asm
luna@luna-LOL:~/Desktop/pwn/86_ASM$ ld -o l1 l1.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000008048060
luna@luna-LOL:~/Desktop/pwn/86_ASM$ ./l1
luna@luna-LOL:~/Desktop/pwn/86_ASM$ echo $?
2 // exit status

Lesson 2 - helloworld program ( understand write() syscall and strings )
l2.asm

msg db "Hello, world!",  0x0a -> defining global db and assign Hello , world! to this variable
len equ $ - msg -> caculate the length of msg variable and assign to len
mov eax,4 -> set write sys call number (4) to eax
mov ebx,1 -> set file descriptor as first argument
mov ecx,msg -> set message as second argument
mov edx,len -> set length as third argument
int 0x80 -> call interrupt 

Lesson 3 - Comparison program ( conditional jump )
l3.asm

_start
mov ecx,99 - set 99 to eax
mov ebx,42 - set 42 to ebx
mov eax,1 - set exit system call to eax
cmp ecx,100 - compare ecx (99) and 100
jl skip - if less than jump to skip

skip
int 0x80 - sys call interrupt

Lesson 4 - Loop Program
l4.asm

_start
mov ebx,1 -> set 1 to ebx
mov ecx,6 -> set 6 to ecx ( ecx is counter register)

label
add ebx,ebx -> same with ebx + ebx
dec ecx -> same with ecx--
cmp ecx,0 -> compare ecx and 0
jg label -> Jump to label if greater than 0
mov eax,1
int 0x80
(Note)
* ecx will decrease to 0 because of compare with 0
* first round ebx=2 , ecx=5 
* second round ebx=4 , ecx=4
* third round ebx=8 , ecx=3
* fourth round ebx=16 , ecx=2
* fifth round ebx=32 , ecx=1
* sixth round ebx=64 , ecx=0
* When ecx is not greater than 0 , prgram call exit interrupt 

Lesson 5 - Data Movement Program
l5.asm

addr db "yellow" - > define "yellow" in global variable addr
mov [addr],byte 'H' -> move 'H' to the first byte of addr 
mov [addr+5],byte '!' -> move '!' to the sixth byte of addr
(Note)
* after moving some byte ,write and exit

Lesson 6 - Stack Usage Program
l6.asm

addr db "yellow" - > define "yellow" in global variable addr
mov [addr],byte 'H' -> move 'H' to the first byte of addr 
mov [addr+5],byte '!' -> move '!' to the sixth byte of addr
(Note)
* after moving some byte ,write and exit

Lesson 7 - Function Call
l7.asm

_start:
call func -> when call func , push eip to the stack
mov eax,1 -> set sys call exit to eax
int 0x80 -> call interrupt

func:
mov ebx,42 -> set 42 to ebx
ret -> set address at the top of the stack as eip
(Note)
* when func called , push next instruction to the stack 
* when ret , top of the stack will be next instruction
* it will return to mov eax,1 

Lesson 8 - Function Proluge Program
l8.asm

func
push ebp -> saved current ebp at the top of the stack
mov ebp,esp -> move esp address to ebp
sub esp,2 -> take 2 bytes from the esp
....
mov esp,ebp -> move ebp address to esp
pop ebp -> popping saved ebp to current ebp
ret
(Note)
* the function proluge is take 2 bytes for func
* when function is finished instructions , taken 2 bytes space on the stack where destroyed (same with leave) , and then put saved ebp to the current ebp 

Lesson 9 - Push into stack and Return value
l9.asm

_start
 push 21 -> push 21 to the top of the stack
 call times2 -> call times function
 mov ebx,eax -> mov eax's value to ebx
 mov eax,1
 int 0x80

times2
push ebp -> saved ebp
mov ebp,esp -> move esp to ebp
mov eax,[ebp+8] -> move 21 to eax
add eax,eax -> 21+21 =42
mov esp,ebp -> move ebp to esp
pop ebp -> popped saved ebp to ebp
ret -> set next instruction pointer on the top of the stack
(Note)
* we pushed 21 to the stack and call times2
* times 2 make function proluge and 21 is exists at [ebp+8]
* mov 21 to eax and peform 21+21 , now eax will be 42 
* and the stack frame is destroy and return to main function
* eax stil 42 and mov this value to ebx 
* when exit , exit status will be 42 because ebx=42

Reference