arm64 assembly crash course
Note: May or may not be specific to iOS.
Registers
- General purpose registers 0 through 30 with two addressing modes:
w0
= 32-bitx0
= 64-bit
- Zero register
wzr
orxzr
. Write to = discard, read from =0
. - Stack pointer
sp
- unlike other instruction sets, never modified implicitly (e.g. nopush
/pop
). - Instruction pointer
pc
, not modifiable directly. - A lot of float, vector and system registers, look up as needed.
- First register in assembly is usually destination, rest are source (except for
str
).
Register manipulation
mov
to copy one register to another, e.g.mov x0, x1
->x0 = x1
.- Constant
0
loaded fromwzr
/xzw
. - Small constants usually OR'ed with zero register, e.g.
orr x0, xzr, 5
. - Big constants usually loaded with
movz
+movk
, e.g.:->movz x0, 0x1234, lsl 32 movk x0, 0x5678, lsl 16 movk x0, 0x9abc
x0 = 0x123456789abc
. movn
for negative values, e.g.movn x0, 1
->x0 = -1
.lsl
andlsr
instructions = logic-shift-left and logic-shift-right, e.g.lsl x0, x0, 8
->x0 <<= 8
.lsl
andlsr
not only used as instructions, but also as operands to other instructions (seemovz
above).asl
for arithmetic shift also exists, but less frequently used.
- Lots of arithmetic, logic and bitwise instructions, look up as needed.
Memory
ldr
andstr
with multiple variations and addressing modes:ldr x0, [x1]
->x0 = *x1
str x0, [x1]
->*x1 = x0
ldr x0, [x1, 0x10]
->x0 = *(x1 + 0x10)
ldp
/stp
to load/store two registers at once behind each other, e.g.:
stp x0, x1, [x2]
->*x2 = x0; *(x2 + 8) = x1;
- Multiple variations for load/store size:
- Register names
xN
for 64-bit,wN
for 32-bit ldrh
/srth
for 16-bitldrb
/strb
for 8-bit
- Register names
- Multiple variations for sign-extending registers smaller than 64-bit:
ldrsw x0, [x1]
-> load 32-bit int, sign extend to 64-bitldrsh x0, [x1]
-> load 16-bit int, sign extend to 64-bitldrsb x0, [x1]
-> load 8-bit int, sign extend to 64-bit- (No equivalent
str
instructions)
- Three register addressing modes:
- Normal:
ldr x0, [x1, 0x10]
- Pre-indexing:
ldr x0, [x1, 0x10]!
(notice the!
) ->x1 += 0x10; x0 = *x1;
- Post-indexing:
ldr x0, [x1], 0x10
->x0 = *x1; x1 += 0x10;
- Normal:
- Memory addresses usually computed by PC-relative instructions:
adr x0, 0x12345
(only works for small offset from PC)- Bigger ranges use
adrp
+add
:adrp x0, 0xffffff8012345000 ; "address of page", last 12 bits are always zero add x0, x0, 0x678
- Even bigger ranges usually stored as pointers in data segment, offset by linker and loaded with
ldr
.
Calling convention
Note: Only dealing with integral types here. The rules change when floating-point is involved.
x0
-x7
first 8 arguments, rest on the stack (low address to high) with natural alignment (as if they were members of a struct)x8
pointer to where to write the return value if >128 bits, otherwise scratch registerx9
-x17
scratch registersx18
platform register (reserved, periodically zeroed by XNU)x19
-x28
callee-savedx29
frame pointer (basically also just callee-saved)x30
return address- Functions that save anything in
x19
-x28
usually start like this:and end like this:stp x24, x23, [sp, -0x40]! stp x22, x21, [sp, 0x10] stp x20, x19, [sp, 0x20] stp x29, x30, [sp, 0x30] add x29, sp, 0x30
The stack for local variables is usually managed separately though, withldp x29, x30, [sp, 0x30] ldp x20, x19, [sp, 0x20] ldp x22, x21, [sp, 0x10] ldp x24, x23, [sp], 0x40 ret
add sp, sp, 0x...
andsub sp, sp, 0x...
. - Variadic arguments are passed on the stack (low address to high), each promoted to 8 bytes. Structs that don't fit into 8 bytes have a pointer passed instead.
Fixed arguments that don't fit intox0
-x7
come before variadic arguments on the stack, naturally aligned. - The return value is passed as follows:
- If it fits into 64 bits, in
x0
. - If it fits into 128 bits, the first/lower half in
x0
, the second/upper half inx1
. - If it is larger than 128 bits, the caller passes a pointer in
x8
to where the result is written.
- If it fits into 64 bits, in
Conditions
- System register
nzcv
holds condition flags (Negative, Zero, Carry, oVerflow).
Set by one instruction and acted upon by a subsequent one, the latter using condition codes.
(Could be accessed as normal system register, but usually isn't.) - Some instructions use condition codes as suffixes (
instr.cond
), others as source operands (instr ..., cond
). List of condition codes:eq
/ne
= equal/not equallt
/le
/gt
/ge
= less than/less or equal/greater than/greater or equal (signed)lo
/ls
/hi
/hs
= lower/lower or same/higher/higher or same (unsigned)- A few more weird flags, seldom used.
- Unlike many other instruction sets, arm64 sets carry on no-borrow rather than borrow.
Thus,cs
/cc
= carry set/carry clear are aliases ofhs
/lo
.
cmp
= most common/basic compare instruction, sets condition flags. Examples:cmp x0, x1 cmp x0, 3
- Other instructions that set condition flags:
cmn
= compare negativetst
= bitwise testadds
/adcs
= add/add with carrysubs
/sbcs
= subtract/subtract with carrynegs
/ngcs
= negate/negate with carry- Some more bitwise and float instructions.
- Some instructions that act on condition flags:
cset
= conditional set, e.g.:->cmp x0, 3 cset x0, lo
x0 = (x0 < 3)
csel
= conditional select, e.g.:->cmp x0, 3 csel x0, x1, x2, lo
x0 = (x0 < 3) ? x1 : x2
(Translates nicely to ternary conditional.)ccmp
= conditional compare, e.g.:->cmp x0, 3 ccmp x0, 7, 2, hs b.hi 0xffffff8012345678
hi
condition will be true ifx0 < 3 || x0 > 7
(thirdccmp
operand is rawnzcv
data).
Often generated by compiler for logical and/or of two conditions.- Many, many more.
Branches
b
= simple branch, jump to PC-relative address.
Can be unconditional:or conditional:b 0xffffff8012345678
Used primarily within function for flow control.cmp x0, 3 b.lo 0xffffff8012345678 ; jump to 0xffffff8012345678 if x < 3
- Shortcuts
cbz
/cbnz
= compare-branch-zero and compare-branch-non-zero.
Just shorter ways to writeorcmp xN, 0 b.eq 0x...
(Translate nicely to Ccmp xN, 0 b.ne 0x...
if(x)
orif(!x)
.) - Shortcurts
tbz
/tbnz
= test single bit and branch if zero/non-zero.
E.g.tbz x0, 3, ...
translates toif((x0 & (1 << 3)) == 0) goto ...
. bl
= branch-and-link (e.g.bl 0xffffff8012345678
)
Store return address tox30
and jump to PC-relative address. Used for static function calls.blr
= branch-and-link to register (e.g.blr x8
)
Store return address tox30
and jump to address inx8
. Used for calls with function pointers or C++ virtual methods.br
= branch to register (e.g.br x8
)
Jump to address inx8
. Used for tail calls.ret
= return to address in register, default:x30
Can in theory use registers other thanx30
(e.g.ret x8
), but compiler doesn't usually generate that.