#### ARM Instruction Set

Computer Organization and Assembly Languages Yung-Yu Chuang

#### Introduction



- The ARM processor is easy to program at the assembly level. (It is a RISC)
- We will learn ARM assembly programming at the user level and run it on a GBA emulator.

## ARM programmer model



- The state of an ARM system is determined by the content of visible registers and memory.
- A user-mode program can see 15 32-bit generalpurpose registers (R0-R14), program counter (PC) and CPSR.
- Instruction set defines the operations that can change the state.

## Memory system



- Memory is a linear array of bytes addressed from 0 to 2<sup>32</sup>-1
- Word, half-word, byte
- Little-endian



## Byte ordering



- Big Endian
  - Least significant byte has highest address

Word address 0x00000000

Value: 00102030

- Little Endian
  - Least significant byte has lowest address

Word address 0x00000000

Value: 30201000



# ARM programmer model



| R0  | R1  | R2  | R3  |
|-----|-----|-----|-----|
| R4  | R5  | R6  | R7  |
| R8  | R9  | R10 | R11 |
| R12 | R13 | R14 | PC  |

| 0x0000000  | 00 |
|------------|----|
| 0x00000001 | 10 |
| 0x00000002 | 20 |
| 0x0000003  | 30 |
| 0x00000004 | FF |
|            | FF |
| 0x00000005 | FF |
| 0x00000006 |    |

| 31 30 29 28 27 2 | 26 // 8 | 7 | 6 | 5 | 4      | 3      | 2      | 1      | 0      |
|------------------|---------|---|---|---|--------|--------|--------|--------|--------|
| N Z C V Q        |         | Ι | F | Т | M<br>4 | M<br>3 | M<br>2 | M<br>1 | M<br>0 |

00 00

#### Instruction set



ARM instructions are all 32-bit long (except for Thumb mode).

There are 2<sup>32</sup>

possible machine instructions.

Fortunately, they are structured.

|                                                          | 31 30 29 28 | 27 | 26 | 25    | 24 | 23         | 22            | 21 | 20 | 19 | 18 17 | 16 | 15 | 1.4 | 13 | 12   | 11  | 10   | 9 8  | 7    | - 6 | 5    | 4     | 3    | 2  | 1  | 0 |
|----------------------------------------------------------|-------------|----|----|-------|----|------------|---------------|----|----|----|-------|----|----|-----|----|------|-----|------|------|------|-----|------|-------|------|----|----|---|
| Data processing immediate shift                          | cond [1]    | 0  | 0  | 0     | 0  | рсс        | ode           |    | S  |    | Rn    |    |    | F   | Rd |      | s   | hift | amo  | unt  | S   | hift | 0     |      | R  | m  |   |
| Miscellaneous instructions:<br>See Figure 3-3            | cond [1]    | 0  | 0  | 0     | 1  | 0          | х             | x  | 0  | x  | х х   | х  | х  | х   | х  | х    | x   | х    | хх   | ×    | ( ) | х    | 0     | x    | x  | х  | х |
| Data processing register shift [2]                       | cond [1]    | 0  | 0  | 0     | C  | рс         | ode           | •  | s  |    | Rn    |    |    | F   | Rd |      |     | R    | s    | 0    | 5   | hift | 1     |      | R  | m  |   |
| Miscellaneous instructions:<br>See Figure 3-3            | cond [1]    | 0  | 0  | 0     | 1  | 0          | x             | x  | 0  | x  | хх    | х  | х  | х   | х  | х    | x   | x    | хх   | 0    | >   | х    | 1     | x    | х  | х  | х |
| Multiplies, extra load/stores:<br>See Figure 3-2         | cond [1]    | 0  | 0  | 0     | х  | х          | x             | x  | x  | x  | хх    | Х  | х  | х   | x  | х    | x   | x    | хх   | 1    | >   | х    | 1     | x    | х  | х  | х |
| Data processing immediate [2]                            | cond [1]    | 0  | 0  | 1     | c  | рс         | ode           | •  | s  |    | Rn    |    |    | F   | Rd |      |     | rot  | ate  |      |     | im   | me    | dia  | te |    |   |
| Undefined instruction [3]                                | cond [1]    | 0  | 0  | 1     | 1  | 0          | х             | 0  | 0  | x  | хх    | x  | х  | Х   | х  | х    | х   | х    | x >  | ×    | ( ) | ×    | х     | х    | х  | х  | х |
| Move immediate to status register                        | cond [1]    | 0  | 0  | 1     | 1  | 0          | R             | 1  | 0  |    | Mask  |    |    | S   | во |      |     | rot  | ate  |      |     | im   | me    | dia  | te |    |   |
| Load/store immediate offset                              | cond [1]    | 0  | 1  | 0     | Р  | U          | В             | W  | L  |    | Rn    |    |    | R   | d  |      |     |      |      | in   | nm  | edia | te    |      |    |    |   |
| Load/store register offset                               | cond [1]    | 0  | 1  | 1     | Р  | U          | В             | w  | L  |    | Rn    |    |    | R   | d  |      | s   | hift | amou | ınt  | s   | hift | 0     |      | R  | m  |   |
| Undefined instruction                                    | cond [1]    | 0  | 1  | 1     | х  | х          | х             | x  | х  | х  | хх    | x  | х  | Х   | х  | х    | x   | х    | хх   | ×    |     | Х    | 1     | х    | x  | х  | х |
| Undefined instruction [4,7]                              | 1 1 1 1     | 0  | х  | x     | х  | х          | х             | x  | х  | х  | хх    | x  | х  | Х   | х  | х    | х   | х    | x x  | ×    |     | ×    | х     | х    | x  | х  | х |
| Load/store multiple                                      | cond [1]    | 1  | 0  | 0     | Р  | U          | S             | w  | L  |    | Rn    |    |    |     |    |      |     |      | regi | ster | lis | t    |       |      |    |    |   |
| Undefined instruction [4]                                | 1 1 1 1     | 1  | 0  | 0     | х  | х          | х             | x  | х  | х  | хх    | x  | х  | х   | х  | х    | х   | х    | x >  | X    |     | ×    | х     | х    | x  | х  | х |
| Branch and branch with link                              | cond [1]    | 1  | 0  | 1     | L  |            |               |    |    |    |       |    |    |     | 24 | -bit | off | set  |      |      |     |      |       |      |    |    |   |
| Branch and branch with link<br>and change to Thumb [4]   | 1 1 1 1     | 1  | 0  | 1     | н  |            | 24-bit offset |    |    |    |       |    |    |     |    |      |     |      |      |      |     |      |       |      |    |    |   |
| Coprocessor load/store and double register transfers [6] | cond [5]    | 1  | 1  | 0     | Р  | U          | N             | w  | L  |    | Rn    |    |    | С   | Rd |      | C   | p_r  | num  |      |     | 8-   | bit ( | offs | et |    |   |
| Coprocessor data processing                              | cond [5]    | 1  | 1  | 1     | 0  | 0          | рсс           | de | 1  |    | CRn   |    |    | С   | Rd |      | С   | p_r  | num  | op   | ССС | de2  | 0     |      | CF | ₹m |   |
| Coprocessor register transfers                           | cond [5]    | 1  | 1  | 1     | 0  | оро        | cod           | e1 | L  |    | CRn   |    |    | F   | ₹d |      | С   | p_r  | num  | op   | осо | de2  | 1     |      | CF | ₹m |   |
| Software interrupt                                       | cond [1]    | 1  | 1  | 1 1 1 |    | swi number |               |    |    |    |       |    |    |     |    |      |     |      |      |      |     |      |       |      |    |    |   |
| Undefined instruction [4]                                | 1 1 1 1     | 1  | 1  | 1     | 1  | x          | x             | x  | x  | x  | х х   | Х  | х  | х   | x  | х    | x   | x    | хх   | ×    | )   | х    | х     | x    | х  | х  | x |

#### Features of ARM instruction set



- Load-store architecture
- 3-address instructions
- Conditional execution of every instruction
- Possible to load/store multiple registers at once
- Possible to combine shift and ALU operations in a single instruction

#### Instruction set



- Data processing
- Data movement
- Flow control

### Data processing



- They are move, arithmetic, logical, comparison and multiply instructions.
- Most data processing instructions can process one of their operands using the barrel shifter.
- General rules:
  - All operands are 32-bit, coming from registers or literals.
  - The result, if any, is 32-bit and placed in a register (with the exception for long multiply which produces a 64-bit result)
  - 3-address format



#### Instruction set



MOV<cc><S> Rd, <operands>

MOVCS R0, R1 @ if carry is set

@ then R0:=R1

MOVS R0, #0 @ R0:=0

- @ Z=1, N=0
- @ C, V unaffected

#### Conditional execution



 Almost all ARM instructions have a condition field which allows it to be executed conditionally.

movcs R0, R1

| Mnemonic | Condition                     | Mnemonic | Condition                 |
|----------|-------------------------------|----------|---------------------------|
| CS       | Carry $S$ et                  | CC       | Carry Clear               |
| EQ       | Equal (Zero Set)              | NE       | Not $E$ qual (Zero Clear) |
| ٧S       | Overflow $Set$                | VC       | Overflow $C$ lear         |
| GT       | Greater $T$ han               | LT       | Less Than                 |
| GE       | Greater Than or $E$ qual      | LE       | Less Than or $E$ qual     |
| PL       | Plus (Positive)               | MI       | Minus (Negative)          |
| HI       | Higher Than                   | LO       | Lower Than (aka CC)       |
| HS       | Higher or $S$ ame (aka $CS$ ) | LS       | Lower or $S$ ame          |

## Register movement



Syntax: <instruction>{<cond>}{S} Rd, N immediate, register, shift

| MOV | Move a 32-bit value into a register              | Rd = N        |
|-----|--------------------------------------------------|---------------|
| MVN | move the NOT of the 32-bit value into a register | $Rd = \sim N$ |

### Addressing modes



Register operands
 ADD R0, R1, R2

Immediate operands

```
a literal; most can be represented by (0..255)x2<sup>2n</sup> 0<n<12

ADD R3, R3, #1 @ R3:=R3+1

AND R8, R7, #0xff @ R8=R7[7:0]

a hexadecimal literal
This is assembler dependent syntax.
```



 One operand to ALU is routed through the Barrel shifter. Thus, the operand can be modified before it is used. Useful for fast multipliation and dealing with lists, table and other complex data structure. (similar to the displacement addressing • mode in CISC.)



Some instructions (e.g. MUL, CLZ, QADD) do not read barrel shifter.



| Mnemonic | Description            | Shift        | Result                                                    |
|----------|------------------------|--------------|-----------------------------------------------------------|
| LSL      | logical shift left     | xLSL y       | $x \ll y$                                                 |
| LSR      | logical shift right    | xLSR y       | $(unsigned)x \gg y$                                       |
| ASR      | arithmetic right shift | xASR y       | $(signed)x \gg y$                                         |
| ROR      | rotate right           | xROR y       | $((\text{unsigned})x \gg y) \mid (x \ll (32 - y))$        |
| RRX      | rotate right extended  | <i>x</i> RRX | $(c \text{ flag} \ll 31) \mid ((\text{unsigned})x \gg 1)$ |

#### Logical shift left





MOV R0, R2, LSL #2 @ R0:=R2<<2

@ R2 unchanged

Example: 0...0 0011 0000

Before R2=0x0000030

After R0=0x00000C0

 $R2=0\times00000030$ 

## Logical shift right





MOV R0, R2, LSR #2 @ R0:=R2>>2

@ R2 unchanged

Example: 0...0 0011 0000

Before R2=0x0000030

After R0=0x000000C

 $R2=0\times00000030$ 

### Arithmetic shift right





MOV R0, R2, ASR #2 @ R0:=R2>>2

@ R2 unchanged

Example: 1010 0...0 0011 0000

Before R2=0xA0000030

After R0=0xE800000C

R2=0xA0000030

#### Rotate right



MOV R0, R2, ROR #2 @ R0:=R2 rotate

@ R2 unchanged

Example: 0...0 0011 0001

Before R2=0x0000031

After R0=0x400000C

 $R2=0\times00000031$ 

### Rotate right extended















- It is possible to use a register to specify the number of bits to be shifted; only the bottom 8 bits of the register are significant.
  - @ array index calculation
    ADD R0, R1, R2, LSL R3 @ R0:=R1+R2\*2R3

```
@ fast multiply R2=35xR0
ADD R0, R0, R0, LSL #2 @ R0'=5xR0
RSB R2, R0, R0, LSL #3 @ R2 =7xR0'
```

#### Multiplication



```
MOV R1, #35

MUL R2, R0, R1

Or

ADD R0, R0, R0, LSL #2 @ R0'=5xR0

RSB R2, R0, R0, LSL #3 @ R2 =7xR0'
```



| N shift operations                  | Syntax             |
|-------------------------------------|--------------------|
| Immediate                           | #immediate         |
| Register                            | Rm                 |
| Logical shift left by immediate     | Rm, LSL #shift imm |
| Logical shift left by register      | Rm, LSL Rs         |
| Logical shift right by immediate    | Rm, LSR #shift imm |
| Logical shift right with register   | Rm, LSR Rs         |
| Arithmetic shift right by immediate | Rm, ASR #shift imm |
| Arithmetic shift right by register  | Rm, ASR Rs         |
| Rotate right by immediate           | Rm, ROR #shift imm |
| Rotate right by register            | Rm, ROR Rs         |
| Rotate right with extend            | Rm, RRX            |

# Encoding data processing instructions







#### Add and subtraction

Syntax: <instruction>{<cond>}{S} Rd, Rn, N

| ADC | add two 32-bit values and carry                  | Rd = Rn + N + carry         |
|-----|--------------------------------------------------|-----------------------------|
| ADD | add two 32-bit values                            | Rd = Rn + N                 |
| RSB | reverse subtract of two 32-bit values            | Rd = N - Rn                 |
| RSC | reverse subtract with carry of two 32-bit values | Rd = N - Rn - !(carry flag) |
| SBC | subtract with carry of two 32-bit values         | Rd = Rn - N - !(carry flag) |
| SUB | subtract two 32-bit values                       | Rd = Rn - N                 |



- ADD R0, R1, R2 @ R0 = R1+R2
- SUB R0, R1, R2 @ R0 = R1-R2
- SBC R0, R1, R2 @ R0 = R1-R2-!C
- RSB R0, R1, R2 @ R0 = R2-R1
- RSC R0, R1, R2 @ R0 = R2-R1-!C

- ADC R0, R1, R2 @ R0 = R1+R2+C



$$3-5=3+(-5) \rightarrow sum <= 255 \rightarrow C=0 \rightarrow borrow$$

$$5-3=5+(-3) \rightarrow \text{sum} > 255 \rightarrow C=1 \rightarrow \text{no} \text{ borrow}$$





```
PRE cpsr = nzcvqiFt USER
       r1 = 0x00000001
      SUBS r1, r1, #1
      cpsr = nZCvqiFt USER
POST
      r1 = 0x00000000
PRE
   r0 = 0x00000000
      r1 = 0x00000005
      ADD r0, r1, r1, LSL #1
POST r0 = 0x0000000f
      r1 = 0x00000005
```

## Setting the condition codes



 Any data processing instruction can set the condition codes if the programmers wish it to

64-bit addition

R1 R0

ADDS R2, R2, R0 + R3 R2

ADC R3, R3, R1

R3 R2

# Logical



Syntax: <instruction>{<cond>}{S} Rd, Rn, N

| AND | logical bitwise AND of two 32-bit values  | $Rd = Rn \otimes N$ |
|-----|-------------------------------------------|---------------------|
| ORR | logical bitwise OR of two 32-bit values   | $Rd = Rn \mid N$    |
| EOR | logical exclusive OR of two 32-bit values | $Rd = Rn \wedge N$  |
| BIC | logical bit clear (AND NOT)               | $Rd = Rn \& \sim N$ |

#### Logical



```
• AND R0, R1, R2 @ R0 = R1 and R2
```

• EOR R0, R1, R2 @ R0 = R1 
$$\times$$
 R2

• BIC R0, R1, R2 @ R0 = R1 and (
$$\sim$$
R2)

bit clear: **R2** is a mask identifying which bits of **R1** will be cleared to zero

$$R1=0x111111111$$
  $R2=0x01100101$ 

BIC R0, R1, R2

$$R0=0\times10011010$$

### Logical



```
PRE r0 = 0x00000000
      r1 = 0x02040608
      r2 = 0x10305070
      ORR r0, r1, r2
POST r0 = 0x12345678
PRE r1 = 0b1111
      r2 = 0b0101
      BIC r0, r1, r2
POST r0 = 0b1010
```

### Comparison



 These instructions do not generate a result, but set condition code bits (N, Z, C, V) in CPSR.
 Often, a branch operation follows to change the program flow.

Syntax: <instruction>{<cond>} Rn, N

| CMN | compare negated                        | flags set as a result of $Rn + N$      |
|-----|----------------------------------------|----------------------------------------|
| CMP | compare                                | flags set as a result of $Rn - N$      |
| TEQ | test for equality of two 32-bit values | flags set as a result of $Rn \wedge N$ |
| TST | test bits of a 32-bit value            | flags set as a result of Rn & N        |

#### Comparison



#### compare

- CMP R1, R2 @ set cc on R1-R2

#### compare negated

- CMN R1, R2 @ set cc on R1+R2

#### bit test

- TST R1, R2 @ set cc on R1 and R2

#### test equal

- TEQ R1, R2 @ set cc on R1 xor R2

# Comparison



```
PRE     cpsr = nzcvqiFt_USER
    r0 = 4
    r9 = 4

CMP    r0, r9

POST    cpsr = nZcvqiFt_USER
```



Syntax: MLA{<cond>}{S} Rd, Rm, Rs, Rn

MUL{<cond>}{S} Rd, Rm, Rs

| MLA | multiply and accumulate | $Rd = (Rm^*Rs) + Rn$ |
|-----|-------------------------|----------------------|
| MUL | multiply                | $Rd = Rm^*Rs$        |

Syntax: <instruction>{<cond>}{S} RdLo, RdHi, Rm, Rs

| SMLAL | signed multiply accumulate long   | [RdHi, RdLo] = [RdHi, RdLo] + (Rm*Rs) |
|-------|-----------------------------------|---------------------------------------|
| SMULL | signed multiply long              | [RdHi, RdLo] = Rm*Rs                  |
| UMLAL | unsigned multiply accumulate long | [RdHi, RdLo] = [RdHi, RdLo] + (Rm*Rs) |
| UMULL | unsigned multiply long            | [RdHi, RdLo] = Rm*Rs                  |



• MUL R0, R1, R2 @ R0 =  $(R1xR2)_{[31:0]}$ 

#### Features:

- Second operand can't be immediate
- The result register must be different from the first operand
- Cycles depends on core type
- If S bit is set, C flag is meaningless
- See the reference manual (4.1.33)



Multiply-accumulate (2D array indexing)

```
MLA R4, R3, R2, R1 @ R4 = R3xR2+R1
```

 Multiply with a constant can often be more efficiently implemented using shifted register operand

```
MOV R1, #35
MUL R2, R0, R1

Or

ADD R0, R0, R0, LSL #2 @ R0'=5xR0
RSB R2, R0, R0, LSL #3 @ R2 =7xR0'
```



```
PRE r0 = 0x000000000

r1 = 0x00000002

r2 = 0x00000002

MUL r0, r1, r2 ; r0 = r1*r2

POST r0 = 0x00000004

r1 = 0x00000002

r2 = 0x00000002
```



```
PRE r0 = 0x00000000

r1 = 0x00000000

r2 = 0xf0000002

r3 = 0x00000002

UMULL r0, r1, r2, r3 ; [r1,r0] = r2*r3

POST r0 = 0xe0000004 ; = RdLo

r1 = 0x00000001 ; = RdHi
```

#### Flow control instructions



#### Determine the instruction to be executed next

```
Syntax: B{<cond>} label
BL{<cond>} label
BX{<cond>} Rm
BLX{<cond>} label | Rm
```

| В   | branch                    | pc = label pc-relative offset within 32MB                                                                    |
|-----|---------------------------|--------------------------------------------------------------------------------------------------------------|
| BL  | branch with link          | pc = label $lr = address$ of the next instruction after the BL                                               |
| ВХ  | branch exchange           | pc = Rm & Oxfffffffe, T = Rm & 1                                                                             |
| BLX | branch exchange with link | pc = label, $T = 1pc = Rm$ & Oxffffffffe, $T = Rm$ & 1<br>lr = address of the next instruction after the BLX |

#### Flow control instructions



Branch instruction

B label

•••

label: ...

Conditional branches

MOV R0, #0

loop: ...

ADD R0, R0, #1

CMP R0, #10

BNE loop

### **Branch conditions**



| Mnemonic | Name                              | Condition flags   |
|----------|-----------------------------------|-------------------|
| EQ       | equal                             | $\overline{Z}$    |
| NE       | not equal                         | Z                 |
| CS HS    | carry set/unsigned higher or same | C                 |
| CC LO    | carry clear/unsigned lower        | С                 |
| MI       | minus/negative                    | N                 |
| PL       | plus/positive or zero             | п                 |
| VS       | overflow                          | V                 |
| VC       | no overflow                       | ν                 |
| HI       | unsigned higher                   | zC                |
| LS       | unsigned lower or same            | Z or $c$          |
| GE       | signed greater than or equal      | NV or nv          |
| LT       | signed less than                  | Nv or $nV$        |
| GT       | signed greater than               | NzV or nzv        |
| LE       | signed less than or equal         | Z or $Nv$ or $nV$ |
| AL       | always (unconditional)            | ignored           |

### **Branches**



| Branch | Interpretation   | Normal uses                                       |
|--------|------------------|---------------------------------------------------|
| B BAL  | Unconditional    | Always take this branch                           |
|        | Always           | Always take this branch                           |
| BEQ    | Equal            | Comparison equal or zero result                   |
| BNE    | Not equal        | Comparison not equal or non-zero result           |
| BPL    | Plus             | Result positive or zero                           |
| BMI    | Minus            | Result minus or negative                          |
| BCC    | Carry clear      | Arithmetic operation did not give carry-out       |
| BLO    | Lower            | Unsigned comparison gave lower                    |
| BCS    | Carry set Higher | Arithmetic operation gave carry-out               |
| BHS    | or same          | Unsigned comparison gave higher or same           |
| BVC    | Overflow clear   | Signed integer operation; no overflow occurred    |
| BVS    | Overflow set     | Signed integer operation; overflow occurred       |
| BGT    | Greater than     | Signed integer comparison gave greater than       |
| BGE    | Greater or equal | Signed integer comparison gave greater or equal   |
| BLT    | Less than        | Signed integer comparison gave less than          |
| BLE    | Less or equal    | Signed integer comparison gave less than or equal |
| BHI    | Higher           | Unsigned comparison gave higher                   |
| BLS    | Lower or same    | Unsigned comparison gave lower or same            |

#### Branch and link



BL instruction save the return address to R14
 (Ir)

```
BL sub @ call sub

CMP R1, #5 @ return to here

MOVEQ R1, #0

...

sub: ... @ sub entry point

...

MOV PC, LR @ return
```

#### Branch and link



BL sub

sub1 @ call sub1

•••

use stack to save/restore the return address and registers

```
sub1: STMFD R13!, {R0-R2,R14}
```

BL sub2

•••

LDMFD R13!, {R0-R2,PC}

sub2: ...

•••

MOV PC, LR

#### Conditional execution



```
CMP R0, #5
BEQ bypass @ if (R0!=5) {
   ADD R1, R1, R0 @ R1=R1+R0-R2
   SUB R1, R1, R2 @ }
bypass: ...
CMP R0, #5 smaller and faster
ADDNE R1, R1, R0
```

Rule of thumb: if the conditional sequence is three instructions or less, it is better to use conditional execution than a branch.

SUBNE R1, R1, R2

#### Conditional execution



```
if ((R0==R1) && (R2==R3)) R4++
```

CMP RO, R1

BNE skip

CMP R2, R3

BNE skip

ADD R4, R4, #1

skip: ...

CMP R0, R1

CMPEQ R2, R3

ADDEQ R4, R4, #1

#### Data transfer instructions



- Move data between registers and memory
- Three basic forms
  - Single register load/store
  - Multiple register load/store
  - Single register swap: **SWP(B)**, atomic instruction for semaphore

## Single register load/store



Syntax: <LDR|STR>{<cond>}{B} Rd,addressing<sup>1</sup>

 $LDR{<cond>}SB|H|SH~Rd$ , addressing<sup>2</sup>

STR{<cond>}H Rd, addressing<sup>2</sup>

| LDR  | load word into a register         | Rd <- mem32[address] |
|------|-----------------------------------|----------------------|
| STR  | save byte or word from a register | Rd -> mem32[address] |
| LDRB | load byte into a register         | Rd <- mem8[address]  |
| STRB | save byte from a register         | Rd -> mem8[address]  |

## Single register load/store



| LDRH  | load halfword into a register        | Rd <- mem16[address]                 |
|-------|--------------------------------------|--------------------------------------|
| STRH  | save halfword into a register        | Rd -> mem16[address]                 |
| LDRSB | load signed byte into a register     | Rd <- SignExtend<br>(mem8[address])  |
| LDRSH | load signed halfword into a register | Rd <- SignExtend<br>(mem16[address]) |

No strsb/strsh since strb/strh stores both signed/unsigned ones

## Single register load/store



The data items can be a 8-bit byte, 16-bit half-word or 32-bit word. Addresses must be boundary aligned. (e.g. 4's multiple for LDR/STR)

```
LDR R0, [R1] @ R0 := mem_{32}[R1]
STR R0, [R1] @ mem_{32}[R1] := R0
```

```
LDR, LDRH, LDRB for 32, 16, 8 bits STR, STRH, STRB for 32, 16, 8 bits
```

## Addressing modes



Memory is addressed by a register and an offset.
 LDR R0, [R1] @ mem[R1]

- Three ways to specify offsets:
  - Immediate

```
LDR R0, [R1, #4] @ mem[R1+4]
```

- Register

```
LDR R0, [R1, R2] @ mem[R1+R2]
```

- Scaled register @ mem[R1+4\*R2]
LDR R0, [R1, R2, LSL #2]

## Addressing modes



- Pre-index addressing (LDR R0, [R1, #4])
   without a writeback
- Auto-indexing addressing (LDR RO, [R1, #4]!)
   Pre-index with writeback
   calculation before accessing with a writeback
- Post-index addressing (LDR RO, [R1], #4)
   calculation after accessing with a writeback

| Index method            | Data               | Base address<br>register | Example         |
|-------------------------|--------------------|--------------------------|-----------------|
| Preindex with writeback | mem[base + offset] | base + offset            | LDR r0,[r1,#4]! |
| Preindex                | mem[base + offset] | not updated              | LDR r0,[r1,#4]  |
| Postindex               | mem[base]          | base + offset            | LDR r0,[r1],#4  |

## Pre-index addressing



LDR R0, [R1, #4] @ R0=mem[R1+4]

- @ R1 unchanged



## Auto-indexing addressing



```
LDR R0, [R1, #4]! @ R0=mem[R1+4]
```

@ R1=R1+4

No extra time; Fast;



### Post-index addressing



LDR R0, R1, #4 @ R0=mem[R1]

@ R1=R1+4



### Comparisons



Pre-indexed addressing

```
LDR R0, [R1, R2] @ R0=mem[R1+R2]
@ R1 unchanged
```

Auto-indexing addressing

```
LDR R0, [R1, R2]! @ R0=mem[R1+R2]
@ R1=R1+R2
```

Post-indexed addressing

```
LDR R0, [R1], R2 @ R0=mem[R1]
@ R1=R1+R2
```

### Example



```
PRE r0 = 0x000000000

r1 = 0x00090000

mem32[0x00009000] = 0x01010101

mem32[0x00009004] = 0x02020202

LDR r0, [r1, #4]!
```

Preindexing with writeback:

**POST(1)** 
$$r0 = 0x02020202$$
  $r1 = 0x00009004$ 

### Example



```
PRE r0 = 0x000000000

r1 = 0x00090000

mem32[0x00009000] = 0x01010101

mem32[0x00009004] = 0x02020202

LDR r0, [r1, #4]
```

#### Preindexing:

**POST(2)** 
$$r0 = 0x02020202$$
  $r1 = 0x00009000$ 

### Example



```
PRE r0 = 0x000000000

r1 = 0x00090000

mem32[0x00009000] = 0x01010101

mem32[0x00009004] = 0x02020202

LDR r0, [r1], #4
```

#### Postindexing:

**POST(3)** 
$$r0 = 0x01010101$$
  $r1 = 0x00009004$ 



Syntax: <LDR|STR>{<cond>}{B} Rd,addressing<sup>1</sup> LDR{<cond>}SB|H|SH Rd, addressing<sup>2</sup> STR{<cond>}H Rd, addressing<sup>2</sup>

| Addressing <sup>1</sup> mode and index method  | Addressing <sup>1</sup> syntax            |
|------------------------------------------------|-------------------------------------------|
| Preindex with immediate offset                 | [Rn, #+/-offset 12]                       |
| Preindex with register offset                  | [Rn, +/-Rm]                               |
| Preindex with scaled register offset           | <pre>[Rn, +/-Rm, shift #shift_imm]</pre>  |
| Preindex writeback with immediate offset       | [Rn, #+/-offset 12]!                      |
| Preindex writeback with register offset        | [Rn, +/-Rm]!                              |
| Preindex writeback with scaled register offset | <pre>[Rn, +/-Rm, shift #shift imm]!</pre> |
| Immediate postindexed                          | [Rn], #+/-offset_12                       |
| Register postindex                             | [Rn], +/-Rm                               |
| Scaled register postindex                      | [Rn], +/-Rm, shift #shift_imm             |



|                               | Instruction                                                                      | r0 =                                                                    | r1 + =                                           |
|-------------------------------|----------------------------------------------------------------------------------|-------------------------------------------------------------------------|--------------------------------------------------|
| Preindex<br>with<br>writeback | LDR r0,[r1,#0x4]!                                                                | mem32[r1+0x4]                                                           | 0x4                                              |
| Preindex                      | LDR r0,[r1,r2]! LDR r0,[r1,r2,LSR#0x4]! LDR r0,[r1,#0x4] LDR r0,[r1,r2]          | mem32[r1+r2]<br>mem32[r1+(r2 LSR 0x4)]<br>mem32[r1+0x4]<br>mem32[r1+r2] | r2<br>(r2 LSR 0x4)<br>not updated<br>not updated |
| Postindex                     | LDR r0,[r1,-r2,LSR #0x4] LDR r0,[r1],#0x4 LDR r0,[r1],r2 LDR r0,[r1],r2,LSR #0x4 | mem32[r1-(r2 LSR 0x4)] mem32[r1] mem32[r1] mem32[r1]                    | not updated<br>0x4<br>r2<br>(r2 LSR 0x4)         |



Syntax: <LDR|STR>{<cond>}{B} Rd,addressing<sup>1</sup> LDR{<cond>}SB|H|SH Rd, addressing<sup>2</sup> STR{<cond>}H Rd, addressing<sup>2</sup>

| Addressing <sup>2</sup> mode and index method | Addressing <sup>2</sup> syntax |
|-----------------------------------------------|--------------------------------|
| Preindex immediate offset                     | [Rn, #+/-offset_8]             |
| Preindex register offset                      | [Rn, +/-Rm]                    |
| Preindex writeback immediate offset           | [Rn, #+/-offset_8]!            |
| Preindex writeback register offset            | [Rn, +/-Rm]!                   |
| Immediate postindexed                         | [Rn], #+/-offset 8             |
| Register postindexed                          | [Rn], +/-Rm                    |



|                            | Instruction                          | Result                              | r1 + =                     |
|----------------------------|--------------------------------------|-------------------------------------|----------------------------|
| Preindex with<br>writeback | STRH r0,[r1,#0x4]!                   | mem16[r1+0x4]=r0                    | 0x4                        |
|                            | STRH r0,[r1,r2]!                     | mem16[r1+r2]=r0                     | r2                         |
| Preindex                   | STRH r0,[r1,#0x4]<br>STRH r0,[r1,r2] | mem16[r1+0x4]=r0<br>mem16[r1+r2]=r0 | not updated<br>not updated |
| Postindex                  | STRH r0,[r1],#0x4<br>STRH r0,[r1],r2 | mem16[r1]=r0<br>mem16[r1]=r0        | 0x4<br>r2                  |

## Load an address into a register



 Note that all addressing modes are registeroffseted. Can we issue LDR RO, Table? The pseudo instruction ADR loads a register with an address

table: .word 10

ADR RO, table

 Assembler transfer pseudo instruction into a sequence of appropriate instructions

sub r0, pc, #12

### **Application**



@ operations on R0

•••

## Multiple register load/store



- Transfer a block of data more efficiently.
- Used for procedure entry and exit for saving and restoring workspace registers and the return address
- For ARM7, 2+Nt cycles (N:#words, t:time for a word for sequential access). Increase interrupt latency since it can't be interrupted.

registers are arranged an in increasing order; see manual

```
LDMIA R1, {R0, R2, R5} @ R0 = mem[R1]

@ R2 = mem[r1+4]

@ R5 = mem[r1+8]
```

## Multiple load/store register



LDM load multiple registers

STM store multiple registers

suffix meaning

IA increase after

IB increase before

DA decrease after

DB decrease before

## Addressing modes



Syntax: <LDM|STM>{<cond>}<addressing mode> Rn{!},<registers>{^}

| Addressing<br>mode   | Description                                                                | Start address                         | End address                           | Rn!                                       |
|----------------------|----------------------------------------------------------------------------|---------------------------------------|---------------------------------------|-------------------------------------------|
| IA<br>IB<br>DA<br>DB | increment after<br>increment before<br>decrement after<br>decrement before | Rn $Rn + 4$ $Rn - 4*N + 4$ $Rn - 4*N$ | Rn + 4*N - 4 $Rn + 4*N$ $Rn$ $Rn - 4$ | Rn + 4*N $Rn + 4*N$ $Rn - 4*N$ $Rn - 4*N$ |



```
LDM<mode> Rn, {<registers>}
IA: addr:=Rn
IB: addr:=Rn+4
DA: addr:=Rn-#<registers>*4+4
DB: addr:=Rn-#<registers>*4
For each Ri in <registers>
  IB: addr:=addr+4
  DB: addr:=addr-4
  Ri:=M[addr]
  IA: addr:=addr+4
                                 Rn
                                           R1
  DA: addr:=addr-4
<!>: Rn:=addr
                                           R2
                                           R3
```



```
LDM<mode> Rn, {<registers>}
IA: addr:=Rn
IB: addr:=Rn+4
DA: addr:=Rn-#<registers>*4+4
DB: addr:=Rn-#<registers>*4
For each Ri in <registers>
  IB: addr:=addr+4
  DB: addr:=addr-4
  Ri:=M[addr]
  IA: addr:=addr+4
                                 Rn
  DA: addr:=addr-4
<!>: Rn:=addr
                                          R1
                                           R2
                                          R3
```



```
LDM<mode> Rn, {<registers>}
IA: addr:=Rn
IB: addr:=Rn+4
DA: addr:=Rn-#<registers>*4+4
DB: addr:=Rn-#<registers>*4
For each Ri in <registers>
                                           R1
  IB: addr:=addr+4
  DB: addr:=addr-4
                                           R2
  Ri:=M[addr]
                                          R3
  IA: addr:=addr+4
                                 Rn
  DA: addr:=addr-4
<!>: Rn:=addr
```



```
LDM<mode> Rn, {<registers>}
IA: addr:=Rn
IB: addr:=Rn+4
DA: addr:=Rn-#<registers>*4+4
DB: addr:=Rn-#<registers>*4
                                           R1
For each Ri in <registers>
                                           R2
  IB: addr:=addr+4
  DB: addr:=addr-4
                                           R3
  Ri:=M[addr]
  IA: addr:=addr+4
                                 Rn
  DA: addr:=addr-4
<!>: Rn:=addr
```



LDMIA R0, {R1,R2,R3}

or

LDMIA R0, {R1-R3}

R1: 10

R2: 20

R3: 30

R0: 0x10





LDMIA RO!, {R1,R2,R3}

R1: 10

R2: 20

R3: 30

R0: 0x01C





LDMIB RO!, {R1,R2,R3}

R1: 20

R2: 30

R3: 40

R0: 0x01C





LDMDA RO!, {R1,R2,R3}

R1: 40

R2: 50

R3: 60

R0: 0x018

| addr  | data |
|-------|------|
| 0x010 | 10   |
| 0x014 | 20   |
| 0x018 | 30   |
| 0x01C | 40   |
| 0x020 | 50   |
| 0x024 | 60   |

R0



LDMDB R0!, {R1,R2,R3}

R1: 30

R2: 40

R3: 50

R0: 0x018

| addr  | data |
|-------|------|
| 0x010 | 10   |
| 0x014 | 20   |
| 0x018 | 30   |
| 0x01C | 40   |
| 0x020 | 50   |
| 0x024 | 60   |

R0



```
PRE mem32[0x80018] = 0x03
mem32[0x80014] = 0x02
mem32[0x80010] = 0x01
r0 = 0x00080010
r1 = 0x00000000
r2 = 0x00000000
r3 = 0x00000000
```



|                                        | Memory  |            |                 |
|----------------------------------------|---------|------------|-----------------|
| Address pointer                        | address | Data       |                 |
|                                        | 0x80020 | 0x00000005 |                 |
|                                        | 0x8001c | 0x00000004 |                 |
|                                        | 0x80018 | 0x00000003 | r3 = 0x00000000 |
|                                        | 0x80014 | 0x00000002 | r2 = 0x00000000 |
| $r\theta = 0$ x80010 $\longrightarrow$ | 0x80010 | 0x00000001 | r1 = 0x00000000 |
|                                        | 0x8000c | 0x00000000 |                 |

### LDMIA r0!, {r1-r3}

|                                        | wiemory |            |                 |
|----------------------------------------|---------|------------|-----------------|
| Address pointer                        | address | Data       |                 |
|                                        | 0x80020 | 0x00000005 |                 |
| $r\theta = 0$ x8001c $\longrightarrow$ | 0x8001c | 0x00000004 |                 |
|                                        | 0x80018 | 0x00000003 | r3 = 0x00000003 |
|                                        | 0x80014 | 0x00000002 | r2 = 0x00000002 |
|                                        | 0x80010 | 0x00000001 | r1 = 0x00000001 |
|                                        | 0x8000c | 0x00000000 |                 |

Momory



|                                        | Memory  |            |                  |
|----------------------------------------|---------|------------|------------------|
| Address pointer                        | address | Data       |                  |
|                                        | 0x80020 | 0x00000005 |                  |
|                                        | 0x8001c | 0x00000004 |                  |
|                                        | 0x80018 | 0x00000003 | r3 = 0x000000000 |
|                                        | 0x80014 | 0x00000002 | r2 = 0x000000000 |
| $r\theta = 0$ x80010 $\longrightarrow$ | 0x80010 | 0x00000001 | r1 = 0x000000000 |
|                                        | 0x8000c | 0x00000000 |                  |

### LDMIB r0!, {r1-r3}

| Address pointer                        | Memory address | Data       |                 |
|----------------------------------------|----------------|------------|-----------------|
|                                        | 0x80020        | 0x00000005 |                 |
| $r\theta = 0$ x8001c $\longrightarrow$ | 0x8001c        | 0x00000004 | r3 = 0x00000000 |
|                                        | 0x80018        | 0x00000003 | r2 = 0x00000000 |
|                                        | 0x80014        | 0x00000002 | r1 = 0x00000000 |
|                                        | 0x80010        | 0x00000001 |                 |
|                                        | 0x8000c        | 0x00000000 |                 |

## **Application**



- Copy a block of memory
  - R9: address of the source
  - R10: address of the destination
  - R11: end address of the source

High memory

r11

Source

STMIA R10!, {R0-R7}

CMP R9, R11

BNE loop

P10

Low memory

Destination

Low memory

## **Application**



 Stack (full: pointing to the last used; ascending: grow towards increasing memory addresses)

| mode                  | POP   | =LDM  | PUSH  | =STM  |
|-----------------------|-------|-------|-------|-------|
| Full ascending (FA)   | LDMFA | LDMDA | STMFA | STMIB |
| Full descending (FD)  | LDMFD | LDMIA | STMFD | STMDB |
| Empty ascending (EA)  | LDMEA | LDMDB | STMEA | STMIA |
| Empty descending (ED) | LDMED | LDMIB | STMED | STMDA |

```
LDMFD R13!, {R2-R9} @ used for ATPCS ... @ modify R2-R9 STMFD R13!, {R2-R9}
```



| PRE             | Address            | Data                      |
|-----------------|--------------------|---------------------------|
|                 | 0x80018            | 0x00000001                |
| sp →            | 0x80014            | 0x00000002                |
|                 | 0x80010            | Empty                     |
|                 | 0x8000c            | Empty                     |
|                 |                    |                           |
| STMFD sp!,      | {r1,r4}            |                           |
| STMFD sp!, POST | {r1,r4}<br>Address | Data                      |
| ' '             |                    | <b>Data</b><br>0x00000001 |
| ' '             | Address            |                           |
| ' '             | Address<br>0x80018 | 0x00000001                |

## Swap instruction



 Swap between memory and register. Atomic operation preventing any other instruction from reading/writing to that location until it completes

Syntax: SWP{B}{<cond>} Rd,Rm,[Rn]

| SWP  | swap a word between memory and a register | tmp = mem32[Rn]<br>mem32[Rn] = Rm<br>Rd = tmp |
|------|-------------------------------------------|-----------------------------------------------|
| SWPB | swap a byte between memory and a register | tmp = mem8[Rn]<br>mem8[Rn] = Rm<br>Rd = tmp   |



```
mem32[0x9000] = 0x12345678
PRE
       r0 = 0x00000000
       r1 = 0x11112222
       r2 = 0x00009000
       SWP r0, r1, [r2]
      mem32[0x9000] = 0x11112222
POST
       r0 = 0x12345678
       r1 = 0x11112222
       r2 = 0x00009000
```

## **Application**



```
spin
        MOV r1, =semaphore
        MOV
            r2, #1
        SWP r3, r2, [r1]; hold the bus until complete
        CMP
            r3, #1
        BEQ
                spin
                                        OS
                      Process A
                                                    Process B
                   While (1) {
                                                While (1) {
                                       <del>S=0/1</del>
                                                   if (s==0) {
                     if (s==0)
                       s=1;
                                                    s=1;
                   // use the
                                                   yse the
                                                 // resource
                   // resource
```

## Software interrupt



 A software interrupt instruction causes a software interrupt exception, which provides a mechanism for applications to call OS routines.

Syntax: SWI{<cond>} SWI number

| SWI | software interrupt | $lr\_svc$ = address of instruction following the SWI                            |
|-----|--------------------|---------------------------------------------------------------------------------|
|     |                    | $spsr\_svc = cpsr$ $pc = vectors + 0x8$ $cpsr mode = SVC$                       |
|     |                    | $cpsr\ \text{Mode} = 8 \text{ V C}$ $cpsr\ I = 1\ (\text{mask IRQ interrupts})$ |



```
PRE
      cpsr = nzcVqift USER
       pc = 0x00008000
       1r = 0x003ffffff; 1r = r14
       r0 = 0x12
     0x00008000 SWI 0x123456
POST
      cpsr = nzcVqIft SVC
       spsr = nzcVqift USER
       pc = 0x00000008
       1r = 0x00008004
       r0 = 0x12
```

#### Load constants



 No ARM instruction loads a 32-bit constant into a register because ARM instructions are 32-bit long. There is a pseudo code for this.

Syntax: LDR Rd, =constant ADR Rd, label

| LDR | load constant pseudoinstruction | Rd = 32-bit constant         |
|-----|---------------------------------|------------------------------|
| ADR | load address pseudoinstruction  | Rd = 32-bit relative address |

### Immediate numbers





### Load constants



 Assemblers implement this usually with two options depending on the number you try to

| load. | Pseudoinstruction                   | Actual instruction                        |
|-------|-------------------------------------|-------------------------------------------|
|       | LDR r0, =0xff<br>LDR r0, =0x5555555 | MOV r0, #0xff<br>LDR r0, [pc, #offset_12] |

## Loading the constant 0xff00ffff

```
LDR r0, [pc, #constant_number-8-{PC}]
:
constant_number
DCD 0xff00ffff

MVN r0, #0x00ff0000
```

#### Load constants



- Assume that you want to load 511 into R0
  - Construct in multiple instructions

```
mov r0, #256 add r0, #255
```

- Load from memory; declare L511 .word 511
  ldr r0, L511 ldr r0, [pc, #0]
- Guideline: if you can construct it in two instructions, do it; otherwise, load it.
- The assembler decides for you

```
ldr r0, =255 \longrightarrow mov r0, 255
ldr r0, =511 \longrightarrow ldr r0, [pc, #4]
```

### PC-relative modes





## PC-relative addressing



```
main:
      MOV R0, #0
      ADR R1, a @ add r1, pc, #4
      STR R0, [R1]
PC SWI #11
     .word 100
a:
      .end
     fetch
           decode
                   exec
                  decode
            fetch
                         exec
                  fetch
                         decode
                                exec
```

## Instruction set



| Operation                        |                               | Operation      |                             |
|----------------------------------|-------------------------------|----------------|-----------------------------|
| Mnemonic                         | Meaning                       | Mnemonic       | Meaning                     |
| ADC                              | Add with Carry                | MVN            | Logical NOT                 |
| ADD                              | $\operatorname{Add}$          | $\mathtt{ORR}$ | Logical OR                  |
| AND                              | Logical AND                   | RSB            | Reverse Subtract            |
| BAL                              | Unconditional Branch          | RSC            | Reverse Subtract with Carry |
| ${\tt B}\langle  cc  \rangle$    | Branch on Condition           | SBC            | Subtract with Carry         |
| BIC                              | Bit Clear                     | SMLAL          | Mult Accum Signed Long      |
| BLAL                             | Unconditional Branch and Link | SMULL          | Multiply Signed Long        |
| $\mathtt{BL}\langle  cc \rangle$ | Conditional Branch and Link   | STM            | Store Multiple              |
| CMP                              | Compare                       | STR            | Store Register (Word)       |
| EOR                              | Exclusive OR                  | STRB           | Store Register (Byte)       |
| LDM                              | Load Multiple                 | SUB            | Subtract                    |
| LDR                              | Load Register (Word)          | SWI            | Software Interrupt          |
| LDRB                             | Load Register (Byte)          | SWP            | Swap Word Value             |
| MLA                              | Multiply Accumulate           | SWPB           | Swap Byte Value             |
| VOM                              | Move                          | TEQ            | Test Equivalence            |
| MRS                              | Load SPSR or CPSR             | TST            | Test                        |
| MSR                              | Store to SPSR or CPSR         | UMLAL          | Mult Accum Unsigned Long    |
| MUL                              | Multiply                      | UMULL          | Multiply Unsigned Long      |