## Goals for today

- The little button that wouldn't :(
  - the volatile keyword
- Pointer operations => ARM addressing modes
- Implementation of C function calls
- Management of runtime stack, register use









button.c

The little button that wouldn't



Want cool diagrams like this? Check out <u>fritzing.org</u>



```
// This program waits until a button is pressed (GPIO 10)
// and turns on GPIO 20, then waits until the button is
//released and turns off GPIO 20
unsigned int * const FSEL1 = (unsigned int *)0x20200004;
unsigned int * const FSEL2 = (unsigned int *)0x20200008;
unsigned int * const SETO = (unsigned int *)0x2020001C;
unsigned int * const CLR0 = (unsigned int *)0x20200028;
unsigned int * const LEVO = (unsigned int *)0x20200034;
void main(void)
    *FSEL1 = 0; // configure GPIO 10 as input
    *FSEL2 = 1; // configure GPIO 20 as output
   while (1) {
        // wait until GPIO 10 is low (button press)
       while ((*LEV0 & (1 << 10)) != 0);
        // set GPIO 20 high
        *SET0 = 1 << 20;
        // wait until GPIO 10 is high (button release)
       while ((*LEV0 & (1 << 10)) == 0);
        // clear GPIO 20
        *CLR0 = 1 << 20;
    }
```

```
// This program waits until a button is pressed (GPIO 10)
// and turns on GPIO 20, then waits until the button is
//released and turns off GPIO 20
unsigned int * const FSEL1 = (unsigned int *)0x20200004;
unsigned int * const FSEL2 = (unsigned int *)0x20200008;
unsigned int * const SETO = (unsigned int *)0x2020001C;
unsigned int * const CLRO = (unsigned int *)0x20200028;
unsigned int * const LEV0 = (unsigned int *)0x20200034;
void main(void)
    *FSEL1 = 0; // configure GPIO 10 as input
    *FSEL2 = 1; // configure GPIO 20 as output
    while (1) {
        // wait until GPIO 10 is low (button press)
        while ((*LEV0 & (1 << 10)) != 0);
        // set GPIO 20 high
        *SET0 = 1 << 20;
        // wait until GPIO 10 is high (button release)
        while ((*LEV0 & (1 << 10)) == 0);
        // clear GPIO 20
        *CLR0 = 1 << 20;
```

### Compiling with -02:

```
Disassembly of section .text.startup:
00000000 <main>:
         ldr r3, [pc, #28] ; 24 <main+0x24>
   0:
         ldr r0, [r3, #52]; 0x34
   4:
         mov r1, #0
   8:
         mov r2, #1
   C:
         tst r0, #1024; 0x400
  10:
  14:
         stmib
                  r3, {r1, r2}
         bne 20 <main+0x20>
  18:
              1c < main + 0x1c >
         b
  1c:
              20 < main + 0x20 >
  20:
         b
  24:
                   0x20200000
         .word
```

```
// This program waits until a button is pressed (GPIO 10)
// and turns on GPIO 20, then waits until the button is
//released and turns off GPIO 20
unsigned int * const FSEL1 = (unsigned int *)0x20200004;
unsigned int * const FSEL2 = (unsigned int *)0x20200008;
unsigned int * const SETO = (unsigned int *)0x2020001C;
unsigned int * const CLRO = (unsigned int *)0x20200028;
unsigned int * const LEV0 = (unsigned int *)0x20200034;
void main(void)
    *FSEL1 = 0; // configure GPIO 10 as input
    *FSEL2 = 1; // configure GPIO 20 as output
    while (1) {
        // wait until GPIO 10 is low (button press)
        while ((*LEV0 & (1 << 10)) != 0);
        // set GPIO 20 high
        *SET0 = 1 << 20;
        // wait until GPIO 10 is high (button release)
        while ((*LEV0 & (1 << 10)) == 0);
        // clear GPIO 20
        *CLR0 = 1 << 20;
```

### Compiling with -02:

```
Disassembly of section .text.startup:
00000000 <main>:
         ldr r3, [pc, #28] ; 24 <main+0x24>
   0:
         ldr r0, [r3, #52]; 0x34
         mov r1, #0
   8:
         mov r2, #1
   C:
         tst r0, #1024; 0x400
  10:
                  r3, {r1, r2}
  14:
         stmib
         bne 20 < main + 0 \times 20 >
  18:
              1c <main+0x1c>
         b
  1c:
             20 <main+0x20>
  20:
         b
                  0x20200000
  24:
         .word
```

```
// This program waits until a button is pressed (GPIO 10)
// and turns on GPIO 20, then waits until the button is
//released and turns off GPIO 20
unsigned int * const FSEL1 = (unsigned int *)0x20200004;
unsigned int * const FSEL2 = (unsigned int *)0x20200008;
unsigned int * const SETO = (unsigned int *)0x2020001C;
unsigned int * const CLRO = (unsigned int *)0x20200028;
unsigned int * const LEV0 = (unsigned int *)0x20200034;
void main(void)
    *FSEL1 = 0; // configure GPIO 10 as input
    *FSEL2 = 1; // configure GPIO 20 as output
    while (1) {
        // wait until GPIO 10 is low (button press)
        while ((*LEV0 & (1 << 10)) != 0);
        // set GPIO 20 high
        *SET0 = 1 << 20;
        // wait until GPIO 10 is high (button release)
        while ((*LEV0 & (1 << 10)) == 0);
        // clear GPIO 20
        *CLR0 = 1 << 20;
```

### Compiling with -02:

```
Disassembly of section .text.startup:
00000000 <main>:
         ldr r3, [pc, #28] ; 24 <main+0x24>
   0:
         ldr r0, [r3, #52]; 0x34
         mov r1, #0
   8:
         mov r2, #1
         tst r0, #1024; 0x400
  10:
  14:
         stmib
                  r3, {r1, r2}
         bne 20 <main+0x20>
  18:
         b 1c < main + 0x1c >
  1c:
  20:
         b
              20 < main + 0x20 >
                  0x20200000
  24:
         .word
```

# What happened to our testing loops??

### Peripheral Registers

These registers are mapped into the address space of the processor (memory-mapped IO).

These registers may behave differently than memory.

For example: Writing a I into a bit in a SET register causes I to be output; writing a 0 into a bit in SET register does not affect the output value. Writing a I to the CLR register, sets the output to 0; write a 0 to a clear register has no effect. Neither SET or CLR can be read. To read the current value use the LEV (level) register.

### volatile

For an ordinary variable, the compiler can use its knowledge of when it is read/written to optimize accesses as long as it keeps the same externally visible behavior.

However, for a variable that can be read/written externally (by another process, by peripheral), these optimizations will not be valid.

The **volatile** qualifier applied to a variable informs the compiler that it cannot remove, coalesce, cache, or reorder references. The generated assembly must faithfully execute each access to the variable as given in the C code.

Because we have GPIO pins on the Raspberry Pi, we need to give hints to the C compiler to not optimize out pin reads — they can change externally to the program!

So, we use the volatile keyword in front of hardware addresses to do this:

```
volatile unsigned int * const FSEL1 = (unsigned int *)0x20200004;
volatile unsigned int * const FSEL2 = (unsigned int *)0x20200008;
volatile unsigned int * const SET0 = (unsigned int *)0x2020001C;
volatile unsigned int * const CLR0 = (unsigned int *)0x20200028;
volatile unsigned int * const LEV0 = (unsigned int *)0x20200034;
```

There are other times to use volatile, too — delays have a similar problem:

```
#define DELAY 500000000
int main()
{
   for (int i=0; i < DELAY; i++);
   return 0;
}</pre>
```

```
$ objdump -d testLoop.o

testLoop.o: file format elf32-littlearm

Disassembly of section .text.startup:

00000000 <main>:
    0: e3a00000 movr0, #0
    4: e12fffle bx lr
```

There are other times to use volatile, too — delays have a similar problem:

```
#define DELAY 500000000
int main()
{
   for (int i=0; i < DELAY; i++);
   return 0;
}</pre>
```

```
$ objdump -d testLoop.o

testLoop.o: file format elf32-littlearm

Disassembly of section .text.startup:

00000000 <main>:
    0: e3a00000 movr0, #0
    4: e12fffle bx lr
```

### No loop — it has been optimized out!

There are other times to use volatile, too — delays have a similar problem:

```
#define DELAY 500000000
int main()
{
    for (volatile int i=0; i < DELAY; i++);
    return 0;
}</pre>
```

```
Disassembly of section .text.startup:
00000000 <main>:
  0: e24dd008
                 sub sp, sp, #8
  4: e3a03000
                 mov r3, #0
  8: e58d3004
                 str r3, [sp, #4]
                     r3, [sp, #4]
  c: e59d3004
                 ldr
                     r2, [pc, #40]
                                         ; 40 < main + 0x40 >
  10: e59f2028
                 ldr
  14: e1530002
                 cmp
                      r3, r2
 18: ca000005
                 bgt 34 <main+0x34>
 1c: e59d3004
                 ldr r3, [sp, #4]
  20: e2833001
                 add r3, r3, #1
  24: e58d3004
                 str r3, [sp, #4]
                     r3, [sp, #4]
  28: e59d3004
                 ldr
  2c: e1530002
                 cmp r3, r2
 30: dafffff9
                 ble
                      1c <main+0x1c>
  34: e3a00000
                      r0, #0
                 mov
  38: e28dd008
                 add
                       sp, sp, #8
  3c: e12fff1e
  40: 1dcd64ff
                 .word 0x1dcd64ff
```

The loop remains when we use volatile.

### What is 'bare metal'?

The default build process for C assumes a hosted environment. It provides standard libraries, all the stuff that happens before main.

To build bare-metal, our makefile disables these defaults; we must supply our own versions when needed.

```
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
    // guaranteed to be random.
}
```

### Makefile settings

Compile freestanding

CFLAGS =-ffreestanding

Link without standard libs and start files

LDFLAGS = -nostdlib

Link with gcc to support division (violates

LDLIBS = -lgcc

Must supply own replacement for libs/start

That's where the fun is...!

### Pointers: more gain than pain!

"The fault, dear Brutus, is not in our stars But in ourselves, that we are underlings." Julius Caesar (I, ii, 140-141)

# Refer to data by address or relative position is very useful!

- Sharing instead of copying
- Access to fields of a struct
- Array elements accessed by index
- Construct linked structures (lists, trees, graphs)



### Excerpted from blink.s

```
loop:
  ldr r0, =0x2020001C // set pin
  str r1, [r0]
  mov r2, #0x3F0000 // delay loop
  wait1:
     subs r2, #1
     bne wait1
  ldr r0, =0x20200028 // clear pin
  str r1, [r0]
  mov r2, #0x3F0000 // delay loop
  wait2:
     subs r2, #1
     bne wait2
b loop
```

```
1dr r0, =0x2020001C
  str r1, [r0]
  b delay
  1dr r0, =0x20200028
  str r1, [r0]
  b delay
  b loop
delay:
  mov r2, #0x3F0000
  wait:
    subs r2, #1
    bne wait
// but... where to go next?
```

```
1dr r0, =0x2020001C
   str r1, [r0]
   mov r14, pc
   b delay
   1dr r0, =0x20200028
   str r1, [r0]
   mov r14, pc
   b delay
   b loop
delay:
   mov r2, #0x3F0000
   wait:
     subs r2, #1
     bne wait
   mov pc, r14
```

We've just invented our own link register!

```
1dr r0, =0x2020001C
   str r1, [r0]
   mov r0, #0x3F0000
   mov r14, pc
   b delay
   1dr r0, =0x20200028
   str r1, [r0]
   mov r0, #0x3F0000 >> 2
   mov r14, pc
   b delay
   b loop
delay:
   subs r0, #1
wait:
   bne wait
   mov pc, r14
```

# We've just invented our own parameter passing!

# Anatomy of C function call

```
int sum(int n)
  int total = 0;
  for (int i = 1; i < n; i++)
     total += i;
  return total;
                       Call and return
                       Pass arguments
                       Local variables
                       Return value
                       Scratch/work space
```

Complication: nested function calls, recursion

# Application binary interface

ABI specifies how code interoperates:

- Mechanism for call/return
- How parameters passed
- How return value communicated
- Use of registers (ownership/preservation)
- Stack management (up/down, alignment)

arm-none-eabi is ARM embedded ABI ("none" refers to no hosting OS)

### Mechanics of call/return

Caller puts up to 4 arguments in r0-r3 Call instruction is b1 (branch and link)

Callee puts return value in r0
Return instruction is bx (branch exchange)

```
add r0, r0, r1
bx lr // pc=lr
```

### Caller and Callee

caller - function doing the calling

callee - function called

main is <u>caller</u> of binky binky is <u>callee</u> of main + <u>caller</u> of winky

```
void main(void) {
   binky(3);
void binky(int a) {
   winky(10, a);
int winky(int x, int y) {
   return x + y;
```

# Register Ownership

r0-r3 are callee-owned registers

- Callee can change these registers
- Caller cedes to callee, cannot assume value will be preserved across call to callee

r4-r13 are caller-owned registers

- Callee must preserve values in these registers
- Caller retains ownership, expects value to be same after call as it was before call

### Discuss

- 1. If the callee needs scratch space for an intermediate value, which type of register should it choose?
- 2. What must a callee do when it wants to use a caller-owed register?
- 3. What is the advantage in having some registers callee-owned and others callerowned? Why not treat all same?
- 4. How can we implement nested calls when we only have a single shared Ir register?

### The stack to the rescue!

Region in memory to store local variables, scratch space, <u>save register values</u>

- LIFO: push adds value on top of stack, pop removes lastmost value
- r13 (alias sp) points to topmost value
- stack grows down
  - newer values at lower addresses
  - push subtracts from sp
  - pop adds to sp
- push/pop are aliases for a general instruction (load/store multiple with writeback)

```
// start.s
mov sp, #0x8000000
                                              gpio
                                                     0x20200000
bl main
                                                     0x8000000
                                      sp →
 // main.c
                                              main
                                      sp -
 void main(void)
                                              binky
     binky(3);
                          Not to scale
                                      sp -
 int binky(int a)
     int arr[100];
                                      pc ·
     return winky(arr, 100);
                                              code
                                                     0x8000
                                                     0x0
                             Diagram not to scale
```

# Single stack frame

```
int winky(int a, int b)
{
  int c = 2*a;
  ...
  return c;
}
```

caller's frame

> saved regs

locals/
scratch

sp →

# Stack operations

```
// PUSH (store reg to stack)
// *-sp = r0
// decrement sp before store
push {r0}
// equivalent to:
         str r0, [sp, #-4]!
// POP (restore reg from stack)
// r0 = *sp++
// increment sp after load
pop {r0}
// equivalent to:
         ldr r0, [sp], #4
```

```
sp →
saved r0
```

```
int winky(int a, int b)
{
  int c = binky(a);
  return b + c;
}
```

If winky calls binky...

Why do they collide on use of 1r?

Is there similar collision for r0? r1?

What do we do about it?

use stack as temp storage!

# example.c

0x80000000 0x7ffffffc 0x7ffffff8 0x7ffffff4 0x7ffffff0 0x7fffffec 0x7fffffe8 0x7fffffe4 0x7fffffe0

r0

r1

r2

r3

lr

sp

pc

| 0x8000000                                                          |   |
|--------------------------------------------------------------------|---|
| 0x7fffffc                                                          |   |
| 0x7fffff8                                                          |   |
| 0x7ffffff4                                                         |   |
| 0x7fffff0                                                          |   |
| 0x7ffffec                                                          |   |
| 0x7ffffe8                                                          |   |
| 0x7ffffe4                                                          |   |
|                                                                    |   |
| 0x7ffffe0                                                          |   |
| 0x/ffffe0<br>:                                                     | • |
| 0x/ffffe0<br>:<br>0x/fffe0<br>:<br>0x7fffe70                       | • |
| •<br>•<br>•                                                        | • |
| :<br>0x7fffe70                                                     |   |
| :<br>0x7fffe70<br>0x7fffe6c                                        |   |
| i<br>0x7fffe70<br>0x7ffffe6c<br>0x7ffffe68                         |   |
| i<br>0x7ffffe70<br>0x7ffffe6c<br>0x7ffffe68<br>0x7ffffe64          |   |
| 0x7ffffe70<br>0x7ffffe6c<br>0x7ffffe68<br>0x7ffffe64<br>0x7ffffe60 |   |

r0

r1

r2

r3

lr

sp

рc

# sp in constant motion

Access values on stack using sp-relative addressing, but ....

sp is constantly changing! (push, pop, add sp, sub sp)

caller's frame

saved regs

locals/
scratch

sp →

# Add frame pointer (fp)

Dedicate fp register to be used as fixed anchor

Offsets relative to fp stay constant!



### APCS "full frame"

APCS = ARM Procedure Call Standard

Conventions for use of frame pointer + frame layout that allows for reliable stack introspection

gcc CFLAGS to enable: -mapcs-frame

r12 used as fp

Adds a prolog/epilog to each function that sets up/tears down the standard frame and manages fp

### **Trace APCS**

#### Prolog

push fp, r13, lr, pc set fp to first word of stack frame

#### **Body**

fp stays anchored access data on stack fp-relative offsets won't vary even if sp changing

### **Epilog**

pop fp, r13, lr can't pop pc (**why not**?), manually adjust stack

caller's frame

sp 🔫

fp →

ξβ

рс

lr

r13/ip

fp

locals/
scratch/
call
other
fns

sp 👈

### FPs form linked chain

other =
additional saved regs,
 locals,
 scratch



```
// start.s

// Need to initialize fp = NULL
// to terminate end of chain

mov sp, #0x8000000
mov fp, #0 // fp = NULL
bl main
```

### APCS Pros/Cons

- + Anchored fp, offsets are constant
- + Standardized frame layout enables introspection
- + Backtrace for debugging
- + Unwind stack on exception
- Expensive, every function call affected
  - prolog/epilog add ~5 instructions
  - 4 registers push/pop => add 16 bytes per frame
  - consumes one of our precious registers