# Introduction to Microcontrollers Notes

James Gowans

September 8, 2014

## Licence

This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.

# Contents

| 1 |                                             | 5  |
|---|---------------------------------------------|----|
|   | 1.1 What is a Microcontroller?              |    |
|   | 1.1.1 Development board block diagram       | 6  |
| 2 | Memory Model                                | 10 |
|   | 2.1 Data Types and Endianness               | 10 |
| 3 | The ARM Cortex-M0                           | 13 |
|   | 3.1 Programmer's Model of the CPU           | 13 |
|   | 3.2 CPU Architecture                        |    |
|   | 3.3 Program Counter                         | 14 |
|   | 3.3.1 Three stage pipeline                  | 15 |
|   | 3.4 Reset Vector                            | 15 |
| 4 | Coding                                      | 17 |
|   | 4.1 Assembly                                |    |
|   | 4.1.1 Instruction Sets                      |    |
|   | 4.2 Linking                                 | 18 |
|   | 4.2.1 Executing Code                        | 18 |
| 5 | Loading and Storing                         | 19 |
|   | 5.1 Immediate Offset Loading                | 19 |
|   | 5.1.1 Offset restrictions                   |    |
|   | 5.2 Program Counter Relative Loading        | 20 |
|   | 5.3 Register Offset Loading                 | 21 |
|   | 5.4 Storing                                 | 21 |
|   | 5.5 Accessing of Datatypes Other Than Words | 21 |
| 6 | Branching                                   | 22 |
|   | 6.1 Implementation of a Branch              | 22 |
|   | 6.2 Using labels                            | 22 |
| 7 | Perihperals                                 | 24 |
|   | 7.1 Internal Peripherals                    | 24 |
| 8 | Clock Distribution                          | 27 |
| _ | 8.1 Reset and Clock Control                 |    |
|   | 8.1.1 Clock Source                          |    |
|   | 8.2 CPU Instruction Cycles                  |    |

| 9  | Gene  | eral Purpose Input/Outputs                                    | 29 |
|----|-------|---------------------------------------------------------------|----|
|    | 9.1   | Pin Mode                                                      | 29 |
|    |       | 9.1.1 Input Mode                                              | 29 |
|    |       | 9.1.2 Output Mode                                             | 29 |
|    | 9.2   | Pull resistors                                                | 30 |
| 10 | Con   | ditional Branching                                            | 32 |
|    | 10.1  | Application Program Status Register                           | 32 |
|    | 10.2  | Overflow Flags                                                | 32 |
|    |       | 10.2.1 Unsigned Numbers and the C Flag                        | 33 |
|    |       | 10.2.2 Signed Numbers and the V Flag                          | 33 |
|    | 10.3  | Compare Instruction                                           | 33 |
|    |       | 10.3.1 A note on the implementation of the subtract operation | 34 |
|    | 10.4  | Condition Code Suffixes                                       | 34 |
|    |       | Branching Based on Individual Bits                            | 34 |
| 11 | Instr | ruction Sets                                                  | 36 |
|    | 11.1  | ARM                                                           | 36 |
|    | 11.2  | Thumb                                                         | 36 |
|    | 11.3  | Thumb-2                                                       | 36 |
|    |       | Implementation of Interworking                                | 38 |
| 12 | Exce  | eptions                                                       | 39 |
|    | 12.1  | Reset                                                         | 39 |
|    | 12.2  | HardFault                                                     | 41 |
| 13 | Stac  | :k                                                            | 42 |
|    | 13.1  | Stack Pointer                                                 | 42 |
|    |       | Stack Access Instructions                                     |    |

# 1 System Overview

#### 1.1 What is a Microcontroller?

The microcontroller can be understood by comparing it to something you are already very familiar with: the computer. Both a microcontroller and a computer can be modeled as a black box which takes in data and instructions, performs processing, and provides output. In order to do this, a micro has some of the same internals as a computer, shown graphically in Figure 1.1 and discussed now:

- CPU: The section of the microcontroller which does the processing. It executes instructions which allows it to do arithmetic and logic operations, amongst other forms of operations.
- Volatile memory (RAM:) This is general purpose memory. It can be used for storing whatever you want to store in it. Typically it stores variables which are created or changed during the course of execution of a program.
- Non-volatile memory (Flash): This non-volatile memory is used to store any date which must not be lost when the power to the micro is removed. Typically this would include the program code and any constants or initial values of data.
- Ports: Interfaces for data to move in and out of the micro. This allow it to communicate with the outside world.

These resources are typically orders of magnitude smaller or a micro than on a conventional computer. A micro makes up for this lack of resources with a small size, low power and low cost. A comparison of the characteristics can be seen in Table 1.1. A computer is typically defined as a multi-purpose, flexible unit able to do computation. A microcontroller on the other hand typically is hard-coded to do one specific job.

The terms *microcontroller* and *microprocessor* are different and should not be used interchangeably. A microprocessor is an IC which is able to perform computation, but requires external memory and peripherals to function. A microcontroller has the memory and peripherals built into it, allowing it to be fully independent. Furthermore, the interface in and out of a microprocessor is mainly just an address and data bus. In a microcontroller, these busses are internal to the device. The interfaces in and out of a microcontroller are configurable to be a wide variety of communication standards. This self-contained nature and ability to deal with a wide variert of signals allows a microcontroller to (as the name suggest) be embedded in a larger

|          | CPU         | $\mathbf{R}\mathbf{A}\mathbf{M}$ | Non-volatile | Power  | ${\bf Size/Mass}$      | $\mathbf{Cost}$ |
|----------|-------------|----------------------------------|--------------|--------|------------------------|-----------------|
| Computer | Dual, 3 GHz | 4 GiB                            | 500 GB       | 100 W  | Large                  | R 3000          |
| Micro    | 48 MHz      | 8 KiB                            | 32  KiB      | 50  mW | $\operatorname{Small}$ | R 15            |

Table 1.1: Comparison of specs of entry level computer to STM32F051C6.



Figure 1.1: The most simplified view of the internals of the STM32F051

system and perform control and monitoring functions.

The micro we will be using is the STM32F051C6. It is manufactured by ST Microelectronics, but has an ARM Cortex-M0 CPU. ARM designed the CPU (specified how the transistors connect together). ST then takes this CPU design, adds it to their design for all of the other bits of the micro (flash, RAM, ports and much much more) and then produces the chip.

#### 1.1.1 Development board block diagram

The development board consists of modules which connect to the microcontroller. Most of these modules are optional in that they are not required for the microcontroller to run. We will develop code later in the course to interface with some of these modules. Those which are not optional are the voltage regulator and the debugger. Following is a brief discussion of the purpose of each of the dev board modules (peripherals).

- STM32F051C6: This is the target microcontroller. It is connected to everything else on the board and it is where the code which we develop will execute.
- Debugger: this is essentially another microcontroller running special code on it which allows it to be able to pass information between a computer and the target microcontroller. The interface to the computer is a USB connection, and the interface to the target is a protocol called Serial Wire Debug (SWD) which is similar to JTAG. The specific type of debugger which we have is a ST-Link.
- Regulator: A MCP1702-33/T0 chip. This converts the 5 V provided by the USB port into 3.3 V suitable for running most of the circuitry on the board.
- LEDs: One byte of LEDs, active high connected to the lower byte of port B.
- Push buttons: Active low push buttons connected to the lower nibble of port A.
- Pots: 2 x 10K (or there abouts) potentiometers connected to PA5 and PA6.
- LCD Screen: A 16x2 screen connected to the micro in 4-bit mode. Used to display text.



Figure 1.2: Modules on the dev board as seen when top boards unplugged or plugged in.



Figure 1.3: Highly simplified diagram showing how micro and computer communicate

- LCD contrast pot: The output of this potentiometer connects to the contrast pin of the LCD screen, hence allowing contrast adjustment.
- MAX232: This chips translates between TTL or CMOST logic level UART traffic and bi-polar higher voltage RS-232 traffic. Used for industrial communications links.
- USB for comms: The header allows intercepting of the UART traffic before it gets to the MAX232 and converting it to USB traffic through a small board which plugs into that header. When this facility is not being used, the jumpers on the header should be placed to allow the UART traffic to make its way to the MAX232.
- Temperature sensor: A TC74-A0  $I^2C$  temperature sensor.
- Crystal: 8 MHz quartz oscillator with 10 pF caps for removing high frequency harmonics.
- EEPROM: A 25LC640A 64Kb Electronically Erasable and Programmable Read Only Memory (EEPROM) chip which communicates over SPI.
- RG LED: Common cathode Red/Green LED.

The full circuit schematic for the board follows. For now, we will forget about all of the other modules on the dev board and consider our system to be a computer talking to a debugger talking to a target micro, as shown in Figure 1.3. This is the most basic system which must be understood to allow us to load code onto the target microcontroller.



# 2 Memory Model

We will now beging to expand on some of the block is Figure 1.1. Before starting to explore how the CPU works, it's useful to have an understanding of how memory is laid out. We will start looking at the flash and RAM blocks. Together with another block called perihperals (which we will explore later), these blocks make up memory. It's important to note that this memory is located *outside* of the CPU, but still inside the microcontroller IC.

The memory of a device can be though of as a very long row of post boxes along a street. Each post box has an address, and each post box can have data put into it or taken out. The amount of data that each post box can hold is 8 bits, or one byte. Therefore, each memory address is said to address one byte. The address of each post box is 32 bits long, meaning that addresses range from 0 (0x00000000) to just over 4 billion (0xFFFFFFFF). In actual fact, the vast majority of these addresses do not have a post box at them. These addresses are said to be unimplemented. Only very small sections of this address space are implemented and can actually be read from or written to. Flash and RAM are contiguous blocks of memory, with a start address and an end address. A simplified memory map of the STM32F051 is shown in Figure 2.1. From this, we can see that if we want to use changeable variables in our programs, the variables should be located at addresses between 0x2000 0000 and 0x2000 1FFF. If we want to load code onto the micro which should not be lost when the device loses power, the code should be loaded into addresses between 0x0800 0000 and 0x0800 7FFF.

## 2.1 Data Types and Endianness

Very often we will need to work with clumps of data which are larger than 1 byte. ARM defines datatypes for a 32 bit CPU as follows:

• byte: 8 bits

• halfword: 16 bits

• word: 32 bits

• doubleword: 64 bits

Each memory address only addresses one byte of memory, so how can something like a word (four bytes) be stored in memory? Obviously, the four bytes have to come after each other to form a four byte block, or word. However, it is not obvious which order they should come in. For example, consider the case of wanting to store the word 0xAABBCCDD in address 0. The two possible ways of doing it are shown in Table 2.1. It doesn't really matter which one of these schemes is used - they each have their pros and cons and different processors use different methods. It is important to know which one our processor has chosen to use. Our processor uses little endian. A more abstract view of how data is stored in our processor is given in Figure 2.2



Figure 2.1: Simplified STM32F051C6 memory map. Note how all addresses are 32 bits. The blocks are very much not to scale. Source: datasheet, Figure 9

| Little E | ndian |   | Big Endian |      |  |  |
|----------|-------|---|------------|------|--|--|
| Address  | Data  | - | Address    | Data |  |  |
| 3        | 0xAA  | - | 3          | 0xDD |  |  |
| 2        | 0xBB  |   | 2          | 0xCC |  |  |
| 1        | 0xCC  |   | 1          | 0xBB |  |  |
| 0        | 0xDD  |   | 0          | 0xAA |  |  |

Table 2.1: Layouts of the word 0xAABBCCDD in memory at effective address 0, according to little or big endian format



Figure 2.2: More abstract view of little endian layout. Source: Prog Man, page 28

## 3 The ARM Cortex-M0

At the core of a microcontroller is the CPU. Our CPU is called the Cortex-M0 and is designed by Advanced RISC Machines (ARM). The ARM Cortex-M0 CPU is certainly the most interesting block inside the STM32F051C6. This is where all processing happens, hence this is where the instructions which we write will run. It is therefore essential that we have an intricate understanding of the CPU so that we may write useful code for it. This chapter seeks to explore the CPU in some detail.

### 3.1 Programmer's Model of the CPU

A programmer's model is a representation of the inner workings of the CPU with sufficient detail to allows us to develop code for the CPU, but no unnecessary detail. The expanded view of the CPU which will now be discussed can be seen in Figure 3.1. This simple model of a CPU is a set of CPU registers, an Arithmetic and Logic Unit (ALU) and a control Unit. The CPU registers are blocks of storage each 32 bits wide which the CPU has the ability to operate on. Only data which is inside a CPU register can be operated on by the CPU. The ARM Cortex-M0 has 16 such registers.

The ALU is that which performs the operations on the registers. It can take data from registers as inputs, do very basic processing and store the result in CPU registers.

The control unit manages execution by telling the ALU what to do. Together, the registers, ALU and control are able to execute instructions. Examples of instructions which the CPU is able to execute:

- 1. adding the contents of R0 and R1 and storing the result in R6
- 2. copying the contents of R3 into R0
- 3. doing a logical XOR of the contents of R3 with the contents of R4 and storing the result in R3
- 4. moving the number 42 into R5

#### 3.2 CPU Architecture

This section will explore some CPU architectures and compare them to the architecture of the Cortex-M0.

The Cortex-M0 makes use of a Von Neumann architecture. This means that there is a single bus which connects all of the parts (such as CPU, RAM, flash) inside the microcontroller. The implication of this is that the CPU cannot fetch an instruction from flash at the same time as it moves data in or out of RAM. This limitation allows for a much simpler architecture, but at the expense of performance.

Other microcontrollers (even others in the Cortex-M series like the Cortex-M3) follow a Harvard architecture, meaning that there are separate buses used for fetching instructions and



Figure 3.1: A view of the internals of the STM32F051 with the ARM Cortex-M0 expanded

moving data around. This allows faster execution as instructions can be fetched at the same time as data is loaded or stored. However, it necessitates greater complexity and more transistors.

It's been said that the ARM Cortex-M0 is a 32-bit processor. For comparison, the processor which we used in this course previously (MC9S08GT16A) was an 8-bit processor. Your personal computer probably has a 64-bit CPU. 16-bit CPUs are also quite common. So what exactly does it mean when we say that the processor is 32-bits? Essentially, the number of bits which a processor is said to be referrers to the size of the data bus. In other words: the amount of data which the processor is able to move around internally or perform arithmetic and logic operations on. Hence, with a 32-bit processor, we can move 32 bits of data from one spot in memory to another in just once instruction. If you had a 8-bit processor, it would cost 4 instructions to move 32 bits of data around.

## 3.3 Program Counter

The Program Counter is a special register in the CPU, specifically: R15. It's called "special" because it has a specific, fixed purpose and cannot be used as a general purpose register like the other registers can. It's purpose is keeping track of were we are in the execution of a program. All instructions which need to be executed are laid out sequentially in flash, each instruction occupying a halfword of memory. Hence, each instruction has a defined address. The PC points to (ie: hold the address of) the instruction which is about to be fetched from flash and executed.

Typically, the value of the PC is simply incremented by 2 in order to cause it to point to the next instruction in memory. However, it's possible to alter the flow of execution of a program by issuing a *branch* instruction which will cause the PC to be incremented or decremented by a different amount.

#### 3.3.1 Three stage pipeline

There is a bit more complexity to the program counter than initially apparent. It's worth understanding this extra intricacy as it affects how other instructions which depend on the program counter work. The ARM Cortex-M0 implements a three stage pipeline. This means that an instruction is broken up into three parts, and executed over the course of three clock cycles. The parts are:

- fetch: the instruction which the program counter points to is pulled into the CPU.
- **decode:** the CPU control unit "looks" at the 16 bits which represent the instruction, and figures out what action it must take.
- execute: the CPU runs the instruction, causing data to be modified.

The fact that the CPU is pipelined means that different instructions can be going through different phases at the same time. In other words, one instruction can be being fetched while another is being decoded while another is being executed. As an example, assume we have three instructions which we want to execute, instruction A, instruction B and instruction C. The three instructions being run through the pipeline is shown graphically in Figure 3.2. It's critical to note how the program counter is always pointing to the instruction being fetched. This makes sense as the job of the program counter after all is to facilitate keeping track of which instruction must be fetched. For this reason, when an instruction is being executed, the PC is actually pointing to two instructions (four bytes) further ahead in memory, and not at the address of the instruction in execution. Hence, when an instruction in execution uses the PC, the value which will be used is the address of the instruction plus four.

#### 3.4 Reset Vector

Initial address of PC loaded from 0x0800 0004.

|            |            | \       |         |         |         |
|------------|------------|---------|---------|---------|---------|
| PC         | Cycle 1    | Cycle 2 | Cycle 3 | Cycle 4 | Cycle 5 |
| Instruc. A | Fetch      | Decode  | Execute |         |         |
| Instruc. B |            | Fetch   | Decode  | Execute |         |
| Instruc. C |            |         | Fetch   | Decode  | Execute |
|            |            |         |         |         |         |
|            |            |         | )       |         |         |
|            | Cycle 1    | Cycle 2 | Cycle 3 | Cycle 4 | Cycle 5 |
| Instruc. A | Fetch      | Decode  | Execute |         |         |
| Instruc. B |            | Fetch   | Decode  | Execute |         |
| Instruc. C |            |         | Fetch   | Decode  | Execute |
|            |            |         |         |         |         |
|            |            |         |         | \       |         |
|            | Cycle 1    | Cycle 2 | Cycle 3 | Cycle 4 | Cycle 5 |
| Instruc. A | Fetch      | Decode  | Execute |         |         |
| Instruc. B |            | Fetch   | Decode  | Execute |         |
| Instruc. C | Instruc. C |         | Fetch   | Decode  | Execute |
|            | -          |         |         | /       |         |

Figure 3.2: Showing three instructions being run through a three stage pipeline, as well as where the PC is pointing every cycle

# 4 Coding

### 4.1 Assembly

In order to get the CPU to do some of what we've discussed above, it needs to have code loaded onto it to run. We write code in a language called assembly. Assembly is a human-readable language. A program is made up of a sequence of instruction; each instruction gets executed by the CPU. It's quite easy to see what each instruction does by reading the program. The complete instruction set is located in the Programming Manual. You must be familiar with this document! Examples of instruction which carry out the tasks listed above are:

- 1. ADDS R6, R0, R1
- 2. MOV RO, R3
- 3. EORS R3, R3, R4
- 4. MOVS R5, #42

The CPU does not have the ability to understand our nice English words like *ADD* or *MOV*. The CPU only has the ability to understand binary data. Assembly code must be compiled to machine code. A machine code instruction is a binary string, 16 bits long consisting of the operation code (opcode) and the data which it must operate on (operand). For example, assume that we wanted to ascertain the machine code representation of the instruction ADDS R6, R0, R1. An extact from the ARMv6-M Architecture Reference Manual is shown in Figure 4.1 where Rd is the destination register and Rm and Rn are the source registers of the add. It can easily be seen that the instruction would compile to 0001100 001 000 110 = 0x1846. The opcodes for each instruction are detailed in the ARMv6-M Architecture Reference Manual. All of the instructions in the program are 16 bits long and are stored sequentially after one another in flash memory.

#### 4.1.1 Instruction Sets

An instruction set is the collection of all of the instructions which a processor can execute. The ARM Cortex-M0 uses the ARMv6-M architecture and this architecture supports the Thumb instruction set (as opposed to Thumb-2 or ARM). Thumb contains about XXX instructions,

| 15 |   |   |   |   |   |   |    |   | 5 | 4  | 3 | 2 | 1  | 0 |
|----|---|---|---|---|---|---|----|---|---|----|---|---|----|---|
| 0  | 0 | 0 | 1 | 1 | 0 | 0 | Rm | l |   | Rn |   |   | Rd |   |

Figure 4.1: An encoding of the ADDS instruction

each of which is 16 bits long.

Higher end ARM processors such as the Cortex-M3 or Cortex-M4 support the ARMv7-M architecture which allows multiple instruction sets to be supported by the processor. The ability to support multiple instruction sets requires *interworking*. Interworking is the ability to specify to the CPU which instruction set to use. While our ARM Cortex-M0 only supports the Thumb instruction set, there is no need for interworking, yet the cabability has still been incorperated into the architecture to allow for compatability to other processors. This means that although our processor only supports one instruction set (Thumb), we have to explicitly tell it that we are using that instruction set.

### 4.2 Linking

Once our assembly code has been written and compiled to machine code, the computer which loads the code onto the micro has to be told what addresses to place the code at. The code should be placed starting at the beginning of flash.

#### 4.2.1 Executing Code

The PC always points to the instruction which is about to be fetched. Hence, when your micro boots up, before it has executed anything, the PC will point to the first instruction to be fetched/decoded/executed. By "point to" we mean that it holds the address of the instruction.

As each instruction in the ARM Cortex-M0 instruction set it 16 bits (aka: half a word) long, ARM have implemented a rule that all instructions must be half word alligned. In other words, the address of the instruction must be divisible by 2 bytes. Legal addresses for instructions are hence, 0x02, 0x04, 0x06, 0x08 ... etc. This means that the least significant bit (bit 0) of the PC register is unused in specifically the address of an instruction. Hence, it has been assigned another use. Specifically, to indicate the instruction set which is being executed.

# 5 Loading and Storing

Loading is the process of getting data from somewhere in the memory space into the CPU registers so that it can be used in processing. Storing is the process of getting data which is in the CPU registers into memory. Remember that seeing as flash is read-only memory, we cannot store data to flash address, but we can store to RAM.

The general format for a load is that a destination register, a register containing a base address, and an offset are supplied. An effective address is then calculated as the base address plus the offset. The contents of memory at the effective address are then copied from memory into the destination CPU register. When we do this we are treating a register as a *pointer*. When we regard the contents of a register as a memory address and use that register to access data in memory we are dereferencing a pointer: accessing the data pointed to by a pointer. This is an important concept!

A store operation is very similar. Again, a register containing a base address and an offset are supplied, but this time it is a source register not a destination register which is supplied. Again, and effective address of base plus offset is calculated. The contents of the source register is copied into the effective address.

Note that most of the load/store operations which we will be doing are 32-bit (word) load or stores. This is because the CPU registers are 32 bits. So far we have only spoken of a single effective address. As you know, each address can only hold 8 bits. Hence, in order to load or store 32 bits, four sequential addresses are used. The effective address specifies the *lowest* in the sequence of the addresses. For example, if we wanted to store the contents of R0 in 0x20000000, the word would be placed into the address range 0x20000000, 0x200000001, 0x200000002 and 0x200000003. Remember that our processor uses little endian format, so the LSB is placed at 0x200000000 and the MSB at 0x200000003.

We will now explore some implementations of loading and storing.

## 5.1 Immediate Offset Loading

In this format, the base address is supplied in one of high CPU registers (R0 - R7), and the offset is supplied as an immediate number. The instruction format for loading data into a register is

where Rt is the target register for the load, Rn contains the base address and #imm is the offset from the base address.

The way that this instruction works is that it calculates an *effective address* which is equal to the contents of the base address register plus whatever number is supplied as an immediate operand. There is, however, a slight complexity in how the offset is dealt with.

#### 5.1.1 Offset restrictions

Remember that all instructions are limited to 16 bits. The format of the LDR instruction in machine code is shown in Figure 5.1. We can see that after 5 bits of opcode and  $2 \times 3 = 6$  bits

```
Encoding T1 All versions of the Thumb instruction set. LDR <Rt>, [<Rn>{,#<imm5>}]
```

| 15 | 14 | 13 | 12 | 11 | 10 | 9    | 8 | 7 | 6 | 5  | 4 | 3 | 2  | 1 | 0 |
|----|----|----|----|----|----|------|---|---|---|----|---|---|----|---|---|
| 0  | 1  | 1  | 0  | 1  |    | imm5 |   |   |   | Rn |   |   | Rt |   |   |

```
t = UInt(Rt); n = UInt(Rn); imm32 = ZeroExtend(imm5:'00', 32);
```

Figure 5.1: Machine Code representation of LDR instruction. Source: ARMv6-M Architecture Reference Manual

of register specifications, we are only left with 5 bits of offset. Normally, these 5 bits would only allow us to provide an offset of  $2^5 - 1 = 31$  bytes. This is not very much! In order to extend the range of the 5 offset bits, the actual offset used is equal to the 5 bit immediate number multiplied by four. This multiplication by four is the same as appending two zeros to the end of the binary value, which you can see is being done in Figure 5.1. This means that the amount which we are able to offset a base address by is now  $(2^5 - 1) \times 4 = 124$ , which is significantly more useful. However, seeing as we are multiplying to immediate number by four to get the actual offset, the implication is that all offsets must be a multiple of four. The compiler automatically takes care of dividing whatever offset we supply in our assembly instruction by four in order to get it to fit into the 5 bit immediate number, and the CPU then multiplies the immediate number by four to get the offset.

For example: if we wanted an offset of 12, the immediate number which would be placed in the instruction by the compiler would be 3.

## 5.2 Program Counter Relative Loading

There is another format of the LDR instruction which takes the Program Counter as a base register, and allows for an 8-bit immediate offset. If you wish to load data from flash into a CPU register, it makes sense to use the PC as a base register due to the fact that the PC is already initialised to be pointing to an address in flash. Specifically, it is pointing to the instruction which is being fetched (not executed - remember the three stage pipeline!). The format of the LDR instruction for PC relative loading can either be specified in the same was as the general LDR instruction, or it can have a label provided as an operand, as follows:

```
LDR Rt, [PC, #imm]
LDR Rt, <label>
```

If one supplies a label as an operand, all that the compiler does is calculate the correct immediate offset value to insert, and compiles the instruction as if it were in the first format. It's important to note that these instructions are exactly equivalent: all that using a label does is cause the compiler to do the hard work of calculating the correct offset so you don't have to. It would really be a lot of hard work; every time you changed something in the structure of your program which caused instructions to be moved to different memory addresses (like writing a new line of code!) you'd potentially have to re-calculate your offsets. The ability to use labels is one of the most useful features of the compiler.

### 5.3 Register Offset Loading

So far all offsets have been supplied as immediate numbers to the load instructions. However, there is another format of the load instruction called a register-offset load. Here, the offset is contained in another register. This is useful as the offset can be set at run-time by modifying the contents of a register, rather than at compile time. In this case, the effective address is calculated as the contents of the base register (Rn) plus the contents of the offset register (Rm).

```
LDR Rt, [Rn, Rm]
```

### 5.4 Storing

The storing commands are so similar to the loading that they will barely be discussed. One difference is that there is no PC-relative store, as there would be no point trying to store data to read-only memory. The store instruction takes moves the contents of a source register, Rt, and places it at the effective memory address equal to the base address, Rn, plus an offset either supplied as a 5-bit immediate number, #imm5, or in an offset register, Rm.

```
| STR Rt, [Rn, #imm5]
| STR Rt, [Rn, Rm]
```

### 5.5 Accessing of Datatypes Other Than Words

So far we have only loaded or stored words. While it is useful to be able to move an entire 32 bits of data around at once we will sometimes only want to move bytes of half-words around. There are instructions which allow us to do this. There is a version of the LDR instruction which loads only 1 byte: LDRB. Similarly, there is a version which loads 2 bytes or half a word: LDRH.

# 6 Branching

Branching refers to the ability to alter the order of execution of code. Ordinarily the instructions which are coded and then placed into flash are executed sequentially: one after the other in the order which they appear in flash. However, this is highly limiting. Branching allows us to execute instructions which can cause the CPU to jump to executing any instruction in the program (sort of).

### 6.1 Implementation of a Branch

Seeing as the program counter (PC) entirely specifies which instruction is going to be executed next (by holding the address of the instruction) it is relatively simple in concept to get the CPU to execute a specific instruction: write the address of that instruction to the PC. Unfortunately there is a complication.

Due to our instructions being 16 bits wide, it is not possible to hold the address of an instruction to branch to as immediate data seeing due to addresses being 32 bits (you can't fit 32 bits of operand into a 16 bit instruction!). To overcome this, a technique called relative branching is employed. This means that the address of the instruction which the CPU branches to is equal to the contents of a certain register plus or minus a certain amount. Seeing as the PC is already pointing to the general area in memory where instructions live, the PC is most often use as the base address register. This means that the branch instruction causes the PC to take on a value equal to the current value of the PC plus/minus some amount.

## 6.2 Using labels

We could manually calculate the difference between the addresses of instructions which we wanted to branch to/from and use that as our offset address. However, just as in the case of load/store, this would be exceptionally tedious. We can use labels to get the compiler to do the laborious work calculating offsets for us. Similar to load/store instructions, we can label an instruction and then use that label as a operand for a branch instruction. The compiler then works out the address of the instruction which has been labeled, works out the address of the instruction which is doing the branch and creates a PC relative branch instruction with the correct offset equal to the difference in addresses of the two instructions.

For example, consider something like this:

```
foo: LDR R0, [R1]

ADDS R0, R0, #1

...

... @ a whole lot of other instructions

B foo
```

That would work by calculating the difference between the branch instruction and the instruction labeled foo and then subtract that amount from the PC when the branch took place. There are

| slight complications around things like the three stage pipeline and data alignment optimisations but in principle that's how it works. |
|-----------------------------------------------------------------------------------------------------------------------------------------|
|                                                                                                                                         |
|                                                                                                                                         |
|                                                                                                                                         |
|                                                                                                                                         |
|                                                                                                                                         |

# 7 Perihperals

Peripherals in our context can have two meanings. Either, they could be the devices around the microcontroller on the development board like the LCD, push-buttons, potentiometers, temperature sensor or EEPROM which the microcontroller is able to interact with, usually for the purpose of getting input, displaying output or storing data. Alternatively, peripherals could refer to the blocks of circuitry inside of the microcontroller which provides some additional functionality which the CPU does not have. Examples would include circuitry for providing precise timing, or circuitry to interact with the pins of the microcontroller. To distinguish between the two, we call those peripherals which are outside of the microcontroller development board peripherals, and those which are inside the microcontroller chip we call internal peripherals.

The general structure is that the CPU interfaces with internal peripherals which in turn interface with dev board peripherals through the pins on the microcontroller.

### 7.1 Internal Peripherals

All internal peripherals are organised in a bus architecture which allows the CPU to interact with them. The full diagram of all of the peripherals in the STM32F051 is shown on the following page. In order for the CPU to interface with them, each peripheral has a block of memory associated with it. Recall the address space of the microcontroller as shown in Figure 2.1. The block called *peripherals* running from address 0x4000 0000 to 0x4800 17FF is the range of addresses which is available to have peripherals associated with it. The full memory map can be seen in Figure 2 of the Reference Manual.

Out of that large peripherals block of memory, each peripheral has a specific block of memory associated with it. The starting and ending address for each peripheral in the microcontroller can be seen in Table 1 of the Reference Manual. Note how the vast majority of the peripherals address space is unimplemented (or "reserved"). This allows there to be lots of space for expansion: fancier microcontrollers can have more peripherals and make use of this unimplemented address space.

Inside each block of memory assigned to a specific peripheral is further sub-divisions of the block into registers. Registers are blocks of memory (typically one word big on our processor) which provide a specific, well defined element of functionality, typically configuring how the peripherals works or providing some status information about the peripheral. The CPU is able to write data to a register to configure the peripheral or read data from a register to get information about the peripheral. Sometimes a register simply holds a number (for example: for use in a counter) but more frequently each individual bit in a register as a specific meaning. For example, a bit can be set high to enable some sort of functionality or set low to disable some functionality.

Each register has an address which must be known when interacting with that register. The way that the address is calculated is using a (base address) plus (offset) system. The base address is the start of the address range for the peripheral as seen in Table 1 of the Reference Manual, and the offset is the number which must be address to the base address to get the effective

STM32F051xx Description



Figure 1. Block diagram

address of the register. This is a very convenient system as our load and store operations in the CPU also work on a base plus offset system.

A description of what each register does (and indeed what each bit in the register does) as well as the offset for that specific register can be found at the end of the chapter of the Reference Manual which deals with the peripheral (or class of peripherals) which you're trying to interact with.

A register is like RAM in that it is volatile memory, but it is different to RAM in that while RAM is general purpose memory which can be used for storing whatever you like, each register has a specific function and very specific, meaningful data must be written to or read from the peripheral which will configure the microcontroller in some way.

The following chapters serve to describe the operation of some of the key peripherals in the microcontroller.

# 8 Clock Distribution

In order for a block of circuitry to function inside the microcontroller it needs to be clocked. Clocking circuitry provides it with well defined timing which allows the circuitry to take data from the bus or place data onto the bus exactly in sync with all other circuitry in the microcontroller. This essentially enables the circuitry for use. However, as soon as circuitry inside the micro is clocked/enabled, it draws power. For that reason, each internal peripheral can selectively be enabled or disabled by providing or removing (better known as gating) the clock to that peripheral. Most importantly the clock is default OFF for all peripherals in order to make the default power consumption as low as possible. The exact amount of power consumed by each peripheral is different depending on which peripheral it is. Furthermore, power consumption is approximately linear with clock speed. At maximum clock speed the power consumption is roughly half a milliamp per peripheral.

#### 8.1 Reset and Clock Control

The RCC is a peripheral. As the name implies, one of its key functions in the management of the clocking system of the microcontroller. This involves both generating or altering the clock frequency and selectively gating or allowing clock to the other peripherals of the micro.

The peripherals are divided up into a bus structure as shown earlier and as such the structure of the clock distribution is also based on a bus structure. The RCC has a register for each bus. The register controls the state of the clock of the devices connected to that bus. These registers include the RCC AHBENR, RCC APB1ENR and RCC APB2ENR.

#### 8.1.1 Clock Source

As well as managing the gating of clocks for peripherals the RCC also selects the oscillator which should be the source of the system clock (sysclock). The default source is an internal 8 MHz RC oscillator. Optionally, the external crystal quartz oscillator can be selected as clock source. The advantage of using an internal oscillator is that it does not require an extra component to be connected to the micro. The disadvantage is the tolerance: it is around 1% at room temperature, but can be more than 4% at more extreme temperatures. The tolerance of an external crystal quartz oscillator is typically much better than 0.01%. Hence, for applications which are not timing sensitive the internal oscillator can be used. For timing sensitive applications the external oscillator should be used.

## 8.2 CPU Instruction Cycles

As well as driving the peripherals the system clock drives the CPU. The instructions which we write each take a certain number of CPU cycles to execute. Instructions typically take one cycle to execute, but this is not true for all instructions. The exact number of cycles which can instruction takes is detailed in Section 3.3 of the ARM Cortex-M0 Technical Reference Manual. By knowing how many cycles each instruction takes to execute we can know how many cycles a

specific block of code takes to execute. By knowing how long each cycles takes in real time (this is the inverse of frequency) we know the real time which a block of code takes to execute.

# 9 General Purpose Input/Outputs

One of the simplest ways to interface the microcontroller with external circuitry is via GPIO. The ability for the microcontroller to communicate with external devices via GPIO pins is one of the defining differences between microcontrollers and microprocessors. Most pins on the microcontroller are able to operate in GPIO mode. As the name implies, a GPIO pin can be either an input or an output. Additionally, a pin can be placed into an alternate function or analogue mode; these will be discussed later.

The microcontroller's GPIO pins are divided up into groups. Each group is known as a port and each port has a letter associated with it (PortA, PortB etc). Each port contains pins. The maximum number of pins which a port can contain is 16, but some ports contain as few as 2 pins. This means that the name which we assign to a pin is a combination of the port letter and the pin's number in that port. For example: Port A pin 7 refers to a specific pin (shortened to PA7). This naming scheme is useful as the name makes it clear how we interact with that pin. The ports are both a logical and physical division of the pins: all of the pins which belong to a certain port are controlled by a certain block of circuitry which manages that port. The name immediately tells us which block of circuitry our code should interface with in order to control that pin.

A diagram showing how the pin is structured electrically inside the microcontroller is shown in Figure 9.1.

#### 9.1 Pin Mode

As mentioned, the pin can be in one of four possible modes: input, output, alternate function, analogue. There is a register which controlled which mode the pin operates in, known as the GPIOx\_MODER. The 32 bits of the register are divided up into pairs of bits where each pair of pits sets the mode for the associated pin.

#### 9.1.1 Input Mode

Input mode is the default mode for most pins. In this mode, the pin is measuring the voltage applied to it and ascertaining whether it is a logic 0 or a logic 1. This 'decision' is made by a Schmitt Trigger which has useful characteristics such as well defined high and low levels, hysteresis and high impedance. The logic level of each pin is latched on each clock cycle and written to the Input Data Register (GPIOx\_IDR). As each pin can only be considered to be either a logic high or a logic low, there is only 1 bit necessary to represent the state of a pin.

#### 9.1.2 Output Mode

Here, the pin does not measure a logic level, but rather asserts a logic level. When in output mode, the pin will either assert a logic 0 allowing it to sink current from an external source, or assert a logic 1 allowing it to source current into an external sink. The logic level which is asserted is controlled by the Output Data Register (GPIOx\_ODR).



Figure 9.1: Internal structure of pin. Source: Figure 17, Reference Manual

Each bit in this register can be set by writing to the register. Additionally, the bits in this register can be set via the Bit Set and Reset Register (GPIOx\_BSSR). This register allows atomic (done in a single instruction) setting or clearing of individual bits in the ODR.

#### 9.2 Pull resistors

When a pin is set to input mode and there is no logic level applied to it, what value will the bit for that pin in the IDR take on? A logic 1 or a logic 0? Due to the high impedance nature of the pin and the presence of environmental noise, the level which is read from the pin will probably jump randomly between a logic 1 and logic 0. The fact that it's a high impedance input means that even very weak EM signals will cause a voltage to appear on the pin which will cause it to oscillate between logic levels. This is generally bad. In order to define a sort of 'default' level which the pin will read when no external signal is applied to it, internal pull up or pull down resistors are used. These resistors are selectively turned on or off using the

#### How to set or clear individual bits

There is often a case where you wish to modify only one or two of the bits of a port, leaving the rest of the pins unchanged. If you simply write a pre-defined value to the pins, it will force all of them to take on a specific value. The way to modify only a single bit is to do a logic AND or OR of the contents of the register with a pre-defined pattern. An OR has the ability to set specific bits while leaving others unchanged, while and AND has the ability to clear certain bits while leaving the others unchanged. For example, say we wanted to set bits 1 and 2, while clearing bits 0, 3, 4 and 5, leaving the other bits of the port unchanged:

```
@ assuming Rn contains the address of the register to modify: LDR R0, [Rn]
```

**ANDS** R0, R0, R1

**MOVS** R1, #0x06

**ORRS** R0, R0, R1

**STR** R0, [Rn]

# 10 Conditional Branching

The branching we have done up until now has been unconditional branching: the branch instruction is always executed. This is highly limiting as the program can have only one flow. Conditional branching refers to the ability of the CPU to either take or ignore a branch instruction depending on some condition. This is very powerful as it allows the flow of the program to by variable depending on dynamic conditions.

### 10.1 Application Program Status Register

The APSR is a special CPU register. It does not have a register number like the other registers and cannot be read or written by normal instructions. However this is a critically important register as it is the source of the conditions for the conditional branching. The APSR holds 4 flags:

**Negative (N):** Set if the result of the last operations has was negative. In other words, the msb was a 1. This flag only has a meaning when treating data as signed numbers.

**Zero (Z):** Set if all bits of the last operations were 0.

Carry/Borrow (C): Set if an *unsigned* overflow occurred. Ie: the actual result of the computation exceeded the bounds of the 32-bit register when treated as an unsigned number.

**Two's Compliment Overflow (V):** Set if a *signed* overflow occurred. Ie: the actual result of the computation exceeded the bounds of the 32-bit register when treated as a signed number.

Together, these flags provide us with an abundance of information about the result of computations. We are able to ascertain basically any information about the relationship between arbitrary numbers by examining these flags. Not all instructions set the APSR flags. It is necessary to examine the details of the instruction in the Programming Manual in order to see whether the instruction sets the flags. Furthermore it may be necessary to examine the detailed workings of the instruction in the ARMv6-M Reference Manual in order to see which flags are set and how the settings of those flags is determined. However, in general instructions which set the flags have an S at the end of their name. Again (in general) arithmetic operations set/clear all APSR flags while logic operations set/clear only the N or Z flags.

## 10.2 Overflow Flags

While the Z and N flags are simple to understand, the overflow flags (especially signed but also unsigned) are more tricky. Let's explore them in a bit of detail.

Our CPU registers contain a limited number of bits: 32. This places a limit on the range of numbers which can be held in the CPU. Note that the register only holds a sequence of bits. That sequence of bits is only interpreted as a number when we assign some sort of encoding scheme to the number.

#### 10.2.1 Unsigned Numbers and the C Flag

The typical scheme used to convert a binary string into an integer is that the weight of each bit is equal to  $2^n$  where n is the position of the bit starting at 0. Each bit is multiplied by its weight and summed. This is one interpretation (arguably the most common) which converts a sequence of bits into an actual number which can be represented on a number line and have a meaning. This interpretation is called *unsigned*.

For 32 bits the maximum value obtainable is when all of the bits are set. This is equal to the value  $2^{32} - 1 = 4\,294\,967\,295$ . The minimum value is when all bits are 0, resulting in a value of 0. It's important to realise that these limits are only true when we are treating the sequence of bits as an unsigned number.

What happens if we attempt to exceed these limits? An overflow occurs. If we attempt to perform a computation where the true result of the computation is outside of the limits imposed by the finite number of bits in the CPU an overflow occurs. This overflow of the unsigned limits is signaled by the CPU through setting the C flag high.

### 10.2.2 Signed Numbers and the V Flag

We've just seen that when interpreting a sequence of bits as unsigned the minimum value is 0. This is often not sufficient as we may want the capability to represent negative numbers. Enter signed numbers. Here, the weights of the msb is  $-(2^n)$  while all the other bit keep their positive weights.

This means we have different limits. The largest value which can be represented by a 32-bit signed number is when all of the positive bits are set and the negative bit is clear: 0x7FFF FFFF or 2 147 483 647. The smallest value which can be represented by a 32-bit signed number is when all of the positive bits are clear and the negative bit is set: 0x8000 0000 or -2 147 483 648.

Again, what happens if we attempt to execute a computation where the actual result is outside of the limits of what the 32 bits can hold when interpreted as signed numbers? The CPU signals this error to us with the Two's compliment overflow flag: V.

The CPU itself has absolutely no idea whether you as the programmer want to treat your data as signed or unsigned numbers. It just takes sequences of bits and performs arithmetic or logic operations on the bits. Hence, to cater for both possible cases (the bits should be treated as signed or the bits should be treated as unsigned) the CPU sets or clears both the C and V flag after computations. If you want your numbers to be treated as unsigned you should be interested in the state of the C flag. If you want your numbers to be treated as signed you should be interested in the V flag.

### 10.3 Compare Instruction

One of the key instructions used in the context of conditional branching is the compare (CMP) instruction. This instruction essentially subtracts two values from each other, disregards the result but updates the flags depending on the result. CMP takes either two registers or a register and an immediate value as operands. The CMP instruction is most often used to set the conditions which the conditional branch will depend on. This is due to the fact that a subtraction tells us a lot about the relationship between two numbers. For example, if the result of a subtraction sets the zero flag we know that the numbers being compared (subtracted) have the

same value. Similarly, if the result of the subtraction of B from A clears the V flag it tell us that A is larger than B when viewed as signed numbers.

The format of the CMP instruction is one of:

```
CMP Rn, Rm
CMP Rn, #imm8
```

In the first case, the value of Rm is subtracted from Rn. In the seconds case, the 8-bit immediate number is subtracted from Rn.

#### 10.3.1 A note on the implementation of the subtract operation

In order to minimize the hardware cost of the ALU circuitry, the subtract operations is implemented by adding the bitwise inverse of Rm to Rn, plus 1. You don't really have to worry about this other than to note that this implementation explains why the C or V flag is set when the numbers being compared are equal. For example, the subtraction of the number 42 from the number 42 corresponds to the addition of the numbers 42 and 4294967253 and 1. It should be apparent to you that this result is zero, but sets the carry flag.

#### 10.4 Condition Code Suffixes

The branch (B) instruction is able to take optional condition code suffixes which specify whether or not the instruction will be executed depending on the state of the flags in the APRS. These suffixes are shown in Figure 10.1. A suffix can be appended to the B instruction to turn it into a conditional branch. For example, BEQ will be taken if the result of the last computation produced a zero result. Similarly, BNE will be taken if the result was non-zero.

The mnemonics for the suffixes are closely related to the compare operation. For example, the BGT (branch if greater than when treated as signed numbers) will be taken if the Rm operand of the CMP instruction is greater than the Rn operand when treated as signed numbers. This is why the CMP and  $B\{cc\}$  instructions go so well together.

## 10.5 Branching Based on Individual Bits

Consider the case where we want to take a branch conditional on the case of a push button being pressed or not pressed. A push button is connected to a single pin which constitutes a single bit in the GPIO\_IDR. Hence, we need a way to make our branch conditional on a single bit being high or low. Put another way, we want to exclude all of the other bits in the IDR from influencing the branch.

In order to achieve this we have to do two steps:

- 1. Mask out the bits which we are not interested in. Specifically, set them all to zero. This is done just as we saw earlier in section 9.2. We AND all of the bits with 0 except for the bit which we are interested in which we AND with 1.
- 2. Compare the result of the mask with 0. If the bit which we are interested in was 0 then the result of the AND will be 0. If the bit that we are interested in was 1 then the result of the AND will be non-zero. Note that this compare does not actually have to be done as the AND instruction sets or clears the zero flag.

| Suffix   | Flags              | Meaning                                                  |
|----------|--------------------|----------------------------------------------------------|
| EQ       | Z = 1              | Equal, last flag setting result was zero                 |
| NE       | Z = 0              | Not equal, last flag setting result was non-zero         |
| CS or HS | C = 1              | Higher or same, unsigned ≥                               |
| CC or LO | C = 0              | Lower, unsigned <                                        |
| MI       | N = 1              | Negative                                                 |
| PL       | N = 0              | Positive or zero                                         |
| VS       | V = 1              | Overflow                                                 |
| VC       | V = 0              | No overflow                                              |
| HI       | C = 1 and Z = 0    | Higher, unsigned >                                       |
| LS       | C = 0 or Z = 1     | Lower or same, unsigned ≤                                |
| GE       | N = V              | Greater than or equal, signed ≥                          |
| LT       | N ! = V            | Less than, signed <                                      |
| GT       | Z = 0 and $N = V$  | Greater than, signed >                                   |
| LE       | Z = 1 and N! = V   | Less than or equal, signed ≤                             |
| AL       | Can have any value | Always. This is the default when no suffix is specified. |

Figure 10.1: Condition code suffixes and meanings. Source: Table 17, Programming Manual

After those two steps (which can actually just be one step) we can take a conditional branch dependant on whether a single bit (a single push button) was set or cleared.

# 11 Instruction Sets

An instruction set refers to a collection of instructions which a CPU is able to execute. This is a combination of the assembly instruction names and the machine code which the assembly language is compiled down to and which is placed into memory for execution. There are three different instruction sets which various ARM processors use. These are Thumb, Thumb-2 and ARM. A graphical representation of the instructions which are available on the various Cortex processors is shown in Figure 11.1. Here we see that the Cortex-M0 and -M1 use the Thumb instruction set while the -M3 and -M4 use the Thumb-2 instruction set. Following is a short discussion on each of the three instruction set which ARM supports. Note while reading that our processor (the Cortex-M0) only supports Thumb instructions 1.

#### 11.1 ARM

The original instruction set used by ARM processors was called the ARM instruction set. This instruction set contains only 32-bit instructions. This is a powerful instruction set as almost all instruction can be conditionally executed. However, seeing as all instructions are fixed to 32 bits wide, the code density is fairly poor. This is due to comparatively simple instructions like a simple add or PC relative branch using wasteful 32 bits of flash.

#### 11.2 Thumb

In 1994 the ARM7TDMI architecture was released which featured the Thumb instruction set. This instruction set was limited to only 16-bit instructions. Obviously these instructions were less powerful as there was less room to specify information about the actions which an instruction should perform. However for simple instructions this was not an issue and resulted in programs being much smaller. For more complicated operations multiple Thumb instructions would be needed to perform the job of a single ARM instruction.

In order to have a combination of the performance of the 32-bit ARM instruction set and the code density of the 16-bit Thumb instruction set, an ability called *interworking* was provided. Interworking allows for the CPU to switch between executing Thumb instructions or ARM instructions. This is a useful ability but introduces some additional complexity into the system.

#### 11.3 Thumb-2

Thumb was extended to Thumb-2 in 2003. Thumb-2 allows 16-bit and 32-bit instructions to be freely mixed together without requiring interworking. Essentially Thumb-2 is a combination of the 16-bit instructions provided in Thumb as well as a whole lot of extra 32-bit instructions. This instruction set allows performance similar to the ARM instruction set while providing code density even better than Thumb.

 $<sup>^1\</sup>mathrm{Not}$  quite true. It supports 3 Thumb-2 instructions. Why 3? I don't know...



Figure 11.1: Cortex instruction set architecture

It's important to note the structuring of the instruction sets: As shown in Figure 11.1, Thumb-2 (Cortex-M3 and -M4) entirely contains all of the 16-bit instructions of Thumb (Cortex-M0 and -M1). Any Thumb code will run on a Thumb-2 capable processor. Any Thumb-2 code will probably NOT run on a Thumb processor. This is called backward compatibility (sort of). The ARM instruction set is completely distinct from that figure. It is a completely different instruction set which does not run on the Cortex series of processors. Processor architectures which support Thumb and ARM (such as the ARM7TDMI) require interworking to switch between the two entirely distinct instruction sets.

### 11.4 Implementation of Interworking

So we understand that Thumb processors can only execute 16-bit instructions. Thumb-2 processors can execute the Thumb instructions as well 32-bit instructions. ARM processors can execute a totally different set of 32-bit instructions. Some processors can run both the ARM and Thumb instruction sets. So, how do we tell one of these interworking capable processors whether an instruction is ARM or Thumb?

Firstly, note that data accesses must be aligned. Seeing as the minimum width of an instruction is 2 bytes, all instructions must be placed on addresses which are multiples of 2. As all addresses of instructions are multiples of 2, the lsb of the PC is always a 0. The bit is therefor sort of wasted. Hence, we assign a different purpose to this bit: when it is a 0 it indicates that the instruction pointed to by the PC is an ARM instruction. When it is a 1 it indicates that the instruction pointed to by the PC is a Thumb instruction.

Although the Cortex series of CPUs does not support the ARM instruction set, is still requires that this rule of using the lsb of the PC to specify instruction set type is adhered to. Seeing as all instructions for the Cortex series (including our CPU) are Thumb or Thumb-2, this lsb of the PC should always be set to a 1. That is why our reset vector needs to point to the address of \_start +1. The +1 forces the lsb to a 1 indicating that the instruction at \_start is a Thumb instruction.

If a vector attempts to set the lsb of the PC to 0, the CPU will HardFault as it would be trying to execute an instruction from an instruction set which is not supported.

# 12 Exceptions

An exception is a fairly generic term for an event which occurs which the CPU needs to deal with in some way. Examples of exceptions would be the microcontroller resetting or the CPU attempting to access invalid memory or a peripheral generating an interrupt. Typically we write a block of instruction which we want to execute when an exception occurs. That block of code is called an *exception handler*. We then place the address of the start of that exception handler into a special location in memory called a vector. Each exception has a vector address associated with it. The block of all vectors is called the vector table. A summarised version of it is shown in Figure 12.1.

When an exception occurs, the CPU performs a few tasks in order to service the exception:

- Save the current 'system state' to the stack in the form of a stack frame. This is basically just pushing a few important registers to the stack. The exact format of the stack frame is shown in Figure 12.2.
- Fetch the data from the vector associated with that exception that occurred and load that data into the PC.
- Start executing the block of instructions pointed to by the vector

Following is a discussion on some of the key exceptions.

#### **12.1** Reset

There are a number of possible causes of a reset all detailed in section 7.1.2 of the Reference Manual. They key ones are a power reset where the power to the micro is cycles or a NRST pin reset where the Negative ReSeT pin is pulled low and then released.

When this exception occurs the microcontroller:

- aborts execution of code,
- sets all registers to their default values,
- fetches the data from the reset vector,
- places that data into the PC and starts execution.

The reset exception is fairly specialised in that is the only exception which does not cause the previous system state to be stacked. Quite the opposite in fact, it clears all state and begins fresh.

| Exception number <sup>(1)</sup> | IRQ number <sup>(1)</sup> | Exception type  | Priority                    | Vector address<br>or offset <sup>(2)</sup> | Activation   |
|---------------------------------|---------------------------|-----------------|-----------------------------|--------------------------------------------|--------------|
| 1                               | -                         | Reset           | -3, the highest             | 0x00000004                                 | Asynchronous |
| 2                               | -14                       | NMI             | -2                          | 0x00000008                                 | Asynchronous |
| 3                               | -13                       | Hard fault      | -1                          | 0x000000C                                  | Synchronous  |
| 4-10                            | -                         | Reserved        | -                           | -                                          | -            |
| 11                              | -5                        | SVCall          | Configurable (3)            | 0x0000002C                                 | Synchronous  |
| 12-13                           | -                         | Reserved        | -                           | -                                          | -            |
| 14                              | -2                        | PendSV          | Configurable <sup>(3)</sup> | 0x00000038                                 | Asynchronous |
| 15                              | -1                        | SysTick         | Configurable (3)            | 0x0000003C                                 | Asynchronous |
| 16 - 47                         | 0 - 31                    | Interrupt (IRQ) | Configurable (3)            | 0x00000040 and above <sup>(4)</sup>        | Asynchronous |

Figure 12.1: Summarised version of the vector table showing the CPU exceptions only. Source: Table 12, Programming Manual



Figure 12.2: Stack frame. These registers are push to the stack thereby saving their state when an exception occurs. Source: Figure 9, Programming Manual

### 12.2 HardFault

A HardFault occurs when an instruction attempts to do something illegal or a peripheral attempts to do an illegal memory transfer. This includes attempting to access unimplemented memory addresses or trying to perform unaligned memory access or trying to execute an instruction which has a non-existent opcode. The full list of events which cause a HardFault exception are detailed in Table B1-6 of the ARMv6-M Architecture Reference Manual.

Typically HardFaults are unrecoverable: when a HardFault happens there is generally something broken in the code and we do not want the code to carry on running. Rather we want to be made aware of the issue so that the code can be corrected.

When a HardFault happens the standard exception handling procedure takes place: the current state is stacked, the exception handler vector is fetched an executed. Due to the fact that the state is saved on the stack it is possible to return from the handler and resume execution of the main code but it would be unusual to want to do this due to the severity of a HardFault.

# 13 Stack

A stack is a concept. The concept is a data structure which implements a Last In / First Out queue. It has two interfaces namely:

**PUSH:** Take a value and places it at the top of the stack, on top of whatever already exists in the stack.

**POP:** Removes the top element from the stack and puts it into a register. The next element down then becomes the top of the stack.

That's basically all there is to a stack: a LIFO queue. An animation of this queue in operations is shown at http://www.csanimated.com/animation.php?t=Stack

The stack is such a useful thing for computer systems what we dedicate specific hardware in the CPU to implementing one of these LIFO queues. The stack has two main uses:

- 1. Saving system state. When an exception occurs we want to back up the system state somewhere and then have the ability to recover the backed up state later if we return from the exception. The stack provides us an always-accessible place in memory where this information can be placed and recovered later.
- 2. General data storage. We have a limited number of registers yet frequently need to work with more data than our CPU can hold. While we could just pick addresses in RAM for use as data storage, we'd have to keep a list of what locations are used for what in our different blocks of code, hard code the addresses into the program and make sure that we don't overwrite our data in RAM. With a stack we can simply push data to the stack and pop it later when we want to get it back. The stack implementation keeps track of the actual memory addresses which data goes to.

#### 13.1 Stack Pointer

Clearly a well implemented stack is highly beneficial to a system. In order to implement the stack, one of our registers, R13 is assigned the special job of being the stack pointer (SP). The purpose of the stack pointer is to point to (hold the address of) the item most recently placed on the stack. In that way it keeps track of the stack. Typically a stack is implemented starting at the end of RAM (highest address) and working it's way down RAM. Hence, the SP should be initialised pointing to the end of RAM.

Well, that's not quite true! As discussed in section 2.1.2 of the Programming Manual, the order of operations for a stack push is to first decrement the pointer and then place the data at the new address pointed to by the SP. That means that if we want to place our first word pushed onto the stack, the SP must be initialised to point to one word AFTER the end of RAM.

The reason we want to start the stack right at the end of RAM is to allow it as much space as possible to grow. Typically computer systems have another data structure called a heap with starts at the beginning of RAM and grows upwards. These data structures should be as far away from each other as possible to prevent stack overflow, when the stack and the heap collide.

Stack/Heap collision is about the worst think that can happen to a program. This is why lots of RAM is good: the more RAM we have the more data we can store before collision happens.

#### 13.2 Stack Access Instructions

The two instructions which give direct access to the top of the stack are the PUSH and POP instructions. Both of these instructions take something called a register list as an argument. This is sort of an array of registers, enclosed in curly brackets such as {R0, R2, R5}. This is very powerful as it allows us to push or pop multiple registers at one!

Seeing as the SP is a CPU register like any other, you can also use it for load/store operations enabling the random access of any element on the stack. For example, to load the 5th last element on the stack by 42 without touching any of the other elements you could do:

```
LDR R0, [SP, #16]
ADDS R0, R0, #42
STR R0, [SP, #16]
```

Without this ability you'd have to pop off all 5 values into registers, add 42 to the specific register and then push the results back.