# COMP 273 Study guide

Francis Piche April 19, 2018 Francis Piché CONTENTS

# Contents

| Ι         | Preliminaries                      | 4      |
|-----------|------------------------------------|--------|
| 1         | Introduction                       | 4      |
| 2         | System Board                       | 5      |
| 3         | Number Formats                     | 5      |
| II        | Basic Circuits                     | 5      |
| 4         | Intro to Circuits                  | 5      |
| 5         | RAM & Registers                    | 5      |
| 6         | Arithmetic Logic Unit              | 5      |
| 7         | Control Unit                       | 5      |
| 8         | Classical & MIPS Pipeline CPU      | 5      |
| II        | I Assembler Programming            | 5      |
| 9         | Basic Assembler                    | 5      |
| 10        | Complex Data                       | 5      |
| 11        | Recursion, Stack & Heap            | 5      |
| ΙV        | Advanced Circuitry                 | 5      |
| <b>12</b> | Polling & Interrupts               | 5      |
|           | 12.1 Peripheral Devices            | 5      |
|           | 12.2 IO & Communication            | 6      |
|           | 12.3 Memory Mapping                | 6      |
|           | 12.4 Polling                       | 7<br>7 |
|           | 12.5 MIPS I/O Communication        | 7      |
|           | 12.6 Interrupts                    | 8      |
|           | 12.6.1 Implementation of Interrupt | 9      |
|           | 12.7 The Exception Co-Processor    | 9      |
|           | 12.7.1 Status & Cause Registers    | 10     |
|           | 12.8 Where Interrupts Go           | 10     |

Francis Piché CONTENTS

| _ | .1 Cache-Loading              |
|---|-------------------------------|
|   | 13.1.1 Locality & Measurement |
|   | 13.1.2 Wide-Bus Method        |
| - | .2 Handling Misses            |
| - | .3 Cache Addressing           |

# Part I Preliminaries

# 1 Introduction

This guide is based off of the lectures and slides by Professor Joseph Vybihal at McGill University, Winter 2018. Images are taken from Prof. Vybihal's lecture slides.

This guide is my best attempt to express the ideas of the course in a clear and concise way, given that there is less than a week until the final. That being said, I'll work backwards, starting from the latest material (and in my opinion the hardest) to ensure that the most difficult material will get covered in time for the exam.

- 2 System Board
- 3 Number Formats

# Part II

# **Basic Circuits**

- 4 Intro to Circuits
- 5 RAM & Registers
- 6 Arithmetic Logic Unit
- 7 Control Unit
- 8 Classical & MIPS Pipeline CPU

# Part III

# **Assembler Programming**

- 9 Basic Assembler
- 10 Complex Data
- 11 Recursion, Stack & Heap

## Part IV

# **Advanced Circuitry**

- 12 Polling & Interrupts
- 12.1 Peripheral Devices

Peripheral are external devices such as keyboards, mice, screens, and network adapters.

The devices each have a **controller** (simple CPU). There are two main types of controllers:

- On-Board: controlling registers are integrated into the system board
- External: controlling registers are part of the card plugged into slots on the system board. These slots are connected to the bus.

A controller chip is made up of:

- Status register (ready, on/off, error-codes)
- Data register (information to be processed)
- Command register (in binary)
- ROM (to hold basic information)

Often the registers are combined into one.

Note that integrated, on-board controllers are faster, since they skip the slot. Their registers are directly connected to the bus and have addresses.

Also note that if we don't look at registers before the next key press, the data is lost.

#### 12.2 IO & Communication

There are two main techniques for communication:

- Interrupt Driven: Device signals the CPU when state changes.
- Polling Driven: CPU looks at the device's status register

Within each of these techniques, there are two ways to exchange data:

- Synchronous I/O: The CPU monitors the device, sending and reading byte by byte
- Asynchronous I/O: CPU signals when to start, device signals back when finished.

Asynchronous is accomplished by this process: first the CPU loads into registers the start address, limit, and command. The CPU is then free to do anything else until the device sends an interrupt to signal that the task is complete.

The registers are accessed either using a general data path (for example the RAM zero page via the bus), or by using a specialized path such as a DMA or interrupt wire.

# 12.3 Memory Mapping

There is a special portion of RAM directly allocated to peripheral registers, called the zero page. This mean these special addresses are actually wired to go to the peripheral registers.

### 12.4 Polling

This is basically accomplished using a busy loop.

```
while(status != 0); // assume 0 means it's ready
```

Now in assembler:

```
LOOP:

lw $t0, STATUS

bne $t0, 0, LOOP

#else check the flags and handle
```

Problem is that it uses 100% CPU capacity.

## 12.5 MIPS I/O Communication

MARS is only able to do simulation for the keyboard and screen (text).

The keyboard is commonly referred to as the **receiver**, and the screen the **transmitter**. The receiver control register's address is at 0xffff0000, followed by the receiver data at 0xffff0004, transmitter control at 0xffff0008, and finally the transmitter data at 0xffff000c.

The first bit of the control registers is the "isReady" bit, 1 meaning ready, 0 meaning not ready.

The second bit of the control registers is to control whether the device is allowed to send an interrupt or not. 1 for yes, 0 for the default (depends on the machine).

#### 12.5.1 GETCHAR and PUTCHAR functions

```
GETCHAR:
  lui $a3, Oxffff
                        #load address of control register
ISREADY:
  lw $t1, 0($a3)
                      #read from control register
  andi $t1, 0($t1), 1
                        #check if first bit is zero
  beqz $1, ISREADY
                      #if yes then check again
  lw $v0, 4($a3)
                      #if 1, then load the contents of data bit into v0
  jr $ra
                      #return
PUTCHAR:
  lui $a3, Oxffff
CHECK:
  lw $t1, 8($a3)
                      #check the transmitter this time
  andi $t1, $t1, 1
                      #check if ready
```

```
beqz $t1, CHECK #if 0 then try again
sw $a0, 12($a3) #if 1 then send character to data register of
device
jr $ra
```

Notice the similarities and differences between the two programs. In getChar, we load FROM the data register to GET the character, whereas in putChar we need to load TO the data register to PUT the character. Notice the offsets when accessing the base address.

There was also the process of using logical AND to extract a bit. This is called **bit masking**. When an AND is performed between an unknown sequence of bits and all 1's, the result will be whatever the unknown sequence was!

### 12.6 Interrupts

First, some definitions:

- Exception: Any event that stops the normal execution of the CPU. For example, stopping the program due to divide by zero, stack overflow etc.
- Interrupt: There are two kinds of interrupt: Signal, an event purposely triggered by a program to re-route the CPU flow to another process (think throw in Java), and trap, an event purposely triggered by a device.

So just remember: signal interrupt = program throw, trap = device throw. The difference between exceptions and interrupts is that the former is to handle instruction faults (division by zero, undefined op-code, etc) while the latter is to hand external events.

This is a picture of what an interrupt would look like on the register's end:



So basically there's an AND gate on the last two bits. One of the bits is the "enable interrupts bit" and the other is the "interrupt bit". The former is turned to 1 by the programmer (or by default), and then when an interrupt is to occur, the other bit will turn to 1, allowing the interrupt signal to flow out the AND gate.

#### 12.6.1 Implementation of Interrupt

The hardware view of this is a bit strange:



What happens here, is that when the device is not sending a trap signal (like in the previous picture), the PC can increment happily as normal. When the trap signal is turned on, this causes the PC to "jump" to the location of the "trap pointer", causing a halt of the normal program execution. Note that the AND/OR gates are multi-bit.

## 12.7 The Exception Co-Processor

This is co-processor 0, (recall each co-processor is identified by a number). This co-processor contains:

- Error PC: Contains the address of where the exception occurred (which instruction).
- Cause Register: Contains information about the exception type, and what may have been the cause.
- BadVAddr: Has the address that cause the bad memory reference (the non-existent address that caused the error)

• Status Register: More on this below:

#### 12.7.1 Status & Cause Registers

The status register contains the **interrupt mask**, which is used to check which devices have interrupt enabled/disabled. It also contains a mini "stack", where it holds some information about the interrupt.



The interrupt stack can hold information about 3 interrupts. This information is whether the interrupt occurred while in kernel mode (for privileged, low-level stuff), or in user mode, and it contains a bet for whether interrupt was enabled for that device. Every time an interrupt occurs, the "stack" shifts left two bits, losing the old data.

The cause register contains pending interrupts (interrupts that haven't been processed yet), and some exception code. The exception code is a binary encoding for what kind of exception occurred.

# 12.8 Where Interrupts Go

All interrupts get routed to a special kernel address called the **iterrupt handler** that processes the interrupts. Programmers can put their own code in this location to change what happens when an interrupt occurs.

This interrupt handling code is generally just a switch statement that handles depending on the cause found in the cause register.

## 13 Cache & Performance

In computing, we often need large amounts of storage, and it needs to be accessed very quickly. The issue is that large storage and speed usually conflict.

<u>Size</u>: Disk > RAM > cache > registers. Speed: Disk < RAM < cache < registers. (note that a > b implies a is better than b) So cache is a necessary part of fast computation.

This comes with some issues:

- Cache is generally smaller than a program
- Most instructions are stored in RAM.

#### 13.1 Cache-Loading

#### 13.1.1 Locality & Measurement

In programs, typically items are referenced several times. (ie, you call printf() several times, recursive functions etc.). This idea is called **temporal locality**. And typically adjacent items are executed in sequence (loops, functions etc). This is called **spatial locality** 

When trying to optimize cache-loading, this idea of locality will help a lot.

The measure of how good or bad a loading method is, is the **hit-to-miss-ratio** or "cache miss rate". A hit means that we found the instruction in cache when we needed it, and a miss means we had to go to RAM to find it.

We associate a **cost** to missing and needing to refill the cache, since it takes time.

#### 13.1.2 Wide-Bus Method

Often a wide bus is used to load data into cache. This works by: whenever we need to go into RAM to get an instruction, we just load that instruction, plus the next 16 bytes after it, in hopes that we'll need them soon (taking advantage of locality).

This takes one clock cycle to send the address to RAM, 1 cycle to find the block of data in RAM, and 1 cycle to send the data back to the cache. So the cost of missing with this method is 3. But that's the cost of loading a single byte anyway, so if we use the others that we loaded, we save lots of time.

# 13.2 Handling Misses

When the IR tries to receive the missed instruction from cache [PC], the MDR loads from RAM[PC]. (takes a few ticks). Then the cache [PC] is updated from the MDR, and we start the instruction over again.

Now a few definitions:

- Miss penalty = cycles to upload data into cache
- Cost of missing = miss frequency \* penalty
- Program speed = n + m\*penalty (n = number of instructions, <math>m = number of instructions that miss)

### 13.3 Cache Addressing

Cache is smaller than RAM, so we need a way of shrinking the addresses down. This is done using modulo! In binary, using modulo is the same as just chopping off the last bits. For example  $(1011)_2 \mod (100)_2 = (11)_2$ , which is the same as if we had just cut off the first 10. In hardware this is done by just grounding wires.

So all addresses in RAM ending in, say 001, so 000001, 101001, 111001 etc would all map to the address 001 in cache!

This results in some overlaps. These are handled by cache having this structure:

| Index | V | Tag               | Data                          |
|-------|---|-------------------|-------------------------------|
| 000   | N |                   |                               |
| 001   | N |                   |                               |
| 010   | N |                   |                               |
| 011   | N |                   |                               |
| 100   | N | 1                 |                               |
| 101   | N | 1                 |                               |
| 110   | Υ | 10 <sub>two</sub> | Memory(10110 <sub>two</sub> ) |
| 111   | N |                   |                               |

The index is what was mentioned above, the "cache address", or the last bits of the RAM address.

The column V keeps track of whether that table entry is valid, meaning whether it has any data in it or not. (or else how would we distinguish 00000 (not initialized), from 00000 (the number zero)?).

The tag column keeps track of the bits that we "cut-off", as a unique identifier of the address we came from. This helps deal with overlaps



So when we try to access the data at this index, we check to see if the tag matches the first (usually 20) bits that were cut from the RAM address. If it does, and the "isValid" column reads 1, then we know the data is correct, and we have a hit. If not, then we missed. Maybe the data was overwritten by an overlap, maybe it was never there to begin with.

# 14 Virtual Memory & Performance