# Computer Architecture (ECS 154A) Study Guide

# Contents

| 1        | Boo               | blean Algebra 3                        |
|----------|-------------------|----------------------------------------|
|          | 1.1               | Operations                             |
|          | 1.2               | Equivalence Laws                       |
|          | 1.3               | Truth Table                            |
|          |                   | 1.3.1 NOT                              |
|          |                   | 1.3.2 AND                              |
|          |                   | 1.3.3 OR                               |
|          |                   | 1.3.4 XOR                              |
|          |                   | 1.3.5 Example                          |
|          | 1.4               | Canonical Disjunctive Normal Form      |
|          |                   | 1.4.1 Minterm                          |
|          |                   | 1.4.2 Example                          |
|          | 1.5               | Canonical Conjunctive Normal Form      |
|          | 1.0               | 1.5.1 Maxterm                          |
|          |                   | 1.5.2 Example                          |
|          | 1.6               | Karnaugh Maps                          |
|          | 1.0               | 1.6.1 Example (Three Variables)        |
|          |                   | - \                                    |
|          |                   | 1 (                                    |
|          | 1 7               | 1 /                                    |
|          | 1.7               | Quine-McClucksey Algorithm             |
| <b>2</b> | Cor               | nbinational Logic Circuits 8           |
| _        | 2.1               | Gates                                  |
|          | 2.2               | Timing Diagrams                        |
|          | 2.3               | Multiplexers, Decoders, Shifters       |
|          | $\frac{2.5}{2.4}$ | Adders and Subtracters                 |
|          | 2.4               | Designing Combinational Logic Circuits |
|          | ۷.0               | Designing Combinational Logic Circuits |
| 3        | Fini              | ite State Automata 8                   |
|          | 3.1               | Moore Model                            |
|          | 3.2               | Mealy Model                            |
|          |                   |                                        |
| 4        | $\mathbf{Seq}$    | uential Logic Circuit 9                |
|          | 4.1               | Latches                                |
|          | 4.2               | Flip Flops                             |
|          | 4.3               | Registers and Counters                 |
|          | 4.4               | Designing Sequential Logic Circuits    |
|          |                   | 4.4.1 Steps                            |
|          | 4.5               | Example                                |
|          |                   | 4.5.1 State Transition Diagram         |
|          |                   | 4.5.2 State Minimization               |
|          |                   | 4.5.3 Binary Code Assignment           |
|          |                   | 10.0 2 mar j Codo 110018 mm cm         |
|          |                   | 4.5.4 State Transition Table           |

| 5 | Sing         | gle Cycle CPU Design             | <b>12</b> |
|---|--------------|----------------------------------|-----------|
| 6 | Cac          | che                              | <b>12</b> |
|   | 6.1          | Memory Hierarchy                 | 12        |
|   | 6.2          | Direct Mapped Cache              | 13        |
|   |              | 6.2.1 Format                     | 13        |
|   |              | 6.2.2 Example                    | 13        |
|   | 6.3          | Fully Associative                | 14        |
|   |              | 6.3.1 Format                     | 14        |
|   |              | 6.3.2 Example                    | 14        |
|   | 6.4          | Set Associative                  | 15        |
|   |              | 6.4.1 Format                     | 15        |
|   |              | 6.4.2 Example                    | 15        |
|   |              | 6.4.3 Associativity Remarks      | 16        |
|   | 6.5          | Replacement Strategies           | 16        |
|   |              | 6.5.1 Least Recently Used (LRU)  | 16        |
|   |              | 6.5.2 First-In, First-Out (FIFO) | 16        |
|   |              | 6.5.3 Most Recently Used (MRU)   | 17        |
|   | 6.6          | Access Types                     | 17        |
|   |              | 6.6.1 Physically Addressed Cache | 17        |
|   |              | 6.6.2 Virtually Addressed Cache  | 17        |
|   | 6.7          | Cache Miss Classification        | 18        |
|   |              | 6.7.1 Compulsory                 | 18        |
|   |              | 6.7.2 Capacity                   | 18        |
|   |              | 6.7.3 Conflict                   | 18        |
|   |              | 6.7.4 Coherence                  | 18        |
|   | 6.8          | Fundamental Cache Parameters     | 18        |
|   |              | 6.8.1 Cache Size                 | 18        |
|   |              | 6.8.2 Block Size                 | 18        |
|   |              | 6.8.3 Associativity              | 19        |
|   |              | 6.8.4 Access Type                | 19        |
|   | 6.9          | Advance Caches                   | 19        |
| 7 | Virt         | tual Memory                      | 19        |
| • | 7.1          | Virtual Address Format           | 19        |
|   | $7.1 \\ 7.2$ | Page Table Entry Format          | 19        |
|   | 7.2          | Example                          | 19        |
|   |              |                                  | 19        |
| 8 | Mul          | lti-Cycle CPU Design             | 20        |
| 9 | Pipe         | eline CPU Design                 | 20        |

# 1 Boolean Algebra

# 1.1 Operations

| Operation | Symbol   | Example      |
|-----------|----------|--------------|
| NOT       | _        | $\bar{A}$    |
|           | !        | !A           |
|           | _        | $\neg A$     |
|           | $\sim$   | $\sim A$     |
| AND       | $\wedge$ | $A \wedge B$ |
|           | *        | A*B          |
|           |          | AB           |
| OR        | V        | $A \vee B$   |
|           | +        | A + B        |
| XOR       | $\oplus$ | $A \oplus B$ |

# 1.2 Equivalence Laws

| Name         | OR Version                             | AND Version                            |
|--------------|----------------------------------------|----------------------------------------|
| Commutative  | A + B = B + A                          | A * B = B * A                          |
| Associative  | (A+B) + C = A + (B+C)                  | (A * B) * C = A * (B * C)              |
| Distributive | A + (B * C) = (A + B) * (A + C)        | A * (B + C) = (A * B) + (A * C)        |
| Identity     | A + 0 = A                              | A*1=A                                  |
| Annulment    | A+1=1                                  | A * 0 = 0                              |
| Idempotent   | A + A = A                              | A * A = A                              |
| Complement   | $A + \bar{A} = 1$                      | $A * \bar{A} = 0$                      |
| De Morgan's  | $\overline{(A+B)} = \bar{A} * \bar{B}$ | $\overline{(A*B)} = \bar{A} + \bar{B}$ |

# 1.3 Truth Table

Truth tables are mathematical tables composing of every combination of inputs and the resulting function. The number of combinations or rows in the truth table is  $2^N$ , where N is the number of inputs.

# 1.3.1 NOT

$$\begin{array}{c|c}
A & f(A) = !A \\
\hline
0 & 1 \\
1 & 0
\end{array}$$

#### 1.3.2 AND

| A | В | f(A,B) = A * B |
|---|---|----------------|
| 0 | 0 | 0              |
| 0 | 1 | 0              |
| 1 | 0 | 0              |
| 1 | 1 | 1              |

#### 1.3.3 OR

| A | В | f(A,B) = A + B |
|---|---|----------------|
| 0 | 0 | 0              |
| 0 | 1 | 1              |
| 1 | 0 | 1              |
| 1 | 1 | 1              |

### 1.3.4 XOR

| A | В | $f(A,B) = A \oplus B$ |
|---|---|-----------------------|
| 0 | 0 | 0                     |
| 0 | 1 | 1                     |
| 1 | 0 | 1                     |
| 1 | 1 | 0                     |

# **1.3.5** Example

Create a function that takes three inputs — x2, x1, x0 — and produces a high or on signal when only two input signals are on.

| x2 | x1 | x0 | f(x2, x1, x0) |
|----|----|----|---------------|
| 0  | 0  | 0  | 0             |
| 0  | 0  | 1  | 0             |
| 0  | 1  | 0  | 0             |
| 0  | 1  | 1  | 1             |
| 1  | 0  | 0  | 0             |
| 1  | 0  | 1  | 1             |
| 1  | 1  | 0  | 1             |
| 1  | 1  | 1  | 0             |

# 1.4 Canonical Disjunctive Normal Form

### 1.4.1 Minterm

The canonical disjunctive normal form is also known as a minterm, typically represented by a lower case 'm'. A minterm is a logical product term such that it uses each input variable once and it

uses only the complement operator and conjunction operation. In other words, if there are n input variables, the minterm must keep to the following constraints:

- 1. be a logical product that evaluates to logically true
- 2. use each of the n variables once
- 3. only use NOT operators and AND operators

For three inputs, the minterms are as follows:

| $x_2$ | $x_1$ | $x_0$ |       | Minterm (Product Term)  |
|-------|-------|-------|-------|-------------------------|
| 0     | 0     | 0     | $m_0$ | $ar{x_2}ar{x_1}ar{x_0}$ |
| 0     | 0     | 1     | $m_1$ | $ar{x_2}ar{x_1}x_0$     |
| 0     | 1     | 0     | $m_2$ | $ar{x_2}x_1ar{x_0}$     |
| 0     | 1     | 1     | $m_3$ | $ar{x_2}x_1x_0$         |
| 1     | 0     | 0     | $m_4$ | $x_2ar{x_1}ar{x_0}$     |
| 1     | 0     | 1     | $m_5$ | $x_2 \bar{x_1} x_0$     |
| 1     | 1     | 0     | $m_6$ | $x_2x_1ar{x_0}$         |
| 1     | 1     | 1     | $m_7$ | $x_2 x_1 x_0$           |

Note: Product terms all evaluate to be logically true.

For a given function, f, it is possible to express the function as a "sum of products". This is a special form of the canonical normal form in that it only includes terms such that it makes the function logically true.

# **1.4.2** Example

Create a function that takes three inputs — x2, x1, x0 — and produces a high or on signal when only two input signals are on. Express this function as a sum of products.

|       | x2 | x1 | x0 | f(x2, x1, x0) |
|-------|----|----|----|---------------|
|       | 0  | 0  | 0  | 0             |
|       | 0  | 0  | 1  | 0             |
|       | 0  | 1  | 0  | 0             |
| $m_3$ | 0  | 1  | 1  | 1             |
|       | 1  | 0  | 0  | 0             |
| $m_5$ | 1  | 0  | 1  | 1             |
| $m_6$ | 1  | 1  | 0  | 1             |
|       | 1  | 1  | 1  | 0             |

$$f(x_2, x_1, x_0) = m_3 + m_5 + m_6$$
  
=  $(\bar{x_2}x_1x_0) + (x_2\bar{x_1}x_0) + (x_2x_1\bar{x_0})$ 

# 1.5 Canonical Conjunctive Normal Form

#### 1.5.1 Maxterm

The canonical conjunctive normal form is also known as a maxterm, typically represented by a lower case 'M'. A maxterm is a logical sum term such that it uses each input variable once and it uses only the complement operator and disjunction operation. In other words, if there are n input variables, the minterm must keep to the following constraints:

- 1. be a logical sum that evaluates to logically false
- 2. use each of the n variables once
- 3. only use NOT operators and OR operators

For three inputs, the maxterms are as follows:

| $x_2$ | $x_1$ | $x_0$ |       | Maxterm (Sum Term)                  |
|-------|-------|-------|-------|-------------------------------------|
| 0     | 0     | 0     | $M_0$ | $x_2 + x_1 + x_0$                   |
| 0     | 0     | 1     | $M_1$ | $x_2 + x_1 + \bar{x_0}$             |
| 0     | 1     | 0     | $M_2$ | $x_2 + \bar{x_1} + x_0$             |
| 0     | 1     | 1     | $M_3$ | $x_2 + \bar{x_1} + \bar{x_0}$       |
| 1     | 0     | 0     | $M_4$ | $\bar{x_2} + x_1 + x_0$             |
| 1     | 0     | 1     | $M_5$ | $\bar{x_2} + x_1 + \bar{x_0}$       |
| 1     | 1     | 0     | $M_6$ | $\bar{x_2} + \bar{x_1} + x_0$       |
| 1     | 1     | 1     | $M_7$ | $\bar{x_2} + \bar{x_1} + \bar{x_0}$ |

Note: Sum terms all evaluate to be logically false.

For a given function, f, it is possible to express the function as a "product of sums". This is a special form of the canonical normal form in that it only includes values such that it makes the function logically false.

### **1.5.2** Example

Create a function that takes three inputs — x2, x1, x0 — and produces a high or on signal when only two input signals are on. Express this function as a product of sums.

|       | x2 | x1 | x0 | f(x2, x1, x0) |
|-------|----|----|----|---------------|
| $M_0$ | 0  | 0  | 0  | 0             |
| $M_1$ | 0  | 0  | 1  | 0             |
| $M_2$ | 0  | 1  | 0  | 0             |
|       | 0  | 1  | 1  | 1             |
| $M_4$ | 1  | 0  | 0  | 0             |
|       | 1  | 0  | 1  | 1             |
|       | 1  | 1  | 0  | 1             |
| $M_7$ | 1  | 1  | 1  | 0             |

$$f(x_2, x_1, x_0) = M_0 * M_1 * M_2 * M_4 * M_7$$
  
=  $(x_2 + x_1 + x_0) * (x_2 + x_1 + \bar{x_0}) * (x_2 + \bar{x_1} + x_0) * (\bar{x_2} + x_1 + x_0) * (\bar{x_2} + \bar{x_1} + \bar{x_0})$ 

# 1.6 Karnaugh Maps

Karnaugh maps are a visual methodology to simplifying boolean expressions. When setting up the Karnaugh map, input values cannot differ more than one hamming distance. The resulting expressions are either a sum of products or a product of sums.

# 1.6.1 Example (Three Variables)

Find a minimum sum of products expression for  $f(x_2, x_1, x_0) = m_1 + m_4 + m_5 + m_7$ .



$$f(x_2, x_1, x_0) = (\bar{x_1}x_0) + (x_2\bar{x_1}) + (x_2x_0)$$

#### 1.6.2 Example (Four Variables)

Find a minimum product of sums expression for  $f(x_3, x_2, x_1, x_0) = m_0 + m_2 + m_6 + m_8 + m_9 + m_{10} + m_{14} + D_{15}$ .



$$f(x_3, x_2, x_1, x_0) = (x_3 + \bar{x_0}) * (\bar{x_2} + x_1) * (\bar{x_1} + \bar{x_0})$$

- 1.6.3 Example (Five Variables)
- 1.7 Quine-McClucksey Algorithm

# 2 Combinational Logic Circuits

- 2.1 Gates
- 2.2 Timing Diagrams
- 2.3 Multiplexers, Decoders, Shifters
- 2.4 Adders and Subtracters
- 2.5 Designing Combinational Logic Circuits

# 3 Finite State Automata

#### 3.1 Moore Model

A Moore model is a finite state machine whose output value is solely dependent on the current state.



# 3.2 Mealy Model

A Mealy model is a finite state machine whose output value is dependent on the current state and the input.



# 4 Sequential Logic Circuit

- 4.1 Latches
- 4.2 Flip Flops
- 4.3 Registers and Counters
- 4.4 Designing Sequential Logic Circuits

# 4.4.1 Steps

- 1. State Transition Diagram
- 2. State Minimization
- 3. Binary Code Assignment
- 4. State Transition Table
- 5. Boolean Algebra Minimization
- 6. Build Circuit

# 4.5 Example

Design a circuit that will receive a continuous bitstream that produces a high signal whenever either of the following sequence appears: 01, 10.

# 4.5.1 State Transition Diagram



# 4.5.2 State Minimization



# 4.5.3 Binary Code Assignment

$$x_1x_0$$
 $x_2$ 
00 01 11 10
0  $S_i$   $S_{1s}$   $S_{10}$   $X$ 
1  $S_{0s}$   $S_{01}$   $X$   $X$ 

# 4.5.4 State Transition Table

| Current State | Current State $(x_2, x_1, x_0)$ | Input $(x_3)$ | Next State $(x'_2, x'_1, x'_0)$ | Output $Z$ |
|---------------|---------------------------------|---------------|---------------------------------|------------|
| i             | 0 0 0                           | 0             | 1 0 0                           | 0          |
|               | 0 0 0                           | 1             | 0 0 1                           | 0          |
| 1s            | 0 0 1                           | 0             | 0 1 1                           | 0          |
|               | 0 0 1                           | 1             | 0 0 1                           | 0          |
| 10            | 0 1 1                           | 0             | 100                             | 1          |
|               | 0 1 1                           | 1             | 1 0 1                           | 1          |
| 0s            | 100                             | 0             | 100                             | 0          |
|               | 1 0 0                           | 1             | 1 0 1                           | 0          |
| 01            | 1 0 1                           | 0             | 0 1 1                           | 1          |
|               | 1 0 1                           | 1             | 0 0 1                           | 1          |

# 4.5.5 Boolean Algebra Minimization



 $x_2' = (\bar{x_3}\bar{x_0}) + x_1 + (x_2\bar{x_0})$ 





Next State  $x_1'$ 

$$x_1' = (\bar{x_3}x_2x_0) + (x_3\bar{x_2}\bar{x_1}x_0)$$

Next State  $x'_0$ 

$$x_0' = (\bar{x_1}x_0) + (x_3)$$

# Output

| $x_1$    |    |    |    |    |
|----------|----|----|----|----|
| $x_3x_2$ | 00 | 01 | 11 | 10 |
| 00       | 0  | 0  | 1  | X  |
| 01       | 0  | 1  | X  | X  |
| 11       | 0  | 1  | X  | X  |
| 10       | 0  | 0  | 1  | X  |

$$Z = (x_2 x_0) + (x_1)$$

# 5 Single Cycle CPU Design

# 6 Cache

Effectiveness is based on the concept of information reuse: temporal locality and spatial locality.

# 6.1 Memory Hierarchy

Goal: Make memory perform as if it was made of the most expensive and fastest type, but cost as if made of the cheapest type.

- 1. Fast, Hot, Expensive
- 2. Static RAM
- 3. Dynamic RAM
- 4. Disk

A cache is smaller than main memory and is composed of numerous cache lines. Cache lines consist of a dirty bit, a tag, and block(s) of data. If the dirty bit is on, then it signals the CPU to write the data from this cache line into main memory when this cache line is freed. Caches also contain a valid bit which signifies whether there is loaded data into the cache — imagine starting up the computer for the first time, the cache is going to be empty. The issue is that even when a cache is empty (bits are all 0), it still holds some signal. As we continue, when we mention the size of a cache line, we refer to the size of the data blocks in the cache line only (excluding flags and tag).

| Flags | Tag | Block(s) |
|-------|-----|----------|
|-------|-----|----------|

### 6.2 Direct Mapped Cache

A given address is partitioned into three components: Tag, line number, and offset. The line number directly accesses a specific cache line. It then compares the tag partitioned from the address to the tag stored in the cache line. If the tags match, then it is a cache hit. Otherwise, it becomes a cache miss. Assuming that it was a cache hit, it then proceeds to use the offset to select which block to read or write. The block of data in the cache line can be thought of as an array and the offset as an index into this "array".

#### **6.2.1** Format

| Tag   | Line Number | Offset |
|-------|-------------|--------|
| 1 -48 | Line rannon | OHECU  |

### 6.2.2 Example

A CPU is using 24-bit addresses and is byte-addressable. Each line in cache holds 16 bytes of data and each block is 1 byte. Assuming that the tag is 12 bits wide, find the following: number of lines in cache, size of cache, size of tag, and sizes of each partition in the format.

$$Size(Cache\ Line) = 16\ bytes$$

$$Bit\_Width(Offset) = Number\ of\ bits\ needed\ to\ address\ 16\ blocks$$

$$= \log_2 16$$

$$= 4\ bits$$

$$Bit\_Width(Line\ Number) = Size(Address) - Bit\_Width(Tag) - Bit\_Width(Offset)$$

$$= 24\ bits - 12\ bits - 4\ bits$$

$$= 8\ bits$$

$$Number\ of\ Cache\ Lines = 2^{Bit\_Width(Line\ Number)}$$

$$= 2^8$$

$$= 256$$

$$Size\ of\ Cache = Number\ of\ Cache\ Lines * Size\ of\ Cache\ Lines$$

$$= 2^8 * 2^4$$

$$= 2048\ bytes$$

$$= 2\ KB$$

Format Partition Sizes and Bit Range

|           | Tag     | Line Number | Offset |
|-----------|---------|-------------|--------|
| Size      | 12 Bits | 8 Bits      | 4 Bits |
| Bit Range | [12,23] | [4,11]      | [0,3]  |

The bit range represents what bits that partition occupies. Ex: Offset occupies bits 0, 1, 2, and 3. The four least-significant bits.

# 6.3 Fully Associative

A given address is partitioned into two components: Tag and offset. The reason why a line number is unused is because the CPU will perform a linear search through all cache lines looking for either a cache hit, an unused cache line, or a cache line to replace. Because of this, using full association has the lowest miss rates. The tag and offset performs the same as the tag and offset in direct mapping.

#### **6.3.1** Format

| Tag | Offset |
|-----|--------|
|-----|--------|

# 6.3.2 Example

A CPU is using 11-bit addresses and is byte-addressable. The cache size is 128 bytes and the size of each cache line is 8 bytes. Find the following: Bit width of offset, bit width of tag, number of cache lines, and sizes and bit ranges of the partitions.

$$\begin{array}{rcl} \operatorname{Bit\_Width}(\operatorname{Offset}) &=& \operatorname{Number\ of\ bits\ needed\ to\ address\ 8\ blocks} \\ &=& \log_2 8 \\ &=& 3\ \operatorname{bits} \end{array}$$
 
$$\begin{array}{rcl} \operatorname{Bit\_Width}(\operatorname{Tag}) &=& \operatorname{Size}(\operatorname{Address}) - \operatorname{Bit\_Width}(\operatorname{Offset}) \\ &=& 11\ \operatorname{bits} - 3\ \operatorname{bits} \\ &=& 8\ \operatorname{bits} \end{array}$$

Format Partition Sizes and Bit Range

|           | Tag    | Offset |
|-----------|--------|--------|
| Size      | 8 Bits | 3 Bits |
| Bit Range | [3,10] | [0,2]  |

#### 6.4 Set Associative

Set associative caching is a hybrid between direct-mapped and fully associative. A given address is partitioned into three components: Tag, set number, and offset. What makes set associative different from direct-mapped caching is that a given address will access a specific set of cache lines instead of a single cache line. Within that set of cache lines, it will perform a linear search like the fully associative cache. The tag and offset perform the same as the direct-mapped cache and fully associative cache. Important terminology: N-way associative cache means that there are N cache lines per set. This is also known as the associativity.

#### **6.4.1** Format

| Tag | Set Number | Offset |
|-----|------------|--------|

#### **6.4.2** Example

A CPU is using 64-bit addresses and is byte-addressable. It also uses a 3-way set associative cache with a cache size of 98,304 bytes and 32 sets. Find the following: Number of lines per set, size of each set, size of cache line, bit width of offset, bit width of set number, bit width of tag, and partition sizes and bit ranges.

```
Number of Lines per Set = 3 (Given)
```

Size(Cache Line) = Size(Set)/ Number of lines per set  
= 
$$3072 \text{ bytes/3}$$
  
=  $1024 \text{ bytes}$ 

```
Bit_Width(Offset) = Number of bits needed to represent 1024 bytes

= log_2 Size(Cache Line)

= log_2 1024

= 10 bits
```

```
\begin{array}{lll} {\rm Bit\_Width(Set\ Number)} &=& {\rm Number\ of\ bits\ needed\ to\ represent\ 32\ sets} \\ &=& {\log_2{\rm Number\ of\ sets}} \\ &=& {\log_2{32}} \\ &=& 5\ {\rm bits} \end{array}
```

```
Bit_Width(Tag) = Size(Address) - Bit_Width(Set Number) - Bit_Width(Offset)
= 64 bits - 5 bits - 10 bits
= 49 bits
```

Format Partition Sizes and Bit Range

|           | Tag     | Set Number | Offset  |
|-----------|---------|------------|---------|
| Size      | 12 Bits | 5 Bits     | 10 Bits |
| Bit Range | [15,63] | [10,14]    | [0,9]   |

# 6.4.3 Associativity Remarks

It can actually be noted that direct-mapped caching and fully associative caching are a form of set associative cache. Suppose that our cache has N lines and we use an 1-way set associative cache. In this scenario, it would imply that each set consists of one cache line. This would mean that the set number is the exact same as a line number. Therefore, 1-way set associative caches are also direct mapped caches.

Suppose that our cache has N lines and we use an N-way set associative cache. This implies that there is a single set with N lines. Recall the format of the set associative — tag, set number, and offset. If there is only one set, how many bits are required to represent one set.  $\log_2 1 = 0$ . Therefore, there is no need for any bits to represent that one set. In this case, the format becomes – tag and offset. Also, recall that in set associative, it first uses the set number to access a specific set and then performs a linear search among the cache lines within that set. Since there is only one set, it performs the linear search over that entire set by default. Therefore, an N-way set associative cache is also a fully associative cache.

# 6.5 Replacement Strategies

- Least Recently Used (LRU)
- First-In, First-Out (FIFO)
- Last-In, First-Out (LIFO)
- Random
- Most Recently Used (MRU)

### 6.5.1 Least Recently Used (LRU)

Each cache line is required to contain "age bits". The least recently used cache line is tracked through these age bits. Every time a cache line is used, the age bits of all other cache lines change.

#### 6.5.2 First-In, First-Out (FIFO)

Requires a queue to store references to the cache lines. Oldest cache line will be the first to be discarded.

### 6.5.3 Most Recently Used (MRU)

Opposite of least recently used. Most useful for situations where there is a positive correlation between the age of data and probability of access.

# 6.6 Access Types

Assuming that the CPU uses virtual addresses, the translation look-aside buffer (TLB) is used as a cache of recent virtual to physical translations. There are four components needed to keep in consideration: The CPU, TLB, Cache, and RAM. RAM must be accessed using physical addresses, but the three other components may use virtual addresses.

# 6.6.1 Physically Addressed Cache

In physically address caches, the CPU sends a virtual address to the translation look-aside buffer and converts it immediately to a physical address. This physical address is then used to access the cache. On a cache miss, the physical address will be used immediately into RAM. It is important to note that to access cache, it must always go through the translation look-aside buffer. This implies that a physically addressed may run slower, but is safer.



#### 6.6.2 Virtually Addressed Cache

In a virtually address cache, the CPU accesses cache immediately with the virtual address. On cache-hit, it returns the block of data. On cache-miss, however, the virtual address is sent to the translation look-aside buffer and converted to a physical address. The physical address is then used to access RAM. Since the translation look-aside buffer is not always accessed, virtually addressed caches are faster in the average case. However, there are quite a few issues with a virtually addressed cache. Whenever there is a context switch, the entire cache may need to be flushed — two processes may use the same virtual address, but their contents are completely different.



#### 6.7 Cache Miss Classification

#### 6.7.1 Compulsory

For a compulsory miss, it means that the data was never seen before. A simple example of this is when the CPU first starts and the cache is empty. Any data accessed will have never been seen before.

#### 6.7.2 Capacity

For a capacity miss, it means that the cache cannot contain all blocks for which the program requires. Blocks are being swapped in and out constantly.

#### 6.7.3 Conflict

For a conflict miss, it pertains to the mapping strategy used. Take for example direct-mapped caches. If all addresses used in the program only accessed one cache line because of the partition bit range, it means that one cache line will be constantly changing while other cache lines are unused.

#### 6.7.4 Coherence

Coherence misses come from false sharing due to multi-processors. Suppose there are two processors, each of which has loaded the same block into their cache. Now if one processor updates that block in its own cache, it invalidates the other processor's block.

#### 6.8 Fundamental Cache Parameters

#### 6.8.1 Cache Size

Larger caches produces lower miss rates, but also longer access times.

#### 6.8.2 Block Size

Small block sizes do not exploit the locality principle and it does not help with compulsory misses. Larger block sizes, however, may pollute the cache in that it may be grabbing more chunks than necessary. These extra chunks become useless and occupy cache space.

#### 6.8.3 Associativity

Larger associativity reduces miss rates, but it becomes slower and harder to build.

#### 6.8.4 Access Type

Virtually addressed caches provide lower access times, but on context-switches will require the entire cache to be flushed. Physically addressed caches do not have issues with context switches, but increases it access time as it needs to be go through the translation look-aside buffer each time.

#### 6.9 Advance Caches

- Small and Simple Cache
- Way Prediction Cache
- Pseudo Association Cache
- Non-Blocking Cache
- Multi-Bank Cache
- Critical Word First Cache

# 7 Virtual Memory

#### 7.1 Virtual Address Format

| Virtual Page Number | Offset |
|---------------------|--------|
|---------------------|--------|

# 7.2 Page Table Entry Format

| Flag Bits | Protection Bits | Physical Frame Number |
|-----------|-----------------|-----------------------|
| 0         |                 | •                     |

#### 7.3 Example

Given a 1 Megabyte physical memory, a 30 bit virtual address, and a page size of 8 KB, find the following: number of page frames, bit width of offset, number of entries in the page table and the width of each entry.

Number of Page Frames = Size(Physical Memory)/Size(Page)  
= 
$$2^{30}$$
 bytes/ $2^{13}$  bytes  
=  $2^{17}$  page frames  
Bit\_Width(Offset) = Number of bits needed to reference 8 KB  
=  $\log_2 2^{13}$   
= 13 bits

# Virtual Address Partition Sizes

|      | Virtual Page Number | Offset  |
|------|---------------------|---------|
| Size | 49 Bits             | 13 Bits |

Number of Page Table Entries = Number of Virtual Pages =  $2^{\text{Bit\_Width(Virtual Page Number)}}$ =  $2^{49}$ 

Bit\_Width (Page Table Entry) = Number of needed to reference  $2^{17}$  page frames =  $\log_2 2^{17}$ = 17 bits

- 8 Multi-Cycle CPU Design
- 9 Pipeline CPU Design