

## **Ve370 Introduction to Computer Organization**

## Homework 7

## 楊毅文 519370910053

Assigned: November 23, 2021

Due: 2:00pm on November 30, 2021

Submit a PDF file on Canvas

1. (20 points) For a 2-way set associative cache with a 32-bit address and write back mechanism, the partition of the 32 bits are as follows:

- Offset: bit 6 to 0

- Index: bit 11-7

(1) What is the size of the cache? (5 points)

$$2^{5}*2^{5}*32+2^{5}*2^{5}*2^{5}*2^{5}*2^{5}*2^{5}*2^{5}*2^{5}*1+2^{5}*2^{5}*1=2092$$
 word =66944 bit

Starting from power on, the following byte addresses were used to access the cache memory: 0, 4, 20, 136, 232, 164, 1024, 30, 140, 3100, 176, 2180

- (2) What is the hit ratio? (5 points) 7/12=58.3%
- (3) Show the final state of the cache, with each valid line represented as <index, tag, data>. (10 points)

<1, 0, Mem[32~63]>

<1000, 0, Mem[256~287]>

<11000, 0, Mem[768~799]>

<10001, 0, Mem[544~575]>

2. (20 points) In general, cache access time is proportional to its capacity. Assume that main memory accesses take 70 ns and that memory accesses are 36% of all instructions. The following table shows data for caches attached to each of two processors, P1 and P2.

|    | Size | Miss Rate | Hit Time |
|----|------|-----------|----------|
| P1 | 16KB | 4.3%      | 1.18 ns  |
| P2 | 32KB | 2.7%      | 2.22 ns  |

(1) Assuming that the cache hit time determines the cycle times for P1 and P2, what are their respective clock rates? (5 points)

Clock rate 
$$P1 = 1/1.18 \text{ ns} = 0.847 \text{ GHz}$$

Clock rate 
$$P2 = 1/2.22 \text{ ns} = 0.450 \text{ GHz}$$

(2) What is the AMAT for P1 and P2? AMAT (Average Memory Access Time) is defined as follows: AMAT = Hit time + Miss rate × Miss penalty (5 points).

AMAT P1 = 
$$1.18 \text{ ns} + 4.3\% * 70 \text{ ns} = 4.19 \text{ ns}$$

AMAT 
$$P2 = 2.22 \text{ ns} + 2.7\% * 70 \text{ ns} = 4.11 \text{ ns}$$

(3) Assuming a base CPI of 1.0 without any memory stalls, what is the actual CPI for P1 and P2? Which processor is faster? (10 points)

Actual CPI P1 = 
$$1.0 + 100\% * 4.3\% * 70$$
 ns /  $1.18$  ns per cycle +  $36\% * 4.3\% * 70$  ns /  $1.18$  ns per cycle =  $4.47$ 

Actual CPI P2 = 
$$1.0 + 100\% * 2.7\% * 70$$
 ns /  $2.22$  ns per cycle +  $36\% * 2.7\% * 70$  ns /  $2.22$  ns per cycle =  $2.16$ 

The P2 is faster.

- 3. (60 points) Given the following byte addresses for memory access:
  - 3, 180, 43, 3, 191, 89, 190, 14, 181, 44, 186, 252
  - (1) Show the final cache contents for a 3-way set associative cache with two-word blocks and a total size of 24 words. Use LRU replacement. For each reference identify the index bits, the tag bits, the offset bits, and if it is a hit or a miss. (20 points)

|        | 3    | 180  | 43   | 3   | 191  | 89    | 190 | 14   | 181 | 44  | 186 | 252  |
|--------|------|------|------|-----|------|-------|-----|------|-----|-----|-----|------|
| tag    | 0    | 101  | 1    | 0   | 101  | 10110 | 101 | 0    | 101 | 1   | 101 | 111  |
| index  | 00   | 10   | 01   | 00  | 11   | 0     | 11  | 01   | 10  | 01  | 11  | 11   |
| offset | 011  | 100  | 011  | 011 | 111  | 001   | 110 | 110  | 101 | 100 | 010 | 100  |
| h/m    | miss | miss | miss | hit | miss | miss  | hit | miss | hit | hit | hit | miss |

## Final cache:

| index | tag | data1   | data2   |  |
|-------|-----|---------|---------|--|
| 00    | 0   | Mem[0]  | Mem[1]  |  |
|       |     |         |         |  |
|       |     |         |         |  |
| 01    | 1   | Mem[10] | Mem[11] |  |
|       | 0   | Mem[2]  | Mem[3]  |  |
|       |     |         |         |  |
| 10    | 101 | Mem[44] | Mem[45] |  |
|       |     |         |         |  |
|       |     |         |         |  |
| 11    | 101 | Mem[46] | Mem[47] |  |
|       | 10  | Mem[22] | Mem[23] |  |
|       | 111 | Mem[62] | Mem[63] |  |

(2) Show the final cache contents for a fully associative cache with one-word blocks and a total size of 8 words. Use LRU replacement. For each reference identify the index bits, the tag bits, and if it is a hit or a miss. (20 points)

|        | 3    | 180        | 43   | 3   | 191    | 89    | 190    | 14   | 181        | 44   | 186    | 252    |
|--------|------|------------|------|-----|--------|-------|--------|------|------------|------|--------|--------|
| tag    | 0    | 10110<br>1 | 1010 | 0   | 101111 | 10110 | 101111 | 11   | 10110<br>1 | 1011 | 101110 | 111111 |
| index  | ı    | ı          | 1    | 1   | ı      | Ī     | ı      | 1    | ı          | 1    | ı      | -      |
| offset | 11   | 00         | 11   | 11  | 11     | 01    | 10     | 10   | 01         | 00   | 10     | 00     |
| h/m    | miss | miss       | miss | hit | miss   | miss  | hit    | miss | hit        | miss | miss   | miss   |

Final cache:

| tag    | data    |
|--------|---------|
| 0      | Mem[0]  |
| 101101 | Mem[45] |
| 111111 | Mem[63] |
| 101111 | Mem[47] |
| 10110  | Mem[22] |
| 11     | Mem[3]  |
| 1011   | Mem[11] |
| 101110 | Mem[21] |

(3) What is the miss rate for a fully associative cache with two-word blocks and a total size of 8 words, using LRU replacement? What is the miss rate using MRU (most recently used) replacement? Finally what is the best possible miss rate for this cache, given any replacement policy? (20 points)

LRU miss rate = 9/12 = 75%

MRU miss rate = 9/12 = 75%

Then best possible miss rate of this cache is 75%.