## Homework 2

1. The following table gives the parameters for a number of different caches. Fill in the missing fields in the table for each cache. Recall that m is the number of memory address bits, C is the cache size (number of data bytes), B is the block size in bytes, E is the associativity, S is the number of cache sets, t is the number of tag bits, s is the number of set index bits, and b is the number of block offset bits.

| Cache | C     | S   | E  | В  | m  | t  | S | b |
|-------|-------|-----|----|----|----|----|---|---|
| A     | 1024  | 16  | 16 | 4  | 64 | 58 | 4 | 2 |
| В     | 32768 | 128 | 8  | 32 | 32 | 20 | 7 | 5 |
| C     | 2048  | 32  | 1  | 64 | 32 | 21 | 5 | 6 |
| D     | 4096  | 256 | 2  | 8  | 64 | 53 | 8 | 3 |

2. Bob has a machine with 4-way cache. The cache line size is 64 bytes. There are 4 sets in the cache. Alice executes the code below on this machine. Suppose sum, i, j, k are stored in registers and sizeof(int) will return 4.

```
#define M 16
  #define N 16
 3
   int a[M][N];
 5
 6 int sum()
 7 {
 8
        int i, j;
 9
        int sum = 0;
10
       for (i = 0; i < M; i++)
11
            for (j = 0; j < N; j++)
12
                sum += a[i][j];
13
14
15
       return sum;
16 }
```

- (a) How many memory accesses in total are there in the loop between line 11 and line 13?  $16 \times 16 = 256$
- (b) How many cache misses in the loop in total?

  16, one cache miss on each iteration of the outer loop.
- (c) Suppose the latency of cache hit is 4 cycles. An access to main memory requires 200 cycles for 64 bytes. What is the average latency of accessing array elements in cycles when executing the loop between line 11 and line 13?

$$4 \times \frac{15}{16} + (200 + 4) \times \frac{1}{16} = 16.5$$

(d) Bob wants to execute this program on a new machine, whose cache doubles in size, to get better performance. The larger cache comes in different styles. Which one(s) of the followings will help?

- 1. Double the associativity of a set
- 2. Double the number of sets
- 3. Double the size of cache line

## Only 3 will reduce the miss rate into half.

3. Suppose we have a 16-bit machine whose cache is two-way set associative (E=2), with a 4-byte block size (B=4) and four sets (S=4). The contents of the cache are shown as follows, with all addresses, tags, and values given in hexadecimal notation.

| Set index | Tag | Valid | Byte 3 | Byte 2 | Byte 1 | Byte 0 |
|-----------|-----|-------|--------|--------|--------|--------|
| 0         | 281 | 0     | 43     | 42     | 41     | 40     |
| 0         | 361 | 1     | DO     | CC     | 97     | FE     |
| 1         | 98E | 0     | 47     | 46     | 45     | 44     |
| 1         | 361 | 1     | 15     | 05     | 20     | 15     |
| 2         | 0B5 | 1     | 48     | 4A     | 49     | 48     |
| 2         | 412 | 0     | 25     | E6     | 68     | 06     |
| 3         | 5FF | 1     | FF     | 03     | CO     | 9A     |
| 3         | 084 | 1     | 9F     | 62     | 47     | 5B     |

For each of the following memory accesses, indicate if it will be a cache hit or miss when carried out in sequence as listed. Also give the value of a read if it can be inferred from the information in the cache.

| Operation | Address | Hit (Y or N) | Read value (or unknown) |  |  |
|-----------|---------|--------------|-------------------------|--|--|
| Read      | 0x3615  | Y            | 20                      |  |  |
| Read      | 0x2816  | N            | <u>unkno</u> wn         |  |  |
| Read      | 0x084F  | Y            | 9F                      |  |  |
| Read      | 0x412B  | N            | <u>unkno</u> wn         |  |  |
| Read      | 0x2817  | <u> </u>     | <u>unkno</u> wn         |  |  |