- 1. (15 points)
  - (a) (5 points) sd, beq
  - (b) (5 points)

All instructions will work correctly.

- (c) (5 points) sub
- 2. (10 points)

 $FF4288E3 = 1111\ 1111\ 0100\ 0010\ 1000\ 1000\ 1110\ 0011 = beq\ x5,\ x20,\ -16$ 

(a) (7 points)

Branch = 1, MemRead = 0, MemtoReg = X, ALUOp = 01, MemWrite = 0, ALUSrc = 0, RegWrite = 0

(b) (3 points) reg[5], reg[20]

3. (10 points)

Since ld instruction is the most time consuming instruction, and ld: PC Read + I-Mem + Register File + ALU + D-Mem + Mux + Register Setup = 30 + 200 + 140 + 160 + 200 + 30 + 20 = 780 ps

The shortest possible clock period is 780 ps.

- 4. (15 points)
  - (a) (10 points)

When the instruction executed in the MEM stage is load/store, the instruction in the IF stage must be stalled, so that the MEM performs memory access first. Memory address takes the address to be loaded/stored, next\_pc takes pc, and flush IF/ID.

When the instruction executed in the MEM stage is load/store and a branch occurs in the ID stage, let MEM perform memory access first, and the instructions in the IF and ID stages must be stalled. Memory address takes the address to be loaded/stored, next\_pc takes pc, and the data in the IF/ID remains unchanged.

When the instructions executed in the MEM stage are not load/store, they operate as they are.

## Sample solution 1:



### Sample solution 2:



## (b) (5 points)

| Instructions     | Cyc1 | Cyc2 | Cyc3 | Cyc4    | Cyc5    | Cyc6    | Cyc7 | Cyc8 | Cyc9 | Cyc10 | Cyc11 | Cyc12 | Cyc13 | Cyc14 |
|------------------|------|------|------|---------|---------|---------|------|------|------|-------|-------|-------|-------|-------|
| sd x31, 16(x18)  | IF   | ID   | EX   | MEM     | WB      |         |      |      |      |       |       |       |       |       |
| ld x31, 8(x18)   |      | IF   | ID   | EX      | MEM     | WB      |      |      |      |       |       |       |       |       |
| ld x28, 0(x18)   |      |      | IF   | ID      | EX      | MEM     | WB   |      |      |       |       |       |       |       |
| add              |      |      |      | IF      | IF      | IF      | IF   | ID   | EX   | MEM   | WB    |       |       |       |
| x17,x15,x11      |      |      |      | (stall) | (stall) | (stall) |      |      |      |       |       |       |       |       |
| beq x16,x0,label |      |      |      |         |         |         |      | IF   | ID   | EX    | MEM   | WB    |       |       |
| and              |      |      |      |         |         |         |      |      | IF   | ID    | EX    | MEM   | WB    |       |
| x29,x11,x14      |      |      |      |         |         |         |      |      |      |       |       |       |       |       |
| or x29, x30, x14 |      |      |      |         |         |         |      |      |      | IF    | ID    | EX    | MEM   | WB    |

# 5. (20 points) (a) (4 points)

1d x5, -32(x4)

 $1d \times 6, -16(\times 4)$ 

**NOP** 

**NOP** 

add x6, x5, x6

NOP

**NOP** 

add x6, x6, x6

Need 4 NOP

### (b) (4 points)

| Instructions   | Cyc 1 | Cyc 2 | Cyc 3 | Cyc 4 | Cyc 5 | Cyc 6 | Cyc 7 | Cyc 8 | Cyc 9 |
|----------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| ld x5, -32(x4) | IF    | ID    | EX    | MEM   | WB    |       |       |       |       |
| ld x6, -16(x4) |       | IF    | ID    | EX    | MEM   | WB    |       |       |       |
| NOP            |       |       | IF    | ID    | EX    | MEM   | WB    |       |       |
| add x6, x5, x6 |       |       |       | IF    | ID    | EX    | MEM   | WB    |       |
| add x6, x6, x6 |       |       |       |       | IF    | ID    | EX    | MEM   | WB    |

1d x5, -32(x4)

1d x6, -16(x4)

**NOP** 

add x6, x5, x6

add x6, x6, x6

### Need 1 NOP

(c) (9 points)

| Instructions   | Cyc 1 | Cyc 2 | Cyc 3 | Cyc 4 | Cyc 5 | Cyc 6 | Cyc 7 | Cyc 8 | Cyc 9 |
|----------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| ld x5, -32(x4) | IF    | ID    | EX    | MEM   | WB    |       |       |       |       |
| ld x6, -16(x4) |       | IF    | ID    | EX    | MEM   | WB    |       |       |       |
| add x6, x5, x6 |       |       | IF    | ID    | ID    | EX    | MEM   | WB    |       |
| add x6, x6, x6 |       |       |       | IF    | IF    | ID    | EX    | MEM   | WB    |

- (i) (3 points) cycle 4
- (ii) (3 points) cycle 7
- (iii) (3 points) cycle 6
- (d) (3 points)

9 cycles

6. (10 points)

Clock period =  $\frac{10}{3}$  ns.

(a) (5 points)

$$(7-1) + S = 30 / \frac{10}{3} = 9$$
 (clock cycles),  $S = 3$ 

$$(7-1) + 5 * S = 6 + 5 * 3 = 21$$
 (clock cycles),  $21 * \frac{10}{3} = 70$  (ns)

(b) (5 points)

$$(N-1) + S = 90 / \frac{10}{3} = 27 \text{ (clock cycles)}$$
 (1)

$$(N-1) + 6 * S = 290 / \frac{10}{3} = 87 \text{ (clock cycles)}$$
 (2)

$$(2) - (1) = 5 * S = 60, S = 12$$

$$(N-1)+12=27, N=16$$

$$S = 12, N = 16$$

- 7. (10 points)
  - (a) (4 points)

always taken:  $\frac{3}{8} = 37.5\%$ , always not taken:  $\frac{5}{8} = 62.5\%$ (b) (3 points)
Accuracy:  $\frac{4}{8} = 50\%$ 

| Ground truth | Т | NT | NT | NT | NT | Т  | NT | Т  |
|--------------|---|----|----|----|----|----|----|----|
| State        | Т | Т  | NT | NT | NT | NT | Т  | NT |
| Decision     | Т | Т  | NT | NT | NT | NT | Т  | NT |
| Correctness  | 0 | Х  | 0  | 0  | 0  | Х  | Х  | Х  |

(c) (3 points)
Accuracy:  $\frac{5}{8} = 62.5\%$ ST: Strongly predict taken WT: Weakly predict taken

SNT: Strongly predict not taken

| Ground truth | Т   | NT  | NT  | NT  | NT  | Т   | NT  | Т   |
|--------------|-----|-----|-----|-----|-----|-----|-----|-----|
| State        | SNT | WNT | SNT | SNT | SNT | SNT | WNT | SNT |
| Decision     | NT  |
| Correctness  | Х   | 0   | 0   | 0   | 0   | Х   | 0   | Х   |

WNT: Weakly predict not taken

8. (10 points)

|         | ĪF               | ID               | EX             | MEM            | WB            |
|---------|------------------|------------------|----------------|----------------|---------------|
| Cycle 1 | 1d x5, 0(x28)    | -                | -              | -              | -             |
| Cycle 2 | sub x6, x6, x5   | ld x5, 0(x28)    | -              | -              | -             |
| Cycle 3 | beq x28, x29, L1 | sub x6, x6, x5   | 1d x5, 0(x28)  | -              | -             |
| Cycle 4 | beq x28, x29, L1 | sub x6, x6, x5   | bubble         | ld x5, 0(x28)  | -             |
| Cycle 5 | sd x28, 0(x29)   | beq x28, x29, L1 | sub x6, x6, x5 | bubble         | ld x5, 0(x28) |
| Cycle 6 | exception        | bubble           | bubble         | sub x6, x6, x5 | bubble        |