## **Homework4 Solution**

- 1) LWI Rd, Rm(Rn)
  - a) No new functional blocks are needed.
  - b) Only the control unit needs modification.

\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

- 2) Based on the latencies of individual stages:
  - a) Pipelined processor: 400 PS; Non-pipelined processor: 1300 PS
  - b) Pipelined latency: 400 \* 5 = 2000 Ps Non-pipelined latency: 1300 PS
  - c) The chosen stage to spilt is Memory and the new latencies are:

| IF     | ID     | EX     | MEM1   | MEM2   | WB     |
|--------|--------|--------|--------|--------|--------|
| 250 ps | 300 ps | 150 ps | 200 ps | 200 ps | 200 ps |

Pipelined proessor: 300 PS

Non-pipelined processor: 1300 PS

d) 35%

\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

3) ADDI X1, X2, #5

**NOP** 

NOP

ADD X3, X1, X2

ADDI X4, X1, #15

**NOP** 

ADD X5, X3, X2

4) The following sequence;

LDUR X1, [X6, #40] ADD X2, X3, X1 ADD X1, X6, X4 STUR X2, [X4, #20] AND X1, X1, X4 a) LDUR X1, [X6, #40]
NOP
NOP
ADD X2, X3, X1
ADD X1, X6, X4
NOP
STUR X2, [X4, #20]
AND X1, X1, X4

b) We can move up an instruction by swapping its place with another instruction that has no dependencies with it, so we can try to fill some nop slots with such instructions. We can use X7 to eliminate WAW and WAR dependences.

LDUR X7, [X6, #40] ADD X1, X6, X4 NOP ADD X2, X3, X7 AND X1, X1, X4 NOP STUR X2, [X4, #20]

- 5) Branch outcomes:
  - a) Always-Taken 3/5 = 60%Always-not-Taken 2/5 = 40%
  - b) Predictor outputs: NT, NT, NT, NT, T

IC, C, IC, IC, IC

Accuracy 1/5 = 20%

IC : Incorrect
C: Correct