

# Ve370 Introduction to Computer Organization

# Homework 4

Assigned: October 21, 2021

Due: 2:00pm on October 28, 2021

Submit a PDF file on Canvas

1. (15 points) Given this instruction:

1w x5, -4(x2)

As the instruction goes through the pipeline, what will be stored in the pipeline registers:

IF: what's in PC

## **Answer:**

- address of the lw instruction (1 point)

ID: what's in IF/ID

## **Answer:**

- address of the lw instruction,
- 111111111100 00010 010 00101 0000011 (0xFFC12283) (1 point)

EX: what's in ID/EX?

### **Answer:**

- address of the lw instruction
- content of x2, (1 point)
- content of x28, (1 point)
- 32-bit immediate number: 0xFFFFFFFC (1 point)
- Instruction[30, 14-12]: 0bX010, (1 point)
- Rd: 5 (1 point)
- ALUSrc: 1 (1 point)
- **ALUOp: 00**
- Branch: 0
- MemWrite: 0 (1 point)
- MemRead: 1 (1 point)
- MemtoReg: 1 (1 point)
- RegWrite: 1 (1 point)

MEM: what's in EX/MEM

## **Answer:**

- Address of lw + Imm << 1 (1 point)
- Content of  $x^2 4$  (1 point)
- Content of x28
- Rd: 5
- Branch: 0
- MemWrite: 0
- MemRead: 1
- MemtoReg: 1

- RegWrite: 1

- Zero

WB: what's in MEM/WB?

## **Answer:**

- Mem[content of  $x^2 4$ ] (1 point)
- Content of  $x^2 4$
- Rd: 5
- MemtoReg: 1
- RegWrite: 1
- 2. (20 points) Assume that individual stages of the RISC-V pipelined datapath have the following latencies:

| IF     | ID     | EX     | MEM    | WB     |
|--------|--------|--------|--------|--------|
| 250 ps | 350 ps | 150 ps | 300 ps | 200 ps |

Also, assume that instructions executed by the processor are broken down as follows:

| ALU/Logic | Jump/Branch | Load | Store |
|-----------|-------------|------|-------|
| 45%       | 20%         | 20%  | 15%   |

(1) What is the clock cycle time? (2 points)

Answer: 350 ps

- (2) What is the execution time of a sw instruction in the pipelined processor? (3 points) **Answer: 350 x 5 ps**
- (3) If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor? (5 points)

Answer: ID stage (3 points), 300ps (2 points)

(4) Using the processor to run a program of 1,000 instructions, what is the total execution time? What is the CPI? (10 points)

```
Answer:
```

```
CPI = 1000+4/1000 = 1.004 (5 points)
Total execution time = IC * CPI * Tcc = 1004 * 350 = 351,400 ps (5 points)
Or
```

Total execution time = 1004 \* 300 = 301,200 ps

- 3. (10 points) Assume that x11 is initialized to 11 and x12 is initialized to 22. Suppose you executed the code below on a pipelined processor that does not handle data hazards at all.
  - L1: addi x11, x12, 5 L2: add x13, x12, x11

L3: addi x14, x11, 15



(1) Indicate data dependencies, if any, in above instruction sequence. (which register between which instructions) (5 points)

#### **Answer:**

```
x11 between L1 and L2 (2.5 points)
x11 between L1 and L3 (2.5 points)
```

(2) What would the final values of registers x13 and x14 be? (5 points)

## **Answer:**

```
x13 will be 11+22 = 33 (2.5 points)
x14 will be 11+15 = 16 (2.5 points)
```

4. (30 points) Given the following instructions:

```
L1: sw x18,-12(x8)

L2: lw x3,8(x18)

L3: add x6,x3,x3

L4: or x8,x9,x6
```

a) Assume there is no forwarding in this pipelined processor. Indicate hazards and add NOP instructions to eliminate them. How many clock cycles will it take to execute the instructions? (10 points)

## **Answer:**

Hazards: load-use hazard between L2 and L3 on x3 (2 points), and EX hazard between L3 and L4 on x6 (2 point)

**Adding NOP:** 

```
L1: sw x18,-12(x8)
L2: lw x3,8(x18)
NOP (2 points)
NOP
L3: add x6,x3,x3
NOP (2 points)
NOP
L4: or x8,x9,x6
```

It will take 12 clock cycles to complete the instructions. (2 points)

b) Assume there is ALU-ALU forwarding. Indicate hazards and add NOP instructions to eliminate them. How many clock cycles will it take to execute the instructions? (10 points)

## **Answer:**

If there is ALU-ALU forwarding, EX hazards and some MEM hazards can be eliminated, but not the load-use hazard. (3 points)

```
Adding NOP: (5 points)
L1: sw x18,-12(x8)
L2: lw x3,8(x18)
NOP
NOP
L3: add x6,x3,x3
L4: or x8,x9,x6
```

It will take 10 clock cycles to complete the instructions. (2 points)



c) Assume there is full forwarding. Indicate hazards and add NOP instructions to eliminate them. How many clock cycles will it take to execute the instructions? (10 points)

If there is full forwarding, then EX hazard is removed, but load-use hazard cannot be completely eliminated. (3 points)

One NOP has to be added. (5 points)

L1: sw x18,-12(x8) L2: lw x3,8(x18)

NOP

L3: add x6,x3,x3 L4: or x8,x9,x6

It will take 9 clock cycles to complete. (2 points)

5. (25 points) Given this assembly instruction sequence executed by the pipelined processor:

```
L1: sub x6, x2, x1

L2: lw x3, 8(x6)

L3: lw x2, 0(x6)

L4: or x3, x5, x3

L5: sw x3, 0(x5)
```

## Hazards:

- EX hazard: L1 and L2 on x6 - MEM hazard: L1 and L3 on x6 - MEM hazard: L2 and L4 on x3 - EX hazard: L4 and L5 on x3

a) If the processor has forwarding, but we forgot to implement the hazard detection unit, what happens when this code executes? (5 points)

## **Answer:**

No load-use hazard, thus no need to stall the pipeline, thus hazard detection unit has no effect to this instruction sequence.

b) If there is forwarding, for the first five cycles during the execution of this code, specify which signals are asserted in each cycle by hazard detection and forwarding units. (10 points)

Signals generated by hazard dection unit: PCWrite, IF/IDWrite, Hazard Signals generated by forwarding unit: ForwardA, ForwardB

```
CC1: PCWrite=1, IF/IDWrite=1, Hazard=0; ForwardA=00, ForwardB=00 CC2: PCWrite=1, IF/IDWrite=1, Hazard=0; ForwardA=00, ForwardB=00 CC3: PCWrite=1, IF/IDWrite=1, Hazard=0; ForwardA=00, ForwardB=00 CC4: PCWrite=1, IF/IDWrite=1, Hazard=0; ForwardA=10, ForwardB=00 CC5: PCWrite=1, IF/IDWrite=1, Hazard=0; ForwardA=00, ForwardB=01
```

(2 points each)



c) If there is no forwarding, what new inputs and output signals do we need for the hazard detection unit? Using this instruction sequence as an example, explain why each signal is needed. (10 points)

## **Answer:**

```
If there is no forwarding, data hazards can only be resolved by stalling an instruction
```

```
L1: sub x6, x2, x1
NOP
NOP
L2: lw x3, 8(x6)
L3: lw x2, 0(x6)
NOP
L4: or x3, x5, x3
NOP
NOP
L5: sw x3, 0(x5)
```

So in the hazard detection unit, we also need to detect:

- ID/EX.RegisterRd == IF/ID.RegisterRs1
- ID/EX.RegisterRd == IF/ID.RegisterRs2
- ID/EX.RegWrite == 1
- EX/MEM.RegisterRd == IF/ID.RegisterRs1
- EX/MEM.RegisterRd == IF/ID.RegisterRs2
- EX/MEM.RegWrite == 1

EX/MEM.RegisterRd, ID/EX.RegWrite, EX/MEM.RegWrite are the new inputs. (2 points each signal)

No new output signals are needed. (2 points)

**Explanation: using above code or similar. (2 points)**