# 1 Homework 4

### 12.14 Excersises

# 12.14.5 Question 5

In the exercise, we examine in detail how an instruction is executed in single-cycle datapath. Problems in this exercise refer to a clock cycle in which the processor fetches the following instruction word: 0xadac0014.

For context: The encoded instruction is sw \$t4, 20(\$t5)

(a) What are the values of the ALU control unit's inputs for this instruction? Work:

#### Answer

- (b) What is the new PC address after this instruction is executed? Highlight the path through which this value is determined.
- (c) For each mux, show the values of its inputs and outputs during the execution of this instruction. List values that are register outputs at **Reg** [xn]
- (d) What are the input values for the ALU and the two add units?
- (e) What are the values of all inputs for the registers unit?

#### 12.14.7 Question 7

Problems in this exercise assume that the logic blocks used to implement a processor's datapath (COD Figure 4.21) have the following latencies:

|       | Register<br>File |      | ALU   |       |     | Register<br>Read |      |      | Control |
|-------|------------------|------|-------|-------|-----|------------------|------|------|---------|
| 250ps | 150ps            | 25ps | 200ps | 150ps | 5ps | 30ps             | 20ps | 50ps | 50ps    |

"Register read" is the time needed after the rising clock edge for the new register value to appear on the output. This value applies to the PC only. "Register setup" is the amount of time a register's data input must be stable before the rising edge of the clock. This value applies to both the PC and Register File.

- (a) What is the latency of an R-type instruction (i.e., how long must the clock period be to ensure that this instruction works correctly)?
- (b) What is the latency of lw? (Check your answer carefully. Many students place extra muxes on the critical path.)

- (c) What is the latency of sw? (Check your answer carefully. Many students place extra muxes on the critical path.)
- (d) What is the latency of beq?
- (e) What is the latency of an arithmetic, logical, or shift I-type (non-load) instruction?
- (f) What is the minimum clock period for this CPU?

## 12.14.10 Question 10

When the processor designers consider a possible improvement to the processor datapath, the decision usually depends on the cost/performance trade-off. In the following three problems, assume that we are beginning with the datapath from COD figure 4.21, the latencies from Exercise 4.7, and the following costs:

|       | Register |     |     |       |       | Single   | Sign   | Sign |         |
|-------|----------|-----|-----|-------|-------|----------|--------|------|---------|
| I-Mem | File     | Mux | ALU | Adder | D-Mem | Register | extend | gate | Control |
| 1000  | 200      | 10  | 100 | 30    | 2000  | 5        | 100    | 1    | 500     |

Suppose doubling the number of general purpose registers from 32 to 64 would reduce the number of lw and sw instruction by 12%, but increase the latency of the register file from 150 ps to 160 ps and double the cost from 200 to 400. (Use the instruction mix from Exercise 4.8 and ignore the other effects on the ISA discussed in Exercise 2.18.)

- (a) What is the speedup achieved by adding this improvement?
- (b) Compare the change in performance to the change in cost.
- (c) Given the cost/performance ratios you just calculated, describe a situation where it makes sense to add more registers and describe a situation where it doesn't make sense to add more registers.

### 12.14.16 Question 16

In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies:

| IF    | ID    | EX    | MEM   | WB    |
|-------|-------|-------|-------|-------|
| 250ps | 350ps | 150ps | 300ps | 200ps |

Also, assume that instructions executed by the processor are broken down as follows:

| ALU/Logic | Jump/Branch | Load | Store |
|-----------|-------------|------|-------|
| 45%       | 20%         | 20%  | 15%   |

- (a) What is the clock cycle time in a pipelined and non-pipelined processor?
- (b) What is the total latency of an lw instruction in a pipelined and non-pipelined processor?
- (c) If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor?
- (d) Assuming there are no stalls or hazards, what is the utilization of the data memory?
- (e) Assuming there are no stalls or hazards, what is the utilization of the write-register port of the "Registers" unit?