#### **Q1**

Suppose a new CPU has 70% of the capacitive load of the previous generation, 15% voltage reduction, and 20% slower clock. How much power reduction the new CPU brings compared to the old generation?

Q2
The following measurements have been made using a simulator for a design. What is the design's CPI?

| Instruction class | CPI | Frequency |  |
|-------------------|-----|-----------|--|
| ALU               | 1   | 40%       |  |
| Load              | 4   | 30%       |  |
| Branches          | 2   | 20%       |  |
| Stores            | 3   | 10%       |  |

# Q3

Using the measurements in Question 2, how much faster would the machine be if a better data cache could reduce the loads and stores by one cycle each?

## **Q4**

Write the MIPS assembly code that creates the 32-bit constant

0001 0000 0000 0010 0100 1001 0010 0100, and stores it in register \$r1.

#### **Q5**

Suppose that we build a 32-bit ripple-carry adder from 32 1-bit full adders. The delay from the input to carry out of a

1-bit full adder is 1 cycle. And, the delay from input to result of the 1-bit full adder is 2cycles. What's the delay of this 32-bit ripple-carry adder from input to the result?

### Q6

How many adders do we need to build a 64-bit faster multiplier by using a parallel tree? (32-bit input and 64-bit output)

#### **Q7**

For the following C statement, please give the corresponding MIPS assembly code (assume that the variables f, g, h and i are given and could be considered 32-bit integer as declared in a C program.) Use a minimal number of MIPS assembly instructions.

$$F = g + (h - 5);$$

## Q8

In this question, we examine how resource hazards, control hazards, and Instruction Set Architecture (ISA) design can affect pipelined execution. Problems in this exercise refer to the following fragment of MIPS code:

sw r16, 12(r6)

lw r16, 8(r6)

beg r5, r4, Label //Assume r5!=r4

add r5, r1, r4

slt r5, r15, r4

Assume that individual pipeline stages have the following latencies:

| IF    | ID    | EX    | MEM   | WB   |
|-------|-------|-------|-------|------|
| 150ps | 100ps | 120ps | 140ps | 80ps |

Assume that all branches are perfectly predicted (effectively eliminating all control hazards) and that no delay slots are used. If we only have one memory (for both instruction and data), there is a structural hazard every time we need to fetch an instruction in the same cycle in which another instruction accesses data. To guarantee "forward progress", this hazard must always be resolved in favor of the instruction that accesses data. What is the total execution time of this instruction sequence in the 5-stage pipeline that only has one memory?

#### Q9

Using the latencies and the assembly code in the last question. In this question, let us continue to assume that all branches are perfectly predicted (this eliminates all control hazards) and that no delay slots are used. If we change load/store instructions to use a register (without an offset) as the address, these instructions no longer need to use the ALU. As a result, MEM and EX stages can be overlapped and the pipeline has only 4 stages. Change this code to accommodate this changed ISA. Assuming this change does not affect clock cycle time, what speedup is achieved in this instruction sequence?

# **Q10**

For the following repeating pattern (e.g., in a loop) of branch outcomes:

N, N, N, T, N, T, T, T, T, N, T, N, T, N, T

- a. What is the accuracy of always-taken and always-not-taken predictors for this sequence of branch outcomes?
- b. What is the accuracy of a one-bit predictor assuming the predictor starts off in the predict-taken state. What is the accuracy of this predictor if this pattern repeats forever?
- c. What is the accuracy of a two-bit dynamic branch predictor assuming that the predictor starts off in the weakly predict-not-taken state? What is the accuracy of this predictor if this pattern is repeated forever?