## EE116C/CS151B Homework 6

## **Problem 1**

The performance advantage of both the multi cycle and the pipelined designs is limited by the longer time required to access memory versus use of the ALU. Suppose the memory access became 2 clock cycles long. Draw the modified pipeline. List all the possible new forwarding situations and all possible new hazards and their length.

## **Problem 2**

We examine how data dependencies affect execution in the basic five-stage pipeline. Problems in this exercise refer to the following sequence of instructions:

|    | Instruction Sequence |
|----|----------------------|
| a. | lw \$1, 40(\$6)      |
|    | add \$6, \$2, \$2    |
|    | sw \$6, 50(\$1)      |
| b. | lw \$5, -16(\$5)     |
|    | sw \$5, -16(\$5)     |
|    | add \$5, \$5, \$5    |

- Indicate dependencies in the above instruction sequence.
- ii. Assume there is no forwarding in this pipelined processor. Indicate hazards and add nop instructions to eliminate them.
- iii. Assuming there is full forwarding, indicate hazards and add nop instructions to eliminate them.

## **Problem 3**

In this exercise, we make several assumptions. First, we assume that an N-issue superscalar processor can execute any N instructions in the same cycle, regardless of their types. Second, we assume that every instruction is independently chosen, without

regard for the instruction that precedes or follows it. Third, we assume that there are no stalls due to data dependences that no delay slots are used, and that branches execute in the EX stage of the pipeline. Finally, we assume that instructions executed in the program are distributed as follows:

|    | ALU | Correctly predicted beq | Incorrectly predicted beq | lw  | sw  |
|----|-----|-------------------------|---------------------------|-----|-----|
| a. | 50% | 18%                     | 2%                        | 20% | 10% |
| b. | 40% | 10%                     | 5%                        | 35% | 15% |

- a. What is the CPI achieved by a 2-issue static superscalar processor on this program?
- b. In a 2-issue static superscalar processor that only has one register write port, what speedup is achieved by adding a second register write port?
- c. For a 2-issue static superscalar processor with a classic five-stage pipeline, what speed-up is achieved by making the branch prediction perfect?
- d. Repeat exercise C, but for a 4-issue processor. What conclusion can you draw about the importance of good branch prediction when the issue width of the processor is increased?