## The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE 431 01 Test 2 November 20, 2003

| Name: |
|-------|
|-------|

| 1. | (15 points) Consider the case of a four-deep pipeline where the branch is resolved at the end of the |
|----|------------------------------------------------------------------------------------------------------|
|    | second stage for unconditional branches and at the end of the third cycle for conditional branches.  |
|    | The program run on this pipeline has the following branch frequencies (as percentages of all         |
|    | instructions) are as follows:                                                                        |

Conditional branches 20% Jumps and calls 5%

Conditional branches 60% are taken

Assuming that the CPI of the program, neglecting branch hazards, is 1.0, how much slower is the real number, when branch hazards are considered?

2. (15 points) You have been given 50 32K x 16-bit SRAMS to build an instruction cache for a processor with a 32-bit address. You do not have a byte offset. What is the largest size (i.e., the largest size of the data storage area in bytes) two-way set associative instruction cache that you can build with two-word blocks? Show the breakdown on the address into its cache access components and describe how the various SRAM chips will be used.

- 3. (1 point) A \_\_\_\_\_ means that the hardware cannot support the combination of instructions that we want to execute in the same clock cycle.
- 4. (1 point) The one case where forwarding cannot save the day is the case where a \_\_\_\_\_\_\_instruction is followed by \_\_\_\_\_\_\_.
- 5. (1 point) A \_\_\_\_\_ computer can launch multiple instructions in a single cycle.
- 6. (1 point) \_\_\_\_\_\_ states that if an item is referenced, items whose addresses are close by will tend to be referenced soon.
- 7. (20 points) Consider executing the following code on a pipelined datapath like the one shown, except that it has forwarding:



If the sw \$s3 instruction two instructions after the sort label begins executing in cycle 1 and the beq \$t0, \$zero, exit1 is taken, what are the values stored in the following fields of the ID/EX pipeline register in the 8<sup>th</sup> cycle? Assume that before the instructions are executed, the state of the machine was as follows:

The PC has the value 500<sub>10</sub>, the address of the addi instruction (with the label sort).

Every register has the initial value  $20_{10}$  plus the register number.

Every memory word accessed as data has the initial value  $10000_{10}$  plus the byte address of the word.

Fill in all of the fields, even if the current instruction in that state is not using them.

ID/EX.WB =

ID/EX.MEM =

ID/EX.EX =

ID/EX.PCInc =

ID/EX.Read1 =

ID/EX.Read2 =

ID/EX.SignEx =

ID/EX.Writert =

ID/EX.Writerd =

- 8. (5 points) For an m-stage pipeline, how many cycles does it take to execute n instructions if the pipeline is empty when these instructions begin to execute? \_\_\_\_\_\_
- 9. (1 point) In the case where memory and cache have different values for the same memory location, the cache and memory are said to be \_\_\_\_\_\_.
- 10. (15 points) Consider a memory hierarchy using one of the three organizations for main memory shown below. Assume that the cache block size is 16 words, that the width of organization b of the figure is eight words, and that the number of banks in organization c is four. If the main memory latency for a new access is 30 cycles and the transfer time is 3 cycles, what are the miss penalties for organizations a, b and c?



a. One-word-wide memory organization

11. (15 points) (a) Identify all of the data dependencies in the following code. (b) How is each data dependency either handled or not handled by forwarding? Draw a multiple clock cycle style diagram to support your answer.

```
add $2, $5, $4
lw $4, 28($2)
add $3, $2, $5
sw $4, 100($2)
add $3, $3, $4
```

12. (10 points) Here is a series of address references given as word addresses: 1, 4, 8, 5, 20, 17, 19, 56, 9, 11, 4, 43, 5, 6. Assuming a direct-mapped cache with two-word blocks and a total size of 16 words that is initially empty, (a) label each reference in the list as a hit or a miss and (b) show the final contents of the cache.