## The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE 431 01 Test II November 15, 2001

|    | Name:                                                                                                                                                                                                                                                                                                                                               |
|----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1. | (5 points) For a twelve-stage pipeline, how many cycles does it take to execute n instructions if the pipeline is empty when these instructions begin to execute?                                                                                                                                                                                   |
| 2. | (1 point) A occurs when an instruction depends on the results of a previous instruction still in the pipeline.                                                                                                                                                                                                                                      |
| 3. | (15 points) Here is a series of address references given as word addresses: 1, 4, 8, 5, 20, 17, 19, 56, 9, 11, 4, 43, 5, 6, 9, 17. Assuming a direct mapped cache with four-word blocks and a total size of 32 words that is initially empty, (a) label each reference in the list as a hit or a miss and (b) show the final contents of the cache. |

4. (15 points) Consider a memory hierarchy using one of the three organizations for main memory shown below. Assume that the cache block size is 8 words, that the width of organization b of the figure is four words, and that the number of banks in organization c is four. If the main memory latency for a new access is 30 cycles and the transfer time is 3 cycles, what are the miss penalties for these organizations?



a. One-word-wide memory organization

| 5. | (15 points) (a) Identify all of the data dependencies in the following code. (b) How is each data dependency either handled or not handled by forwarding? If forwarding occurs, state the source of the data. |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|    | add \$2, \$5, \$4  lw \$2, 28(\$2)  add \$4, \$2, \$5  sw \$4, 100(\$2)  add \$3, \$2, \$4                                                                                                                    |
|    |                                                                                                                                                                                                               |
|    |                                                                                                                                                                                                               |
|    |                                                                                                                                                                                                               |
|    |                                                                                                                                                                                                               |
|    |                                                                                                                                                                                                               |
| 6. | (1 point) Pipelining improves performance by increasing                                                                                                                                                       |
|    | , as opposed to decreasing the execution time of an individual instruction.                                                                                                                                   |

7. (3 points).The three C's of cache memories are: \_\_\_\_\_\_.

\_\_\_\_\_, and \_\_\_\_\_.

8. (10 points) Consider a virtual memory system with the following properties:

34-bit virtual byte address8 Kbyte pages18-bit physical address

What is the total size of the page table for each process on this machine, assuming that the valid, protection, dirty, and use bits take a total of 6 bits and that all the virtual pages are in use? (Assume that disk addresses are not stored in the page table.)

9. (20 points) Consider executing the following code on a pipelined datapath like the one in Chapter 6, which has no forwarding, which has hazard detection, in which branches complete in the MEM stage, and in which there is no branch prediction:

```
$t4, 4($t2)
sort:
        addi$sp, $sp, -20
                                                ٦w
                                                slt
        sw $ra, 16($sp)
                                                      $t0, $t4, $t3
                                                beq
                                                      $t0, $zero, exit2
        sw $s3, 12($sp)
        sw $s2, 8($sp)
                                                add
                                                      $a0, $s2, $zero
        sw $s1, 4($sp)
                                                add
                                                       $a1, $s1, $zero
        sw $s0, 0($sp)
                                                jal
                                                      swap
        add $s2, $a0, $zero
                                                addi $s1, $s1, -1
        add $s3, $a1, $zero
                                                      for2tst
                                        exit2: addi $s0, $s0, 1
        add $s0, $zero, $zero
for1tst: slt $t0, $s0, $s3
                                                      for1tst
       beq $t0, $zero, exit1
                                        exit1: lw
                                                      $s0, 0($sp)
       addi$s1, $s0, -1
                                                lw
                                                      $s1, 4($sp)
for2tst: slt $s1, $s0, $zero
                                                      $s2, 8($sp)
                                                lw
        bne $t0, $zero, exit2
                                                lw
                                                      $s3, 12($sp)
        add $t1, $s1, $s1
                                                lw
                                                      $ra, 16($sp)
        add $t1, $t1, $t1
                                                addi $sp, $sp, 20
                                                jr
        add $t2, $s2, $t1
                                                      $ra
        lw $t3, 0($t2)
```

If the addi instruction with the label sort begins executing in cycle 1 and the beq \$t0, \$zero, exit1 is taken, what are the values stored in the following fields of the ID/EX pipeline register in the 19<sup>th</sup> cycle? Assume that before the instructions are executed, the state of the machine was as follows:

The PC has the value  $300_{10}$ , the address of the addi instruction.

Every register has the initial value  $20_{10}$  plus the register number.

Every memory word accessed as data has the initial value 4000<sub>10</sub> plus the byte address of the word.

ID/EX.EX =
ID/EX.PCInc =
ID/EX.ReadRegister1 =
ID/EX.ReadRegister2 =
ID/EX.WriteRt =
ID/EX.WriteRd =

10. (10 points) Using the figures provided below, indicate which portions of the datapath are active (performing an operation used later by the instruction's execution) and which are inactive (performing an operation discarded during the instruction's execution) in each of the five stages of a lw instruction.





## 11. (5 points) Use the following code fragment:

```
loop: lw $1, 0($2)
addi $1, $1, 4
sw $1, 0($2)
addi $2, $2, 4
bne $2, $3, loop
```

Consider the pipelined datapath of Chapter 6 with forwarding. Insert any necessary stalls, or reorder, if possible, to prevent stalling.