## The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE 431 01 Final Exam Solution Fall 2010

- 1. (1 point) A \_symmetric multiprocessor (SMP)\_ is one that offers the programmer a single physical address space across all processors.
- 2. (1 point) As processors operating in parallel normally share data, they need to coordinate when operating on shared data, this coordination is called <u>\_synchronization\_</u>.
- (1 point) \_Pipelining\_ allows sharing of functional units of a single processor in an overlapping fashion.
- 4. (1 point) A virtual memory miss is called a \_page fault\_.
- 5. (1 point) \_True\_ (True or False) Overhead for communication is one of the reasons that it is difficult to write fast parallel processing programs.
- 6. (10 points) Consider two different implementations of the same instruction set architecture. There are four classes of instructions, A, B, C, and D. The clock rate and CPI of each implementation are given in the following table. Given a program with 10<sup>6</sup> instructions divided into classes as follows" 10% class A, 20% class B, 50% class C and 20% class D, which implementation is faster?

|   |    | Clock rate | CPI Class A | CPI Class B | CPI Class C | CPI Class D |
|---|----|------------|-------------|-------------|-------------|-------------|
| ] | P1 | 1.5 GHz    | 1           | 2           | 3           | 4           |
| ] | P2 | 2 GHz      | 2           | 2           | 2           | 2           |

$$\frac{P_{P2}}{P_{P1}} = \frac{ET_{P1}}{ET2} = \frac{\frac{IC_{P1} * CPI_{P1}}{CR_{P1}}}{\frac{IC_{P2} * CPI_{P2}}{CR_{P2}}} = \frac{IC * CPI_{P1} * CR_{P2}}{IC * CPI_{P2} * CR_{P1}} = \frac{(0.1*1 + 0.2*2 + 0.5*3 + 0.2*4)*2 \times 109}{(0.1*2 + 0.2*2 + 0.5*2 + 0.2*2)*1.5 \times 109}$$

$$= \frac{(0.1 + 0.4 + 1.5 + 0.8)*2}{(0.2 + 0.4 + 1.0 + 0.4)*1.5} = \frac{2.8*2}{2*1.5}$$

$$= \frac{5.6}{3} = 1.87$$

P2 is faster than P1 by a factor of 1.87

7. (10 points) Consider the following instruction: swi Rd, Rs (Rt) whose operation is defined by the following register transfer statement Mem[Rs + Rt] = Rd. What must be changed in the single cycle datapath to add this instruction to the MIPA ISA?



The register file has to be modified so that there are three read ports. A MUX has to be added with an attendant control line.

8. (10 points) Consider executing the following code on a pipelined datapath like the one shown which (a)has no forwarding, (b)does support jump and (c) has branch completion in the ID stage. A jump instruction completes in the ID stage.



```
sort:
         addi$sp, $sp, -20
                                             for1tst: slt $t0, $s0, $s3
         sw $ra, 16($sp)
                                                      beg $t0, $zero, exit1
         sw $s3, 12($sp)
                                                      addi$s1, $s0, -1
         sw $s2, 8($sp)
                                             for2tst: slt $t0, $s1, $zero
         sw $s1, 4($sp)
                                                      bne $t0, $zero, exit2
         sw $s0, 0($sp)
                                                      add $t1, $s1, $s1
         add $s2, $a0, $zero
                                                      add $t1, $t1, $t1
         add $s3, $a1, $zero
                                                      add $t2, $s2, $t1
         add $s0, $zero, $zero
                                                      lw $t3, 0($t2)
```

```
$t4, 4($t2)
      lw
                                                           for1tst
      slt
             $t0, $t4, $t3
                                             exit1: lw
                                                           $s0, 0($sp)
      beq
             $t0, $zero, exit2
                                                    lw
                                                           $s1, 4($sp)
             $a0, $s2, $zero
      add
                                                    lw
                                                           $s2, 8($sp)
                                                           $s3, 12($sp)
      add
             $a1, $s1, $zero
                                                    lw
                                                    lw
                                                           $ra, 16($sp)
      jal
             swap
             $s1, $s1, -1
      addi
                                                    addi
                                                           $sp, $sp, 20
             for2tst
                                                    jr
                                                           $ra
      j
exit2: addi
             $s0, $s0, 1
```

If the addi \$s1 instruction one instruction before the for2tst label begins executing in cycle 1 and the bne \$t0, \$zero, exit2 is taken, what are the instructions in each stage of the pipeline in the 14th cycle? If there is a bubble in any stage of the pipeline, also indicate which instruction was there before it became a bubble.

```
IF:
     _addi $s1_
ID:
     beg $t0
EX:
     __slt $t0_
MEM: __bubble (lw $s0)__
    ___j for1tst ____
WB
```

| Cycle | IF        | ID        | EX        | MEM       | WB        |
|-------|-----------|-----------|-----------|-----------|-----------|
| 1     | addi \$s1 |           |           |           |           |
| 2     | slt \$t0  | addi \$s1 |           |           |           |
| 3     | bne \$t0  | slt \$t0  | addi \$s1 |           |           |
| 4     | bne \$t0  | slt \$t0  | bubble    | addi \$s1 |           |
| 5     | bne \$t0  | slt \$t0  | bubble    | bubble    | addi \$s1 |
| 6     | add \$t1  | bne \$t0  | slt \$t0  | bubble    | bubble    |
| 7     | add \$t1  | bne \$t0  | bubble    | slt \$t0  | bubble    |
| 8     | add \$t1  | bne \$t0  | bubble    | bubble    | slt \$t0  |
| 9     | addi \$s0 | bubble    | bne \$t0  | bubble    | bubble    |
| 10    | j for1tst | addi \$s0 | bubble    | bne \$t0  | bubble    |
| 11    | lw \$s0   | j for1tst | addi \$s0 | bubble    | bne \$t0  |
| 12    | slt \$t0  | bubble    | j for1tst | addi \$s0 | bubble    |
| 13    | beq \$t0  | slt \$t0  | bubble    | j for1tst | addi \$s0 |
| 14    | addi \$s1 | beq \$t0  | slt \$t0  | bubble    | j for1tst |

9. (10 points) Assume that the variables f, q, h, i, and j are assigned to registers \$50, \$51, \$52, \$s3, and \$s4, respectively. Assume that the base address of the arrays A and B are in registers \$56 and \$57, respectively. Additionally, A and B are arrays of integers. What is the corresponding MIPS assembly code for the following C statement?

$$f = g - A[B[4]]$$

```
$t0, 16($s7)
lw
      $t0, $t0, 2
sll
      $t0, $t0, $s6
add
      $t0, 0($t0)
lw
      $s0, $s1, $t0
sub
```

10. (5 points) What MIPS instruction does the following collection of bits represent? 0x8D080040

```
1000 1101 0000 1000 0000 0000 0100 0000 Opcode = 10 0011 = lw

Rs = 01000 = $t0

Rt = 01000 = $t0

Offset = 0000 0000 0100 0000 = 64

lw $t0, 64($t0)
```

11. (5 points) What is the minimum number of bits needed to represent -150 in 2's complement? What is that representation in hexadecimal?

With 8 bits, the numbers that can be represented are  $-2^7 - +2^7 - 1$  or -128 - +127, can't represent -150. With 9 bits, the numbers that can be represented are  $-2^8 - +2^8 - 1$  or -256 = +255, so 9 bits is enough and the minimum.

12. (10 points) Here is a series of address references given as word addresses: 1, 4, 8, 5, 20, 17, 19, 56, 9, 11, 4, 43, 5, 6. Assuming a two-way set associative cache with two word blocks and a total size of 16 words that is initially empty, (a) label each reference in the list as a hit or a miss and (b) show the entire history of the cache

13. (10 points) Mean Time Between Failures (MTBF), Mean Time To Replacement (MTTR), and Mean Time To Failure (MTTF) are useful metrics for evaluating the reliability and availability of a storage resource. Explore these concepts by answering the questions about a device with the following metrics.

- (a) Calculate the MTBF for the devices.
- (b) Calculate the availability for this device.

MTBF = MTTF + MTTR = 5\*365 + 10 = 1835 days = 44,040 hours

$$Availability = \frac{MTTF}{MTTF + MTTR} = \frac{5Years \times \frac{365Days}{1Year}}{5Years \times \frac{365Days}{1Year} + 10Days} = \frac{1825}{1835} = 0.9986$$

14. (10 points) Consider the following portions of two different programs running at the same time on three processors in a symmetric multicore processor (SMP). Assume that before this code is run, w is 2, x is -2 and y and z are 1. w, x, y, and z are type int.

```
Core 1: y = 5/z;

Core 2: w = x + y + 1;

Core 3: z = w*x + y;
```

What are all the possible resulting values of w, x, y, and z? Show all possible interleavings of instructions and the resulting values of w, x, y, and z.

|                    | Resulting Values |    |    |    |
|--------------------|------------------|----|----|----|
| Possible Sequences | W                | X  | y  | Z  |
|                    | 2                | -2 | 1  | 1  |
| 1, 2, 3            | 4                | -2 | 5  | 3  |
| 1, 3, 2            | 4                | -2 | 5  | 1  |
| 2, 1, 3            | 0                | -2 | 5  | 5  |
| 2, 3, 1            | 0                | -2 | 5  | 1  |
| 3, 1, 2            | -2               | -2 | -1 | -3 |
| 3, 2, 1            | 0                | -2 | -1 | -3 |

15. (5 points) Represent -16.0 in single precision floating-point format.

```
-16.0_{10} = 10000_2 = 1.0 \times 2^4
Sign = 1, negative number
Exponent = 4 + 127 = 131_{10} = 1000\_0001_2
Fraction = 000\ 0000\ 0000\ 0000\ 0000\ 0000
```

Putting them together  $1100\ 0000\ 1000\ 0000\ 0000\ 0000\ 0000\ 0000 = 0xC080\ 0000$ 

- 2 0 1 2 1 0 2 2 0 2 4 0 2 8 0 2 16
- 16. (10 points) The following code is written in MATLAB, where elements within the same column are stored contiguously. References to which variables exhibit spatial locality? I and J are both arrays of integers 8000 by 8000.

```
for J=1:8;
for I=1:8000;
A(I, J) = B(J, 1) + A(J, I);
```

| Variable | I           | J           | A(I,J)       | B(J, 1)      | A(J, I)      |
|----------|-------------|-------------|--------------|--------------|--------------|
| Spatial  | Unknown     | Unknown     | Yes,         | Yes,         | No, elements |
|          |             |             | elements are | elements are | are accessed |
|          |             |             | accessed by  | accessed by  | by row       |
|          |             |             | column       | column       |              |
| Temporal | Yes, used   | Yes, used   | No, a        | Yes, used    | No, a        |
|          | 64000 times | 64000 times | different    | 8000 times,  | different    |
|          | to access   | to access   | element is   | each         | element is   |
|          | array       | array       | accessed     |              | accessed     |
|          | elements    | elements    | each time    |              | each time    |