## The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE 431 01 Final Exam December 4, 2014

|    |                                 | Name:                                                                                                   |
|----|---------------------------------|---------------------------------------------------------------------------------------------------------|
| 1. | (1 point) A i                   | s a set of computers connected over a local area network that                                           |
|    | function as a single large mul  | tiprocessor.                                                                                            |
| 2. | (1 point)                       | is the speedup achieved on a multiprocessor without                                                     |
|    | increasing the size of the prol | olem.                                                                                                   |
| 3. | (1 point)                       | parallelism achieved by performing the same operation on                                                |
|    | independent data.               |                                                                                                         |
| 4. | (1 point) A                     | includes one or more threads, the address space, and the                                                |
|    | operating system state.         |                                                                                                         |
| 5. | (1 point) A                     | is a function that processes a data structure and returns a                                             |
|    | single value.                   |                                                                                                         |
| 6. |                                 | nary representation of the decimal number 159.375 assuming the nat. Express your answer in hexadecimal. |

7. (15 points) Here is a series of address references given as hexadecimal word addresses: 1, 4, 8, 5, 20, 17, 19, 56, 209, 11, 4, 43, 5, 36, 8, 16, 59, 187. Assuming a direct mapped cache with four word blocks, a total size of 16 words that is initially empty, (a) label each reference in the list as a hit or a miss and (b) show the entire history of the cache

| 0x1   |  |
|-------|--|
| 0x4   |  |
| 0x8   |  |
| 0x5   |  |
| 0x20  |  |
| 0x17  |  |
| 0x19  |  |
| 0x56  |  |
| 0x209 |  |
| 0x11  |  |
| 0x4   |  |
| 0x43  |  |
| 0x5   |  |
| 0x36  |  |
| 0x8   |  |
| 0x16  |  |
| 0x59  |  |
| 0x187 |  |

8. (8 points) ) Consider the following portions of three programs running at the same time on three processors in a symmetric multicore processor (SMP). Assume that before this code is run, w is 2, x is 4 and y is 3 and z is 1. w, x, y, and z are type int.

```
Core 1: y = 5/(z + w);

Core 2: x = x + y/w + 1;

Core 3: z = w*(x - y);
```

9. (7 points) . Consider a computer running a program that requires 750 s, with 70s spent executing FP instructions, 85 s executing L/S instructions, and 40 s spent executing branch instructions and the rest executing R Type instructions. By how much must we improve the CPI of R Type instructions if we want the program to run two times faster?

10. (15 points) Add a variant of the lw instruction which sums the contents of two registers to obtain the address of the data and which uses the R format to the single-cycle datapath shown in the figure below. Add any necessary datapaths and control signals and show the necessary additions to the table of control signals given.



| Instruction | RegDst | ALUSrc | Memto- | Reg   | Mem  | Mem   | Branch | ALU | ALU |  |
|-------------|--------|--------|--------|-------|------|-------|--------|-----|-----|--|
|             |        |        | Reg    | Write | Read | Write |        | Op1 | Op0 |  |
| R-format    | 1      | 0      | 0      | 1     | 0    | 0     | 0      | 1   | 0   |  |
| lw          | 0      | 1      | 1      | 1     | 1    | 0     | 0      | 0   | 0   |  |
| SW          | d      | 1      | d      | 0     | 0    | 1     | 0      | 0   | 0   |  |
| beq         | d      | 0      | d      | 0     | 0    | 0     | 1      | 0   | 1   |  |
|             |        |        |        |       |      |       |        |     |     |  |

d – don't care

11. (12 points) The following data constitutes a stream of virtual addresses as seen on a system.

Assume 8 KiB pages, a 4-entry fully associative TLB, and true LRU replacement. If pages must be brought in from disk, increment the next largest page number.

4669, 2227, 13916, 34587, 18885, 12608, 49225, 9226, 46390

TLB

| Valid | Tag | Physical Page Number |
|-------|-----|----------------------|
| 1     | 0   | 5                    |
| 1     | 7   | 4                    |
| 1     | 3   | 6                    |
| 0     | 4   | 9                    |

Page table

| Valid | Physical page or in disk |
|-------|--------------------------|
| 1     | 5                        |
| 0     | Disk                     |
| 0     | Disk                     |
| 1     | 6                        |
| 1     | 9                        |
| 1     | 11                       |
| 0     | Disk                     |
| 1     | 4                        |
| 0     | Disk                     |
| 0     | Disk                     |
| 1     | 3                        |
| 1     | 12                       |

Given the address stream, and the shown initial state of the TLB and page table, show the final state of the system. Also list for each reference if it is a hit in the TLB, a hit in the page table, or a page fault.

12. (15 points) (a) Identify all of the data dependencies in the following code. (b) How is each data dependency either handled or not handled by forwarding? Draw a multiple clock cycle style diagram to support your answer.

```
a add $5, $5, $4
b lw $4, 28($2)
c add $2, $4, $5
d sw $4, 100($2)
e add $3, $2, $4
```

13. (8 points) Pseudoinstructions are not part of the MIPS instruction set but often appear in MIPS programs. For the pseudoinstruction listed, produce a minimal sequence of actual MIPS instructions to accomplish the same thing. You may need to use \$at for some of the sequences. In the table, big refers to a specific number that requires 32 bits to represent and small to a number that can fit in 16 bits.

| Pseudoinstruction  | What it accomplishes      |
|--------------------|---------------------------|
| lw \$t5, big(\$t2) | \$t5 = Memory[\$t2 + big] |

14. (8 points) Consider a SEC code that protects 4 bit words with 3 parity bits. If we read the value 0x57, is there an error? If so, correct the error.