## The University of Alabama in Huntsville ECE Department CPE 431 01 Fall 2012 Test 1

|                            |                                 | Na                     | me:                      |                    |        |
|----------------------------|---------------------------------|------------------------|--------------------------|--------------------|--------|
| (1 point) A pro            | ogram selected for use<br>      | in comparing com       | puter performance is o   | called a           |        |
| (1 point) A the proper add | ress.                           | is a link to the calli | ng site that allows a p  | rocedure to ret    | urn to |
|                            | stems program that pla          | 5 1 5                  | ram in main memory s     | so that it is reac | ly to  |
| (1 point) An u             | nscheduled event that           | disrupts program e     | execution is called an _ |                    | ·      |
| _                          | is an impexecution, much like a |                        | que in which multiple    | instructions ar    | e      |
| (10 points) Co             | nsider a computer run           | ning programs with     | n CPU times shown in     | the following      | table. |
| FP                         | INT                             | L/S                    | Branch                   | Total              |        |
| Instructions               | Instructions                    | Instructions           | Instructions             | Time               |        |
| 50 s                       | 80 s                            | 50 s                   | 30 s                     | 210 s              | 1      |

If the INT instruction time is reduced by 30 %, what is the speedup achieved?

7. (10 points) In a von Neumann architecture, groups of bits have no intrinsic meanings by themselves. What a bit pattern represents depends entirely on how it is used. If the bit pattern 0xAF19 F329 expressed in hexadecimal notation is placed in to the Instruction Register, what MIPS instruction will be executed?

8. (10 points) Write down the binary representation of 120.125 in IEEE single precision format.

9. (10 points) In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies:

| IF    | ID               | EX | MEM   | WB    |  |
|-------|------------------|----|-------|-------|--|
| 200 s | 00 s 170 s 220 s |    | 210 s | 150 s |  |

What is the total latency of an sw instruction in a pipelined and non-pipelined processor?

10. (10 points) The table below shows the number of instructions per processor core on a multicore processor as well as the average CPI for executing the program on 1, 2, 4, or 8 cores. Using this data, you will be exploring the speedup of applications on multicore processors.

|    |   | Cores per Processor | Instructions per Core | Average CPI |
|----|---|---------------------|-----------------------|-------------|
| a. | 1 | 1                   | 1.00E+10              | 1.3         |
|    | 2 | 2                   | 5.00E+09              | 1.5         |
|    | 4 | 4                   | 2.50E+09              | 1.9         |
|    | 8 | 8                   | 1.25E+09              | 2.7         |

Assuming a 2.4 GHz clock frequency, what is the execution time of the program using 1, 2, 4, or 8 cores?

11. (15 points) In this exercise, we examine how latencies of individual components of the datapath affect the clock cycle time of the entire datapath, and how these components are utilized by instructions. For problems in this exercise, assume the following latencies for logic blocks in the datapath: Any components not listed have zero delay. Show all of your work.

| I-Mem  | Add    | ALU    | Regs   | D-Mem  | SignExtend | Shift-Left-2 |
|--------|--------|--------|--------|--------|------------|--------------|
| 230 ps | 100 ps | 150 ps | 180 ps | 300 ps | 20 ps      | 10 ps        |



What is the clock cycle for this datapath?

12. (15 points) Consider an architecture that is similar to MIPS except that it supports update addressing for data transfer instructions. If we run gcc using this architecture, some percentage of the data transfer instructions will be able to make use of the new instructions, and for each instruction changes, one arithmetic instruction can be eliminated. If 20 % of the data transfer instructions can be changed, which will be faster for gcc, the modified MIPS architecture or the unmodified architecture? Assume the CPI values shown and that the modified architecture has its cycle time increased by 15% in order to accommodate the new instructions.

| Instruction Class  | Average CPI | Frequency in gcc |  |  |
|--------------------|-------------|------------------|--|--|
| Arithmetic         | 1.0         | 48 %             |  |  |
| Data transfer      | 1.3         | 33 %             |  |  |
| Conditional branch | 1.7         | 17 %             |  |  |
| Jump               | 1.1         | 2 %              |  |  |

13. (15 points) For each MIPS instruction, show the value of the opcode (op), source register (rs), and target register (rt) fields. For the I-type instructions, show the value of the immediate field, and for the R-type instructions, show the value of the destination register (rd) field.

|     | Type               | op | rs | rt | rd | Immediate |  |
|-----|--------------------|----|----|----|----|-----------|--|
| lui | \$t0, 4            |    |    |    |    |           |  |
| add | \$t1, \$s6, \$zero |    |    |    |    |           |  |
| sw  | \$t5, 16(\$s7)     |    |    |    |    |           |  |
| lw  | \$t2, -8(\$a2)     |    |    |    |    |           |  |
| beq | \$s3, \$t8, -100   |    |    |    |    |           |  |