## The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE 431 01 Test 1 Solution Fall 2009

- 1. (1 point) State one of the design principles: \_Simplicity favors regularity or Make the common case fast or Good design demands good compromises or Smaller is faster\_.
- (1 point) A form of representation of an instruction composed of fields of binary numbers is an \_instruction format\_.
- 3. (1 point) The program that instigates a procedure and provides the necessary parameter values is the caller .
- 4. (1 point) The register that is reserved to point to the static area is called the <u>\_global pointer\_</u>.
- 5. (1 point) A systems program that places an object program in main memory so that it is ready to execute is a \_loader\_.
- 6. (10 points) For the following assembly code fragment, what is the total number of MIPS instructions executed?

```
(1)
            addi
                   $t1, $zero, 100
(2)
      LOOP: lw
                   $s1, 0($s0)
(3)
                   $s2, $s2, $s1
            add
(4)
            addi $s0, $s0, 4
(5)
            subi
                  $t1, $t1, 1
(6)
            bne
                   $t1, $zero, LOOP
```

Instruction (1) is executed once and sets the value of \$t1 to 100. Every time through the loop, t1 is decremented by 1. The bne is evaluated 100 times for the values of t1 = 99, 98, ..., 0. So, instructions (2)-(6) are executed 100 times, resulting in 500 instructions being executed. The total number of instructions executed is 501.

7. (5 points) Show the IEEE 754 binary representation for the floating-point number 1.5<sub>ten</sub> in single precision.

```
S=0 \text{ for positive number} \\ 1.5_{10}=1.1_2 \times 2^0 \\ \text{Exponent}=0+127=0111\ 1111 \\ \text{Fraction}=100\ 0000\ 0000\ 0000\ 0000\ 0000 \\ \text{Complete representation}=0011\ 1111\ 1100\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000 \\ \text{0000}\ 0000\ 0000\ 0000\ 0000\ 0000 \\ \text{0000}\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000 \\ \text{0000}\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000 \\ \text{0000}\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 000
```

8. (20 points) Consider two different implementations, P1 and P2, of the same instruction set. There are five classes of instructions (A, B, C, D, and E) in the instruction set. P1 has a clock rate of 4GHz, and P2 has a clock rate of 6 GHz. The average number of cycles for each instruction class for P1 and P2 are listed in the following table

| Class | CPI on P1 | CPI on P2 |
|-------|-----------|-----------|
| A     | 1         | 2         |
| В     | 2         | 2         |
| С     | 3         | 2         |
| D     | 4         | 4         |
| Е     | 5         | 4         |

If the number of instructions executed in a certain program is divided equally among the classes of instructions except for class A, which occurs twice as often as each of the others, how much faster is P2 than P1?

$$ET = \frac{IC * CPI}{CR}$$
,  $CPI = \sum_{i} w_{i} CPI_{i}$ 

$$\begin{split} w_A &= 2/6, \, w_B = 1/6, \, w_C = 1/6, \, w_D = 1/6, \, w_E = 1/6 \\ CPI_{P1} &= (2*1 + 2*1 + 3*1 + 4*1 + 5*1)/6 = 16/6 \\ CPI_{P2} &= (2*2 + 2*1 + 2*1 + 4*1 + 4*1)/6 = 16/6 \end{split}$$

$$\frac{P_{P2}}{P_{P1}} = \frac{ET_{P1}}{ET_{P2}} = \frac{\frac{IC_{P1} * CPI_{P1}}{CR_{P1}}}{\frac{IC_{P2} * CPI_{P2}}{CR_{P2}}} = \frac{\frac{CPI_{P1}}{CR_{P1}}}{\frac{CPI_{P2}}{CR_{P2}}} = \frac{CPI_{P1} * CR_{P2}}{CPI_{P2} * CR_{P1}} = \frac{16/6 * 6 \times 10^9}{16/6 * 4 \times 10^9} = 1.5$$

9. (10 points) What IEEE 754 floating point number does the following bit pattern represent?

1100 1101 1001 0100 0000 0000 0000 0001

S = 1, so the number is negative

Exponent = 
$$1001\ 1011 = 9*16+11 = 155$$
  
Fraction =  $2^{-3} + 2^{-5} + 2^{-23} = 0.125 + 0.03125 + 0.00000011921 = 0.15625011921$   
Number =  $(-1)^{S}*(1 + Fraction)* 2^{Exponent - 127} = (-1)1*(1.15625011921)* 2^{155-127}$   
=  $-1.15625011921 * 2^{28} = -310,378,528$ 

10. (10 points) The CPI of the different instruction types is given in the following table.

| Arithmetic | Load/Store | Branch |  |
|------------|------------|--------|--|
| 1          | 10         | 4      |  |

Assume the following instruction breakdown given for executing a program.

|            | Instructions (in millions) |  |  |  |
|------------|----------------------------|--|--|--|
| Arithmetic | 500                        |  |  |  |
| Load/Store | 300                        |  |  |  |
| Branch     | 100                        |  |  |  |

Suppose that we find a way to double the performance of arithmetic instructions. What is the overall speed-up of our machine?

$$2 = \frac{P_{arith\_new}}{P_{arith\_old}} = \frac{ET_{arith\_old}}{ET_{arith\_new}}, \qquad ET_{arith\_new} = 0.5*ET_{arith\_old}$$

$$Speedup = \frac{P_{new}}{P_{old}} = \frac{ET_{old}}{ET_{new}}$$

$$ET_{old} = ET_{arith\_old} + ET_{load/store} + ET_{branch} = (IC_{arith\_old} * CPI_{arith\_old} + IC_{load/store} * CPI_{load/store} + IC_{branch} * CPI_{branch}) * CT = (500 \text{ x } 10^6 * 1 + 300 \text{ x } 10^6 * 10 + 100 \text{ x } 10^6 * 4) * CT = 3.9 \text{ x } 10^9 * CT$$

$$ET_{new} = 0.5*ET_{arith\_old} + ET_{load/store} + ET_{branch} = (0.5*500 \text{ x } 10^6*1 + 300 \text{ x } 10^6*10 + 100 \text{ x } 10^6*4) \\ * CT = 3.65 \text{ x } 109*CT$$

Speedup = 
$$\frac{P_{new}}{P_{old}} = \frac{ET_{old}}{ET_{new}} = \frac{3.9 \times 10^9 * CT}{3.65 \times 10^9 * CT} = 1.07$$

11. (15 points) Assume that the logic blocks used to implement the datapath have the following latencies:

| I-Mem  | Add    | Mux    | ALU    | Regs   | D-Mem   | Sign-Extend | Shift-left-2 | ALU Ctrl |
|--------|--------|--------|--------|--------|---------|-------------|--------------|----------|
| 500 ps | 150 ps | 100 ps | 180 ps | 220 ps | 1000 ps | 90 ps       | 20 ps        | 55 ps    |

Further, assume that the time needed by the control unit to generate individual control signals is as follows:

| RegDst | Jump   | Branch | MemRead | MemtoReg | ALUOp  | MemWrite | ALUsrc | RegWrite |
|--------|--------|--------|---------|----------|--------|----------|--------|----------|
| 720 ps | 730 ps | 600 ps | 400 ps  | 700 ps   | 200 ps | 710 ps   | 200 ps | 800 ps   |



What is the clock cycle time of the processor?

The clock cycle must be long enough to complete all instructions:

beq 1330 ps lw 2320 ps sw 2210 ps R-type 1540 ps jump 1330 ps

12. (15 points) Assume that individual stages of the datapath have the following latencies:

| IF     | ID     | EX     | MEM    | WB     |
|--------|--------|--------|--------|--------|
| 200 ps | 150 ps | 120 ps | 190 ps | 140 ps |

Instead of a single-cycle organization, we can use a multi-cycle organization where each instruction takes multiple cycles but one instruction finishes before another is fetched. In this organization, an instruction only goes through the stages it actually needs (e.g., sw only takes four cycles because it does not need the WB stage). What are the clock cycle times for a single-cycle organization, a multiple-cycle organization and a pipelined organization?

Single-cycle: all stages must complete, so sum = (200 + 150 + 120 + 190 + 140) = 800 ps Pipelined: the clock cycle must be as long as the longest stage, 200 ps, and all instructions take 5 cycles to execute

Multi-cycle: the clock cycle must be as long as the longest stage, 200 ps, and instructions take different numbers of cycles: beq -3, sw -4, lw -5, R-type -4 (skip MEM), jump -1

13. (10 points) For a color display using 16 bits for each of the primary colors (red, green, blue) per pixel and with a resolution of 1280 x 800 pixels, what should be the size (in bytes) of the frame buffer to store a frame? If a computer has a main memory of 8 GB, how many frames could it store, assuming the memory contains no other information?

$$1280 \times 800 \, pixels \times \frac{3 colors}{pixel} \times \frac{16 bits}{color} \times \frac{1b yte}{8 bits} = 6144000 bytes$$

$$2^3 \times 2^{30} bytes \times \frac{1 frame}{6144000 bytes} = 1398 frames$$