# CSCI 524 – Computer Architecture Homework 1

Due: September 8, 2016

# Alexander Powell

1. (4 pts) Computer A executes the MIPS ISA and computer B executes the x86 ISA. On average, programs execute 1.5 times as many MIPS instructions as x86 instructions. Computer A has an average CPI of 1.5 and computer B an average CPI of 3. If computer B runs at a 3GHz clock frequency, what speed does computer A have to run at to be at least as fast as computer B?

**Solution:** Since the equation for CPU time is:

CPU Time = Instruction Count  $\times$  CPI  $\times$  Clock Cycle,

and the clock cycle for computer B can be converted to 333.3 psec, we can set up the following equality:

$$1.5 \times I \times 1.5 \times CC = I \times 3 \times 333.3$$
 psec

The first 1.5 on the left side of the equation is added to account for the fact that the MIPS ISA executes 1.5 times as many instructions as x86. Also, I stands for the number of instructions that each computer will execute. By simplifying the equation we end up with the following which is trivial to solve:

$$1.5 \times 1.5 \times CC = 3 \times 333.3$$
$$2.25 \times CC = 3 \times 333.3$$
$$CC = \frac{3 \times 333.3}{2.25} = 444.4$$

Again, by taking the reciprocal we have  $\frac{1}{444.4 \text{ psec}} = 2.25 \text{ GHz}$ . Therefore, computer A must run at 2.25 GHz to be at least as fast as computer B.

2. (7 pts) Assume a typical program has the following instruction type breakdown:

30% loads

10% stores

40% adds (adds, compares, and, sub, etc.)

8% multiplies

2% divides

8% conditional branches (assume perfect prediction)

2% unconditional branches (jumps, calls, returns, etc.)

Assume the processor that this program will be running on has the following instruction latencies (CPIs):

loads: 4 cycles stores: 6 cycles adds: 2 cycles multiplies: 12 cycles divides: 40 cycles

conditional branches: 4 cycles unconditional branches: 2 cycles

If you could pick one type of instruction to make twice as fast (half as many cycles to complete) in the next-generation of this processor, which instruction type would you pick? Why?

#### **Solution:**

First, let's begin by calculating the baseline summation of frequencies multiplied by their respective CPIs. The results are shown in the following table.

| Operation            | Frequency | CPI | $Freq \times CPI$ |
|----------------------|-----------|-----|-------------------|
| Loads                | 30%       | 4   | 1.2               |
| Stores               | 10%       | 6   | 0.6               |
| Adds                 | 40%       | 2   | 0.8               |
| Multiplies           | 8%        | 12  | 0.96              |
| Divides              | 2%        | 40  | 0.8               |
| Conditional Branch   | 8%        | 4   | 0.32              |
| Unconditional Branch | 2%        | 2   | 0.04              |
| Σ                    |           |     | 4.72              |

The next table shows the results of the summation after each respective operation's cycles are halved.

| Operation            | $\Sigma \ Freq \times CPI$ |  |
|----------------------|----------------------------|--|
| Load                 | 4.12                       |  |
| Store                | 4.42                       |  |
| Adds                 | 4.32                       |  |
| Multiplies           | 4.24                       |  |
| Divides              | 4.32                       |  |
| Conditional Branch   | 4.56                       |  |
| Unconditional Branch | 4.7                        |  |

Based on the results of this table, you should choose to make the load instructions twice as fast in the next generation of processor because it will increase performance the most (and therefore decrease execution time the most).

- 3. (9 pts) Consider three different processors P1, P2, and P3 executing the same instruction set. P1 has a 3.0 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2.
  - a Which processor has the highest performance expressed in instructions per second?

#### Solution:

$$P_1: \frac{3 \text{ GHz}}{1.5} \times 10^9 = 2 \times 10^9$$
 instructions per second

$$P_2: \frac{2.5 \text{ GHz}}{1} \times 10^9 = 2.5 \times 10^9 \text{ instructions per second}$$

$$P_3: \frac{4 \text{ GHz}}{2.2} \times 10^9 \approx 1.82 \times 10^9 \text{ instructions per second}$$

Therefore, based on the calculations above,  $P_2$  has the highest performance expressed in instructions per second.

b If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions.

#### Solution:

Cycles:

$$P_1: 10 \sec \times 3 \text{ GHz} = 30 \times 10^9 \text{ Cycles}$$

$$P_2: 10 \sec \times 2.5 \text{ GHz} = 25 \times 10^9 \text{ Cycles}$$

$$P_3: 10 \sec \times 4 \text{ GHz} = 40 \times 10^9 \text{ Cycles}$$

Number of Instructions:

$$P_1: \frac{3 \text{ GHz} \times 10 \text{ sec}}{1.5} = 20 \times 10^9 \text{ instructions}$$

P<sub>1</sub>: 
$$\frac{3 \text{ GHz} \times 10 \text{ sec}}{1.5} = 20 \times 10^9 \text{ instructions.}$$
P<sub>2</sub>:  $\frac{2.5 \text{ GHz} \times 10 \text{ sec}}{1} = 25 \times 10^9 \text{ instructions.}$ 

$$P_3: \frac{4 \text{ GHz} \times 10 \text{ sec}}{2.2} \approx 18.2 \times 10^9 \text{ instructions.}$$

c We are trying to reduce the execution time by 30% but this leads to an increase of 20% in the CPI. What clock rate should we have to get this time reduction?

#### **Solution:**

Based on the percentages given, we can model this with the following equation:

$$0.7 \times \text{CPU Time} = \frac{\text{Instruction Count} \times \text{CPI} \times 1.2}{\text{Clock Rate}}$$

So, to get the new clock rates for each processor we must multiply the old clock rate with the ratio  $\frac{1.2}{0.7}$ . Therefore, we have:

$$P_1: \frac{1.2}{0.7} \times 3 = 5.14286 \text{ GHz}$$

$$P_2: \frac{1.2}{0.7} \times 2.5 = 4.28571 \text{ GHz}$$

$$P_3: \frac{1.2}{0.7} \times 4 = 6.85714 \text{ GHz}$$

- 4. (15 pts) Consider two different implementations of the same instruction set architecture. The instructions can be divided into four classes according to their CPI (class A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3.0 GHz and CPIs of 2, 2, 2, and 2. Given a program with a dynamic instruction count of 1.0E6 instructions divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which is faster: P1 or P2?
  - a What is the global CPI for each implementation?

#### Solution:

For  $P_1$ , the global CPI can be calculated as follows:

$$1 \times 0.1 + 2 \times 0.2 + 3 \times 0.5 + 3 \times 0.2 = 2.6.$$

For  $P_2$ , the global CPI can be calculated as follows:

$$2 \times 0.1 + 2 \times 0.2 + 2 \times 0.5 + 2 \times 0.2 = 2.$$

b Find the clock cycles required in both cases.

For  $P_1$ , the time per cycle can be expressed as  $\frac{1}{2.5} \times 10^9 = 400$  psec. Using this, the total execution time is

$$400 \times 10^{-12} \times 10^6 \times 2.6 = 1040$$
 seconds.

For  $P_2$ , the time per cycle can be expressed as  $\frac{1}{3} \times 10^9 = 333.3$  psec. Using this, the total execution

$$333.3 \times 10^{-12} \times 10^{6} \times 2 = 666.6$$
 seconds.

Therefore,  $P_2$  is the faster processor.

5. (15 pts) Assume a program requires the execution of 5010<sup>6</sup> FP instructions, 11010<sup>6</sup> INT instructions, 8010<sup>6</sup> L/S instructions, and 1610<sup>6</sup> branch instructions. The CPI for each type of instruction is 1, 1, 4, and 2, respectively. Assume that the processor has a 2.0 GHz clock rate.

3

a By how much must we improve the CPI of FP instructions if we want the program to run two times faster?

# Solution:

First, we need to calculate how long it would normally take the processor to run the program. We use the performance equation to do this. So, the execution time of the program can be expressed as:

$$\frac{(50 \times 10^6) + (110 \times 10^6) + (4 \times 80 \times 10^6) + (2 \times 16 \times 10^6)}{2 \text{ GHz}} = 256 \times 10^6 \text{ psec.}$$

So, two find the new CPI for FP instructions for the program to run twice as fast, we set the original equation equal to  $128 \times 10^6$ , since that's half of the execution time, and solve for the new CPI:

$$\frac{(CPI_{NEW} \times 50 \times 10^6) + (110 \times 10^6) + (4 \times 80 \times 10^6) + (2 \times 16 \times 10^6)}{2 \text{ GHz}} = 128 \times 10^6$$

$$CPI_{NEW} = \frac{-103}{25}$$

Since the new CPI value we found is negative, and that's the only solution to the equation, we conclude that it is not possible to reduce the CPI enough to make the program run twice as fast.

b By how much must we improve the CPI of L/S instructions if we want the program to run two times faster?

# **Solution:**

Again, twice as fast would mean running the program in  $128 \times 10^6$  psec.

$$\frac{(50\times10^6)+(110\times10^6)+(CPI_{NEW}\times80\times10^6)+(2\times16\times10^6)}{2~\mathrm{GHz}}=128\times10^6$$
 
$$CPI_{NEW}=0.8$$

Therefore, the CPI of the L/S instructions would need to go from 4 to 0.8, for the program to run twice as fast.

c By how much is the execution time of the program improved if the CPI of INT and FP instructions is reduced by 40% and the CPI of L/S and Branch is reduced by 30%?

# Solution:

If the CPI of INT and FP instructions is reduced by 40%, those two new CPIs will both be 0.6 and if L/S and Branch CPIs are reduced by 30% the new CPIs will be 2.8 and 1.4 respectively.

Using these new values, the execution time can be calculated as follows:

$$\frac{(0.6\times50\times10^6)+(0.6\times110\times10^6)+(2.8\times80\times10^6)+(1.4\times16\times10^6)}{2~\mathrm{GHz}}=171.2\times10^6~\mathrm{psec}.$$

Therefore, we can say that this new time is

$$\frac{256\times10^6~\mathrm{psec}}{171.2\times10^6~\mathrm{psec}}\approx1.5$$

times faster than the original time.