**CprE 381 – Computer Organization and**

**Assembly Level Programming**

**Fall 2013**

**Homework #1**

**Assigned: Friday Aug 30**

**Due: Friday Sep 6, Midnight**

Notes:

* Do this week’s reading before you start on the homework.
* Complete the assignment electronically and submit it on BlackBoard Learn.
* Late homework is accepted within three days from the due date. Late penalty is 10% per day.

**Textbook Reading:** Ch. 1 Computer Abstractions, Sec. 1.1-1.4

1. [15] Textbook **exercise 1.2 revised**.

Consider the two different configurations shown in the table. Answer the following questions for each configuration. Note for memory size, 1K = 210, 1M = 220, and 1G = 230 but for processor and network speed, 1K = 103, 1M = 106, and 1G = 109. Assume file size uses the same K/M/G denotation as memory size.

|  |  |  |  |
| --- | --- | --- | --- |
| Configuration | Resolution | Main Memory | Ethernet Network |
| A | 1280 X 800 | 4 Gbytes | 100 Mbit |
| B | 3200 X 1800 | 8 Gbytes | 1 Gbit |

* 1. [5] For a color display using 8 bits for each of the primary colors (red, green, blue) per pixel, what should be the minimum size in bytes of the frame buffer to store a frame?

**A: 3 (RGB) \* 8 bits \* 1280 \* 800 pixels = 24,576,000 bits = 3,072,000 bytes**

**B: 3 \* 8 \* 3200 \*1800 = 138,240,000 bits = 17,280,000 bytes**

* 1. [5] How many frames could the memory store, assuming the memory contains no other information?

**A: 4\*2^30 / 3,072,000 = 1398.1**

**B: 8\*2^30 / 17,280,000 = 497.1**

* 1. If a 256 Kbytes file is sent through the Ethernet connection, how long would it take?

**A: 256Kbytes = 2048 kBits; 2,048,000 bits \* 1 s / 100,000,000 bits = 20.48 ms**

**B: 2,048,000 bits \* 1s / 1,000,000,000 bits = 2.048 ms**

1. [15] Textbook **exercise 1.3** **revised**.

Consider three different processors P1, P2, and P3 executing the same instruction set and a same program with the clock rates and CPIs (for a given benchmark program) given in the following table.

Processor Clock rate CPI

P1 3.0 GHz 1.8

P2 2.4 GHz 0.9

P3 4.0 GHz 2.5

1. [5] If the processors each execute the program for a time of 10 seconds, find the number of cycles and the number of instructions that the execution takes. (Assume the CPI is constant through the program execution.)

**P1: 3.0 GHz \* 10 seconds = 30 Gcycles**

**P2: 24 Gcycles**

**P3: 40 Gcycles**

**P1: 30 Gcycles / (1.8 cycles/Instruction) = 16.67 G-instructions (16.67 billion)**

**P2: 24 G / 0.9 = 26.67 billion instructions**

**P3: 40 G / 2.5 = 16 billion instructions**

1. [5] Which processor has the highest performance expressed in instructions per second?

**Processor 2, at 2.6 billion instructions per second**

1. [5] We are trying to reduce the time by 20% but this leads to an increase of 30% in the CPI. What clock rate improvement (in percentage) should we have to this time reduction?

**T = CPI \* Ni / Fc**

**Set all to 1. Then, want T to = .8, and CPI now equals 1.3; instruction count remains constant:**

**.8 = 1.3 / Fc**

**Fc = 1.3 / .8 = 1.625**

* **Need a 62.5% increase in clock speed.**

1. [15] Textbook **exercise 1.8**. **Read textbook Section 1.5 before you work on this problem.**

Suppose we have developed a new version of a processor with the following characteristics.

|  |  |  |
| --- | --- | --- |
| Version | Voltage | Clock Rate |
| Old | 1.75V | 1.5GHz |
| New | 1.2V | 2.0GHz |

* 1. [5] Assume the dynamic power has been reduced by 10%. How much has the capacitive load varied?

**Old: P = Cl \* 1.75^2 \* 1.5 G**

**New: P = Cl \* 1.2^2 \* 2 G**

**Set P = 1; new P = .8; new Cl/ old Cl = .784 => Cl has changed by 21.6%**

* 1. [5] Assume the capacitive load has not changed. How much has the dynamic power been reduced?

**Old P = 1.75^2 \* 1.5 G**

**New P = 1.2^2 \*2 G**

**Pn/Po = .627 => Power reduced by 33.3%**

* 1. [5] Assume the capacitive load of the new version is 80% the capacitive load of the old version, and the dynamic power has been reduced by 40%. What is the voltage of the new version?

**\*\*ASSUMING SAME CLOCK FREQUENCY???\*\***

**1(P) = 1^2 (V) \* 1 (Cl)**

**.6(P) = V^2 \* .8 (Cl)**

**New voltage is 86.6% of old voltage**

1. [15**]** In this exercise, you will see how cache performance will dramatically affect the program execution time.

Download the program dmm1.c and dmm2.c (attached), compile them on a machine of your choice, execute the programs and collect their execution time. For information, those programs implement the matrix multiplication.

If you want to use a Linux computer, you may remotely log on a college/department server. The server names and login method are described here: <http://it.engineering.iastate.edu/remote/>. You may compile the programs by command like “gcc –O dmm1.c –o dmm1” in a terminal, time the execution by command like “time ./dmm1”, and report the “user” time component as the execution time.

1. [10] **Report** the execution times of the two programs running on a computer of your choice. **Also report** the ratio of the two times. **Describe** the computer (make/model) and its processor (make/model/frequency).
2. [5] The two programs are identical except that two lines are swapped. **Find out** those two lines and cut & paste here. Then, try to **explain** why the swap may cause such a difference in execution time. You will get the credit as long as there is a merit in your reasoning.