

# **FACULTY OF ENGINEERING**

### DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

B.Sc. Engineering
2014 Intake Semester 2 Examination

### **CS2052 COMPUTER ARCHITECTURE**

Time allowed: 2 Hours January 2016

## **ADDITIONAL MATERIAL: None**

### **INSTRUCTIONS TO CANDIDATES:**

- 1. This paper consists of 4 questions in 5 pages.
- 2. Answer **ALL** questions.
- 3. Applicable Assembly instructions are given as **Annex**.
- 4. Start answering each of the main questions on a new page.
- 5. For MCQ and True/False questions, select the most appropriate answer, and write the corresponding sub-question number and the answer number in your answer book.
- 6. The maximum attainable mark for each question is given in brackets.
- 7. This examination accounts for 60% of the module assessment.
- 8. This is a closed book examination.

NB: It is an offence to be in possession of unauthorised material during the examination.

- 9. Only calculators approved by the Faculty of Engineering are permitted.
- 10. Assume reasonable values for any data not given in or with the examination paper. Clearly state such assumptions made on the script.
- 11. In case of any doubt as to the interpretation of the wording of a question, make suitable assumptions and clearly state them on the script.
- 12. This paper should be answered only in English.

Question 1 [25 marks]

Figure 1 shows the block diagram of a simple microprocessor (a.k.a. nano-processor). Answer following questions based on Figure 1.



Figure 1 – High-level diagram of the microprocessor.

- (i) "This microprocessor design is based on the Von Neumann architecture". [1 mark]
  True False
- (ii) What is the size of the Program Counter (i.e., number of bits *n* in Figure 1)? [2 marks]
  - a) 4.52 bits

b) 5 bits

c) 5.61 bits

d) 6 bits

| (iii                                                                                 | )           | How many bits are required to select a register from the Register Bank (i.e., number of bits $k$ in Figure 1)? [2 marks                                              |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
|--------------------------------------------------------------------------------------|-------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------|---------------------------------------------------------------------------------------------------------------------------|----------------------------|------------|---------------------------|-----------------------------|----------------------------|--|--|--|
|                                                                                      |             | a)                                                                                                                                                                   | 4.52 b  | its                                                                                                                       |                            | b)         |                           | 5 bits                      |                            |  |  |  |
|                                                                                      |             | c)                                                                                                                                                                   | 5.61 b  | its                                                                                                                       |                            | d)         |                           | 6 bits                      |                            |  |  |  |
| One of the instructions supported by the microprocessor can be defined as follows:   |             |                                                                                                                                                                      |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
|                                                                                      | Ins         | truction                                                                                                                                                             | ı       | SUB R1, R2                                                                                                                |                            |            |                           |                             |                            |  |  |  |
| -                                                                                    | Description |                                                                                                                                                                      |         | Subtract registers $R_1$ and $R_2$ and store results on $R_1$ , i.e., $R_1 \leftarrow R_1 - R_2$ . $R_1, R_2 \in [0, 31]$ |                            |            |                           |                             |                            |  |  |  |
|                                                                                      | m-l         | oit instr                                                                                                                                                            | uction  | 100                                                                                                                       | <i>k</i> -bits to indicate | Register R | $k_1 \mid k$              | z-bits to indicate Register | $R_2 \mid x \mid x \mid x$ |  |  |  |
| (iv                                                                                  | ·)          | What i                                                                                                                                                               | s the w | ord leng                                                                                                                  | gth of an instruct         | tion?      |                           |                             | [2 marks]                  |  |  |  |
|                                                                                      |             | a)                                                                                                                                                                   | 10 bits | }                                                                                                                         |                            | b)         |                           | 14 bits                     |                            |  |  |  |
|                                                                                      |             | c)                                                                                                                                                                   | 16 bits | }                                                                                                                         |                            | d)         |                           | 18 bits                     |                            |  |  |  |
| (v) If the Program ROM can store 50 such instructions, what is Program ROM in bytes? |             |                                                                                                                                                                      |         |                                                                                                                           |                            |            | , what is the total capac | ity of the [2 marks]        |                            |  |  |  |
|                                                                                      |             | a)                                                                                                                                                                   | 100 by  | rtes                                                                                                                      |                            | b)         |                           | 113 bytes                   |                            |  |  |  |
|                                                                                      |             | c)                                                                                                                                                                   | 800 by  | es                                                                                                                        |                            | d)         |                           | 900 bytes                   |                            |  |  |  |
| (vi                                                                                  | i)          | Above 16-bit Add/Subtract unit <b>cannot</b> be built using: [2 marks]                                                                                               |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
|                                                                                      |             | a)                                                                                                                                                                   | 16 Ful  | l Addei                                                                                                                   | 's                         | b)         |                           | Four 4-bit RCAs             |                            |  |  |  |
|                                                                                      |             | c)                                                                                                                                                                   | 15 Ful  | l Addei                                                                                                                   | s and 1 Half Ad            | der d)     |                           | 32 Half Adders              |                            |  |  |  |
| (vii) Write the machine code for the following instruc                               |             |                                                                                                                                                                      |         |                                                                                                                           |                            | ictic      | on.                       | [3 marks]                   |                            |  |  |  |
| SUB 0x12, 0x07                                                                       |             |                                                                                                                                                                      |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
| (viii                                                                                | 1)          | Briefly explain how the above instruction can be implemented on the microprocessor. Clearly identify what components to activate and how to activate them. [7 marks] |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
| (ix                                                                                  | <b>(</b> )  | Briefly discuss to whether we can extend the proposed microprocessor to have a 2-stage pipeline with "instruction fetch" and "instruction execute" stages. [4 marks] |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
| Question 2 [25 marks                                                                 |             |                                                                                                                                                                      |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
| (i                                                                                   | i)          | Show the schematic diagram of a 1-to-8 demultiplexer built using two 1-to-4 demultiplexers and other basic logic components. [3 marks]                               |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
| (ii                                                                                  | )           | A 3-bit counter has the following state transitions:                                                                                                                 |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
|                                                                                      |             | $001 \rightarrow 010 \rightarrow 100 \rightarrow 000 \rightarrow 001 \rightarrow 010 \rightarrow 100 \rightarrow 000 \rightarrow \dots$                              |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
|                                                                                      |             | Show the schematic diagram of this counter built using T Flip Flops. Your answer should include the truth table, Karnaugh Maps, and schematic diagram. [11 marks]    |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
| (iii                                                                                 | )           | Using the 2's Complement calculate <b>63 – 71</b> . Note that registers are 8-bit. [3 marks]                                                                         |         |                                                                                                                           |                            |            |                           |                             |                            |  |  |  |
| (iv                                                                                  | ·)          | Represent 2.718 using Single Precision, IEEE Floating Point standard. [4 marks]                                                                                      |         |                                                                                                                           |                            |            |                           |                             | [4 marks]                  |  |  |  |

Hint: Single Precision standard has a 23-bit mantissa, 8-bit exponent, and 1-bit sign. The exponent is calculated as E = e + 127.

- (v) For a given application, 40% of the instructions require memory access. Instruction miss rate is 2% and data miss rate is 4%. An instruction can be executed in 1 clock cycle. L1 cache access time is approximately 5 clock cycles while the L1 miss penalty is 64 clock cycles.
  - 1. What is the average memory access time for instructions? [2 marks]
    - a) 1.28 clock cycles

b) 5 clock cycles

c) 6.28 clock cycles

- d) 64 clock cycles
- 2. What is the average memory access time for data?

[2 marks]

a) 1.024 clock cycles

b) 2 clock cycles

c) 3.024 clock cycles

d) 5 clock cycles

Question 3 [25 marks]

(i) Given two integers x and y, write an Assembly program to identify the minimum value, i.e., min(x, y). For example, if x = 3 and y = 7 a designated register should contain 3. Provide comments for your code. [13 marks]

Assume microprocessor has 20, 8-bit general purpose registers labelled as R1, R2, ... R20.  $x, y \in [1, 127]$ . It is ok to assume x and y are already stored in two of the general purpose registers. Use only the instruction set given as Annex.

- (ii) Briefly discuss which I/O technique among Pooling, Interrupt Driven, or Direct Memory Access (DMA) is most suitable for the following applications. [4 x 2 marks]
  - 1. When you order food at a fast food restaurant you are given a number. Orders are processed on the First Come First Serve (i.e., FIFO) basis. Then once your order is ready, your order number is to be displayed using a *counter* that uses three 7-segement displays. When your number is displayed you go and pick up the food.

You are required to design a simple embedded system to control this *counter*, which typically needs an increment button and a reset button.

- 2. A Time and Attendance (T&A) system to be developed to record the arrival and leaving time of employees. When an employee arrives, he/she enters the 4-digit employee ID (via a key pad) and press the *arrive* button. When he/she leaves, again the 4-digit ID is entered followed by the *leave* button. Rest of the time the T&A system is expected to update relevant database records such as whether an employee is present on a given day, how many hours he/she worked, number of leaves taken, and calculate overtime.
- (iii) How would you improve the following program to benefit from spatial and/or temporal locality in caching? Discuss while presenting an optimized program.

A and b are two  $n \times n$  matrices. Memory layout is row major.

[4 marks]

```
for (i= 0; i < n; i++)

for (j = 0; j < n; j++)

a[j][i] = b[j][i] + 2
```

Question 4 [25 marks]

Suppose, just after graduation you were hired by Intel Inc. Your team is responsible for designing the 6<sup>th</sup> generation of Intel Core processors. Intel is hoping that the chips will kill off the tablet trend and persuade consumers to invest in hybrid or all-in-one Windows-based devices instead.

6<sup>th</sup> generation of Intel Core processors are expected to be capable of starting up in 0.5 seconds and have 2.5 times the performance of previous generations. It is also possible that devices running on the chip will offer users up to 3x more battery life than what they are currently used to.

- Part of the write up is extracted from www.wired.com

- (i) Briefly explain what type of a design you would recommend for each of the following items. Provide at least 2 justifications for each of your selections.
  - 1. Number of processing elements (e.g., single core or multi core). [3 marks]
  - 2. No of pipeline stages. [3 marks]
  - 3. Memory hierarchy (e.g., number of levels, cache size, and cache associativity). [4 marks]
  - 4. What is the expected performance-to-power gain? [2 marks]
- (ii) Based on performance tests it has been identified that the memory sub-system on the 5<sup>th</sup> generation Intel Core processors is a major performance bottleneck. This is due to memory system not being able to provide the required bandwidth when each CPU core is running at its full speed. Waiting time for memory was 65% of the total time. Therefore, one of your team members suggested that by further increasing the memory bandwidth, it would be possible to gain the desired overall speedup of 2.5.
  - 1. How much speed up in the memory sub-system will be required to achieve the desired overall speed up? [ 3 marks]

You may use the Amdahl's law given below for the calculation:

$$Speedup_{Overall} = \frac{1}{(1 - Fraction_{Enhanced}) + \frac{Fraction_{Enhanced}}{Speedup_{Enhanced}}}$$

- 2. Do you agree with your team member's idea of gaining such a speedup by improving only the bandwidth of the memory sub-system? Explain. [3 marks]
- "6th generation processor will be designed for Solid State Disk (SSD) based systems".
   Will this improve the overall performance and energy efficiency? Discuss. [4 marks]
- (iv) "6<sup>th</sup> generation processor to have even more I/O pins supporting more Parallel Connections among components on a motherboard such as RAM, video card, and network card. This will further speed up the system."

Do you agree or disagree with this suggestion? Briefly discuss. [3 marks]

| E | ND OF | THE P. | APER |  |
|---|-------|--------|------|--|
|---|-------|--------|------|--|