

## United International University Department of Computer Science and Engineering

CSE 313: Computer Architecture

Final Examination Time: 1 Hour 45 Minutes Date:

Name: ID:

Answer any 4 (Four) questions

1. You have three processors **P1**, **P2** and **P3**. **P1** executes instructions sequentially and **P2**, **P3** supports pipelining. **P1** has cycle time 500 ps, **P2** has 100 ps and **P3** 150 ps. Now consider the following table -

Table 1: Instruction Clock Cycles

| Instruction | Fetch  | Register Read | ALU Op  | Memory Access | Register Write |
|-------------|--------|---------------|---------|---------------|----------------|
| lw          | 200 ps | 50  ps        | 250  ps | 100 ps        | 50  ps         |
| add         | 200 ps | 100 ps        | 250  ps |               | 50 ps          |
| sw          | 200 ps | 50 ps         | 250  ps | 100 ps        |                |

The program consists of the following instructions -

lw \$1, 12(\$8)

lw \$2, 20(\$8)

add \$3, \$2, \$1

sw \$3, 100(\$10)

(a) How many cycle does **P1**, **P2** and **P3** need to execute the code? Show with proper diagram.

[7]

[5]

- (b) Which processor will perform better? Explain with the help of Execution Time. [3]
- 2. (a) Draw a Datapath with control signals that can support both R and J type instructions.
  - (b) Explain the control signals used in the Question 2(a) [2]
  - (c) Explain the following instruction using the design of Question 2(a). Assume Register \$7 has value 10 and \$10 has 20.

add \$2, \$7, \$10

3. The following program was written by a student -

lw \$4, 200

lw \$5, 100

sub \$7, \$4, \$5

lw \$1,500

beq \$1, \$5, 11

11: add \$3, \$4, \$5

Use the given CPI Table to calculate the necessary performance metrics.

|     | R Type | I Type | J Type |
|-----|--------|--------|--------|
| CPI | 6      | 7      | 4      |

Table 2: CPI Table

(a) Find Total clock cycles, AVG<sub>CPI</sub> and CPU time given that clock cycle time 10ms

[3]

[3]

[4]

[1]

[2]

[2]

[2]

[2]

- (b) Assume the above code spends 15% time to execute I-type instruction and we add some additional mechanism which improved the execution of I-type instructions by the factor of 4. Calculate the improved execution time.
- (c) Now consider that you have a program containing 1200 Million instructions. You have two processors A and B. A has clock rate 4 GHz and B 5 GHz. A needs 7 Million cycles and B needs 15 Million cycles to execute the same program. Compare their performances in terms of MIPS.
- 4. As you have learned that pipelining speeds up the processor operation at staggering speed, you wanted to incorporate pipelining in the most advanced nuclear reactor that you wanted to use for peaceful purposes. There were some old fashioned part which were not affected by pipelining at all.
  - (a) Show differences between Sequential and Combinatorial logic units with examples. [1]
  - (b) Will pipelining affect the performances of both of type of logic units? Justify your answer.
  - (c) When you are about to celebrate the success, you suddenly found out that some instructions gave incorrect results even though you checked, double-checked the codes. What kind of hazards may arise from pipelining? Describe briefly.
  - (d) After tiresome debugging, it was found out by some clever HCI-man that Data Hazard was causing lots of troubles for our reactor. Give one case with proper figures where such cases can arise.
  - (e) Now that you know what kind of hazards have occurred, you need to come up with prompt solutions. Our project manager always favors hardware solutions. But our deputy manager is from CS background. He prefers software solutions over anything else. Show two different solutions that you can present to each of them.
  - (f) Finally, we are recovered from the Hazardous situations. Now we want to provide the ever enthusiast crowd an estimated time of when our peaceful nuclear reactor would start working. For that we need to calculate the Average Memory Access Time for a CPU with 1ns clock cycle time. We have the following data -

| I-cache miss rate      | 1%         |
|------------------------|------------|
| D-cache miss rate      | 10%        |
| Miss Penalty           | 200 Cycles |
| Base CPI (Ideal Cache) | 2          |
| Load Instructions      | 10%        |
| Store Instruction      | 20 %       |

- 5. As a proud computer architect of Xavier Institute for Mutant Education and Outreach, you are given a special task to organize the memory of a computer that would be needed to stop Magneto. There are only two constraints, i) it has to be cost effective and ii) it has to be better performing than old hard disks of Logan era. So, you decided to incorporate a Cache Memory to speed up the performances in your hierarchical organization.
  - (a) Explain the principles that this Cache Memory can exploit for faster memory access. Provide necessary examples.

[2]

[2]

- (b) Professor X, the head of the institute, can only control 1024 blocks in the Main Memory. Each block contains 16 bytes. You have budget to design a Cache Memory with only 64 entries. Draw a Direct Mapped Cache Memory with 32 bit Cache Address Field. Properly mention the size of the Byte Offset, Index field and Tag field in Cache Address. Calculate the amount of Data that can be stored into the Cache Memory at a time.
- (c) Byte Number 200 contains the Mind Stone information that is crucial to stop Magneto's plan. In your designed Cache Memory, where would the Mind Stone be mapped into?
- (d) Now, with the help of Mindstone, Prof X learned about set associativity. He wanted you to implement 2-way set associativity instead of Direct Mapping. Will it improve the overall performance? Justify your answer.
- (e) Like many other mutant students, you did not bother to change the Cache Address field while implementing 2-way set associativity. What would be the amount of data that can be stored in Cache Memory now?