

## United International University Department of CSE CSE 313: Computer Architecture Final Examination Summer 2022

Time: 2 Hours

Full Marks: 40

Any examinee found adopting unfair means will be expelled from the trimester/program as per UIU disciplinary rules.

[N.B.: Answer all the questions. Assume any data if it is not mentioned explicitly.]

a) Modify the block diagram for the single-cycle datapath so that it can execute the [5] following instruction "X". Note that the instruction works with three operands and checks if the value of the second register is less than the value of the third register. If so, set the value of the first register to 1.

The instruction format is given as: X \$s0, \$s1, \$s2

OP no not not st

Also write down the control unit values for this instruction.

b) Modify the block diagram for the single-cycle datapath so that it can execute the following instruction "Y". The instruction jumps to any function called. It first saves the returning address to register 31, then jumps to the target address (the starting address of the function called).

The instruction format is given as: Y 1024

21

Also write down the control unit values for these instructions.

Consider a processor that goes through the following six stages while executing an instruction. The duration of each stage (in ps) is given underneath it:

| Instruction | Instruction | Register | ALU       | Memory | Register |
|-------------|-------------|----------|-----------|--------|----------|
| Fetch       | Decode      | Read     | Operation | Access | Write    |
| 250         | 50          | 150      | (300)     | 250    | 150      |



LOOP: beq \$s5, \$zero, EXIT sll \$t0, \$t0, 2 lw \$s2, \$t0(\$s0) add \$s1, \$s1, \$s2 addi \$s5, \$s5, -1 j LOOP EXIT:

Consider the following code snippet.

add \$50, \$51, \$52 lw \$51, 16(\$50) sub \$52, \$54, \$56 sw \$57, 12(\$50) add \$52, \$57, \$58



Now answer the following questions:

- b) If basic pipelining is implemented in your processor, calculate the execution time of the [2.5] instruction snippet with proper timing diagram.
- c) Suggest a hardware change that you can implement in your processor to improve the [5] execution time of the instruction snippet. What would be the execution time after this change? Include a timing diagram in your answer.
- d) Explain how an optimized compiler can improve the execution time of the instruction snippet further. In your answer, clearly show any changes that might be brought in the given instruction snippet.
- 2) Consider a cache memory of size 8KB and block size having 8 words (1 word = 4 bytes). [5]

  Determine the miss rate if the following bytes are addressed sequentially.

  21, 64, 2058, 128, 2078
  - b) If we change the block size in Q3(a) to 4 words, find out the miss rate for similar memory [5] address access. Find out the miss rate and explain the principle of locality.
  - c) Consider a 512KB cache memory with 4 words/block. Determine the actual size of the cache. [2]

Consider the following code. We will parallelize only the nested loop part.

4.



- a) Time for one single arithmetic operation is 2ns. Find the speedup with respect to the [2] runtime of the given code for a hexa-core processor vs a single core processor for x=4, y=4, and z=4.
- b) Now, again find the speedup for the same code using the value x=5, y=5, and z=5; [2]
- c) Compare the speedup obtained in part (a) and in part (b). Is it strong scaling or weak? [2] Give a brief description.

1 1 16 EST