

**MID-TERM QUESTION SOLUTIONS** 

# COMPUTER ARCHITECTURE CSE 3313

**SOLUTION BY** 

MD. NURUL ALAM ADOR

**UPDATED TILL SPRING 2024** 

# Index

| Trimester   | Page |
|-------------|------|
| Spring 2024 | 3    |
| Fall 2023   | 9    |
| Summer 2023 | 15   |
| Spring 2023 | 20   |

# **Spring 2024**

1. Consider the following MIPS code:

```
1
     block:
          add $sp, $sp, -8
 2
          sw $ra, $zero($sp)
 3
 4
          sw $s0, 4($sp)
 5
          addi $v0, 0, 0
 6
 7
          addi $s0, $zero, 1
 8
 9
     loop:
          slti $t0, $a0, s0
bne $t0, $zero, end_loop
10
11
          add $v0, $v0, $s0
12
          addi $s0, $s0, 1
13
14
          jump loop
15
16
     end_loop:
          lw $ra, 4($sp)
17
18
          lw $s0, 8($sp)
19
          addi $sp, $sp, 12
20
          jr $ra
```

| Instruction | CPI    |
|-------------|--------|
| add         | 2      |
| addi        | 3      |
| sub         | 2      |
| lw          | 4<br>4 |
| SW          | 4      |
| sll         | 1      |
| srl         | 1      |
| slt         | 4      |
| slti        | 8      |
| beq         | 4      |
| bne         | 4      |
| j           | 8      |
| jr          | 4<br>4 |
| jal         | 4      |

Table 1: CPI for each instruction for program la

# [N.B. You should assume the value of \$a0 = 3]

1. a) Find the errors and fix them in the above code.

# **Solution:**

The code has been rewritten below with error correction:

```
block:
    addi $sp, $sp,-8
                                    # addi should use for constant operand
         $ra, 0($sp)
                                    # offset of sw should be constant
         $s0, 4($sp)
    SW
    addi $v0, $zero, 0
                                    # only last operand is constant in addi
    addi $s0, $zero, 1
loop:
         $t0, $a0, $s0
    slt
                                    # last operands of slti cannot be register
    bne $t0, $zero, end_loop
         $v0, $v0, $s0
    add
    addi $s0, $s0, 1
    j
         loop
                                    # jump is not a valid instruction
end_loop:
         $ra, 0($sp)
    lw
                                    # lw addresses doesn't match with sw
          $s0, 4($sp)
    lw
    addi $sp, $sp, 8
                                    # 8 was allocated for $sp
    jr
          $ra
```

1. b) As you get error-free code from 1a, **Write** the value of \$vo and \$s0 registers after executing the program.

# **Solution:**

The value of the following registers has been written below:

$$v0 = 6$$
  
 $s0 = 4$ 

1. c) Consider a CPU with a *4GHz* clock rate and CPI in the following table-1. **Calculate** the CPU time for executing the program 1a.

# **Solution:**

Given,

CPU Clock Rate = 
$$4 GHz$$
  
=  $4 \times 10^9 Hz$ 

Here,

| Instruction              | addi | SW | slt | bne | add | j | lw | jr |
|--------------------------|------|----|-----|-----|-----|---|----|----|
| Instuction<br>Count (IC) | 7    | 2  | 4   | 4   | 3   | 3 | 2  | 1  |
| CPI                      | 3    | 4  | 4   | 4   | 2   | 8 | 4  | 4  |

Now,

Clock Cycle = 
$$IC \times CPI$$
  
=  $7 \times 3 + 2 \times 4 + 4 \times 4 + 4 \times 4 + 3 \times 2 + 3 \times 8 + 2 \times 4 + 1 \times 4$   
=  $103$ 

We know,

CPU Time = Clock Cycle × Clock Time
$$= \frac{\text{Clock Cycle}}{\text{Clock Rate}}$$

$$= \frac{103}{4 \times 10^9}$$

$$= 2.575 \times 10^{-8} \text{ s}$$

1. d) Consider program la running on another computer that requires 160ns, with 40ns spent executing FP instructions, 90ns executed L/S instructions, and 30ns spent executing branch instructions. What is the improvement factor using Amdahl's law if we only improve the performance of L/S instructions using a better ALU to get the program completion time improved by 2x?

# **Solution:**

Here,

$$T_{affected} = 90 \text{ ns}$$
 [L/S instructions time]  
 $T_{uneffected} = (160 - 90) \text{ ns}$   
 $= 70 \text{ s}$   
 $T_{improved} = \frac{160}{2} \text{ ns}$   
 $= 80 \text{ ns}$ 

Improvement factor, n = ?

We know,  $T_{improved} = \frac{T_{affected}}{n} + T_{unaffected}$  or,  $80 = \frac{90}{n} + 70$  or,  $\frac{90}{n} = 10$  or,  $\frac{90}{10} = n$   $\therefore n = 9$ 

 Consider the following C function. Compiler will assign a (base address) into \$s0, b (base address) into \$s1, x into \$s2 and i into \$s3.

```
1
     int main() {
 2
          int x=1,i=32,a[10],b[10];
 3
 4
              if(a[2*i]==b[2*(i+1)]){
 5
                  x += a[2*i] - b[2*(i+1)];
 6
                  break;
 7
              else if(a[2*i] & b[2*(i+1)]){
 8
 9
                  i = i+x;
10
                  continue;
11
12
              else{
                i = i- 3;
13
14
15
              i = i+1;
          }while(i%4);
16
17
          return 0;
     }
18
```

# [N.B. You should assume the value of \$a0 = 3]

2. a) Convert the code to the corresponding MIPS assembly instructions.

# **Solution:**

```
main:
   addi $sp, $sp, -16
   sw $s0, 12($sp)
   sw $s1, 8($sp)
   sw $s2, 4($sp)
   sw $s3, 0($sp)
   addi $s2, $zero, 1
   addi $s3, $zero, 32

loop:
   sll $t0, $s3, 2
```

```
add
          $t0, $s0, $t0
          $t1, 0($t0)
    lw
    addi $t0, $s3, 1
    sll $t0, $t0, 2
add $t0, $s1, $t0
          $t2, 0($t0)
    lw
if:
          $t1, $t2, else_if
    bne
         $t0, $t1, $t2
    sub
    add $s2, $s2, $t0
          end
    j
else_if:
          $t0, $t1, $t2
$t0, $zero, else
$s3, $s3, $s2
    and
    beg
    add
    j
          loop
else:
    addi $s3, $s3, -3
    addi $s3, $s3, 1
and $t0, $s3, 3
          $t0, $zero, loop
    bne
end:
          $s3, 0($sp)
    lw
          $s2, 4($sp)
    lw
          $s1, 8($sp)
    lw
          $s0, 12($sp)
    lw
    addi $sp, $sp, 16
    jr $ra
```

2. b) Convert the first 8 lines of MIPS assembly instructions to the corresponding machine code after entering the loop body. No need to convert it to binary.

# **Solution:**

| 1. | ор | rs | rt | rd  | shamt | funct |
|----|----|----|----|-----|-------|-------|
|    | 0  | X  | 19 | 8   | 2     | 0     |
| 2. | ор | rs | rt | rd  | shamt | funct |
|    | 0  | 16 | 8  | 8   | X     | 32    |
| 3. | ор | rs | rt | C/A |       |       |
|    | 35 | 8  | 9  | 0   |       |       |
| 4. | op | rs | rt | C/A |       |       |
|    | 8  | 19 | 8  | 1   |       |       |
| 5. | ор | rs | rt | rd  | shamt | funct |
|    | 0  | X  | 8  | 8   | 2     | 0     |

| 6.  | ор | rs | rt | rd        | shamt | funct |
|-----|----|----|----|-----------|-------|-------|
|     | 0  | 17 | 8  | 8         | X     | 32    |
| 7.  | op | rs | rt | C/A       |       |       |
|     | 35 | 8  | 10 | 0         |       |       |
| 8.  | ор | rs | rt | C/A       |       |       |
|     | 5  | 10 | 9  | else_if/4 | 4     |       |
| 9.  | ор | rs | rt | rd        | shamt | funct |
|     | 0  | 9  | 10 | 8         | X     | 34    |
| 10. | ор | rs | rt | rd        | shamt | funct |
|     | 0  | 18 | 8  | 18        | Х     | 32    |

2. c) Assume that the processor has 64 registers. The size of MIPS instruction is 32 bits and 6 bits are reserved for opcode. The structure for addi instruction is given in the table-2. Find out the maximum constant value for addi instruction in MIPS. Note that, MIPS supports negative constants.

| opcode | rs | rt | constant |
|--------|----|----|----------|
|--------|----|----|----------|

Table 2: Structure of addi instruction

# **Solution:**

We know,

Structure of addi instruction,

| opcode | rs     | rt     | constant |
|--------|--------|--------|----------|
| 6 bits | 5 bits | 5 bits | 16 bits  |

We have 16 bits space for constants. Since MIPS supports negative constant, 1 bit used as sign bit and remaining 15 bits store constant value.

$$\therefore$$
 Maximum constant value =  $2^{15} - 1$   
= 32767

3. Using the division algorithm, **show** each step of the division of 15 by 4.

# **Solution:**

$$15 \div 4$$

0100 (DR)

Initially: A, Q: 0000 1111 
$$M = 0100$$
  $1011$   $M = 0100$   $-M$ : 1100  $-M = \frac{+1}{1100}$ 

Step 1: A, Q: 0001 111\_ A = A-MA, Q: 1101 111<u>0</u> = 0001+1100 A, Q: 0001 1110 = 1101 Step 2: A, Q: 0011 110\_ A = A-MA, Q: 1111 110<u>0</u> = 0011+1100 A, Q: 0011 1100 = 1111 A, Q: 0111 100\_ Step 3: A = A-M= 0111+1100 A, Q: 0011 100<u>1</u> = 0011 **Step 4:** A, Q: 0111 001\_ A = A-M**= 0111+1100** A, Q: 0011 001<u>1</u> = 0011

- ∴ Reminder, A = 0011 (3)
- $\therefore$  Quotient, Q = 0011 (3)

# **Fall 2023**

- a) Consider three different processors P1, P2, and P3 executing the same instruction set architecture (ISA). P1 has a 3GHz clock rate and a CPI of 1.5. P2 has a 2.5GHz clock rate and a CPI of 1.0. P3 has a 4.0GHz clock rate and has a CPI of 2.2.
  - i) Find out which processor performs better by calculating CPU time.
  - ii) If each processor's execute a program in 10 seconds, find the number of clock cycles and the number of instructions.

# **Solution:**

CPU Time 
$$_{Pl} = \frac{IC_{Pl} \times CPI_{Pl}}{Clock Rate_{Pl}}$$
$$= \frac{I \times 1.5}{3 \times 10^{9}}$$
$$= I \times 5 \times 10^{10} \text{ s}$$

For processor P2,

CPU Time 
$$_{P2} = \frac{IC_{P2} \times CPI_{P2}}{Clock Rate_{P2}}$$

$$= \frac{I \times 1}{2.5 \times 10^9}$$

$$= I \times 4 \times 10^{-10} \text{ s}$$

For processor P3,

CPU Time 
$$_{P3} = \frac{IC_{P3} \times CPI_{P3}}{Clock Rate_{P3}}$$
$$= \frac{I \times 2.2}{4 \times 10^9}$$
$$= I \times 5.5 \times 10^{10} s$$

Here,

CPI 
$$_{Pl} = 1.5$$
  
Clock Rate  $_{Pl} = 3 GHz$   
 $= 3 \times 10^9 Hz$ 

CPI 
$$_{P2} = 1$$
  
Clock Rate  $_{P2} = 2.5 GHz$   
 $= 2.5$   
 $\times 10^9 Hz$ 

$$\times 10^9 Hz$$

CPI  $_{P3} = 2.2$ 

Clock Rate  $_{P3} = 4 GHz$ 
 $= 4 \times 10^9 Hz$ 

IC  $_{P1} = IC$   $_{P2} = IC$   $_{P3} = I$ 

Since CPU Time  $p_1 > CPU$  Time  $p_2 > CPU$  Time  $p_3$ , : Processor P2 perform better.

ii) For processor P1,

Clock Cycle 
$$_{\text{Pl}}$$
 = CPU Time × Clock Rate  $_{\text{Pl}}$  =  $10 \times 3 \times 10^9$  =  $30 \times 10^9$  | Here,  $_{\text{CPl}}$   $_{\text{Pl}}$  =  $\frac{\text{Clock Cycle}}{\text{CPl}}$   $_{\text{Pl}}$  =  $\frac{30 \times 10^9}{1.5}$  =  $2 \times 10^{10}$  | CPU Time =  $10 \text{ s}$ 

For processor P2,

Clock Cycle 
$$_{P2}$$
 = CPU Time × Clock Rate  $_{P2}$   
=  $10 \times 2.5 \times 10^9$   
=  $25 \times 10^9$   
IC  $_{P2}$  =  $\frac{\text{Clock Cycle}_{P2}}{\text{CPI}_{P2}}$  =  $\frac{25 \times 10^9}{1}$  =  $2.5 \times 10^{10}$ 

For processor P3,

Clock Cycle 
$$_{P3}$$
 = CPU Time × Clock Rate  $_{P3}$   
=  $10 \times 4 \times 10^9$   
=  $40 \times 10^9$   
IC  $_{P3}$  =  $\frac{\text{Clock Cycle}_{P3}}{\text{CPl}_{P3}}$  =  $\frac{40 \times 10^9}{2.2}$  =  $1.8 \times 10^{10}$ 

- Consider a program running on a computer that requires 280ns, with 80ns spent executing FP instructions, 180ns executed L/S instructions, and 20ns spent executing branch instructions..
  - i) By how much is the total time reduced if the time for FP operations is reduced by 20?
  - ii) What is the improvement factor using Amdahl's law if we only improve the performance of L/S instructions using a better ALU to get the program completion time improved by 2x?

# **Solution:**

i) Here,

Previous total time =  $280 \ ns$ L/S instruction execution time =  $180 \ ns$ Branch instruction execution time =  $20 \ ns$ 

After reducing by 20 ns,

New FP instruction execution time = 
$$80 - 20 ns$$
  
=  $60 ns$ 

$$\therefore \text{ New total time} = 180 + 20 + 60 \text{ ns}$$
$$= 260 \text{ ns}$$

$$\therefore$$
 Total time reduced =  $280 - 260 ns$   
=  $20 ns$ 

ii) Here,

$$T_{affected} = 180 \ ns$$
 [L/S instructions time]  $T_{uneffected} = (280 - 180) \ ns$   $= 100 \ ns$   $T_{improved} = \frac{280}{2} \ ns$   $= 140 \ s$ 

Improvement factor, n = ?

We know,

$$T_{improved} = \frac{T_{affected}}{n} + T_{unaffected}$$
 or,  $140 = \frac{180}{n} + 100$  or,  $\frac{180}{n} = 40$ 

```
or, \frac{180}{40} = n

\therefore n = 4.5
```

2. Consider the following C function. Assume necessary registers.

```
1
          2000: int check_func(int a[],int x){
 2
                     if(x%2){
 3
                         a[x] = a[x]-x;
 4
                         return a[x];
 5
                     }
 6
 7
          3000: int get_sum(int a[],int b[],int k){
                     int sum = 0;
 8
 9
                     for(int i=0;i<k;i++){</pre>
10
                         if(check_func(a[i+1],i)){
                              b[i] = sum + (a[i]-b[i])/2;
11
12
                             sum = b[i];
                         }
13
                     }
14
15
                     return sum;
16
17
                int main() {
     PC->5000:
18
                     int res=0, n=10;
19
                     int a[n],b[n];
20
                     res = get_sum(a,b,n);
21
                     res = res*9;
22
                     return 0;
23
                }
```

2. a) Convert the code to the corresponding MIPS assembly instructions.

# **Solution:**

```
5000:
         addi $s0, $zero, 0
         addi $s1, $zero, 10
add $a0, $s2, $zero
  5004:
                                                Assuming,
  5008:
                                                res in $s0
  5012:
         add $a1, $s3, $zero
                                                n in $s1
         add $a2, $s1, $zero
  5016:
                                                a in $s2
         jal
  5020:
              get_sum 3000
                                                b in $s3
         add $s0, $v0, $zero
  5024:
              $t0, $s0, 3
  5028: sll
  5032:
         add $s0, $t0, $s0
         addi $v0, $zero, 0
  5036:
get_sum:
  3000:
         addi $sp, $sp, -8
                                                Assuming,
  3004: sw
              $s0, 4($sp)
                                                sum in $s0
  3008:
              $s1, 0($sp)
         SW
                                                i in $s1
         addi $s0, $zero, 0
  3012:
         addi $s1, $zero, 0
  3016:
loop:
```

```
slt
              $t0, $s1, $a2
  3020:
              $t0, $zero, exit_loop 3144
  3024:
         beg
         addi $t0, $s1, 1
  3028:
         sll
              $t0, $t0, 2
  3032:
              $t0, $a0, $t0
  3036:
         add
              $t1, 0($t0)
  3040:
        lw
  3044:
         addi $sp, $sp, −12
              $a0, 8($sp)
  3048:
         \mathsf{SW}
              $a1, 4($sp)
  3052:
         SW
  3056: sw
              $ra, 0($sp)
  3060:
             $a0, $t1, $zero
         add
         add $a1, $s1, $zero
  3064:
  3068:
         jal check_func 2000
  3072: lw
              $ra, 0($sp)
              $a1, 4($sp)
  3076: lw
              $a0, 8($sp)
  3080:
        lw
         addi $sp, $sp, 12
  3084:
              $v0, $zero, <del>inc_i</del> 3132
  3088:
         beq
              $t0, $s1, 2
  3092:
         sll
              $t1, $a0, $t0
  3096:
         add
  3100:
             $t2, $a1, $t0
         add
  3104: lw
              $t3, 0($t1)
              $t4, 0($t2)
  3108: lw
  3112: sub
              $t0, $t3, $t4
              $t0, $t0, 1
  3116:
         srl
         add $t0, $s0, $t0
  3120:
              $t0, 0($t2)
  3124: sw
  3128: lw
              $t0, 0($t2)
  3132: add $s0, $t0, $zero
inc_i:
  3136:
         addi $s1, $s1, 1
              <del>loop</del> 3020
  3140: j
exit_loop:
  3144: add $v0, $s0, $zero
              $s1, 0($sp)
  3148: lw
             $s0, 4($sp)
  3152: lw
         addi $sp, $sp, 8
  3156:
  3160:
        jr
              $ra
check_func:
  2000:
         andi $t0, $a1, 1
              $t0, $zero, exit_check 2036
  2004:
         beg
              $t0, $a1, 2
  2008:
         sll
  2012:
             $t1, $a0, $t0
         add
              $t2, 0($t1)
  2016: lw
              $t0, $t2, $a1
  2020: sub
  2024: sw
              $t0, 0($t1)
  2028: lw
              $t0, 0($t1)
         add $v0, $t0, $zero
  2032:
exit_check:
  2036:
         jr
              $ra
```

Convert the first 12 lines of get\_sum() function's assembly instructions to the corresponding machine code. No need to convert it to binary.

# **Solution:**

The code has been converted into MIPS assembly instructions:

- 1. op rs rt C/A

  8 29 29 -8

  2. op rs rt C/A
- 2. op rs rt C/A 43 29 16 4
- 3. op rs rt C/A 43 29 17 0
- 4. op rs rt C/A 8 0 16 0
- 5. op rs rt C/A 8 0 17 0
- 6. op rs rt rd shamt funct
  0 17 6 8 X 42
- 7. op rs rt C/A
  4 0 8 786
- 8. op rs rt C/A 8 17
- 9. op rs rt rd shamt funct
  0 X 8 8 2 0
- 10.
   op
   rs
   rt
   rd
   shamt
   funct

   0
   4
   8
   8
   X
   32
- 11. op rs rt C/A
  35 8 9 0
- 12. op rs rt C/A 8 29 29 -12

# 2. c) Monica claims that the **sll \$t0**, **\$s1**, **40** instruction is correct in the MIPS architecture, but Joey disagrees with Monica's claim. **Justify** your opinion with a proper explanation.

# **Solution:**

We know,

Structure of sll instruction,

| opcode | rs     | rt     | rd     | shamt  | funct  |
|--------|--------|--------|--------|--------|--------|
| 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |

Here,

Maximum value for shamt =  $2^5 - 1 = 31$ 

Given instruction:

Since shamt (shift amount) of given instruction is greater than maximum value of shamt (40 > 31), therefore this is not a correct instructions for MIPS.

- ∴ Joey is correct.
- Using the optimized multiplication algorithm, show each step of the multiplication of 3. 12 by 3.

# **Solution:**

 $12 \times 3$ 

1100 (MN)

0011 (MR)

Initially: MN: 1100

P: 0000 0011

Step 1: P: 1100 0011

 $P_{LS} = P_{LS} + MN$ =0000+1100<u>0</u>110 0001 P:

= 1100

Step 2: P: 0010 0001  $P_{LS} = P_{LS} + MN$ 

= 0110 + 1100<u>1</u>001 0000 P:

= 0010 (Carry 1)

Step 3: <u>0</u>100 1000 P:

Step 4: P: <u>0</u>010 0100

: Product, P =  $0010\ 0100\ (36)$ 

# **Summer 2023**

1.

You are the most renowned computer architect in the world. Your friend asks you to find the answer to the following question for the given scenario: Consider computer A with a CPU speed of 2 GHz. The program P which has number of instruction and CPI as shown in Table 1.

| Instruction | IC | CPI |
|-------------|----|-----|
| addi        | 3  | 2   |
| add         | 2  | 1   |
| beq         | 1  | 4   |
| bne         | 0  | 8   |
| slt         | 1  | 4   |
| sll         | 1  | 4   |
| and         | 1  | 4   |
| j           | 1  | 8   |

Table 1: Program P's number of instructions and CPI in Computer A architecture

1. a) Find the execution time of program P in Computer A.

### **Solution:**

Given,  
CPU Clock Rate = 
$$2 GHz$$
  
=  $2 \times 10^9 Hz$ 

Now,

Clock Cycle = IC × CPI  
= 
$$3 \times 2 + 2 \times 1 + 1 \times 4 + 0 \times 8 + 1 \times 4 + 1 \times 4 + 1 \times 4 + 1 \times 8$$
  
=  $32$ 

We know,

CPU Time = Clock Cycles × Clock Time  
= 
$$\frac{32}{2 \times 10^9}$$
  
=  $16 \times 10^{-9} s$   
=  $16 ns$ 

- $\therefore$  Execution time program P in computer A is 16 ns.
- Computer B with a CPU speed of 10 GHz has the execution time of the program P, 2 times faster than computer A and same instruction set architecture (ISA), calculate the average CPI of computer B.

# **Solution:**

From 1(a), Clock Cycle 
$$_{\rm A}=32$$

Here,

$$\therefore \text{ Average CPI}_{\text{A}} = \frac{32}{10} = 3.2$$

For computer A,

CPU Time <sub>A</sub> = 
$$\frac{IC_A \times Average CPI_A}{Clock Rate_A}$$
$$= \frac{I \times 3.2}{2 \times 10^9}$$
$$= I \times 1.6 \times 10^{-9} s$$

For computer B,

CPU Time B = 
$$\frac{IC_A \times Average CPI_B}{Clock Rate_B}$$
$$= \frac{I \times Average CPI_B}{2 \times 10^9}$$
$$= I \times Average CPI_B \times 5 \times 10^{-10} s$$

According to question,

$$\frac{\text{CPU Time}_{\text{B}}}{\text{CPU Time}_{\text{B}}} = 2$$
or, 
$$\frac{I \times 1.6 \times 10^{-9}}{I \times \text{Average CPI}_{\text{B}} \times 5 \times 10^{-10}} = 2$$
or, 
$$\frac{1.6 \times 10^{-9}}{2 \times 5 \times 10^{-10}} = \text{Average CPI}_{\text{B}}$$

$$\therefore \text{Average CPI}_{\text{B}} = 1.6$$

: Average CPI of computer B is 1.6.

Here, Clock Rate  $_{A} = 2 GHz$  $= \frac{I \times \text{Average CPI}_{\text{B}}}{2 \times 10^{9}}$   $= I \times \text{Average CPI}_{\text{B}} \times 5 \times 10^{-10} \text{ s}$   $= 10 \times 10^{9} \text{ Hz}$   $= 10 \times 10^{9} \text{ Hz}$ 

c) As we get the total execution time of the given program for Computer A from 1. question (a). The Arithmetic unit takes 62.5%, the logical unit takes 25%, and the branch operation takes 12.5% time of total execution. Find the improvement factor of the given program if we replace the arithmetic unit with a better Arithmetic unit, which improves total completion time by 2x.

# **Solution:**

From 1(a),

Total execution time = 16 ns

 $\therefore$  Arithmetic unit time =  $(16 \times 62.5\%)$  ns

 $= 10 \, ns$ 

∴ Logical unit time =  $(16 \times 25\%)$  ns =4 ns

: Branch operation time =  $(16 \times 12.5\%)$  ns = 2 ns

Here.

 $T_{affected} = 10 ns$  [Arithmetic unit time]

$$T_{uneffected} = (16 - 10) ns$$

$$= 6 ns$$

$$T_{improved} = \frac{16}{2} ns$$

$$= 8 s$$
Improvement factor,  $n = ?$ 
We know,
$$T_{improved} = \frac{T_{affected}}{n} + T_{unaffected}$$
or,  $8 = \frac{10}{n} + 6$ 
or,  $\frac{10}{n} = 2$ 
or,  $\frac{10}{2} = n$ 

$$\therefore n = 5$$

2. Consider the following C function. Assume necessary registers.

```
1
     2000: int find_series_sum(int n,int x){
 2
                intres=0;
                if(n==0){
 3
 4
                    res = x;
 5
                else{
 6
                     for(int i=1;i<=n;i++){
 7
                         res = res + (x*5);
 8
 9
10
                }
11
                return res;
12
13
            int main(){
14
     1000:
                int s=10, x=2, r=0;
                r = find_serise_sum(s,x);
15
            }
16
```

2. a) Convert the code to the corresponding MIPS assembly instructions.

# **Solution:**

```
1000: addi $s0, $zero, 10
1004: addi $s1, $zero, 2
1008: addi $s2, $zero, 0
1012: add $a0, $s0, $zero
1016: add $a1, $s1, $zero
1020: jal find_series_sum
1024: add $s2, $v0, $zero

.
find_series_sum:
```

```
addi $sp, $sp, -8
sw $s0, 4($sp)
  2000:
  2004: sw
  2008: sw
               $s1, 0($sp)
                                                  Assuming,
  2012: addi $s0, $zero, 0
                                                  res in $s0
         bne $a0, $zero, else 2024
add $s0, $a1, $zero
  2016:
                                                  i in $s1
  2020:
else:
         addi $s1, $zero, 1
  2024:
loop:
                                                  Changing
  2028:
         slt $t0, $s1, $a0
         beq $t0, $zero, exit_loop 2056
  2032:
                                                  for(int i=1;i<=n;i++)</pre>
  2036: sll $t0, $a1, 2
2040: add $t0, $t0, $a1
                                                   for(int i=0;i<n;i++)</pre>
         add $s0, $s0, $t0
  2044:
         addi $s1, $s1, 1
  2048:
  2052: j
               <del>loop</del> 2028
exit_loop:
  2056: add $v0, $s0, $zero
               $s1, 0($sp)
  2060: lw
               $s0, 4($sp)
  2064: lw
  2068: addi $sp, $sp, 8
  2072: jr
               $ra
```

2. b) Convert the first 8 lines of your assembly instructions to the corresponding machine code. No need to convert it to binary.

# Solution:

| 1. | ор | rs  | rt   | C/A |       |       |
|----|----|-----|------|-----|-------|-------|
|    | 8  | 0   | 16   | 10  |       |       |
| 2. | ор | rs  | rt   | C/A | _     |       |
|    | 8  | 0   | 17   | 2   |       |       |
| 3. | ор | rs  | rt   | C/A |       |       |
|    | 8  | 0   | 18   | 0   |       |       |
| 4. | ор | rs  | rt   | rd  | shamt | funct |
|    | 0  | 16  | 0    | 4   | X     | 32    |
| 5. | ор | rs  | rt   | rd  | shamt | funct |
|    | 0  | 17  | 0    | 5   | X     | 32    |
| 6. | ор | add | ress |     |       |       |
|    | 3  | 50  | 00   |     |       |       |
| 7. | ор | rs  | rt   | rd  | shamt | funct |
|    | 0  | 2   | 0    | 18  | X     | 32    |
| 8. | ор | rs  | rt   | C/A |       |       |
|    | 8  | 29  | 29   | -8  |       |       |

2. c) Assume we have a new instruction type available in MIPS architecture which is S-type. Only load/store instructions can be executed using the S-type MIPS field. The structure of the Stype is given in table 2. Please find the maximum number of indexes that can be possible in an array. Explain your answer.

| ор     | rs     | rt     | C/A     |
|--------|--------|--------|---------|
| 8 bits | 8 bits | 8 bits | 40 bits |

Table 2: S-type format for MIPS

# **Solution:**

Given,

Structure of S-Type instruction,

| ор     | rs     | rt     | C/A     |
|--------|--------|--------|---------|
| 8 bits | 8 bits | 8 bits | 40 bits |

Here,

Maximum value for 
$$C/A = 2^{40} - 1$$
  
=  $1.09951163 \times 10^{12}$ 

This is the maximum value that we can store in the C/A. But it is not the maximum index number because we cannot use consecutive number for array index location since each location is size of 4 byte. Therefore if we divide maximum value with 4, then we will get maximum index for array.

: Maximum index for array = 
$$\frac{1.09951163 \times 10^{12}}{4}$$
  
= 2.74877907 × 10<sup>11</sup>

- $\therefore$  Maximum number of indexes that can be possible in an array is  $2.74877907 \times 10^{11}$ .
- **3.** a) Assuming 4-bit architecture and using the division algorithm show each step of the division of 13 by 5.

# **Solution:**

$$13 \div 5$$

1101 (DN)

0101 (DR)

Initially: A, Q: 0000 1101 
$$M = 0101$$
  $1010$   $M = 0101$   $1010$   $M = 0101$   $M = 011$   $M = 0101$   $M = 011$   $M = 011$ 

A, Q: 0011 0100

A, Q: 0011 0010

- $\therefore$  Reminder, A = 0011 (3)
- $\therefore$  Quotient, Q = 0010 (2)
- **3.** b) If we want to multiply 32 by 32 using the multiplication algorithm then what will be the minimum size of the product register?

= 1110

# **Solution:**

Here,

32

 $\frac{\times 32}{1024}$ 

Let, n bits size required for product register.

Now,

$$2^{n} - 1 = 1024$$
  
or,  $2^{n} = 1024 + 1$   
or,  $n = \log_{2} 1025$   
or,  $n = 10.001$   
 $\therefore n \approx 11$ 

: Minimum size of the product register should be 11 bits.

# **Spring 2023**

 Alternative compiled code sequences are using same instructions as add, sub and beq. A table is given to show the required number of cycles per instruction (CPI) and the instruction count (IC) on each code sequence.

Find out the number of clock cycles and average CPI for all the code sequences.

| Instuction       | add | sub | beq |
|------------------|-----|-----|-----|
| CPI              | 3   | 4   | 7   |
| Code Sequences 1 | 240 | 300 | 500 |
| Code Sequences 2 | 320 | 100 | 150 |

# **Solution:**

For Code Sequence 1,

Clock Cycle<sub>1</sub> = IC × CPI  
= 
$$3 \times 240 + 4 \times 300 + 7 \times 500$$
  
=  $5420$   
Total Instruction Count =  $240 + 300 + 500$   
=  $1040$   
 $\therefore$  Average CPI<sub>1</sub> =  $\frac{5420}{1040}$  = 5.21

For Code Sequence 2,

Clock Cycle 
$$_2$$
 = IC × CPI  
= 3 × 320 + 4 × 100 + 7 × 150  
= 2410  
Total Instruction Count = 320 + 100 + 150  
= 570  
 $\therefore$  Average CPI  $_2$  =  $\frac{2410}{570}$  = 4.23

1. b) Consider a computer running a program that requires 400 s, with 90 s spent executing FP instructions, 180 s executed L/S instructions, and 60 s spent executing branch instructions. Find out the affected and unaffected times for Amdahl's law. What is the improvement factor using Amdahl's law if we get the program completion time improved by 4x?

### **Solution:**

Here,  

$$T_{affected} = (90 + 180 + 60) ns$$
  
 $= 330 ns$   
 $T_{uneffected} = (400 - 330) ns$   
 $= 70 s$ 

$$T_{improved} = \frac{400}{4} ns$$

$$= 100 ns$$
Improvement factor,  $n = ?$ 
We know,
$$T_{improved} = \frac{T_{affected}}{n} + T_{unaffected}$$
or,  $100 = \frac{330}{n} + 70$ 
or,  $\frac{330}{n} = 30$ 
or,  $\frac{330}{30} = n$ 

$$\therefore n = 11$$

1. c) What is power wall? Discuss the necessity of multi-core processors.

# **Solution:**

The power wall refers to the physical limitations in increasing processor speed due to excessive power consumption and heat generation. The power wall is a point where increasing processor speed becomes impractical due to the insurmountable challenges of power consumption and cooling.

To overcome the power wall, the computer industry shifted its focus from increasing clock speeds to increasing the number of cores on a single chip. The power wall necessitated a shift in processor design, leading to the development of multi-core processors as a viable solution to overcome the limitations of single-core processors performance. Multi-core processors offer several advantages such as improved performance, enhanced energy efficiency, parallel processing and scalability.

Consider the following C function. The starting MIPS assembly instruction is 1000. Assume necessary registers.

```
int function(int n1, int n2) {
    int i,s=1;
    for(i=n1;i<n2;i++) {
        if(arr[i]<5) {
            arr[i]=arr[i]+(s*5);
            s=s+i;
        }
        else {
            s++;
        }
    }
    return s;
}</pre>
```

2. a) Convert the code to the corresponding MIPS assembly instructions.

# **Solution:**

The code has been rewritten below with the error correction:

```
function:
  1000:
         addi $sp, $sp, -8
              $s0, 4($sp)
  1004:
         SW
              $s1, 0($sp)
  1008:
         SW
  1012:
         addi $s0, $zero, 1
         add $s1, $a0, $zero
  1016:
loop :
  1020:
         slt
              $t0, $s1, $a1
  1024:
              $t0, $zero, exit_loop 1084
         beg
              $t0, $s1, 2
  1028:
         sll
  1032:
              $t1, $s2, $t0
         add
  1036: lw
              $t2, 0($t1)
         slti $t0, $t2, 5
beq $t0, $zero, else
  1040:
  1044:
 1048:
         sll
              $t0, $s0, 2
         add $t3, $t0, $s0
 1052:
         add $t0, $t2, $t3
  1056:
              $t0, 0($t1)
  1060:
         SW
  1064:
         add $s0, $s0, $s1
  1068: j <del>loop_end</del> 1076
else:
  1072:
         addi $s0, $s0, 1
loop_end:
  1076: add $s1, $s1, 1
              <del>loop</del> 1020
  1080:
         j
exit_loop:
  1084: add
              $v0, $s0, $zero
              $s1, 0($sp)
$s0, 4($sp)
  1088: lw
  1092: lw
         addi $sp, $sp, 8
  1096:
  1100:
         jr
              $ra
```

Assuming, n1 in \$a0 n2 in \$a1 s in \$s0 i in \$s1 arr[] in \$s2

funct 32

[ Assuming arr[] was declared outside of function before ]

2. b) Convert the first 10 lines of your assembly instructions to the corresponding machine code. No need to convert it to binary..

# **Solution:**

| 1. | ор | rs | rt | C/A |       |
|----|----|----|----|-----|-------|
|    | 8  | 29 | 29 | -8  |       |
| 2. | ор | rs | rt | C/A | 1     |
|    | 43 | 29 | 16 | 4   |       |
| 3. | ор | rs | rt | C/A | •     |
|    | 43 | 29 | 17 | 0   |       |
| 4. | ор | rs | rt | C/A | '     |
|    | 8  | 0  | 17 | 1   |       |
| 5. | ор | rs | rt | rd  | shamt |
|    | 0  | 4  | 0  | 17  | ×     |

| 6.  | ор | rs | rt | rd  | shamt | funct |
|-----|----|----|----|-----|-------|-------|
|     | 0  | 17 | 5  | 8   | X     | 42    |
| 7.  | op | rs | rt | C/A |       |       |
|     | 4  | 0  | 8  | 271 |       |       |
| 8.  | ор | rs | rt | rd  | shamt | funct |
|     | 0  | X  | 17 | 8   | 2     | 0     |
| 9.  | ор | rs | rt | rd  | shamt | funct |
|     | 0  | 18 | 8  | 9   | X     | 32    |
| 10. | ор | rs | rt | C/A |       |       |
|     | 35 | 9  | 10 | 0   |       |       |

2. c) Assume we have a new instruction type available in MIPS architecture which is K-type. Only jump instruction can be executed using the K-type MIPS field. Structure of the Ktype is given below. Please find the maximum jump address. Explain your answer.

| ор      | rs      | rt      | C/A     |
|---------|---------|---------|---------|
| 12 bits | 10 bits | 10 bits | 32 bits |

# **Solution:**

Given,

Structure of K-Type instruction,

| ор          | rs      | rt      | C/A     |
|-------------|---------|---------|---------|
| <br>12 bits | 10 bits | 10 bits | 32 bits |

Here,

Maximum value for 
$$C/A = 2^{32} - 1$$
  
= 4294967295

This is the maximum value that we can store in the C/A. But it is not the jump address because we store jump address in C/A by dividing it by 4. Therefore if we multiply maximum value with 4, then we will get maximum jump address.

: Maximum index for array = 
$$4294967295 \times 4$$
  
=  $17179869180$ 

- ∴ Maximum jump address is 17179869180.
- **3.** a) Assuming 4-bit architecture and using the division algorithm show each step of the division of 11 by 6.

# **Solution:**

$$11 \div 6$$

1011 (DN)

0110 (DR)

- $\therefore$  Reminder, A = 0101 (5)
- $\therefore$  Quotient, Q = 0001 (1)
- 3. b) Optimized multiplication is better than the normal multiplication algorithm. Why? Explain.

# **Solution:**

Given,

Structure of K-Type instruction,

| ор      | rs      | rt      | C/A     |
|---------|---------|---------|---------|
| 12 bits | 10 bits | 10 bits | 32 bits |

Here,

Maximum value for 
$$C/A = 2^{32} - 1$$
  
= 4294967295

This is the maximum value that we can store in the C/A. But it is not the jump address because we store jump address in C/A by dividing it by 4. Therefore if we multiply maximum value with 4, then we will get maximum jump address.

: Maximum index for array = 
$$4294967295 \times 4$$
  
=  $17179869180$ 

∴ Maximum jump address is 17179869180.