## Solution to Practice Final Examination

Instructor: Cholwich Nattee

# 1 Multi-Cycle Implementation

1. The control signals for the given sequence of instructions.

Cycle 1: Fetch rmmov %rax, 0x100(%rcx)

|        |         | ,     | • •   |       |       |       |       |
|--------|---------|-------|-------|-------|-------|-------|-------|
| nRegid | n Val C | regA  | regB  | regE  | regM  | aluA  | aluB  |
| 1      | 1       | XX    | XX    | 00    | 00    | XX    | Х     |
| aluF   | setCnd  | mAddr | mData | mRd   | mWrt  | newPC | vPWrt |
| XX     | 0       | Х     | Х     | Х     | 0     | XX    | 1     |
| irWrt  | vAWrt   | vBWrt | vEWrt | vMWrt | pcWrt |       |       |
| 1      | 0       | 0     | 0     | 0     | 0     |       |       |

Cycle 2: Decode

| Cycle 2. D | ccoac    |       |       |       |        |       |       |
|------------|----------|-------|-------|-------|--------|-------|-------|
| nRegid     | n  Val C | regA  | regB  | regE  | regM   | aluA  | aluB  |
| Х          | Х        | 10    | 10    | 00    | 00     | XX    | Χ     |
| aluF       | setCnd   | mAddr | mData | mRd   | mWrt   | newPC | vPWrt |
| XX         | 0        | Х     | Х     | Х     | 0      | XX    | 0     |
| irWrt      | vAWrt    | vBWrt | vEWrt | vMWrt | pc Wrt |       |       |
| 0          | 1        | 1     | 0     | 0     | 0      |       |       |

Cycle 3: Execute

| nRegid | n Val C | regA  | regB  | regE  | regM   | aluA  | aluB  |
|--------|---------|-------|-------|-------|--------|-------|-------|
| X      | Х       | XX    | XX    | 00    | 00     | 10    | 1     |
| aluF   | setCnd  | mAddr | mData | mRd   | mWrt   | newPC | vPWrt |
| 00     | 0       | Х     | Х     | Х     | 0      | XX    | 0     |
| irWrt  | vAWrt   | vBWrt | vEWrt | vMWrt | pc Wrt |       |       |
| 0      | 0       | 0     | 1     | 0     | 0      |       |       |

Cycle 4: Memory

| nRegid | nValC  | regA  | regB  | regE  | regM   | aluA  | aluB  |
|--------|--------|-------|-------|-------|--------|-------|-------|
| Х      | Х      | XX    | XX    | 00    | 00     | XX    | Х     |
| aluF   | setCnd | mAddr | mData | mRd   | mWrt   | newPC | vPWrt |
| XX     | 0      | 0     | 0     | 0     | 1      | XX    | 0     |
| irWrt  | vAWrt  | vBWrt | vEWrt | vMWrt | pc Wrt |       |       |
| 0      | 0      | 0     | 0     | 0     | 0      |       |       |

Cycle 5: PC Update

| nRegid | nValC  | regA  | regB  | regE  | regM   | aluA  | aluB  |
|--------|--------|-------|-------|-------|--------|-------|-------|
| Х      | Х      | XX    | XX    | 00    | 00     | XX    | Х     |
| aluF   | setCnd | mAddr | mData | mRd   | m Wrt  | newPC | vPWrt |
| XX     | 0      | Х     | Х     | Х     | 0      | 00    | 0     |
| irWrt  | vAWrt  | vBWrt | vEWrt | vMWrt | pc Wrt |       |       |
| 0      | 0      | 0     | 0     | 0     | 1      |       |       |

| nRegid | nValC  | regA  | reqB  | reqE  | regM  | aluA  | aluB  |
|--------|--------|-------|-------|-------|-------|-------|-------|
| 1      | 0      | XX    | XX    | 00    | 00    | XX    | X     |
| aluF   | setCnd | mAddr | mData | mRd   | mWrt  | newPC | vPWrt |
| XX     | 0      | Х     | Х     | Х     | 0     | XX    | 1     |
| irWrt  | vAWrt  | vBWrt | vEWrt | vMWrt | pcWrt |       |       |
| 1      | 0      | 0     | 0     | 0     | 0     |       |       |

Instructor: Cholwich Nattee

Cycle 7: Decode

| nRegid | n Val C | regA  | regB  | regE  | regM   | aluA  | aluB  |
|--------|---------|-------|-------|-------|--------|-------|-------|
| Х      | Х       | 10    | 10    | 00    | 00     | XX    | Х     |
| aluF   | setCnd  | mAddr | mData | mRd   | m Wrt  | newPC | vPWrt |
| XX     | 0       | Х     | Х     | Х     | 0      | XX    | 0     |
| irWrt  | vAWrt   | vBWrt | vEWrt | vMWrt | pc Wrt |       |       |
| 0      | 1       | 1     | 0     | 0     | 0      |       |       |

Cycle 8: Execute

| nRegid | n Val C | regA  | regB  | regE  | regM  | aluA  | aluB  |
|--------|---------|-------|-------|-------|-------|-------|-------|
| Х      | Х       | XX    | XX    | 00    | 00    | 11    | 1     |
| aluF   | setCnd  | mAddr | mData | mRd   | m Wrt | newPC | vPWrt |
| 00     | 1       | Х     | Х     | Х     | 0     | XX    | 0     |
| irWrt  | vAWrt   | vBWrt | vEWrt | vMWrt | pcWrt |       |       |
| 0      | 0       | 0     | 1     | 0     | 0     |       |       |

Cycle 9: Write Back

| nRegid | n Val C | regA  | regB  | regE  | regM  | aluA  | aluB  |
|--------|---------|-------|-------|-------|-------|-------|-------|
| Х      | Х       | XX    | XX    | 10    | 00    | XX    | Х     |
| aluF   | setCnd  | mAddr | mData | mRd   | mWrt  | newPC | vPWrt |
| XX     | 0       | Х     | Х     | Х     | 0     | XX    | 0     |
| irWrt  | vAWrt   | vBWrt | vEWrt | vMWrt | pcWrt |       |       |
| 0      | 0       | 0     | 0     | 0     | 0     |       |       |

Cycle 10: PC Update

| Cy 010 10. 1 | Сераше  |       |       |       |        |       |       |
|--------------|---------|-------|-------|-------|--------|-------|-------|
| nRegid       | n Val C | regA  | regB  | regE  | regM   | aluA  | aluB  |
| Х            | Х       | XX    | XX    | 00    | 00     | XX    | Х     |
| aluF         | setCnd  | mAddr | mData | mRd   | m Wrt  | newPC | vPWrt |
| XX           | 0       | Х     | Х     | Х     | 0      | 00    | 0     |
| irWrt        | vAWrt   | vBWrt | vEWrt | vMWrt | pc Wrt |       |       |
| 0            | 0       | 0     | 0     | 0     | 1      |       |       |

Cycle 11: Fetch push %rdx

| Cycic 11.1 | coon pusi | 701 GX |       |       |       |       |       |
|------------|-----------|--------|-------|-------|-------|-------|-------|
| nRegid     | n Val C   | regA   | regB  | regE  | regM  | aluA  | aluB  |
| 1          | 0         | XX     | XX    | 00    | 00    | XX    | Х     |
| aluF       | setCnd    | mAddr  | mData | mRd   | mWrt  | newPC | vPWrt |
| XX         | 0         | Х      | Х     | Х     | 0     | XX    | 1     |
| irWrt      | vAWrt     | vBWrt  | vEWrt | vMWrt | pcWrt |       |       |
| 1          | 0         | 0      | 0     | 0     | 0     |       |       |

| nRegid | n Val C | regA  | regB  | regE  | regM  | aluA  | aluB  |
|--------|---------|-------|-------|-------|-------|-------|-------|
| Х      | Х       | 10    | 01    | 00    | 00    | XX    | Х     |
| aluF   | setCnd  | mAddr | mData | mRd   | m Wrt | newPC | vPWrt |
| XX     | 0       | Х     | Х     | Х     | 0     | XX    | 0     |
| irWrt  | vAWrt   | vBWrt | vEWrt | vMWrt | pcWrt |       |       |
| 0      | 1       | 1     | 0     | 0     | 0     |       |       |

Instructor: Cholwich Nattee

Cycle 13: Execute

| nRegid | n Val C | regA  | regB  | regE  | regM  | aluA  | aluB  |
|--------|---------|-------|-------|-------|-------|-------|-------|
| Х      | Х       | XX    | XX    | 00    | 00    | 00    | 1     |
| aluF   | setCnd  | mAddr | mData | mRd   | mWrt  | newPC | vPWrt |
| 00     | 0       | Х     | Х     | Х     | 0     | XX    | 0     |
| irWrt  | vAWrt   | vBWrt | vEWrt | vMWrt | pcWrt |       |       |
| 0      | 0       | 0     | 1     | 0     | 0     |       |       |

Cycle 14: Memory

| nRegid | nValC  | regA  | regB  | regE  | regM   | aluA  | aluB  |
|--------|--------|-------|-------|-------|--------|-------|-------|
| Х      | Х      | XX    | XX    | 00 00 |        | XX    | Х     |
| aluF   | setCnd | mAddr | mData | mRd   | mWrt   | newPC | vPWrt |
| XX     | 0      | 0     | 0     | 0     | 1      | XX    | 0     |
| irWrt  | vAWrt  | vBWrt | vEWrt | vMWrt | pc Wrt |       |       |
| 0      | 0      | 0     | 0     | 0     | 0      |       |       |

Cycle 15: Write Back

| nRegid | n  Val C | regA  | regB  | regE  | regM       | aluA | aluB  |
|--------|----------|-------|-------|-------|------------|------|-------|
| Х      | Х        | XX    | XX    | 01    | 00         | XX   | Х     |
| aluF   | setCnd   | mAddr | mData | mRd   | mRd $mWrt$ |      | vPWrt |
| XX     | 0        | Х     | Х     | Х     | 0          | XX   | 0     |
| irWrt  | vAWrt    | vBWrt | vEWrt | vMWrt | pc Wrt     |      |       |
| 0      | 0        | 0     | 0     | 0     | 0          |      |       |

Cycle 16: PC Update

| nRegid | n Val C | regA  | regB  | regE  | regM   | aluA  | aluB  |
|--------|---------|-------|-------|-------|--------|-------|-------|
| Х      | Х       | XX    | XX    | 00    | 00     | XX    | Х     |
| aluF   | setCnd  | mAddr | mData | mRd   | mWrt   | newPC | vPWrt |
| XX     | 0       | Х     | Х     | Х     | 0      | 00    | 0     |
| irWrt  | vAWrt   | vBWrt | vEWrt | vMWrt | pc Wrt |       |       |
| 0      | 0       | 0     | 0     | 0     | 1      |       |       |

## 2. The control signal for $imrmov\ V$ , rA (C|0|rA|F|V|)

### Cycle 1: Fetch

| 0,010 1. 10 | - J     |       |       |       |                  |       |       |  |  |  |  |  |
|-------------|---------|-------|-------|-------|------------------|-------|-------|--|--|--|--|--|
| nRegid      | n Val C | regA  | regB  | regE  | $regE \mid regM$ |       | aluB  |  |  |  |  |  |
| 1           | 1       | XX    | XX    | 00    | 00               | XX    | Х     |  |  |  |  |  |
| aluF        | setCnd  | mAddr | mData | mRd   | m Wrt            | newPC | vPWrt |  |  |  |  |  |
| XX          | 0       | Х     | Х     | Х     | 0                | XX    | 1     |  |  |  |  |  |
| irWrt       | vAWrt   | vBWrt | vEWrt | vMWrt | pcWrt            |       |       |  |  |  |  |  |
| 1           | 0       | 0     | 0     | 0     | 0                |       |       |  |  |  |  |  |

Instructor: Cholwich Nattee

#### Cycle 2: Execute

| nRegid | n Val C | regA  | regB  | regE  | regM   | aluA  | aluB  |
|--------|---------|-------|-------|-------|--------|-------|-------|
| Х      | Х       | XX    | XX    | 00    | 00     | 10    | 0     |
| aluF   | setCnd  | mAddr | mData | mRd   | m Wrt  | newPC | vPWrt |
| 00     | 0       | Х     | Х     | Х     | 0      | XX    | 0     |
| ir Wrt | vAWrt   | vBWrt | vEWrt | vMWrt | pc Wrt |       |       |
| 0      | 0       | 0     | 1     | 0     | 0      |       |       |

### Cycle 3: Memory

| nRegid | nValC  | regA  | regB  | regE  | regM   | aluA  | aluB  |  |
|--------|--------|-------|-------|-------|--------|-------|-------|--|
| Х      | Х      | XX    | XX    | 00    | 00     | XX    | Χ     |  |
| aluF   | setCnd | mAddr | mData | mRd   | mWrt   | newPC | vPWrt |  |
| XX     | 0      | 0     | Х     | 1     | 0      | XX    | 0     |  |
| irWrt  | vAWrt  | vBWrt | vEWrt | vMWrt | pc Wrt |       |       |  |
| 0      | 0      | 0     | 0     | 0     | 0      |       |       |  |

#### Cycle 4: Write Back

| nRegid | n Val C | regA  | regB  | regE  | regM   | aluA  | aluB  |
|--------|---------|-------|-------|-------|--------|-------|-------|
| Х      | Х       | XX    | XX    | 00    | 10     | XX    | Х     |
| aluF   | setCnd  | mAddr | mData | mRd   | m Wrt  | newPC | vPWrt |
| XX     | 0       | Х     | Х     | Х     | 0      | XX    | 0     |
| irWrt  | vAWrt   | vBWrt | vEWrt | vMWrt | pc Wrt |       |       |
| 0      | 0       | 0     | 0     | 0     | 0      |       |       |

## Cycle 5: PC Update

| nRegid | nValC  | regA  | regB  | regE  | regM  | aluA  | aluB  |
|--------|--------|-------|-------|-------|-------|-------|-------|
| Х      | Х      | XX    | XX    | 00    | 00    | XX    | Х     |
| aluF   | setCnd | mAddr | mData | mRd   | m Wrt | newPC | vPWrt |
| XX     | 0      | Х     | Х     | Х     | 0     | 00    | 0     |
| irWrt  | vAWrt  | vBWrt | vEWrt | vMWrt | pcWrt |       |       |
| 0      | 0      | 0     | 0     | 0     | 1     |       |       |

## 2 Pipelining

1. Show how the sequence of instructions be executed

```
1) rrmov %rax, %rbx
2) rrmov %rax, %rcx
3) rmmov %rbx, 4(%rdx)
4) mrmov 8(%rcx), %rdx
5) add %rdx, %rbx
6) irmov 5, %rax
7) add %rax, %rdx
```

Instructor: Cholwich Nattee

| Inst | valA      | valB      | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|------|-----------|-----------|---|---|---|---|---|---|---|---|---|----|----|----|
| 1    |           |           | F | D | Е | M | W |   |   |   |   |    |    |    |
| 2    |           |           |   | F | D | Е | M | W |   |   |   |    |    |    |
| 3    | $M\_valE$ |           |   |   | F | D | Е | Μ | W |   |   |    |    |    |
| 4    |           | $M\_valE$ |   |   |   | F | D | Е | M | W |   |    |    |    |
| 5    | $m\_valM$ |           |   |   |   |   | F | X | D | Е | Μ | W  |    |    |
| 6    |           |           |   |   |   |   |   |   | F | D | Ε | M  | W  |    |
| 7    | $e\_valE$ |           |   |   |   |   |   |   |   | F | D | Е  | M  | W  |

2. Program 1 runs faster than Program 2. This is because the repetition structure in Program 1 was designed so that the condition for jge is *true* only in the last iteration. This conforms with the *Assume Branch Not Taken* technique to solve the control hazards.

```
irmov $10, %rsi
irmov $0, %rax
irmov $1, %rdi
L: cmp %rsi, %rax
jge E
add %rdi, %rax
jmp L
E: hlt
```

```
irmovl 10, %esi
irmovl 0, %eax
irmovl 1, %edi
L: addl %edi, %eax
cmpl %esi, %eax
jl L
hlt
```

## 3 Cache Memory

- 1. The hit rate is 75% when the block size is 16 bytes.
- 2. The hit rate is 93.70% when the block size is 64 bytes (16 integers) since there were  $\lceil \frac{1000}{16} \rceil = 63$  misses in the cache access.

Instructor: Cholwich Nattee

## 4 Vector Processing

```
#include<stdio.h>
#include<stdlib.h>
#include<immintrin.h>
#include<x86intrin.h>
#include<math.h>
#define ALIGN __attribute__ ((aligned (32)))
int main() {
    int i, n;
    double ALIGN A[] = \{1,2,3,4\};
    double ALIGN B[] = \{4,4,4,4,4\};
    double ALIGN S[] = \{0,0,0,0\};
    double sum;
    __m256d a, b, s;
    printf("Enter n: ");
    scanf("%d", &n);
    a = _{mm256\_load\_pd(A)};
    b = _{mm256\_load\_pd(B)};
    s = _mm256_load_pd(S);
    for(i=0; i<n/4; i++) {
        s = _mm256_add_pd(s, a);
        a = _{mm256}add_{pd}(a, b);
    }
    s = _mm256\_add\_pd(s, _mm256\_permute2f128\_pd(s, s, 1));
    s = _mm256_add_pd(s, _mm256_permute_pd(s, 5));
    _mm256_store_pd(S, s);
    sum = S[0];
    for(i=n/4*4+1; i<=n; i++) {
        sum += i;
    }
```

```
printf("Sum = %.01f\n", sum);
return 1;
}
```