# Architetture dei Sistemi di Elaborazione 02GOLOV Delivery date: October 18th 2023 Expected delivery of lab\_01.zip including: - program\_0.s - lab\_01.pdf (fill and export this file to pdf)

Please, configure the winMIPS64 processor architecture with the *Base Configuration* provided in the following:

• Integer ALU: 1 clock cycle

• Data memory: 1 clock cycle

• Branch delay slot: 1 clock cycle

Code address bus: 12Data address bus: 12

• Pipelined FP arithmetic unit (latency): 6 stages

• Pipelined FP multiplier unit (latency): 8 stages

• FP divider unit (latency): not pipelined unit, 28 clock cycles

• Forwarding optimization is disabled

• Branch prediction is disabled

• Branch delay slot optimization is disabled.

#### Use the Configure menu:

- Running the *WinMIPS* simulator, launching the graphical user interface (folder\_to\_simulator)...\winMIPS64\winmips64.exe
- Disable <u>ALL</u> the optimization (a mark appears when they are enabled)
- Browse the Architecture menu (Ctrl-A)



- Modify the defaults Architectural parameters (where needed)



- Verify in the Pipeline window that the configuration is effective (usually in the left bottom window)



#### 1) Exercise your assembly skills.

Write and run an assembly program called **program\_0.s** (to be delivered) for the *MIPS64* architecture.

#### The program must:

- 1. Given three arrays of 9 8-bit integer numbers (v1,v2,v3), check **for each one of them** if the content corresponds to a **palindrome** sequence of numbers. If yes, use three 8-bit unsigned variables (flags) to store the results. The variables will be equal to 1 is the sequence is palindrome, 0 otherwise.
- 2. Only for the palindrome arrays, compute the sum element by element and place the result in another array v4 (i.e. v4[i] = v2[i] + v3[i] supposing that only v2 and v3 are palindromes)

Example of a vectors sequence containing only 9 numbers:

```
2, 6, -3, 11, 9, 11, -3, 6, 2
v1:
            . byte
v2:
            . byte
                            4, 7, -10,3, 11, 9, 7, 6, 4
v3:
                            9, 22, 5, -1, 9, -1, 5, 22, 9
            . byte
flag1:
            .space 1
flag2:
            .space 1
flag3:
            .space 1
v4:
            .space 9
```

#### 2) Use the *WinMIPS* simulator.

Identify and use the main components of the simulator:

- a. Running the WinMIPS simulator
  - Launch the graphic interface ...\winMIPS64\winmips64.exe
- b. Load your program in the simulator:

- Load the program from the **File Open** menu (*CTRL-O*). In the case the of errors, you may use the following command in the command line to compile the program and check the errors:
- ...\winMIPS64\asm program 0.s
- c. Run your program step by step (F7), identifying the whole processor behavior in the six simulator windows:

Pipeline, Code, Data, Register, Cycles and Statistics

d. Collect the clock cycles to fill the following table (fill all required data in the table before exporting this file to pdf format to be delivered).

Table 1: Program performance for the specific processor configurations

| Program   | Clock cycles | Number of<br>Instructions | Clocks per instruction (CPI) | Instructions<br>per Clock<br>(IPC) |
|-----------|--------------|---------------------------|------------------------------|------------------------------------|
| program_0 | 328          | 188                       | 1.745                        | 0.573                              |

#### 3) Perform execution and time measurements.

Measure the processor performance by running a benchmark of programs. Change the weights of the programs as indicated in the following to evaluate how these variations may produce different performance results.

Search in the winMIPS64 folder the following benchmark programs:

- a. testio.s
- b. mult.s
- c. series.s
- d. program 0.s (your program)

Starting from the basic configuration with no optimizations, compute by simulation the number of cycles required to execute these programs; in this initial scenario, it is assumed that the weight of the programs is the same (25%) for everyone. Assume a processor frequency of 1.75 kHz (a very old technology node).

Then, change processor configuration and vary the programs' weights as follows. Compute again the performance for every case and fill the table below (fill all required data in the table before exporting this file to pdf format to be delivered).:

- 1) Configuration 1
  - a. Enable Forwarding
  - b. Disable branch target buffer
  - c. Disable Delay Slot

Assume that the weight of all programs is the same (25%).

- 2) Configuration 2
  - a. Enable Forwarding
  - b. Enable branch target buffer
  - c. Disable Delay Slot

Assume that the weight of all programs is the same (25%).

### 3) Configuration 3

Configuration 1, but assume that the weight of the program your program is 43.33%.

## 4) Configuration 4

Configuration 1, but assume that the weight of the program series.s is 60%.

Table 2: Processor performance for different weighted programs

| Program                   | No opt | Conf. 1 | Conf. 2 | Conf. 3 | Conf. 4 |
|---------------------------|--------|---------|---------|---------|---------|
| testio.s                  | 422ms  | 272ms   | 247ms   | 272ms   | 272ms   |
| mult.s                    | 1074ms | 560ms   | 526ms   | 560ms   | 560ms   |
| series.s                  | 314ms  | 133ms   | 134ms   | 133ms   | 133ms   |
| program_0.s               | 187ms  | 149ms   | 146ms   | 149ms   | 149ms   |
| TOTAL Time<br>(@ 1.75kHz) | 499ms  | 278.5ms | 263ms   | 247ms   | 210.6ms |

# Appendix: winMIPS64 Instruction Set

| WinMIPS64                                                        | beq - branch if pair of registers are equal                                        |  |  |  |
|------------------------------------------------------------------|------------------------------------------------------------------------------------|--|--|--|
| The following assembler directives are supported                 | bne - branch if pair of registers are not equal                                    |  |  |  |
| .data - start of data segment                                    | beqz - branch if register is equal to zero                                         |  |  |  |
| .text - start of code segment                                    | bnez - branch if register is not equal to zero                                     |  |  |  |
| .code - start of code segment (same as .text)                    |                                                                                    |  |  |  |
| .org <n> - start address</n>                                     | j - jump to address                                                                |  |  |  |
| .space <n> - leave n empty bytes</n>                             | jr - jump to address in register                                                   |  |  |  |
| .asciiz <s> - enters zero terminated ascii string</s>            | jal - jump and link to address (call subroutine)                                   |  |  |  |
| .ascii <s> - enter ascii string</s>                              | jalr - jump and link to address in register (call subroutine)                      |  |  |  |
| .align <n> - align to n-byte boundary</n>                        |                                                                                    |  |  |  |
| .word <n1>,<n2> enters word(s) of data (64-bits)</n2></n1>       | dsll - shift left logical                                                          |  |  |  |
| .byte $\langle n1 \rangle, \langle n2 \rangle$ enter bytes       | dsrl - shift right logical                                                         |  |  |  |
| .word32 <n1>,<n2> enters 32 bit number(s)</n2></n1>              | dsra - shift right arithmetic                                                      |  |  |  |
| .word16 $<$ n1>, $<$ n2> enters 16 bit number(s)                 | dsllv - shift left logical by variable amount                                      |  |  |  |
| .double <n1>,<n2> enters floating-point number(s)</n2></n1>      | dsrlv - shift right logical by variable amount                                     |  |  |  |
|                                                                  | dsrav - shift right arithmetic by variable amount                                  |  |  |  |
| where <n> denotes a number like 24, <s> denotes a string</s></n> |                                                                                    |  |  |  |
| like "fred", and                                                 | movn - move if register not equal to zero                                          |  |  |  |
| <n1>,<n2> denotes numbers seperated by commas.</n2></n1>         | nop - no operation                                                                 |  |  |  |
|                                                                  | and - logical and                                                                  |  |  |  |
| The following instructions are supported                         | or - logical or                                                                    |  |  |  |
| lb - load byte                                                   | xor - logical xor                                                                  |  |  |  |
| lbu - load byte unsigned                                         | slt - set if less than                                                             |  |  |  |
| sb - store byte                                                  | sltu - set if less than unsigned                                                   |  |  |  |
| lh - load 16-bit half-word                                       | dadd - add integers                                                                |  |  |  |
| lhu - load 16-bit half word unsigned                             | daddu - add integers unsigned                                                      |  |  |  |
| sh - store 16-bit half-word                                      | dsub - subtract integers                                                           |  |  |  |
| lw - load 32-bit word                                            | dsubu - subtract integers unsigned                                                 |  |  |  |
| lwu - load 32-bit word unsigned                                  | -11.1 -11.0                                                                        |  |  |  |
| sw - store 32-bit word                                           | add.d - add floating-point                                                         |  |  |  |
| ld - load 64-bit double-word                                     | sub.d - subtract floating-point                                                    |  |  |  |
| sd - store 64-bit double-word                                    | mul.d - multiply floating-point                                                    |  |  |  |
| l.d - load 64-bit floating-point                                 | div.d - divide floating-point                                                      |  |  |  |
| s.d - store 64-bit floating-point                                | mov.d - move floating-point cvt.d.l - convert 64-bit integer to a double FP format |  |  |  |
| halt - stops the program                                         | cvt.l.d - convert double FP to a 64-bit integer format                             |  |  |  |
| daddi - add immediate                                            | · · · · · · · · · · · · · · · · · · ·                                              |  |  |  |
| daddii - add immediate unsigned                                  | c.lt.d - set FP flag if less than<br>c.le.d - set FP flag if less than or equal to |  |  |  |
| andi - logical and immediate                                     | c.eq.d - set FF flag if equal to                                                   |  |  |  |
| ori - logical or immediate                                       | bc1f - branch to address if FP flag is FALSE                                       |  |  |  |
| xori - exclusive or immediate                                    | bc1t - branch to address if FP flag is TRUE                                        |  |  |  |
| lui - load upper half of register immediate                      | mtc1 - move data from integer register to FP register                              |  |  |  |
| slti - set if less than or equal immediate                       | mfc1 - move data from FP register to integer register                              |  |  |  |
| sltiu - set if less than or equal immediate unsigned             | mer move data from 11 register to integer register                                 |  |  |  |
| error man or equal minimum unsigned                              |                                                                                    |  |  |  |
|                                                                  |                                                                                    |  |  |  |