

# THE IMAGINATION UNIVERSITY PROGRAMME

# RVfpgaEL2 Lab 18 Adding New Instructions to the VeeR EL2 core



# 1. Adding Instructions fadd.s, fmul.s and fdiv.s to VeeR EL2

# A. Introduction

In this lab, you will apply the knowledge acquired in previous labs to modify the VeeR EL2 processor to add three floating-point instructions that belong to the RISC-V Single-Precision Floating-Point Zfinx extension: fadd.s, fmul.s and fdiv.s. At <a href="https://wiki.riscv.org/display/HOME/Zfinx+TG">https://wiki.riscv.org/display/HOME/Zfinx+TG</a> and at <a href="https://github.com/riscv/riscv-zfinx">https://github.com/riscv/riscv-zfinx</a>, you can find more information about this extension, which in many aspects is similar to the RISC-V Single-Precision Floating-Point F extension, for which you can find details at <a href="https://five-embeddev.com/riscv-isa-manual/latest/f.html">https://five-embeddev.com/riscv-isa-manual/latest/f.html</a>.

The Zfinx extension provides instructions similar to those in the standard floating-point F extension for single-precision floating-point instructions, but they operate on the  $\mathbf{x}$  (integer) registers instead of the  $\mathbf{f}$  (floating-point) registers. This makes the implementation simpler, given that the VeeR EL2 processor does not include a Floating Point Register File and adding it would be more complex.

Given the complexity of adding new instructions, we guide you through the process. Once you've learned how to add new instructions to the core, you can practice by adding other instructions from this same RISC-V extension or from any other RISC-V extension.

# B. Floating point instructions fadd, fmul and fdiv

As stated by the Zfinx specification, the variants of these F-extension instructions have the same semantics, except that whenever such an instruction would have accessed an **f** register, it instead accesses the **x** register with the same number.

We next summarize some features of these three RISC-V instructions:

# - fadd.s

- o Instruction fadd.s rd, rs1, rs2 adds the two floating-point values in rs1 and rs2 and stores the result in rd.
- o Its format, as defined in the RISC-V F extension, is the following: 0000000 | rs2 | rs1 | Rounding-Mode | rd | 1010011

# - fmul.s

- o Instruction fmul.s rd, rs1, rs2 adds the two floating-point values in rs1 and rs2 and stores the result in rd.
- o The instruction format, as defined in the RISC-V F extension, is the following: 0001000 | rs2 | rs1 | Rounding-Mode | rd | 1010011

### - fdiv.s

o Instruction fdiv.s rd, rs1, rs2 adds the two floating-point values in rs1 and rs2 and stores the result in rd.



o The instruction format, as defined in the RISC-V F extension, is the following: 0001100 | rs2 | rs1 | Rounding-Mode | rd | 1010011

Floating-point instructions assume that the operands are represented in single-precision floating-point IEEE 754 format:

https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF. To represent floating-point numbers, the register is logically divided into three fields: **Sign** (1 bit long), **Exponent** (E<sub>7:0</sub>, 8 bits) and **Mantissa** (M<sub>22:0</sub>, 23 bits).

```
Sign \mid \mathbf{E}_7 \dots \mathbf{E}_0 \mid \mathbf{M}_{22} \dots \mathbf{M}_0
```

# c. Extension of the VeeR EL2 processor to support the new instructions

We next describe in detail how to perform the required changes for including the three instructions in VeeR EL2. Specifically, you must make changes in two parts of the core: the Execution Unit and the Control Unit.

# **Changes in the Control Unit:**

Modify/create new control signals to support the new instructions. You must make changes in two files:

## In file

[RVfpgaEL2NexysA7DDRPath]/src/VeeRwolf/VeeR\_EL2CoreComplex/include/el2\_def. sv, perform the following changes:

1) Create a new structure type called fp\_pkt\_t which includes 3 bits: fp\_add, fp\_mul, and fp\_div; these bits indicate, respectively, if the processor is executing a floating-point addition, a floating-point multiplication or a floating-point division.

2) Create three new bits, called fp\_add, fp\_mul, and fp\_div, that are part of the structure type el2\_dec\_pkt\_t. Remember that this is the main structure type used in the Control Unit.

```
typedef struct packed {
    logic fp_add;
    logic fp_mul;
    logic fp_div;
    togic ctz;
    logic pcnt;
    logic sext_b;
    logic sext h;
```



# In file [RVfpgaEL2NexysA7DDRPath]/src/VeeRwolf/VeeR\_EL2CoreComplex/dec/el2\_dec\_dec ode\_ctl.sv, perform the following changes:

- 1) Assign the value to the new bits in the D Stage, using signal i0\_dp\_raw. To do so, you must modify the equations from module e12\_dec\_dec\_ct1 (lines 1541-1867 of file el2\_dec\_decode\_ctl.sv), as explained next (note that these explanations are summarized in lines 1526-1539 of module e12\_dec\_decode\_ctl, from where we have obtained and extended them):
  - a. Generate the new equations:
    - File [RVfpgaEL2NexysA7DDRPath]/src/VeeRwolf/VeeR\_EL2CoreComplex/dec/d ecode is a human readable file that has all of the instruction decodes defined in the VeeR EL2 processor, and that you must modify as explained next.
      - In section .definition, create a new line for each of the new instructions according to their format, shown above.

```
.definition
         [0000000.....1010011]
fadd =
fmul =
         [0001000.....1010011]
fdiv =
        [0001100.....1010011]
           [011000000000.....001.....0010011]
clz
ctz
        = [011000000001....001....0010011]
cpop
        = [011000000010....001....0010011]
        = [011000000100....001....0010011]
sext b
           [011000000101.....001.....0010011]
sext h
```

In section .output, create a new bit for each instruction.



• In section .decode, create a new line for each instruction that indicates the control bits enabled by each of them. As we explain below, for the sake of simplicity we treat the new instructions similarly to the div instructions. Thus, the same signals enabled for a division must be included plus the specific new signal created before for each instruction (fp add, fp mul, and fp div).



 In the same folder ([RVfpgaEL2NexysA7DDRPath]/src/VeeRwolf/VeeR\_EL2CoreComplex/dec),



generate the *general equations*, which, after the modification of the *decode* file, will include the instructions supported by VeeR EL2 plus the three floating point instruction.

```
./coredecode -in decode > coredecode.e
./espresso.linux -Dso -oeqntott coredecode.e |
./addassign -pre out. > equations
```

These two commands will generate files coredecode.e and equations.

 In the same folder ([RVfpgaEL2NexysA7DDRPath]/src/VeeRwolf/VeeR\_EL2CoreComplex/dec), generate the legal equations.

```
./coredecode -in decode -legal > legal.e
./espresso.linux -Dso -oeqntott legal.e |
./addassign -pre out. > legal equation
```

These two commands will generate files legal.e and legal equations.

- b. Substitute the old equations with the new ones generated in the previous step. Specifically, you must substitute the existing equations in lines 1554-1862 of file el2\_dec\_decode\_ctl.sv, for the new ones defined in files equations and legal\_equations.
- 2) In module e12\_dec\_decode\_ct1, assign a value to the new floating point control bits in signal fp p, using signal i0 dp.

```
assign div_p.valid = div_decode_d;
assign div_p.unsign = i0_dp.unsign;
assign div_p.rem = i0_dp.rem;

// FP
assign fp_p.fp_add = i0_dp.fp_add;
assign fp_p.fp_mul = i0_dp.fp_mul;
assign fp_p.fp_div = i0_dp.fp_div;

assign mul_p.valid = mul_decode_d;
```

# **Changes in the Execution Unit:**

The Execution Unit is implemented in modules e12\_exu, e12\_exu\_alu\_ctl, e12\_exu\_mul\_ctl, e12\_exu\_div\_ctl (the files that contain these modules are named after the modules). You will add hardware for computing floating-point addition, multiplication, and division (you may find some sources on the Internet as we detail below). The processor will then use this hardware when a fadd.s, fmul.s, or fdiv.s instruction is executed. To do so, complete the following steps:



- Obtain the units for the floating-point computations: Download the multi-cycle floating-point Adder, Multiplier, and Divider provided at: <a href="https://github.com/dawsonjon/fpu">https://github.com/dawsonjon/fpu</a>. These are non-pipelined multi-cycle units like the Integer Divider available in VeeR EL2. You do not need to understand their internal design, but you need to understand their interface (their input and output signals), so that you are able to integrate the new units in the processor. For example:
  - Signals input\_a and input\_b are used to provide the two input operands to the FP unit.
  - o Signal output z stb is used to indicate if the operation has finished.
- Integrate the new instructions in the processor: For the sake of simplicity, we recommend the new instructions to be treated similarly to the div instructions (for example, the result of the new instructions will also be written through port 2 of the Register File) and the floating-point units to be instantiated from inside the el2\_exu\_div\_ctl module. That's why, in the previous section ("Changes in the Control Unit"), for the new floating-point instructions we set the same control bits as div instructions.

The el2\_exu\_div\_ctl module provides some signals that are useful for integrating the new instructions:

- Signals dividend and divisor: These signals are assigned with the input operands to the divisor. In the new implementation we will redefine them to provide the input operands both to the divisor and to the FPU. Note that you do not need to make changes in the Verilog codes, and you can simply use these two signals as they are.
- Signal out: This is assigned with the result of the division. In the new implementation we will redefine it to provide both the result of the divinstruction and the result of the floating-point instructions. This requires a new multiplexer inside the el2\_exu\_div\_ctl module to select the correct output for the different instructions supported in the module (div, fadd.s, fmul.s and fdiv.s).
- Signal finish\_dly: This signal goes high when the division ends and it is used as the enable signal of write port 2 of the Register File. In the new implementation we will redefine it for signalling both the completion of a div instruction and the completion of a floating-point instruction.

# **D. Experiments**

After modifying the hardware, we will perform a simulation in RVfpgaEL2-Trace that illustrates the use of the new instructions. You can use the program provided in Figure 1, or you can create your own one. The program in Figure 1 creates an endless loop that computes three instructions: floating-point add, multiply, and divide. It also includes a division in order to confirm that this instruction keeps working correctly.

```
main:
li t3, 0x40800000
li t4, 0x40000000
li a3, 20
```



```
li a4, 4
REPEAT:
INSERT NOPS 4
.word 0x01ce8f53
                    # fadd.s 0000000 | 11100 | 11101 | 000 | 11110 | 1010011
INSERT NOPS 4
.word 0x11ce8f53
                    # fmul.s 0001000 | 11100 | 11101 | 000 | 11110 | 1010011
INSERT NOPS 4
.word 0x19ce8f53
                    # fdiv.s 0001100 | 11100 | 11101 | 000 | 11110 | 1010011
INSERT NOPS 4
div a5, a3, a4
                    # just to confirm that div keeps working alright
INSERT NOPS 10
beq zero, zero, REPEAT # Repeat the loop
```

Figure 1. Program for testing the new floating point instructions

Figure 2 shows the results of the RVfpgaEL2-Trace simulation for the add instruction. To check the results, you can use a floating-point converter, such as the one available at: <a href="https://www.h-schmidt.net/FloatConverter/IEEE754.html">https://www.h-schmidt.net/FloatConverter/IEEE754.html</a>.



Figure 2. Simulation of fadd

In the first cycle, signal  $dec_i0_{instr_d}$  has the fadd instruction (0x01ce8f53), which is at the D Stage. The two operands are read from the Register File and provided to the Floating Point Unit ( $dividend_fp = 0x40000000$  and  $divisor_fp = 0x40800000$ ). After a few cycles the finish\_dly signal goes high, indicating that the floating point operation finishes, and the result is written to the Register File through Write Port 2: wen2 = 1, waddr2 = 0x1E and wd2 = 0x40C00000.

Figure 3 shows the results of the RVfpgaEL2-Trace simulation for the div instruction.





Figure 3. Simulation of div

In the first cycle, signal  $dec_i0_{instr_d}$  has the div instruction (0x02e6c7b3), which is in the D Stage. The two operands are read from the Register File and provided to the Floating-Point Unit ( $dividend_fp = 0x14$  and  $divisor_fp = 0x4$ ). After a few cycles the result is written to the Register File through Write Port 2: wen2 = 1, waddr2 = 0x0F and wd2 = 0x5.

# 2. Exercises

- 1) Modify the SoC to include the fadd, fmul and fdiv instructions, as explained in Section C. Generate the bitstream in Vivado and RVfpgaEL2-Trace, RVfpgaEL2-ViDBo and RVfpgaEL2-Pipeline binaries with Verilator.
- 2) Test the program from Figure 1, both in RVfpgaEL2-Trace and on the board. Analyse the fmul and fdiv instructions.
- 3) Modify the provided program to test other cases and test if the instructions work correctly. For example, test negative numbers, data dependencies with previous/subsequent instructions, etc.
- 4) Implement the example *DotProduct\_C-Lang* provided in the GSG, using the new fmul and fadd instructions for performing the floating-point computations.

  Compare the execution of this algorithm when floating-point instructions are emulated vs. when these instructions are implemented in hardware.
- 5) Implement the Bisection Method. You can find a lot of information about this root-finding algorithm on the internet, for example, at: <a href="https://en.wikipedia.org/wiki/Bisection\_method">https://en.wikipedia.org/wiki/Bisection\_method</a>. Compare the execution of this



- algorithm when floating-point instructions are emulated vs. when these instructions are implemented in hardware.
- 6) Replace the FPU (floating-point unit) with the following one:

  <a href="https://github.com/openhwgroup/cvfpu">https://github.com/openhwgroup/cvfpu</a>. The Final Degree Project "Extensiones de punto flotante para el core SweRV EH1" should be helpful, as it performs the same extension on VeeR EH1. You will find the project on the Internet and the sources at: <a href="https://github.com/aperea01/TFG-SweRV-EH1-FP">https://github.com/aperea01/TFG-SweRV-EH1-FP</a>
- 7) Add more functionality, such as providing support for: other floating-point formats (such as *double precision*), other floating-point rounding modes, a new register file for the floating-point values (note that floating point instructions that use a Floating-Point Register File are described in the RISC-V F extension), your own FP unit implementation, etc.
- 8) Add instructions from other RISC-V Extensions that are not available in the VeeR EL2 processor.
- 9) Verify the processor, including the new instructions. The Final Degree Project "Extensiones de punto flotante para el core SweRV EH1[Floating-point extensions for the SweRV EH1 core]" should be helpful, as it performs the same extension on VeeR EH1. You will find the project on the Internet and the sources at: https://github.com/aperea01/TFG-SweRV-EH1-FP