# Creating a Vivado HLS Core


This notebook will walk through the process of creating a Shared-Memory Vivado HLS core. This notebook assumes that you are familiar with Vivado HLS and have already used Vivado HLS to write, simluate, and debug a C/C++ core for hardware.


If you have not already, add Vivado HLS to your executable path. In Cygwin, you can do this by running:

``` bash
    source C:/Xilinx/Vivado/2017.1/settings64.sh
```

Or on Linux: 

``` bash
    source /opt/Xilinx/Vivado/2017.1/settings64.sh
```

These command assume that Vivado has been installed in `C:/Xilinx/Vivado` or `/opt/Xilinx/Vivado/`. If that is not the case, you should modify the commands above to match your installation path. 

This notebook assumes that you have cloned the [PYNQ-HLS repository](https://github.com/drichmond/PYNQ-HLS) to the home directory (`~`) on your computer. On our computer, this is the `/home/xilinx/` directory. 

To skip this notebook, run the following commands on your host computer: 

``` bash
     cp ~/PYNQ-HLS/pynqhls/sharedmem/ip/mmult/* ~/PYNQ-HLS/tutorial/pynqhls/sharedmem/ip/mmult/
     make -C ~/PYNQ-HLS/pynqhls/sharedmem/ip/mmult/
```

## Objectives: 

We will be creating a Shared-Memory Matrix-Multiply accelerator as a High-Level Synthesis core. Unlike the Streaming Filter, this notebook will not use a DMA engine and read/write memory shared with the ARM Processor.

This notebook will teach you how to : 
1. Create an AXI-Lite Interface for HLS Core configuration
4. Create an AXI-Master Interface for reading and writing memory shared by the ARM PS

The AXI Lite interface will be connected to the AXI-Master of the ARM processor. The AXI-Master Interface will be connected to the AXI-Slave ports of the ARM processor.

## Creating a Vivado HLS Project


We will begin by creating a Vivado HLS project. On your host computer, navigate to the following folder of the PYNQ-HLS repository using your terminal:

```bash
    cd ~/PYNQ-HLS/tutorial/pynqhls/sharedmem/ip/mmult/
```

In this directory we have provided a makefile that will: 

1. Create a `mmult` directory with a Vivado HLS project
2. Add `mmult.cpp`, `mmult.hpp`, and `main.cpp` files to the project
3. Run tests for the `mmult.cpp` file (**These will fail initially**)
4. If the tests pass, synthesize the core.

To run the makefile, run make from your current directory:

``` bash
    make
```

This will build the Vivado HLS project, but the testbench will fail because the method `mmult` is not implemented. Open the project in the Vivado HLS tool by running the command:

```bash
    vivado_hls -p ~/PYNQ-HLS/tutorial/pynqhls/sharedmem/ip/mmult/
```

This will open the following window:

<img src="pictures/vivadohls_mmult_splash.png" alt="mmult Project in Vivado HLS" style="width: 768px;"/>

## Writing Your Core

The next step is to implement the `mmult` core. Open the file `mmult.cpp`. You will see the following method body: 

``` C++

#include "mmult.hpp" // Defines mata_t, A_ROWS, B_ROWS, etc.

// mmult()
//     Implements a simple matrix-multiply function in HLS
// Parameters:
//     A - mata_t
//         A 2-dimensional array of mata_t values to be multiplied
//                  
//     BT - matb_t
//         A 2-dimensional array of matb_t values to be multiplied
//         BT is the transpose of B
//
//     C - matc_t
//         Matrix multiply output definition
// 
// The dimensions of the arrays are defined in mmult.hpp.
void mmult(const mata_t A [A_ROWS][A_COLS],
	const matb_t BT [B_COLS][B_ROWS],
	matc_t C [A_ROWS][B_COLS]){

	// Your code goes here!

}

```

As you can see, the body of the function `mmult` is blank - **this is okay**. To pass the testbench we have provided in `main.cpp` you will need to fill out the functionality. 

To pass the testbench, you will need to: 

- Multiply matrix A and matrix BT (Transpose of B) and write the result in Matrix C

To implement the core in hardware you will need to create the following interfaces: 

- AXI-Lite Slave on the control registers (Also known as `return`)
- AXI-Master for the matrix `A` argument
- AXI-Master for the matrix `BT` argument
- AXI-Master for the matrix `C` argument
- Combine the three AXI-Master interfaces into a single interface

Go ahead and try to implement your own `mmult` function!

### Our Implementation

You can define your own implementation, or fill `mmult.cpp` with the implementation below:

``` C++
#include "mmult.hpp"

// mmult()
//     Implements a simple matrix-multiply function in HLS
// Parameters:
//     A - mata_t
//         A 2-dimensional array of mata_t values to be multiplied
//                  
//     BT - matb_t
//         A 2-dimensional array of matb_t values to be multiplied
//         BT is the transpose of B
//
//     C - matc_t
//         Matrix multiply output definition
// 
// The dimensions of the arrays are defined in mmult.hpp.
void mmult(const mata_t A [A_ROWS][A_COLS],
	const matb_t BT [B_COLS][B_ROWS],
	matc_t C [A_ROWS][B_COLS]){
/* Define a new AXI-Lite bus named CTRL for offset arguments, and HLS
   Status/Control registers (return)*/
#pragma HLS INTERFACE s_axilite port=return bundle=CTRL
/* Define a new AXI4 Master bus named DATA for memory ports A, BT, and C.  The
   argument offset=slave specifies that the the pointers (offset) of A, BT, and
   C can be set using register writes in the CTRL axi slave port */
#pragma HLS INTERFACE m_axi port=A offset=slave bundle=DATA
#pragma HLS INTERFACE m_axi port=BT offset=slave bundle=DATA
#pragma HLS INTERFACE m_axi port=C offset=slave bundle=DATA

	// We use the log2 functions in mmult.hpp to determine the correct size
	// of the index variables i, j, and k. Typically, vivado will size these
	// correctly
	ap_uint<pynq::log2(A_ROWS) + 1> i = 0;
	ap_uint<pynq::log2(B_COLS) + 1> j = 0;
	ap_uint<pynq::log2(A_COLS) + 1> k = 0;

	// Perform a simple matrix-multiply with three nested for-loops
	for(i = 0; i < A_ROWS; ++i){
		for(j = 0; j < B_COLS; ++j){
			matc_t sum = 0;
			for(k = 0; k < A_ROWS; ++k){
#pragma HLS PIPELINE
				sum += A[i][k]*BT[j][k];
			}
			C[i][j] = sum;
		}
	}
}
```

## Compiling

Once you have filled the implementation, click the **Run C Simulation ** and then **Synthesize** button. This will produce the window shown below: 

<img src="pictures/vivadohls_mmult_synth.png" alt="Synthesized mmult function in Vivado HLS" style="width: 768px;"/>


## Interfaces

In the center window scroll down to view the ports. The ports list shows us the interface signals for our HLS core. The port names are unimportant in this example - what matters is the protocol field. The protocol field has four types: **s_axi_lite (AXI Lite, Slave Interface)**, **m_axi (AXI Master Interface)**, and **ap_ctrl_hs (Control Signals)**. 

For best results in Vivado (and PYNQ) your core should provide **s_axi_lite**, **axis**, or **m_axi** interfaces for data transfer. These ports are automagically recognized by Vivado and can be used in the Block Diagram editor in the **[Building a Bitstream](3-Building-A-Bitstream.ipynb)** notebook.


### AXI-Lite Interface

The AXI-Lite interface is declared using the following pragma: 

``` C
#pragma HLS INTERFACE s_axilite port=return bundle=CTRL
```

This defines a AXI-Lite interface that will be named `s_axi_CTRL` in Vivado. AXI Lite is used for configuration data. It is low-performance and uses few resources. The `pragma` in the first line above defines an AXI-Lite interface for the control registers, called the `return` argument.

<img src="pictures/vivadohls_mmult_axilite.png" alt="AXI Lite Control Port in Vivado HLS" style="width: 512px;"/>

This AXI-Lite interface will allow the ARM PS to specify the address offset of the `A` `BT` and `C` arrays described below. 

### AXI-Master Interface

The AXI-Master interface on matrices `A` `BT` and `C` are declared using the pragmas below: 

``` C
#pragma HLS INTERFACE m_axi port=A offset=slave bundle=DATA
#pragma HLS INTERFACE m_axi port=BT offset=slave bundle=DATA
#pragma HLS INTERFACE m_axi port=C offset=slave bundle=DATA
```

Each pragma defines an AXI-Master interface for `A`, `BT`, and `C`. The interfaces are combined using the `bundle=DATA` argument, which defines a single AXI-Master interface that will be named `m_axi_DATA` in Vivado.
AXI-Master is a memory-mapped bus interface driven by the HLS Core. Accesses on the array will appear as reads/writes byte addresses on the `m_axi_DATA` bus interface of the hardware core. 

<img src="pictures/vivadohls_mmult_aximaster.png" alt="AXI Master Port in Vivado HLS" style="width: 512px;"/>

Each array is associated with an offset address that is programmed before the HLS core starts computation. The read/write address of any access on `A`, `BT`, or `C` is computed by adding each array's offset to the byte-offset of the index. The argument `offset=slave` specifies that the offset is a register that can be accessed by the AXI-Slave interface. The location of each offset register can be found in the header files generated by High-Level Synthesis. For example, here is the header file `xmmult_hw.h` produced when this file is compiled: 


``` C
// CTRL
// 0x00 : Control signals
//        bit 0  - ap_start (Read/Write/COH)
//        bit 1  - ap_done (Read/COR)
//        bit 2  - ap_idle (Read)
//        bit 3  - ap_ready (Read)
//        bit 7  - auto_restart (Read/Write)
//        others - reserved
// 0x04 : Global Interrupt Enable Register
//        bit 0  - Global Interrupt Enable (Read/Write)
//        others - reserved
// 0x08 : IP Interrupt Enable Register (Read/Write)
//        bit 0  - Channel 0 (ap_done)
//        bit 1  - Channel 1 (ap_ready)
//        others - reserved
// 0x0c : IP Interrupt Status Register (Read/TOW)
//        bit 0  - Channel 0 (ap_done)
//        bit 1  - Channel 1 (ap_ready)
//        others - reserved
// 0x10 : Data signal of A_V
//        bit 31~0 - A_V[31:0] (Read/Write)
// 0x14 : reserved
// 0x18 : Data signal of BT_V
//        bit 31~0 - BT_V[31:0] (Read/Write)
// 0x1c : reserved
// 0x20 : Data signal of C_V
//        bit 31~0 - C_V[31:0] (Read/Write)
// 0x24 : reserved
// (SC = Self Clear, COR = Clear on Read, TOW = Toggle on Write, COH = Clear on Handshake) 
#define XMMULT_CTRL_ADDR_AP_CTRL   0x00
#define XMMULT_CTRL_ADDR_GIE       0x04
#define XMMULT_CTRL_ADDR_IER       0x08
#define XMMULT_CTRL_ADDR_ISR       0x0c
#define XMMULT_CTRL_ADDR_A_V_DATA  0x10
#define XMMULT_CTRL_BITS_A_V_DATA  32
#define XMMULT_CTRL_ADDR_BT_V_DATA 0x18
#define XMMULT_CTRL_BITS_BT_V_DATA 32
#define XMMULT_CTRL_ADDR_C_V_DATA  0x20
#define XMMULT_CTRL_BITS_C_V_DATA  32
```

### ap_ctrl_hs ports

**ap_ctrl_hs** signals provide clock, reset, and interrupt ports.

<img src="pictures/vivadohls_mmult_ap_ctrl.png" alt="AP Ctrl Port in Vivado HLS" style="width: 512px;"/>


## Testing and Recompiling the Core

Once this process has been completed, you can re-compile the HLS core and run the tests by executing the following commands:

```bash
    cd ~/PYNQ-HLS/pynqhls/sharedmem/ip/mmult
    make clean mmult
```

If the tests pass, proceed to the **[Building a Bitstream](3-Building-A-Bitstream.ipynb)** notebook.