

# VectorBlox MXP Quickstart for Altera

September 02, 2015 VectorBlox

# **Contents**

sections.

| 1 | Installation |                                |    |
|---|--------------|--------------------------------|----|
|   |              | Prerequisites                  |    |
|   | 1.2          | Installing                     | 3  |
| 2 | Hard         | dware                          | 4  |
|   | 2.1          | VectorBlox MXP Qsys Parameters | 5  |
|   | 2.2          | Qsys Connections               | 7  |
|   | 2.3          | Qsys System Generation         | Ś  |
| 3 | Soft         | tware                          | 10 |
|   | 3.1          | Prerequisites                  | 10 |
|   | 3.2          | Configuring the board          | 10 |
|   | 3.3          | Running the software           | 11 |
|   |              |                                |    |

If you do not have a MXP license, or are only intererested in software development, skip over the hardware specific

# 1 Installation

This guide describes installing VectorBlox MXP to run with Qsys and the Altera Nios II Embedded Design Suite (EDS).

# 1.1 Prerequisites

Before you begin, make sure you have:

- · Altera Quartus II, version 13.0 or later. Download it from Altera.
- For releases containing the SDK and hardware IP: The VectorBlox MXP Software Development Kit (SDK) and encrypted MXP IP core, provided as a .zip file.
- For releases containing only the SDK: The VectorBlox MXP Software Development Kit (SDK), provided as a .zip file.
- One of the supported development boards, if you want to use one of the pre-built FPGA bitstreams included in the SDK.

Note: Reference designs for the Altera/Terasic DE4 board were created with Quartus II 14.1. The UniPHY memory controller constraints in the QSF file are not backwards compatible with the UniPHY from older versions of Quartus II.

# 1.2 Installing

· Install Altera Quartus II software according to Altera's instructions.

http://www.altera.com/download/dnl-index.jsp

Note the pathname where these tools are installed. For example, under Windows, the default install location for version 14.0 is C:\ALTERA\14.0\quartus and the environment variable QUARTUS\_ROOTDIR will be set to this location.

- Follow Altera's instructions for installing the USB Blaster device driver.
  http://www.altera.com/download/drivers/dri-index.html
- Copy vectorblox/repository/altera/mxp and vectorblox/repository/altera/ncs\_shim into QUARTUS\_ROOTDIR/../ip/vectorblox You may have to create that directory.

To install the license file (provided separately) for the hardware IP, please refer to the Altera Software Installation and Licensing manual.

# 2 Hardware

A VectorBlox MXP system always uses a Nios II/f as a host CPU. This host CPU provides the MXP with instructions, and run the unaccelerated portions of your C/C++ application. You should always start with a working Nios II/f system in Qsys, then add MXP to it.

This section describes how to instantiate the VectorBlox MXP processor into a Qsys system. Some familiarity with Qsys is assumed; please refer to the *System Design with Qsys* section of the Quartus II Handbook for further details on Qsys.

You must ensure that the VectorBlox components are in the Qsys IP Search Path. If you extracted the Vectorblox MXP distribution into QUARTUS\_ROOTDIR/../ip or into the ip subdirectory of your Quartus project, Qsys will be able find the VectorBlox components automatically. Otherwise, you can add a directory to the IP Search Path in Qsys; from the *Tools* menu, open the *Options* dialog box.



Figure 1: Qsys Component Library

The figure above shows the VectorBlox components in the Qsys Component Library.

The **Custom Instruction Slave Shim** component is glue logic required to connect a Nios II CPU to the VectorBlox MXP.

# 2.1 VectorBlox MXP Qsys Parameters

To add a VectorBlox MXP instance to your Qsys system, double-click on *VectorBlox MXP*. Qsys will open the parameter editor shown in the figure. The parameters are described below.

#### **Number of Vector Lanes**

The number of 32-bit vector lanes. This must be a power of 2. Each vector lane requires about the same amount of resources as the Nios II/f host CPU.

#### **Number of Memory Lanes**

The data bus width of the MXP DMA Engine's Avalon-MM master interface expressed in terms of 32-bit lanes. The number of memory lanes must be a power of two and no larger than the number of vector lanes. For best performance, the width of this interface should match the width of the Avalon slave port of the external memory controller.

#### **Maximum Burst Size in Beats**

The maximum number of beats per burst issued by the DMA Engine's Avalon-MM master interface. A beat is a clock cycle in which data is transferred between a master and slave interface. Using a larger value can improve the bandwidth between the MXP and slave devices. However, a very large value could potentially starve other masters (such as the Nios II/f host) from accessing a slave. Likewise, very small values will divide long transfers int very short bursts, losing potential bandwidth during the gaps.

## **Scratchpad Size**

The Scratchpad RAM size in kilobytes. VectorBlox recommends 4KB per vector lane for most applications.

## **Instruction Port Type**

The MXP can be controlled using several different instruction ports. To connect an ARM host processor to the MXP use the AXI Memory Mapped Port. To connect A NIOS II host processor to the MXP use the Nios Custom Instruction Port. The Nios Custom Instruction (with shim) is provided as a workaround if the QSYS or BSP generation tools fail while creating a multiprocessing system.

# **Multiplier Performance**

This sets the minimum multiplier size. This can be used to reduce the number of hard multipliers used, at the cost of performance. If set to *Byte*, then byte, halfword, and word multipliers are instantiated and multiplication of any element size runs at full speed. If set to *Halfword*, only word and halfword multipliers are instantiated; byte-width multiplication will be executed with the halfword multiplier and run at half speed. If



set to *Word*, only word multipliers are instantiated; halfword-width multiplication will run at half speed and byte-width multiplication will run at quarter speed.

#### **Fixed-Point Multiply Format**

These parameters affect the fixed-point multiply operation. They specify the number of least-significant bits that will be used to represent the fractional part of 32-bit, 16-bit, and 8-bit fixed-point numbers.

The fixed-point formats are also displayed in Q notation, where the first number specifies the number of integer bits and the second number specifies the number of fractional bits.

Fixed-point formats with no integer bits (Q0.X) are not allowed. Multiply high (VMULHI) is equivalent to fixed-point multiply for these formats and should be used instead.

Below the editable parameters are the Derived Avalon Parameters, which show the inferred widths of various Avalon ports:

#### **DMA Master Data Bus Width**

The data bus width, in bits, of the DMA Engine's Avalon-MM master interface. This is derived from the number of memory lanes.

#### **DMA Master Burst Size**

The Avalon-MM master interface's maximum burst size in bytes, as determined by the memory bus width and the maximum number of beats per burst.

### **DMA Master Burstcount Width**

The width of the Avalon-MM master interface's burstcount output, as determined by the maximum burst size and the memory bus width.

#### Slave Address Width

The MXP's Avalon-MM slave address width as determined by the scratchpad size.

## 2.2 Qsys Connections

This section describes the MXP processor's interfaces, shown in the figure below.

## instr\_conduit

In some multiprocessing systems, the QSYS and BSP generation tools fail, VectorBlox provides the following workaround:

The MXP connects to the Nios II CPU's Custom Instruction Master interface via a separate component, the *Custom Instruction Slave Shim.* The Nios CPU's Custom Instruction Master interface connects to the shim's



Custom Instruction Slave interface, and the shim connects to the MXP core via an Avalon Conduit interface we call an instruction conduit. The figure above shows how a Nios CPU, MXP core, and Custom Instruction Slave Shim should be connected together in Qsys.

#### ncs

The MXP connects to the Nios II CPU through through a custom instruction port Make sure the Opcode base is set to 0.

## axi\_instr\_slave

When Using an ARM as a host processor, connect the MXP to the processor via the axi memory mapped slave.

## dma\_master

This is the MXP DMA Engine's interface to external memory. The data bus width is determined by the number of memory lanes selected in the MXP Parameter Editor.

## scratchpad slave

This slave interface provides external access to the MXP's scratchpad memory. The address width is determined by the scratchpad size.

## core\_clk

This is the main MXP clock. The MXP's Avalon Memory Mapped Master and Slave interfaces run synchronously to **core\_clk**. **core\_clk** has to be synchronous to the Nios II CPU clock.

## core clk 2x

This clock must be double the frequency of **core\_clk** and must be synchronous to **core\_clk**. It should be generated from the same PLL that provides **core\_clk**.

## core\_reset

This is the active high reset input. Assertion and deassertion must be synchronous to **core\_clk** and **core\_clk\_2x**.

# 2.3 Qsys System Generation

Once all of the components in your Qsys system have their interfaces connected correctly, you can proceed to Qsys generation by clicking on the *Generation* tab, setting options, then clicking *Generate*.

To generate synthesizable HDL, check *Create HDL design files for synthesis* in the Synthesis subsection.

To generate a simulation model, you must have a VectorBlox license that supports IP Functional Simulation (IPFS) model generation in either Verilog or VHDL (or both). If your license supports it, select either Verilog or VHDL from the *Create Simulation Model* drop-down menu in the Simulation subsection.

# 3 Software

This section describes how to download one of the provided pre-built FPGA bitstreams to a development board and how to compile and run a test program on it.

If you have a VectorBlox MXP hardware IP release, the string **EXAMPLES** refers to the <code>examples</code> subdirectory of the extracted release.

If you have downloaded the VectorBlox MXP preview release from github, the string **EXAMPLES** below refers to the top-level directory from the extracted download.

# 3.1 Prerequisites

Before you begin, make sure you have:

· Installed Altera Quartus II 13.0 or later.

Quartus II 14.1 is the currently recommended version, as this is the version that was used to create the pre-built bitstreams and BSPs.

Version 14.1 or later of the Quartus II Programmer is required to download the pre-built bitstreams to the FPGA. You can download the latest version of the Quartus II Programmer as a standalone program from the Altera web site if you do not wish to install a full upgrade of Quartus II.

- · Installed and tested the USB Blaster device driver.
- A supported development board. See the contents of EXAMPLES/boards/.
- Connected your development board to your computer via USB cable.

# 3.2 Configuring the board

# 1. Start a Nios II Command Shell

This sets up the proper environment variables for using the Nios II compiler system. Under Windows, this is a bash-shell under the Cygwin environment.

#### 2. Connect to the board

Run jtagconfig which finds the USB Blaster and connects to the development board. If the response is "No JTAG hardware available," be sure the USB cable is connected and the board is turned on.

If more than one board is connected, the target cable will have to be specified in all the commands below, adding -c CABLE\_NUM to each command.

## 3. Configure the FPGA

Navigate to one of the prebuilt VectorBlox MXP systems for your development board, located in EXAM PLES/boards/<board\_name>/prebuilt\_\*.

For example, for the DE2-115 board, change directory to EXAMPLES/boards/de2\_115/prebuilt\_de2\_115\_v16 for a 16-lane MXP system.

From here, download the .sof file to the FPGA board using the command line:

```
quartus_pgm -m JTAG -o P\;vblox1.sof
```

Alternatively, you may start the GUI using <code>quartus\_pgmw</code>, press <code>Add File...</code> and choose the <code>.sof</code> file, press <code>Hardware Setup...</code> and choose <code>USB Blaster</code> in the pulldown list, and finally press the <code>Start</code> button. Now, assuming you are still in the directory containing the SOF, store the absolute path of the pre-built system's Board Support Package in a shell variable. This will save typing later. In Linux, run

```
BSP_ROOT_DIR=`pwd`/bsp
```

In Windows, Windows-style paths are expected by Altera's software Makefiles, so instead use

```
BSP_ROOT_DIR=`cygpath -wa bsp`
```

# 3.3 Running the software

#### 1. Make the executable

Navigate to a test application such as vbw\_vec\_add\_t, located in EXAMPLES/software/bmark/vbw\_vec\_add\_t.

To compile the program for the FPGA bitstream you previously selected, you need to pass the absolute path of the BSP to make. Assuming you have already saved this path in the BSP\_ROOT\_DIR shell variable as described earlier, you can run

```
make clean_all all BSP_ROOT_DIR=$BSP_ROOT_DIR
```

After a long list of messages and a few seconds, the file test.elf should have been created. If you receive the error "../vblox1.sopcinfo has been modified since the BSP was generated.", run the command touch \$BSP\_ROOT\_DIR/public.mk to resolve it.

Note that we used the <code>clean\_all</code> target first to ensure that any libraries that might have been compiled against a different BSP in the past were cleaned and re-compiled. If you know that the libraries were already compiled for the selected BSP, you can omit the <code>clean\_all</code> target.

#### 2. Download the executable

Download and run the executable from the command line:

```
nios2-download -g -r test.elf
```

After the download completes, the flag -r resets the Nios II computer system, and the flag -g tells the computer to 'go' or execute the program.

## 3. View execution output

The executable may use stdin or stdout for printf(), scanf(), and similar functions. A terminal emulator can connect with the Nios II computer system to provide this I/O behaviour by running

```
nios2-terminal
```

If the Nios II program wants to disconnect with the terminal, it can print a Control-D character with

```
putchar(4);
```

To disconnect manually host-side, type C-c. Note that Control-C does not actually stop the application from running. However, with the nios2-terminal disconnected, the application may eventually block if it fills its printf() buffer and cannot flush the output device.