# Hardware acceleration of video processing

**Pieter BERTELOOT** 

Supervisor: Sammy Verslype

Academic year: 2017-2018

# **Abstract**

¾ van bladzijde

# Inhoud

| Abs | stract .         |                             | 1  |  |
|-----|------------------|-----------------------------|----|--|
| 1   | Intro            | Introduction                |    |  |
|     | 1.1              | Objective                   | 3  |  |
| 2   | Ove              | rlays                       | 4  |  |
|     | 2.1              | Base overlay                | 4  |  |
|     | 2.2              | Rebuilding the base overlay | 5  |  |
|     | 2.3              | Creating our first IP       | 7  |  |
| 3   | Video processing |                             |    |  |
|     | 3.1              | Video signal                | 10 |  |
|     |                  | 3.1.1 Frontend              | 10 |  |
|     |                  | 3.1.2 VDMA                  | 11 |  |
|     |                  | 3.1.3 HDMI out              | 11 |  |
|     |                  | 3.1.4 Video pipeline        | 11 |  |
|     | 3.2              | Processing the signal       | 11 |  |
|     |                  | 3.2.1 Passthrough           | 11 |  |

# 1 Introduction

#### 1.1 Objective

The objective of this project is to study the use of FPGA hardware resources to accelerate software processes with focus on video processing. All elements are discussed in this report how the final project is realized and can be used as beginner guide for new PYNQ users. All HSL, bit, tcl, python files are on github.

https://github.com/Pieter-Berteloot/PYNQ\_Projects

This project makes use of the PYNQ-Z1 board. This board is the hardware platform for the PYNQ open-source framework. This includes ARM A9 CPUs where the following software runes:

- Linux
- Python
- Jupyther notebook
- Hardware libraries and API for the FPGA

These are used to create a user-friendly and customizable video processing system.

Hardware libraries are the programmable logic circuits and are called overlays. These overlays are like software libraries, the programmer can select the overlay that matches their application the best. The advantage of using these overlays is that once an overlay is build, it can be reused in other applications. Overlays are discussed in detail in the next chapter.

#### 2 OVERLAYS

Overlays are programmable and configurable FPGA designs. These overlays are used to accelerate software applications. PYNQ provides a python interface that allows overlays to be controlled in the processing system.

An overlay includes:

Bitstream file

File that contains the programming information for the FPGA.

TCL file

Determines the available IPs

Python API

Handles the configuration and communication with the IPs

The default base overlay is loaded at boot time on the PYNQ board. This overlay can be replaced with other overlays while the system is running. To gain a better understanding about overlays, we will take a closer look at the base overlay.

#### 2.1 Base overlay

The base overlay allows the PYNQ to use the peripherals (video, audo, GPIOs, ...) that are on the board. It connects the IP blocks to the Zynq processing system. These peripherals can then be used from the Python environment. Let's now take a look what's inside this base overlay. To do this, we must rebuild the overlay following these steps:

- 1. First clone/download the board files and overlays from the PYNQ github page: https://github.com/Xilinx/PYNQ.
- 2. Open Vivado Design Suite (for this project Vivado 2016.2 is used) and run the following code in the TCL console:

```
cd <PYNQ repository>/boards/Pynq-Z1/base
vivado -mode batch -source build_base_ip.tcl
vivado -mode batch -source base.tcl
```

3. Wait until both scripts have finished (this will take some time). When this is done the base overlay can be found in:

```
<PYNQ repository>/boards/Pynq-Z1/base/base
```

This base overlay will be used as starting point for our project because it already defines all the configurations needed for the processing system interface and the peripherals. An important part in the block design of the overlay is the processing system AXI peripherals. This is a General-Purpose AXI-Lite interface (GP0) that controls and configures IP blocks in the design and runs on a 100MHz clock.

### 2.2 Rebuilding the base overlay

Every peripheral can be found in the base overlay. The routing of these blocks takes time and hardware so to reduce this, we will only keep the components that are needed for video streaming/processing. Our edited base overlay can be found on the next page.

The following blocks are needed for video streaming:

- AXI interface
- ZYNQ processing system
- System interrupts
- Reset processing system fclk0
- Reset processing system fclk1
- Video

The rest of the IP block have been removed from the block design. To prevent errors, the deleted input and output signals are removed from the top.v file that can be found in the project Manager.

The **Address Editor** is also a very important subject in the IP Integrator. Here we can see the the Offset Address of each IP. This address will later be used for **Memory-mapped I/O** (MMIO). When a new IP, that has AXI-Lite communication, the user has to map the IP to give it an Address.



Figure 2-1



Figure 2-2

#### 2.3 Creating our first IP

Now that we know how to create, edit and communicate with the overlay, we can create our own IP. Let's start with a simple adder. The objective is to make an IP that has 2 integer values as input and 1 integer value as output. The user provides these 2 integers and the IP calculates the sum.

For constructing this IP, Vivado High Level Synthesis (HLS) is used. HLS is used to transform complex algorithms into VHDL code. It accelerates the IP creation transforming C, C++ and System C code to VHDL code.

When creating a new project using the PYNQ-Z1 board, select the xc7z020clg400-1 board part.

Let's analyze the following code:

```
#include <ap_fixed.h>
#include <ap_int.h>

void add_function( int a, int b, int *c) {

#pragma HLS INTERFACE s_axilite port=return bundle=control

#pragma HLS INTERFACE s_axilite port=a bundle=control

#pragma HLS INTERFACE s_axilite port=b bundle=control

#pragma HLS INTERFACE s_axilite port=c bundle=control

#pragma HLS INTERFACE s_axilite port=c bundle=control

*c = a + b;
}
```

First, we start with importing C++ libraries so we can use the Fixed-Point Data Types and Integer Data Types.

Then we have our TOP function. This function is very important because the arguments of the top functions are the interfaces. These will become ports on the RTL design and directives can be specified on these to specify the IO protocol ports. We use the axilite protocol for communication. The pragmas define this protocol.

And at last, the functionality of the IP is programmed. When this is done, C synthesis and RLT export can be performed.

Now we can import the IP in our overlay. To do this, import the IP in the IP catalog (project manager -> IP Catalog) and add the IPs repository. Once this is done, open the block design and add the IP as shown in Figure 2-3. Connect the AXI control input to an open AXI connection on the PS AXI periph.



Figure 2-3

Now we can assign an address to our IP in the **Address Editor**. Figure 2-4 shows how this is done.



Figure 2-4

In my case the **offset address** is 0x43C8\_0000. Now let's check what's inside this register. In the HLS-project, open **add\_function\_control\_s\_axi.vhd** (solution1 -> syn -> vhdl). Scroll down till you see the Address Information. Tabel 2-1 shows the signals that our important for us.

Tabel 2-1

| Address | Name             | Function                                                                      |
|---------|------------------|-------------------------------------------------------------------------------|
| 0x00    | Control signals  | Controls the ip, bit 0 makes the ip start, bit 1 will be high when it's done, |
| 0x10    | Data signal of a | Stores integer a                                                              |
| 0x18    | Data signal of b | Stores integer b                                                              |
| 0x20    | Data signal of c | Stores integer c                                                              |

Now we are ready to generate our BIT- and TCL file. To do this generate the bitstream and run the following code in the Tcl console:

write\_bd\_tcl top.tcl

**Note**: generating the BIT-file can take some time.

Once this is done, copy the BIT-and TCL file to the following location:

\\192.168.2.99\xilinx\pynq\overlays\base

And run the code seen in Figure 2-5 in a notebook.

```
In [2]: from pynq import Overlay
        from pynq import MMIO
        from pynq.lib.video import *
        base = Overlay("/home/xilinx/pynq/overlays/base/top.bit")
        base.download()
In [3]: add example = MMIO(0x43C80000,0x10000)
In [4]: add example.write(0x10,3)
        print("Integer a:",add_example.read(0x10))
        add_example.write(0x18,5)
        print("Integer b:",add_example.read(0x18))
        Integer a: 3
        Integer b: 5
In [5]:
        add_example.write(0x00,1)
In [6]: print("Integer c=",add_example.read(0x20))
        Integer c= 8
```

Figure 2-5

Now that we have successfully created and use the overlay, we can use this to create more complex systems. In the next chapter will the focus be on creating the Sobel edge detection filter in a video stream.

# 3 VIDEO PROCESSING

#### 3.1 Video signal

Before the video can be processed, we will take a closer look on the video signal is transmitted in the base overlay. Here we can divide the video processing in different parts:

- HDMI in
  - Frontend
  - Color\_convert
  - Pixel\_pack
    - Dvi2RGB decoder
    - Video in to axi4-stream
- VDMA
- HDMI out
  - o Pixel\_unpack
  - Color\_convert
  - Frontend

#### 3.1.1 Frontend

The HDMI signal is transmitted in a transition minimized differential signal (TDMS). The DVI to RGB video decoder decodes this signal and transforms it to and RGB signal. This IP outputs a 24-bit RGB signal with V synq and H synq signals. The video in to axi4-stream converts this signal to the **Xilinx video protocol**.

| Function       | Width               | Direction | AXI4-Stream Signal Name | Video Specific Name |
|----------------|---------------------|-----------|-------------------------|---------------------|
| Video Data     | Any number of bytes | Out       | m_axis_video_tdata      | DATA                |
| Valid          | 1                   | Out       | m_axis_video_tvalid     | VALID               |
| Ready          | 1                   | In        | m_axis_video_tready     | READY               |
| Start Of Frame | 1                   | Out       | m_axis_video_tuser      | SOF                 |
| End Of Line    | 1                   | Out       | m_axis_video_tlast      | EOL                 |

The following signals are important for us:

Video data

Contains the video data which is 24 bit (8 bit for each color).

Start of Frame

Start of frame indicates that the first pixel of a new frame is transmitted.

End of Line

End of Line indicates that the last pixel of a line is transmitted.

More information about this protocol can be found on: <a href="https://www.xilinx.com/support/documentation/ip\_documentation/axi\_videoip/v1\_0/ug934\_axi\_videoIP.pdf">https://www.xilinx.com/support/documentation/ip\_documentation/axi\_videoip/v1\_0/ug934\_axi\_videoIP.pdf</a>

#### 3.1.2 VDMA

The video direct memory access is designed to allow for efficient high-bandwidth access between the AXI4-+stream video interface and the AXI4 interface. This IP reads and writes frames to the memory.

#### **3.1.3 HDMI out**

HDMI out is the same as HDMI in but now it transforms the Xilinx video protocol to HDMI signal.

#### 3.1.4 Video pipeline

For more information about the video pipeline and how to use it in the PYNQ notebooks can be found in the hdmi\_video\_pipeline notebook.

# 3.2 Processing the signal

The video processing will take place in the HDMI-in package show in Figure 3-1



Figure 3-1

#### 3.2.1 Passthrough