<table align="center"><tr><th> <div style="width:600px"> <h4>AGH University of Krakow <br><br> Faculty of Computer Science, Electronics and Telecommunication <br><br> Institute of Electronics </h4></div></th><th> <div style="width:200px"><img src="./img/logo_agh.png" width="68" height="136"/></div> </th></tr></table>

---

<div style="text-align:center"><h3>CUSTOM SYSTEM DESIGN IN FPGA LABORATORY</h3></div>
<br>
<div style="text-align:center"><h1> PYNQ - introduction </h1></div>
<br>

> **NOTE:** This tutorial shows how to start your adventure with the PYNQ system using the [AMD-Xilinx KRIA KV260](https://www.amd.com/en/products/system-on-modules/kria/k26/kv260-vision-starter-kit.html) development platform.<br>After rebuilding the project in Vivado, all examples can be run on any FPGA SOC platform like [Zedboard](https://digilent.com/reference/programmable-logic/zedboard/start), [Zybo](https://digilent.com/reference/programmable-logic/zybo/start).


## Contents

* [Introduction](#Introduction)
* [What is PYNQ?](#What-is-PYNQ?)
* [Connection and communication with the development board](#Connection-and-communication-with-the-development-board)
* [First steps with PYNQ](#First-steps-with-PYNQ)
* [Creating own overlay](#Creating-own-overlay)
* [Using a new overlay with PYNQ](#Using-a-new-overlay-with-PYNQ)

<div style="text-align:right"><h5> ver 0.2.1 </h5></div><br>

# Introduction

## Objectives

In this tutorial the PYNQ environment will be introduced. Basic functionalities will be presented based on a simple project. The next section will show you how to create your own Vivado project that works with PYNQ.

## Prerequisites

Digital design background and will be necessary for this tutorial. The prior knowledge of Verilog/System Verilog will be helpful but not absolutely necessary. Vivado 2022.1 design tool will be use to build the design files. Basic python language skills will be helpful.


# What is PYNQ?

PYNQ stands for Python Productivity for Zynq. It's an open-source project from AMD-Xilinx that aims to make it easier to use Xilinx Zynq System on Chips (SoCs) with Python.

Zynq SoCs combine a processing system based on ARM Cortex-A cores with programmable logic fabric, allowing users to create customized hardware accelerators. PYNQ provides a framework for software developers and engineers to leverage the benefits of programmable logic in their applications, using familiar Python programming language and libraries.

Key features of PYNQ include:

- Python Integration: PYNQ allows users to control and interact with the programmable logic fabric using Python scripts, making it accessible to a wider audience of software developers and engineers.

- Hardware Abstraction: PYNQ abstracts away many of the complexities of working with programmable logic, providing higher-level abstractions that simplify hardware design and integration.

- Jupyter Notebooks: PYNQ uses Jupyter Notebooks as the primary development environment, enabling interactive and collaborative development experiences.

- Community and Ecosystem: PYNQ has a growing community of developers contributing to the project, as well as a range of available libraries and resources to support development.

<img src="./img/pynq_layers.png" width="800"/>

### PYNQ install

During the laboratory classes students get already prepared SD cards with Ubuntu and PYNQ setup.<br>
To install on your own SD card, just follow the instructions on the official [GitHub repository](https://github.com/Xilinx/Kria-PYNQ).

# Connection and communication with the development board


> **STEP 1:** Connect cables to the KRIA
>
>> A. Insert the microSD card to the J11<br>
>> B. Connect micro-USB cable to J4<br>
>> C. Connect RJ45 cable<br>
>> D. Connect power supply to J12<br>
>
> **WARNING:** After connecting to power supply the fan located at the KRIA board will start at full speed. It will slow down after correct boot of Ubuntu OS, in approx. 3 minuts.

<img src="./img/kria_plugs.png" width="300"/>


> **STEP 2:** Connect to the network
>
> PYNQ is working most efficient if it has access to the Internet. If available, you should connect your board to a network or router with Internet access. This will allow you to update your board and easily install new packages. There are two options:
>
>> LEFT option: Connect the KRIA with RJ45 directly to the PC (embedded network card, or dongle **USB<->Ethernet** included in the lab box)<br>
>> RIGHT option: Connet the KRIA to the router<br>

<img src="./img/kria_setup_network.png" width="800"/>

> You will need to have an Ethernet port available on your computer, and you will need to have permissions to configure your network interface. With a direct connection, you will be able to use PYNQ, but unless you can bridge the Ethernet connection to the board to an Internet connection on your computer, your board will not have Internet access. You will be unable to update or load new packages without Internet access.<br>
>
>> In the example below **Ethernet** is PC connection to LAN, and **Ethernet 3** is connection with KRIA. Use **View Network Connections** to set **Internet Connetion Sharing** option.<br>In case of no Received Bytes on **Ethernet 3** network, the best solution is to *disable and enable* network sharing in the **Ethernet** Properties tab.

<img src="./img/setup_ethernet.png" width="800"/>

> **STEP 3:** Open a USB Serial Terminal
>
>  You can use the terminal to check the network connection of the board. [PuTTY](https://www.putty.org/) is one application that can be used. To open a terminal, you will need to know the COM port for the board. Use 115200 baud rate. To login use data:
>
>> username: ubuntu<br>
>> password: kriasdup<br>
>
> You can check the HOSTNAME and IP address of the board using *ifconfig*. 
> In this example: 
>
>> HOSTNAME: kriaSDUP-0<br>
>> IPaddress: 192.168.137.227

<img src="./img/kria_putty.png" width="500"/>

> **STEP 4:** Connecting to Jupyter Notebook (LAB)
>
> Once your board is setup, to connect to Jupyter Notebooks open a web browser on your PC and navigate to:
>
>> [HOSTNAME:9090/lab](HOSTNAME:9090/lab) or <br>
>> [IPaddress:9090/lab](IPaddress:9090/lab)<br>
>
> To login use data:
>
>> password: xilinx<br>
>
> You can use drag-and-drop to copy files to the board.<br>
> To download file from the board use option File -> Download.

<img src="./img/kria_jupyter.png" width="600"/>

<div class="alert alert-success"><strong>NOTE:</strong> Starting from now you can work on a copy of this document.<br> Create your work folder (ex. /sdup/student) and copy to it extracted archive (T8_pynq_notebook.zip from UPEL webpage) which consists of jupyter notebook file and a folder with images.<br>Then just open the notebook file on KRIA and go the the next step.</div>

# First steps with PYNQ

Start with download file **kv260_sdup.xsa** - the training overlay from UPEL webpage. Move this file to KRIA to your work directory (ex. /sdup/student/), the same where you have your jupyter notebook.

A simple Vivado design was created with a Timer, BRAM, AXI GPIO controller and DMA with data FIFO connected (diagram below). The AXI GPIO is connected to the Pmod on the KRIA KV260.

The Xilinx support archive (XSA) file was created. It includes the bitstream file (.bit), and the Hardware Hand-off file (.hwh). The bitstream is used to program the Zynq Programmable Logic (PL) with your custom hardware design. The Hwh is usually exported to the Xilinx SDK tools for building software applications for your system. Both files (connected as a .xsa file) contain information about the system including clocks, and settings, IP and the system memory map. These files can be parsed in PYNQ and the information used to help use the design from Python. 

<img src="./img/vivado_diagram_base.png" width="1400"/>

<div class="alert alert-info"><strong>NOTE:</strong> During laboratory classess use provided by the instructor mini board with LEDs. Connect it to the PMOD port on KRIA. Be aware where is the GND on both parts.</div>
<img src="./img/pmod.png" width="500"/>

### Instantiate the Overlay

In [None]:
from pynq import Overlay

kv260_sdup_ov = Overlay("kv260_sdup.xsa")

<div class="alert alert-warning"><strong>NOTE:</strong> The fan on the KRIA board will go to full speed. This is because we load new bitstream in which there is no control for the fan speed.</div>

### Check list of IPs in the design
Before using the design, you need to know what IPs are available in loaded bitstream. You could determine this based on the Vivado design you created.

The Overlay can be queried to determine the available IPs.

In [None]:
kv260_sdup_ov?

Notice that AXI Timer have been assigned the pynq.overlay.DefaultIP driver, and AXI GPIO IP have been assigned the pynq.lib.axigpio.AxiGPIO driver. In the case of BRAM, a memory, only MMIO is needed to read and write memory locations.

Overlay IP information can also be read from the ‘ip_dict’. This will print a complete list of IP in the design, and various properties.

In [None]:
kv260_sdup_ov.ip_dict

Once you know the IP names, and hierarchical paths, you can set up more convenient handles.

In [None]:
timer = kv260_sdup_ov.axi_timer_0
bram = kv260_sdup_ov.axi_bram_ctrl_0
pmod = kv260_sdup_ov.axi_gpio_0

### MMIO read / write operations
This example show how to use AXI IP using provided read() and write() methods for MMIO objects.

We will make changes to the content of BRAM memory.
Start by checking the help for the AXI BRAM.

In [None]:
bram.read?

In [None]:
bram.write?

<div class="alert alert-success"><strong>TASK:</strong> Using read/write methods write a value 0x321 to the 0x124 location in BRAM memory, and check the result.</div>

In [None]:
bram.write(0x124, 0x321)

In [None]:
bram.read(0x124)

Another way is writing an array which is more faster. The PYNQ has a NumPy array object implemented.

Lets write and read back an array.

In [None]:
bram.mmio.array[0:4] = [(17 + x) for x in range(4)]

In [None]:
bram.mmio.array[0:4]

### Register Map operations

This example show how to access and use the PYNQ register map information for an AXI IP. We will play with [AXI Timer IP](https://docs.amd.com/v/u/en-US/pg079-axi-timer).

Lets start with displaying the content of register map, help funtion, and finaly starting the timer.

In [None]:
timer.register_map

In [None]:
help(timer.register_map)

In [None]:
timer.register_map.TCSR0.ENT0 = 1

<div class="alert alert-success"><strong>TASK:</strong> Check if timer actualy is counting (read TCR0 register). Using RegisterMap and AXI Timer IP documentation set timer to Auto Reload mode.</div>

### AXI GPIO
Because AXI GPIO IP have been assigned the pynq.lib.axigpio.AxiGPIO driver, we can use MMIO operations and RegisterMap

In [None]:
pmod.register_map

This is an example of using interactiv slider for manipulating LEDs.

In [None]:
from ipywidgets import interact
import ipywidgets as widgets

def click(x):
    pmod.write(0, x)

In [None]:
interact(click, x=widgets.IntSlider(min=0, max=15, step=1, value=5));

### AXI DMA
This example shows how to use a DMA in PYNQ. We will send data from PS DRAM to an IP in the Zynq, and read back data from IP to the PL and write it to PS DRAM.

The DMA has an AXI_Lite slave control interface, and two AXI master connections to PS HP ports that allow access to the PS DRAM.<br>
The AXI Stream FIFO is connected to DMA in a way that it is creating a loop-back channel. FIFO is not visible for PYNQ.

Lets start with creating some data to send, and buffer to retrieve it.

In [None]:
from pynq import allocate
import numpy as np

data_size = 1000000
in_buffer = allocate(shape=(data_size,), dtype=np.uint32)
out_buffer = allocate(shape=(data_size,), dtype=np.uint32)

for i in range(data_size):
    in_buffer[i] = 0xcafe0000 + i

Now we are ready to carry out DMA transfer from DDR to FIFO.

In [None]:
dma = kv260_sdup_ov.axi_dma_0
dma.sendchannel.transfer(in_buffer)

While transfering data lets see the content of control registers, which we have access via AXI_LITE

In [None]:
dma.register_map

Before reading back data lets check buffer content.

In [None]:
for i in range(10):
    print('0x' + format(out_buffer[i], '02x'))

Transfer data and read buffer once again. After that compare buffers. At the end do the cleaning.

In [None]:
dma.recvchannel.transfer(out_buffer)
for i in range(10):
    print('0x' + format(out_buffer[i], '02x'))

In [None]:
print("Buffers are equal: {}".format(np.array_equal(in_buffer, out_buffer)))
del in_buffer
del out_buffer

<div class="alert alert-success"><strong>TASK:</strong> Write a simple script to measure speed of data transfer via FIFO.</div>

# Creating own overlay

In this section we will create new overlay step by step. For this purpose we will use AMD-Xilinx Vivado 2022.1 software.

Start with downloading from the UPEL webpage archive with IPs. Uncompress it, locate folder **ip_repo** and copy it to your vivado working directory.

We will connect to the Zynq Processing System two IPs: 
* sequential implementation of CORDIC, based on tutorial 3 - Implementation of the system with a sequential accelerator
* pipelined CORDIC accelerator, based on tutorial 4 - The pipelined sine/cosine cordic processor

Sequential version of CORDIC is using AXI Lite interface. We will connect it to the Zynq PS, and interact with it exatly the same way as with AXI Timer or AXI GPIO.
On the other hand, pipelined accelerator will be connected directly to the DMA using AXI Stream interface, just like FIFO from previous section. 


> **STEP 1:** Create a Vivado project.
>
>> *File -> Project -> New*
>
>> Set **Project name** to *kv260_cordic*
>
>> In **Project type** leave option *RTL project*
>
>> As **Default Part** in *Boards* tab choose *Kria KV260 Vision AI Starter Kit*
>
>> *Tools -> Settings -> IP > Repository* add **ip_repo** - vivado should recognise two IPs

> **STEP 2:** Create a block diagram.
>
>> *Flow -> Create Block Design* and name it *kv260_cordic*
>
>> Add IPs:
>> * Zynq UltraScale+ MPSoC **zynq_ultra_ps_e_0**
>> * cordic_seq_v1.0 **cordic_seq_0**
>> * cordic_pipe_v1.0 **cordic_pipe_0**
>> * AXI Direct Memory Access **axi_dma_0**
>> * 2 x AXI Interconnect **axi_interconnect_0** **axi_interconnect_1**
>
>> *Run Block Automation*<br>

> **STEP 3:** Set IPs properties (double click on the IP in diagram).
>
>> **zynq_ultra_ps_e_0**
>>
>>> PS-PL Configuration -> PS-PL Interfaces -> Master Interface -> Enable only **AXI HPM0 FPD**<br>
>>> PS-PL Configuration -> PS-PL Interfaces -> Slave Interface -> AXI HP -> Enable only **AXI HP0 FPD**
>
>> **axi_dma_0**
>>
>>> Disable Scatter Gather Engine
>
>> **axi_interconnect_0**
>>
>>> Number of Slave Interfaces = 1<br>
>>> Number of Master Interfaces = 2
>
>> **axi_interconnect_1**
>>
>>> Number of Slave Interfaces = 2<br>
>>> Number of Master Interfaces = 1

> **STEP 4:** Make interfaces connections.
>
>> **zynq_ultra_ps_e_0** M_AXI_HPM0_FPD <-> S00_AXI **axi_interconnect_0**<br>
>> **zynq_ultra_ps_e_0** S_AXI_HP0_FPD <-> M00_AXI **axi_interconnect_1**<br>
>> **axi_interconnect_0** M00_AXI <-> S_AXI_LITE **axi_dma_0**<br>
>> **axi_interconnect_0** M01_AXI <-> S00_AXI **cordic_seq_0**<br>
>> **axi_interconnect_1** S00_AXI <-> M_AXI_MM2S **axi_dma_0**<br>
>> **axi_interconnect_1** S01_AXI <-> M_AXI_S2MM **axi_dma_0**<br>
>> **cordic_pipe_0** M00_AXIS <-> S_AXIS_S2MM **axi_dma_0**<br>
>> **cordic_pipe_0** S00_AXIS <-> M_AXIS_MM2S **axi_dma_0**<br>
>
>> *Run Connection Automation* -> choose *All automation*

> **STEP 5:** Assigning addresses.
>
>> Switch *Diagram* to *Address Editor* tab. 
>
>> Right click on *Network 0* -> *Assign All*
>>
>> As a result we sholud get 6 addresses assigned and 4 addresses excluded:
>>
>>> Network 0: /axi_dma_0 (AXI connections)<br>
>>> Network 1: /zynq_ultra_ps_e_0 (AXI_LITE connections)

After this steps the block diagram sholud look like below (Interfaces View).

<img src="./img/vivado_diagram_cordic.png" width="1400"/>

**Create HDL Wrapper** for the block diagram *kv260_cordic* (right click on kv260_cordic.bd file in Hierarchy view of Sources).

Select **Generate Bitstream** and wait for the result.

After successful bistream generation select *File -> Export -> Export Hardware*. Choose to include bitstream. Name the XSA file as *kv260_cordic*.

Locate file **kv260_cordic.xsa** in vivado project directory, and copy it to KRIA (drag and drop to the folder containing this notebook).

# Using a new overlay with PYNQ
Start with uploading bitstream and checking IP list.

In [None]:
from pynq import Overlay
from pynq import allocate
import numpy as np

kv260_cordic_ov = Overlay("kv260_cordic.xsa")

In [None]:
kv260_cordic_ov.ip_dict

### Sequential cordic test

In [None]:
cordic_seq = kv260_cordic_ov.cordic_seq_0.register_map

Select angle value, and count sin/cos with NumPy.

In [None]:
angle_deg = 11
angle_rad = np.deg2rad(angle_deg)
print("ARM results:")
print("sin", np.sin(angle_rad))
print("cos", np.cos(angle_rad))

Convert angle to radians fxp(12:10).

In [None]:
angle_fxp = (int(1024*angle_deg * 1024*np.pi) >> 10) / 180;

Send data and read back result.

In [None]:
cordic_seq.ANGLE_REG = angle_fxp + 0
cordic_seq.CONTROL_REG = 1
sin = ((int(cordic_seq.RESULT_REG) & 0x00000FFF) << 20) >> 20
cos = ((int(cordic_seq.RESULT_REG) & 0x0FFF0000) <<  4) >> 20
print("CORDIC SEQ results:")
print("sin", sin/1024)
print("cos", cos/1024)

### Pipelined cordic test
Prepare data buffers.

In [None]:
cordic_pipe_latency = 15
data_size = 1 + cordic_pipe_latency
in_buffer = allocate(shape=(data_size,), dtype=np.uint32)
out_buffer = allocate(shape=(data_size,), dtype=np.uint32)

for i in range(data_size):
    in_buffer[i] = angle_fxp

In [None]:
dma = kv260_cordic_ov.axi_dma_0
dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer)

In [None]:
sin = ((int(out_buffer[cordic_pipe_latency]) & 0x00000FFF) << 20) >> 20
cos = ((int(out_buffer[cordic_pipe_latency]) & 0x0FFF0000) <<  4) >> 20
print("CORDIC PIPE results:")
print("sin", sin/1024)
print("cos", cos/1024)

In [None]:
del in_buffer
del out_buffer

<div class="alert alert-success"><strong>TASK:</strong> Write a simple script to compare precision and speed of operations sin/cos from different architectures (arm/seq/pipe).</div>

## Timing comparation
Our own overlay is working! Or at lest produces some output. Now let's check how the overlay performs.

The cell below generates the input data and imports few libraries that will be needed later on. Timeit offers a neat way of benchmarking. Matplotlib enables plot generation.

The generated data is spread evenly between 1 and 90 degrees.

`number_of_points` sets number of datapoints in our test data. **Don't change it yet, a few following cells require at least 4000 points to work properly.** You can play with it once you reach the end of the notebook.

In [None]:
import matplotlib.pyplot as plt
import timeit

number_of_points = 4080 # This cannot exceed 4080!!! It's DMA size limitation.
points = np.linspace(1, 90, number_of_points) # Generate evenly spaced angles between 1 and 90 degrees

angles_rad = np.deg2rad(points) # Convert degrees to radians

## Convert degrees to format expected by our layout
angles_fxp = []
for i in points:
    angles_fxp.append((int(1024*i * 1024*np.pi) >> 10) / 180)

Ok, we've got the data! Now it's time for some comparations. Below we are declaring 3 functions, each accepting an array of angle values and returning sine and cosine values. Then we use these functions to calculate sine and cosine values for our test angles. Finally we compare our layout resoults to ARM calculaterd values and plot the absolute error.

Some error is expected so don't be alarmed. We are not trying to mimic ARM sine cosine calculation process, we just need values that are reasonabely close.

In [None]:
## Calculate sin and cos values using ARM processor
def arm_process(angles_rad):
    arm_sin = np.sin(angles_rad)
    arm_cos = np.cos(angles_rad)
    return (arm_sin, arm_cos)

## Calculate sin and cos vlaues using sequential approach
def seq_calculation(angles_fpx_sliece):
    seq_sin = []
    seq_cos = []
    for angle_fxp in angles_fpx_sliece:
        cordic_seq.ANGLE_REG = angle_fxp + 0
        cordic_seq.CONTROL_REG = 1
        sin = ((int(cordic_seq.RESULT_REG) & 0x00000FFF) << 20) >> 20
        cos = ((int(cordic_seq.RESULT_REG) & 0x0FFF0000) << 4) >> 20
        seq_sin.append( sin / 1024 )
        seq_cos.append( cos / 1024 )
    return (seq_sin, seq_cos)
    
## Calculate sin and cos values using pipeline approach
def pipe_process(angles_fpx_slice):
    pipe_sin = []
    pipe_cos = []
    cordic_pipe_latency = 15
    data_size = len(angles_fpx_slice) + cordic_pipe_latency
    in_buffer = allocate(shape=(data_size,), dtype=np.uint32)
    out_buffer = allocate(shape=(data_size,), dtype=np.uint32)
    for i, angle in enumerate(angles_fpx_slice):
        in_buffer[i] = angle

    dma = kv260_cordic_ov.axi_dma_0
    dma.sendchannel.transfer(in_buffer)
    dma.recvchannel.transfer(out_buffer)
    
    for i in range(cordic_pipe_latency, data_size):
        sin = ((int(out_buffer[i]) & 0x00000FFF) << 20) >> 20
        cos = ((int(out_buffer[i]) & 0x0FFF0000) << 4) >> 20
        pipe_sin.append(sin / 1024)
        pipe_cos.append(cos / 1024)

    del in_buffer
    del out_buffer

    return (pipe_sin, pipe_cos)


## Call the functions to calculate values
arm_sin, arm_cos = arm_process(angles_rad)
seq_sin, seq_cos = seq_calculation(angles_fxp)
pipe_sin, pipe_cos = pipe_process(angles_fxp)


## Calculate absolute error against ARM calculated values
seq_error = []
pipe_error = []

for arm, seq in zip(arm_cos, seq_cos):
    seq_error.append(seq - arm)
    
for arm, pipe in zip(arm_cos, pipe_cos):
    pipe_error.append(pipe - arm)


## Plot the results and error function
plt.plot(angles_rad, arm_cos)
plt.plot(angles_rad, seq_cos)
plt.plot(angles_rad, pipe_cos)
plt.grid()
plt.legend(["ARM", "seq", "pipe"])
plt.title("Cosine values comparation")
plt.xlabel("Angle [Rad]")
plt.ylabel("value")
plt.show()

plt.plot(angles_rad, seq_error)
plt.plot(angles_rad, pipe_error)
plt.grid()
plt.legend(["seq", "pipe"])
plt.title("Absolure error of cosine value against ARM")
plt.xlabel("Angle [Rad]")
plt.ylabel("Error")
plt.show()

Having tested our sequential and pipeline sine cosine calculations, we can move on to testing the performance. The cell below uses timeit library to benchmark the sequential and pipeline calculations. **Don't be alarmed that functions are passed as a string, it is how timeit works!** Timeit accepts the code to be executed as strings and then executes it in a controlled manner that aims at minimising garbage collector overhead. This way we are mostly measuring the actual time of execution.



Timeit also offers doing more than one run of benchmarked code. You can set `iters` value to chane how many benchmarks are performed.

<div class="alert alert-warning"><strong>CAUTION:</strong> Setting the iters value too high will cause the cell to run for a long time.</div>

In [None]:
iters = 10 # Number of tierations to average time benchmark

## Time the sequential calculation using timeit
seq_t = timeit.timeit('seq_calculation(angles_fxp)', globals=globals(), number=iters)
pipe_t = timeit.timeit('pipe_process(angles_fxp)', globals=globals(), number=iters)


## Display results
print("=================")
print("Processing time [ms]")
print("seq:\t", seq_t/iters * 1000)
print("pipe:\t", pipe_t/iters * 1000)
print("------------------")
print("Avg. element processing time [ms]")
print("seq:\t", seq_t/iters * 1000 / number_of_points)
print("pipe:\t", pipe_t/iters * 1000 / number_of_points)
print("=================")

As you can see, pipeline significantly outperforms sequential approach for a large pile of data. This is expected, although building the pipeline is more complex, it introduces some pararelism. In place of wating for 15 steps to finish each time, we can insert new data as soon as the first step is finished. This enables pipelines to process a lot of data or frequent stream of data faster.

Lest's see how gains depend on amount of data in case of our overlay. The cell below benchmarks speed for different counts of input data. We will again use timit but this time we will do tests for each input data count and compare overall time and average time per element.

<div class="alert alert-warning"><strong>CAUTION:</strong> You can still change iters value to get more or less precise benchmark, but keep in mind that each iteration will make each test slower. The effect is cumulative!</div>

In [None]:
iters = 10 # Number of tierations to average time benchmark

mes = [10, 20, 50, 100, 100, 200, 500, 1000, 2000, 4000] # Count of data points in each consecutive test

## Containers for output data
seq_times = []
seq_single_times = []
pipe_times = []
pipe_single_times = []

## Do tests for each input data count in mes array
for val in mes:
    angles_slice = angles_fxp[0:val] # Takes a slice of test data of proper size
    
    # Time the sequential calculation
    seq_t = timeit.timeit('seq_calculation(angles_slice)', globals=globals(), number=iters)

    # Time the pipeline calculation
    pipe_t = timeit.timeit('pipe_process(angles_slice)', globals=globals(), number=iters)
    
    ## Average the time and scale it to ms
    seq_times.append(seq_t/iters * 1000)
    seq_single_times.append(seq_t/iters * 1000 / val)
    pipe_times.append(pipe_t/iters * 1000)
    pipe_single_times.append(pipe_t/iters * 1000 / val)
    
## Plot overall processing time in function of input data count
plt.plot(mes, seq_times)
plt.plot(mes, pipe_times)
plt.grid()
plt.title("Processing time")
plt.legend(["seq", "pipe"])
plt.ylabel("t [ms]")
plt.xlabel("number of points")
plt.show()

## Plot average time of processing per element in function of input data count
plt.plot(mes, seq_single_times)
plt.plot(mes, pipe_single_times)
plt.grid()
plt.title("Avg. time per data point")
plt.legend(["seq", "pipe"])
plt.ylabel("t [ms]")
plt.xlabel("number of points")
plt.show()

The result should show linear increase in overall sequential time and gentle hyperbolic increase in overall pipeline time. Also sequential time per data point is constant, while pipeline introduces logarythmic decrease as expected.