# ARRAY INPUTS - DOT - OPTIMIZATIONS

In this notebook we explain how to use ```pynq``` framework to test the acceleration (optimized) of dot multiplication on ```Alveo U200```.

In [None]:
from pynq import Overlay
from pynq import DefaultIP
from pynq import DefaultHierarchy
from pynq import MMIO
from pynq.pl import *
import pynq.lib.dma

The function initializes the hardware of FPGA building an object that contains synthesized module (```ol```), which contains all infos to execute IP module, and a reference to IP (```ip```). At the end of initialization, prints the signature of the C function.

In [None]:
def init_hw(filepath):
    global ol, dot
    ol = Overlay(filepath)
    dot = ol.dot_matrix_1
    print(dot.signature)

In [None]:
init_hw("/path/to/binary_container_1.xclbin")

In this block the variables that are needed later are allocated and initialized. This specifies the allocation of the variables where the size and their type must be specified as written in Vivado HLS. The suggestion is to use ```numpy```.

In [None]:
DIM = 300

a = allocate(shape=((DIM, DIM)), dtype=np.int32, cacheable=True)
b = allocate(shape=((DIM, DIM)), dtype=np.int32, cacheable=True)
c = allocate(shape=((DIM, DIM)), dtype=np.int32, cacheable=True)

a[:] = np.ones((DIM,DIM)).astype('int') * 3
b[:] = np.ones((DIM,DIM)).astype('int') * 3
c[:] = np.zeros((DIM,DIM)).astype('int')

Now variables previously allocated are flushed in the global memory of Alveo.

In [None]:
a.sync_to_device()
b.sync_to_device()
c.sync_to_device()

The ```call``` function starts the execution of the IP module and ```wait``` function is used to synchronize the events avoiding reading/writing on the buffer of the IP module which may lead to race conditions.

In [None]:
dot.call(a, b, c)

The ```invalidate``` function is used on the output buffer because we have no other computations to do and so we want to store the result without using it again.

In [None]:
c.sync_from_device()

In [None]:
result[:] = c

In [None]:
del a
del b
del c