# Matrix multiplication (2)

This example shows the usage of two parallel 32x32 floating point multiplications in hardware. The data is streamed to and from the IPs using AXI DMA components. An alternative is to use a multi-channel DMA.

A first of all, the correct bitstream (.bit) and its associated hardware description (.hwh) is loaded onto the FPGA:

In [12]:
from pynq import Overlay, MMIO

overlay = Overlay("/home/xilinx/overlays/matmul_2.bit")
overlay.download()

The components in the design and all associated metadata can be found in the `ip_dict`.

In [13]:
[key for key in overlay.ip_dict.keys()]

['axi_dma_0', 'axi_dma_1', 'matmul_0', 'matmul_1', 'processing_system7_0']

Next, the input and output matrices are allocated and populated with random data.

In [14]:
from pynq import allocate
import numpy as np

A = allocate(shape=(32,32), dtype=np.single)
B = allocate(shape=(32,32), dtype=np.single)
C = allocate(shape=(32,32), dtype=np.single)
D = allocate(shape=(32,32), dtype=np.single)

A[:] = np.random.rand(32, 32)
B[:] = np.random.rand(32, 32)

Start the `matmul` IPs. This can be done by writing a start (and autorestart) bit to the memory, because all components are memory mapped by default.

In [15]:
overlay.matmul_0.mmio.write(0x0, 0x81)
overlay.matmul_1.mmio.write(0x0, 0x81)

Stream the A and B matrices to the IP and wait until the response has been streamed back to matrix C.

In [16]:
overlay.axi_dma_0.sendchannel.transfer(A)
overlay.axi_dma_0.sendchannel.transfer(B)
overlay.axi_dma_1.sendchannel.transfer(A)
overlay.axi_dma_1.sendchannel.transfer(B)
overlay.axi_dma_0.recvchannel.transfer(C)
overlay.axi_dma_1.recvchannel.transfer(D)
overlay.axi_dma_0.recvchannel.wait()
overlay.axi_dma_1.recvchannel.wait()

Now we can verify if there is any difference between the regular software version (using `@`) and the hardware version:

In [17]:
print(A@B)
print(C)
print(D)

[[7.8404207 8.229252  6.598717  ... 6.9341702 7.431403  9.261841 ]
 [6.195579  7.4207535 5.545655  ... 7.7409186 6.985519  7.663156 ]
 [6.9762025 8.182006  6.3304424 ... 7.8341413 7.0130005 8.248719 ]
 ...
 [6.3516955 7.0457754 5.5058703 ... 7.9323773 6.3141127 7.1473174]
 [6.2141557 7.0981708 5.555508  ... 7.455466  6.7106276 7.2171183]
 [6.3661427 7.1328535 5.00865   ... 8.134221  7.45771   7.40928  ]]
[[7.8404207 8.229252  6.598717  ... 6.9341702 7.431403  9.261841 ]
 [6.195579  7.4207535 5.545655  ... 7.7409186 6.985519  7.663156 ]
 [6.9762025 8.182006  6.3304424 ... 7.8341413 7.0130005 8.248719 ]
 ...
 [6.3516955 7.0457754 5.5058703 ... 7.9323773 6.3141127 7.1473174]
 [6.2141557 7.0981708 5.555508  ... 7.455466  6.7106276 7.2171183]
 [6.3661427 7.1328535 5.00865   ... 8.134221  7.45771   7.40928  ]]
[[7.8404207 8.229252  6.598717  ... 6.9341702 7.431403  9.261841 ]
 [6.195579  7.4207535 5.545655  ... 7.7409186 6.985519  7.663156 ]
 [6.9762025 8.182006  6.3304424 ... 7.8341413 7.01

Luckily, they are the same.