# Overlay Design Methodology

## Developing a scalar multiplier IP using HLS

Suppose we or someone else develops a new overlay and wants to reuse the existing IP. As long as they import the python file containing the driver class the drivers will be automatically created. As an example consider the next design which, among other things includes a renamed version of the `scalar_add` IP.

![Second Block Diagram](../images/attribute2.png)

In [13]:
from pynq import Overlay
overlay = Overlay('/home/xilinx/overlay_tutorial/overlays/scalar_mult.bit')
#overlay?

## IP Hierarchies

The block diagram above also contains a hierarchy `const_multiply`, which looks like this:

![Hierarchy](../images/hierarchy.png)

Said hierarchy contains a custom IP with an input and output stream, an AXI4-Lite interface as well as a DMA engine for transferring the data. The custom IP multiply the value of `data` in the input stream by `constant` and outputs the result without modifying the rest of signals. As streams are involved we need to handle `TLAST` appropriately for the DMA engine. The HLS code is a little bit more complex with additional pragmas and types but the complete code is still relatively short.

```C
#include "ap_axi_sdata.h"
typedef ap_axiu<32,1,1,1> stream_type;

void mult_constant(stream_type* in_data, stream_type* out_data, ap_int<32> constant) {
#pragma HLS INTERFACE s_axilite register port=constant
#pragma HLS INTERFACE ap_ctrl_none port=return
#pragma HLS INTERFACE axis port=in_data
#pragma HLS INTERFACE axis port=out_data
	out_data->data = in_data->data * constant;
	out_data->dest = in_data->dest;
	out_data->id = in_data->id;
	out_data->keep = in_data->keep;
	out_data->last = in_data->last;
	out_data->strb = in_data->strb;
	out_data->user = in_data->user;

}
```

Looking at the HLS generated documentation we again discover that to set the constant we need to set the register at offset `0x10` so we can write a simple driver for this purpose

In [14]:
from pynq import DefaultIP

class ConstantMultiplyDriver(DefaultIP):
    def __init__(self, description):
        super().__init__(description=description)
    
    bindto = ['xilinx.com:hls:mult_constant:1.0']
    
    @property
    def constant(self):
        return self.read(0x10)
    
    @constant.setter
    def constant(self, value):
        self.write(0x10, value)

The DMA engine driver is already included inside the PYNQ driver so nothing special is needed for that other than ensuring the module is imported. Reloading the overlay will make sure that our newly written driver is available for use.

In [15]:
import pynq.lib.dma

overlay = Overlay('/home/xilinx/overlay_tutorial/overlays/scalar_mult.bit')

dma = overlay.const_multiply.multiply_dma
multiply = overlay.const_multiply.multiply

The DMA driver transfers numpy arrays allocated using `pynq.allocate`. Lets test the system by multiplying 5 numbers by 3.

In [18]:
from pynq import allocate
import numpy as np

in_buffer = allocate(shape=(5,), dtype=np.uint32)
out_buffer = allocate(shape=(5,), dtype=np.uint32)

for i in range(5):
    in_buffer[i] = i

multiply.constant = 0
dma.sendchannel.transfer(in_buffer)
dma.recvchannel.transfer(out_buffer)
dma.sendchannel.wait()
dma.recvchannel.wait()

out_buffer

PynqBuffer([0, 0, 0, 0, 0], dtype=uint32)

While this is one way to use the IP, it still isn't exactly user-friendly. It would be preferable to treat the entire hierarchy as a single entity and write a driver that hides the implementation details. The overlay class allows for drivers to be written against hierarchies as well as IP but the details are slightly different.

Hierarchy drivers are subclasses of `pynq.DefaultHierarchy` and, similar to `DefaultIP` have a constructor that takes a description of hierarchy. To determine whether the driver should bind to a particular hierarchy the class should also contain a static `checkhierarchy` method which takes the description of a hierarchy and returns `True` if the driver should be bound or `False` if not. Similar to `DefaultIP`, any classes that meet the requirements of subclasses `DefaultHierarchy` and have a `checkhierarchy` method will automatically be registered.

For our constant multiply hierarchy this would look something like:

In [19]:
from pynq import DefaultHierarchy

class StreamMultiplyDriver(DefaultHierarchy):
    def __init__(self, description):
        super().__init__(description)
        
    def stream_multiply(self, stream, constant):
        self.multiply.constant = constant
        with allocate(shape=(len(stream),), \
                      dtype=np.uint32) as in_buffer,\
             allocate(shape=(len(stream),), \
                      dtype=np.uint32) as out_buffer:
            for i, v, in enumerate(stream):
                in_buffer[i] = v
            self.multiply_dma.sendchannel.transfer(in_buffer)
            self.multiply_dma.recvchannel.transfer(out_buffer)
            self.multiply_dma.sendchannel.wait()
            self.multiply_dma.recvchannel.wait()
            result = out_buffer.copy()
        return result

    @staticmethod
    def checkhierarchy(description):
        if 'multiply_dma' in description['ip'] \
           and 'multiply' in description['ip']:
            return True
        return False

We can now reload the overlay and ensure the higher-level driver is loaded

In [20]:
overlay = Overlay('/home/xilinx/overlay_tutorial/overlays/scalar_mult.bit')
overlay?

and use it

In [21]:
overlay.const_multiply.stream_multiply([1,2,3,4,5], 0)

PynqBuffer([0, 0, 0, 0, 0], dtype=uint32)