## Convolution Demo Run
___
<div class="alert bg-primary">This notebook shows a single run of the convolution using CNNDataflow IP. The input feature map is read from memory, processed and output feature map is captured for one single convolution command. The cycle count for the full operation is read and displayed at the end.
The input data in memory is set with random integers in this notebook to test the convolution run.</div>


### Set the arguments for the convolution in CNNDataflow IP

In [1]:
# Input Feature Map (IFM) dimensions
ifm_height = 14
ifm_width = 14
ifm_depth = 64

# Kernel Window dimensions
kernel_height = 3
kernel_width = 3

# Other arguments
pad = 0
stride = 1

# Channels
channels = 32

print(
    "HOST CMD: CNNDataflow IP Arguments set are - IH %d, IW %d, ID %d, KH %d,"
    " KW %d, P %d, S %d, CH %d"
    % (ifm_height, ifm_width, ifm_depth, kernel_height, kernel_width,
       pad, stride, channels))

HOST CMD: CNNDataflow IP Arguments set are - IH 14, IW 14, ID 64, KH 3, KW 3, P 0, S 1, CH 32


### Download `Convolution IP` bitstream

In [2]:
from pynq import Overlay

overlay = Overlay(
    "/opt/python3.6/lib/python3.6/site-packages/pynq/overlays/darius/"
    "convolution.bit")
overlay.download()
print(f'Bitstream download status: {overlay.is_loaded()}')

Bitstream download status: True


### Create MMIO object to access the CNNDataflow IP
[MMIO Documentation](http://pynq.readthedocs.io/en/latest/overlay_design_methodology/pspl_interface.html#mmio)

In [3]:
from pynq import MMIO

# Constants
CNNDATAFLOW_BASEADDR = 0x43C00000
NUM_COMMANDS_OFFSET = 0x60
CMD_BASEADDR_OFFSET = 0x70
CYCLE_COUNT_OFFSET = 0xd0

cnn = MMIO(CNNDATAFLOW_BASEADDR, 65536)
print(f'Idle state: {hex(cnn.read(0x0, 4))}')

Idle state: 0x4


### Create Xlnk object 
__Xlnk object (Memory Management Unit) for allocating contiguous array in memory for data transfer between software and hardware__

<div class="alert alert-danger">Note: You may run into problems if you exhaust and do not free memory buffers – we only have 128MB of contiguous memory, so calling the allocation twice (allocating 160MB) would lead to a “failed to allocate memory” error. Do a xlnk_reset() before re-allocating memory or running this cell twice  </div>

In [4]:
from pynq import Xlnk
import numpy as np

# Constant
SIZE = 5000000  # 20 MB of numpy.uint32s

mmu = Xlnk()

# Contiguous memory buffers for CNNDataflow IP convolution command, IFM Volume,
# Weights and OFM Volume. These buffers are shared memories that are used to 
# transfer data between software and hardware
cmd = mmu.cma_array(SIZE)
ifm = mmu.cma_array(SIZE)
weights = mmu.cma_array(SIZE)
ofm = mmu.cma_array(SIZE)

# Saving the base phyiscal address for the command, ifm, weights, and
# ofm buffers. These addresses will be used later to copy and transfer data 
# between hardware and software
cmd_baseaddr = cmd.physical_address
ifm_baseaddr = ifm.physical_address
weights_baseaddr = weights.physical_address
ofm_baseaddr = ofm.physical_address

### Functions to print Xlnk statistics

In [5]:
def get_kb(mmu):
    return int(mmu.cma_stats()['CMA Memory Available'] // 1024)


def get_bufcount(mmu):
    return int(mmu.cma_stats()['Buffer Count'])


def print_kb(mmu):
    print("Available Memory (KB): " + str(get_kb(mmu)))
    print("Available Buffers: " + str(get_bufcount(mmu)))


print_kb(mmu)

Available Memory (KB): 52664
Available Buffers: 4


### Construct convolution command
__Check that arguments are in supported range and construct convolution command for hardware__

In [6]:
from darius import cnndataflow_lib

conv = cnndataflow_lib.CNNDataflow(ifm_height, ifm_width, ifm_depth,
                                   kernel_height, kernel_width, pad, stride,
                                   channels, ifm_baseaddr, weights_baseaddr,
                                   ofm_baseaddr)

conv.construct_conv_cmd(ifm_height, ifm_width, ifm_depth, kernel_height,
                        kernel_width, pad, stride, channels, ifm_baseaddr,
                        weights_baseaddr, ofm_baseaddr)

All convolution arguments are in supported range
Convolution command to CNNDataflow IP: 
b'\x0e\x00\x0e\x00\x03\x00\x03\x00\x01\x00\x00\x00\x0c\x00\x0c\x00\x08\x00\x04\x00\x01\x00\x01\x00\x00\x00\xd0\x17 \x06\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00P\x1a@\x02\x00\x00\x00\x00\x10\x19@\x02\x00\x00\x00\x12\x00\x00\x00\x00\x00\x00'


### Create IFM volume and weight volume.
__Volumes are created in software and populated with random values in a row-major format.__

In [7]:
from random import *

ifm_sw = np.empty(ifm_width * ifm_height * ifm_depth, dtype=np.int16)

for i in range(0, ifm_depth):
    for j in range(0, ifm_height):
        for k in range(0, ifm_width):
            index = i * ifm_height * ifm_width + j * ifm_width + k
            ifm_sw[index] = randint(0, 255)

weights_sw = np.empty(channels * ifm_depth * kernel_height * kernel_width,
                      dtype=np.int16)

for i in range(0, channels):
    for j in range(0, ifm_depth):
        for k in range(0, kernel_height * kernel_width):
            addr = i * ifm_depth * kernel_height * kernel_width + \
                   j * kernel_height * kernel_width + k
            weights_sw[addr] = randint(0, 255)

__Run the following in a code cell for debug:   __
```Python           
for i in range(0, ifm_width*ifm_height*ifm_depth, 4):
    print(hex(ifm_sw[i]))

for i in range(0, channels*ifm_depth*kernel_height*kernel_width, 4):
    print(hex(weights_sw[i]))
```    

### Reshape IFM volume and weights 
__Volumes are reshaped from row-major format to IP format and data is copied to their respective shared buffer__

In [8]:
conv.reshape_and_copy_ifm(ifm_height, ifm_width, ifm_depth, ifm_sw, ifm)
conv.reshape_and_copy_weights(kernel_height, kernel_width, ifm_depth,
                              weights_sw, weights)

__Run the following in a code cell for debug:__   
```Python
for i in range(0, ifm_width*ifm_height*ifm_depth, 4):
    print(hex(ifm[i]))

for i in range(0, channels*ifm_depth*kernel_height*kernel_width, 4):
    print(hex(weights[i]))
```

### Send convolution command to CNNDataflow IP

In [9]:
conv.load_conv_cmd(cmd_baseaddr)

### Start IP

In [10]:
# Load the number of commands and command physical address to offset addresses
cnn.write(NUM_COMMANDS_OFFSET, 1)
cnn.write(CMD_BASEADDR_OFFSET, cmd_baseaddr)

# Start Convolution if CNNDataflow IP is in Idle state
state = cnn.read(0x0)
if state == 4: # Idle state
    print("state: IP IDLE; HENCE STARTING IP")
    start = cnn.write(0x0, 1) # Start IP
    start
else:
    print("state %x: IP BUSY" % state)

state: IP IDLE; HENCE STARTING IP


### Check status of the CNNDataflow IP

In [11]:
# Check if Convolution IP is in Done state
state = cnn.read(0x0)
if state == 6: # Done state
    print("state: IP DONE")
else:
    print("state %x: IP BUSY" % state)

state: IP DONE


__Run the following in a code cell for debug: read back first few words of OFM:__   
```Python
for i in range(0, 15, 4):
    print(hex(ofm[i]))
```

### Read cycle count and efficiency of the complete run

In [12]:
hw_cycles = cnn.read(CYCLE_COUNT_OFFSET, 4)
efficiency = conv.calc_efficiency(kernel_height, kernel_width, ifm_depth, hw_cycles)
print("CNNDataflow IP cycles: %d, effciency: %.2f%%" % (hw_cycles, efficiency))

CNNDataflow IP cycles: 44141, effciency: 93.95%


### Reset Xlnk

In [13]:
mmu.xlnk_reset()
print_kb(mmu)
print("Cleared Memory!")

Available Memory (KB): 130040
Available Buffers: 0
Cleared Memory!
