## Convolution Demo Run
___
<div class="alert bg-primary">This notebook shows a single run of the convolution using CNNDataflow IP. The input feature map is read from memory, processed and output feature map is captured for one single convolution command. The cycle count for the full operation is read and displayed at the end.
The input data in memory is set with random integers in this notebook to test the convolution run.</div>


### Set the arguments for the convolution in CNNDataflow IP

In [1]:
# Input Feature Map (IFM) dimensions
ifm_height = 14
ifm_width = 14
ifm_depth = 64

# Kernel Window dimensions
kernel_height = 3
kernel_width = 3

# Other arguments
pad = 0
stride = 1

# Channels
channels = 32

print(
    "HOST CMD: CNNDataflow IP Arguments set are - IH %d, IW %d, ID %d, KH %d,"
    " KW %d, P %d, S %d, CH %d"
    % (ifm_height, ifm_width, ifm_depth, kernel_height, kernel_width,
       pad, stride, channels))

HOST CMD: CNNDataflow IP Arguments set are - IH 14, IW 14, ID 64, KH 3, KW 3, P 0, S 1, CH 32


### Download `Convolution IP` bitstream

In [5]:
from pynq import Overlay

overlay = Overlay(
    "/opt/python3.6/lib/python3.6/site-packages/pynq/overlays/darius/"
    "convolution.bit")
overlay.download()
print(f'Bitstream download status: {overlay.is_loaded()}')

Bitstream download status: True


### Create MMIO object to access the CNNDataflow IP
[MMIO Documentation](http://pynq.readthedocs.io/en/latest/overlay_design_methodology/pspl_interface.html#mmio)

In [6]:
from pynq import MMIO

# Constants
CNNDATAFLOW_BASEADDR = 0x43C00000
NUM_COMMANDS_OFFSET = 0x60
CMD_BASEADDR_OFFSET = 0x70
CYCLE_COUNT_OFFSET = 0xd0

cnn = MMIO(CNNDATAFLOW_BASEADDR, 65536)
print(f'Idle state: {hex(cnn.read(0x0, 4))}')

Idle state: 0x4


### Create Xlnk object 
__Xlnk object (Memory Management Unit) for allocating contiguous array in memory for data transfer between software and hardware__

In [7]:
from pynq import Xlnk
import numpy as np

# Constant
SIZE = 5000000  # 20 MB of numpy.uint32s

mmu = Xlnk()

# Contiguous memory buffers for CNNDataflow IP convolution command, IFM Volume,
# Weights and OFM Volume. These buffers are shared memories that are used to 
# transfer data between software and hardware
cmd = mmu.cma_array(SIZE)
ifm = mmu.cma_array(SIZE)
weights = mmu.cma_array(SIZE)
ofm = mmu.cma_array(SIZE)

# Saving the base phyiscal address for the command, ifm, weights, and
# ofm buffers. These addresses will be used later to copy and transfer data 
# between hardware and software
cmd_baseaddr = cmd.physical_address
ifm_baseaddr = ifm.physical_address
weights_baseaddr = weights.physical_address
ofm_baseaddr = ofm.physical_address

### Functions to print Xlnk statistics

In [8]:
def get_kb(mmu):
    return int(mmu.cma_stats()['CMA Memory Available'] // 1024)


def get_bufcount(mmu):
    return int(mmu.cma_stats()['Buffer Count'])


def print_kb(mmu):
    print("Available Memory (KB): " + str(get_kb(mmu)))
    print("Available Buffers: " + str(get_bufcount(mmu)))


print_kb(mmu)

Available Memory (KB): 46212
Available Buffers: 4


### Construct convolution command
__Check that arguments are in supported range and construct convolution command for hardware__

In [9]:
from darius import cnndataflow_lib

conv = cnndataflow_lib.CNNDataflow(ifm_height, ifm_width, ifm_depth,
                                   kernel_height, kernel_width, pad, stride,
                                   channels, ifm_baseaddr, weights_baseaddr,
                                   ofm_baseaddr)

conv.construct_conv_cmd(ifm_height, ifm_width, ifm_depth, kernel_height,
                        kernel_width, pad, stride, channels, ifm_baseaddr,
                        weights_baseaddr, ofm_baseaddr)

All convolution arguments are in supported range
Convolution command to CNNDataflow IP: 
b'\x0e\x00\x0e\x00\x03\x00\x03\x00\x01\x00\x00\x00\x0c\x00\x0c\x00\x08\x00\x04\x00\x01\x00\x01\x00\x00\x00\xd0\x17 \x06\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00P\x1a@\x02\x00\x00\x00\x00\x10\x19@\x02\x00\x00\x00\x12\x00\x00\x00\x00\x00\x00'


### Create IFM volume and weight volume.
__Volumes are created in software and populated with random values in a row-major format.__

In [10]:
from random import *

ifm_sw = np.empty(ifm_width * ifm_height * ifm_depth, dtype=np.int16)

for i in range(0, ifm_depth):
    for j in range(0, ifm_height):
        for k in range(0, ifm_width):
            index = i * ifm_height * ifm_width + j * ifm_width + k
            ifm_sw[index] = randint(0, 255)

weights_sw = np.empty(channels * ifm_depth * kernel_height * kernel_width,
                      dtype=np.int16)

for i in range(0, channels):
    for j in range(0, ifm_depth):
        for k in range(0, kernel_height * kernel_width):
            addr = i * ifm_depth * kernel_height * kernel_width + \
                   j * kernel_height * kernel_width + k
            weights_sw[addr] = randint(0, 255)

__Run the following in code cell for debug:   __
```Python           
for i in range(0, ifm_width*ifm_height*ifm_depth, 4):
    print(hex(ifm_sw[i]))

for i in range(0, channels*ifm_depth*kernel_height*kernel_width, 4):
    print(hex(weights_sw[i]))
```    

### Reshape IFM volume and weights 
__Volumes are reshaped from row-major format to IP format and data is copied to their respective shared buffer__

In [11]:
conv.reshape_and_copy_ifm(ifm_height, ifm_width, ifm_depth, ifm_sw, ifm)
conv.reshape_and_copy_weights(kernel_height, kernel_width, ifm_depth,
                              weights_sw, weights)

__Run the following in code cell for debug:__   
```Python
for i in range(0, ifm_width*ifm_height*ifm_depth, 4):
    print(hex(ifm[i]))

for i in range(0, channels*ifm_depth*kernel_height*kernel_width, 4):
    print(hex(weights[i]))
```

### Send convolution command to CNNDataflow IP

In [12]:
conv.load_conv_cmd(cmd_baseaddr)

### Start IP

In [13]:
# Load the number of commands and command physical address to offset addresses
cnn.write(NUM_COMMANDS_OFFSET, 1)
cnn.write(CMD_BASEADDR_OFFSET, cmd_baseaddr)

# Start Convolution if CNNDataflow IP is in Idle state
state = cnn.read(0x0)
if state == 4: # Idle state
    print("state: IP IDLE")
    start = cnn.write(0x0, 1) # Start IP
    start
else:
    print("state %x: IP BUSY" % state)

state: IP IDLE


### Check status of the CNNDataflow IP

In [14]:
# Check if Convolution IP is in Done state
state = cnn.read(0x0)
if state == 6: # Done state
    print("state: IP DONE")
else:
    print("state %x: IP BUSY" % state)

state: IP DONE


### Readback the convoluted OFM buffer

In [15]:
conv.read_ofm(ofm_baseaddr)

0x2bb
0xe6f1
0xc3c2
0x8739
0xdda5
0x5451
0x1d90
0x9eef
0xa13d
0x7469
0x2dd2
0xdb7b
0x814c
0xfc22
0xa9c1
0xae5b
0x7a98
0xac91
0xe41f
0x52d6
0xc6b0
0x210
0xf4b9
0xb4e6
0x3a0d
0x7ddf
0xa522
0x53cb
0x8401
0xcc80
0x4580
0xbcb7
0xa24d
0xcf8b
0xe870
0x9d8a
0x89f2
0x43d3
0x963a
0x1ef5
0x70b9
0xdcb9
0xd611
0x14db
0xc5dd
0x6311
0x216d
0xb140
0x1010
0xcaf9
0xa037
0xad7c
0xad3c
0x8c34
0x8599
0xb61b
0xd422
0xf3ae
0xf2dc
0x48db
0x4d06
0x91d8
0xfc96
0x5d21
0xe495
0xfd93
0xe08d
0x6787
0x9bd1
0x6932
0x81cd
0x2fb3
0xbcd6
0x8725
0xfcd8
0x7299
0x197d
0xb331
0xac84
0x9392
0xea02
0x6cd8
0x6945
0x8573
0xb406
0xc357
0xbfef
0xd1bc
0x387d
0x982
0x8414
0x8b2f
0x1ca6
0xf3f7
0xcaa4
0x3f6
0xea03
0x2635
0xe3b2
0xd342
0x5191
0x910b
0x569e
0xfee2
0xee3f
0x227
0x1816
0x51b3
0x7838
0x9631
0x293c
0xc439
0x8311
0xa406
0xdd4d
0xf700
0x5334
0x8189
0x25b6
0x5a32
0x35c0
0xf143
0x8a12
0xb77
0x75d0
0xa88c
0xebd3
0x50ad
0x7c44
0x9c59
0x8c7a
0x5d5c
0xf7b0
0xcb51
0xffc7
0xfd0f
0xb71e
0x34c8
0xc561
0xc1a7
0x86c2
0x2300
0xc70b
0x647

### Read cycle count and efficiency of the complete run

In [16]:
hw_cycles = cnn.read(CYCLE_COUNT_OFFSET, 4)
conv.calc_efficiency(kernel_height, kernel_width, ifm_depth, hw_cycles)

CNNDataflow IP cycles: 44440, effciency: 93.32%


### Reset Xlnk

In [17]:
mmu.xlnk_reset()
print_kb(mmu)
print("Cleared Memory!")

Available Memory (KB): 123640
Available Buffers: 0
Cleared Memory!
