## Convolution Single Run
<div class="alert bg-primary">This notebook shows a single run of the convolution using CNNDataflow IP. The input feature map is read from memory, processed and output feature map is captured for one single convolution command. The cycle count for the full operation is read and displayed at the end.
The input data in memory is set with random integers in this notebook to test the convolution run.</div>


### Set the Arguments for the Convolution in CNNDataflow IP

In [16]:
# Input Feature Map (IFM) dimensions
ifm_height = 14
ifm_width = 14
ifm_depth = 64

# Kernel Window dimensions
kernel_height = 3
kernel_width = 3

# Other arguments
pad = 0
stride = 1

# Channels
channels = 32

print(
    "HOST CMD: CNNDataflow IP Arguments set are - IH %d, IW %d, ID %d, KH %d,"
    " KW %d, P %d, S %d, CH %d"
    % (ifm_height, ifm_width, ifm_depth, kernel_height, kernel_width,
       pad, stride, channels))

HOST CMD: CNNDataflow IP Arguments set are - IH 14, IW 14, ID 64, KH 3, KW 3, P 0, S 1, CH 32


### Download `Convolution IP` bitstream

In [17]:
from pynq import Overlay

overlay = Overlay(
    "/opt/python3.6/lib/python3.6/site-packages/pynq/overlays/darius/"
    "convolution.bit")
overlay.download()
overlay.is_loaded()

True

### Create MMIO object to access the CNNDataflow IP
[MMIO Documentation](http://pynq.readthedocs.io/en/latest/overlay_design_methodology/pspl_interface.html#mmio)

In [18]:
from pynq import MMIO

# Constants
CNNDATAFLOW_BASEADDR = 0x43C00000
NUM_COMMANDS_OFFSET = 0x60
CMD_BASEADDR_OFFSET = 0x70
CYCLE_COUNT_OFFSET = 0xd0

cnn = MMIO(CNNDATAFLOW_BASEADDR, 65536)
print(f'Idle state: {hex(cnn.read(0x0, 4))}')

Idle state: 0x4


### Create Xlnk Object 
__Xlnk Object (Memory Management Unit) for allocating contiguous array in memory for data transfer between software and hardware__

In [19]:
from pynq import Xlnk
import numpy as np

# Constant
SIZE = 5000000  # 20 MB of numpy.uint32s

mmu = Xlnk()

# Contiguous memory buffers for CNNDataflow IP convolution command, IFM Volume,
# Weights and OFM Volume. These buffers are shared memories that are used to 
# transfer data between software and hardware
cmd = mmu.cma_array(SIZE)
ifm = mmu.cma_array(SIZE)
weights = mmu.cma_array(SIZE)
ofm = mmu.cma_array(SIZE)

# Saving the base phyiscal address for the command, ifm, weights, and
# ofm buffers. These addresses will be used later to copy and transfer data 
# between hardware and software
cmd_baseaddr = cmd.physical_address
ifm_baseaddr = ifm.physical_address
weights_baseaddr = weights.physical_address
ofm_baseaddr = ofm.physical_address

### Functions to print Xlnk statistics

In [20]:
def get_kb(mmu):
    return int(mmu.cma_stats()['CMA Memory Available'] // 1024)


def get_bufcount(mmu):
    return int(mmu.cma_stats()['Buffer Count'])


def print_kb(mmu):
    print("Available Memory (KB): " + str(get_kb(mmu)))
    print("Available Buffers: " + str(get_bufcount(mmu)))


print_kb(mmu)

Available Memory (KB): 9744
Available Buffers: 4


### Construct Convolution Command
__Check that arguments are in supported range and construct convolution command for hardware__

In [21]:
from darius import cnndataflow_lib

conv = cnndataflow_lib.CNNDataflow(ifm_height, ifm_width, ifm_depth,
                                   kernel_height, kernel_width, pad, stride,
                                   channels, ifm_baseaddr, weights_baseaddr,
                                   ofm_baseaddr)

conv.construct_conv_cmd(ifm_height, ifm_width, ifm_depth, kernel_height,
                        kernel_width, pad, stride, channels, ifm_baseaddr,
                        weights_baseaddr, ofm_baseaddr)

All convolution arguments are in supported range
Convolution command to CNNDataflow IP: 
b'\x0e\x00\x0e\x00\x03\x00\x03\x00\x01\x00\x00\x00\x0c\x00\x0c\x00\x08\x00\x04\x00\x01\x00\x01\x00\x00\x00\xd0\x17 \x06\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00\x90\x1b@\x02\x00\x00\x00\x00\x10\x19@\x02\x00\x00\x00\x12\x00\x00\x00\x00\x00\x00'


### Create IFM volume and Weight volume.
__Volumes are created in software and populated with random values in a row-major format.__

In [22]:
from random import *

ifm_sw = np.empty(ifm_width * ifm_height * ifm_depth, dtype=np.int16)

for i in range(0, ifm_depth):
    for j in range(0, ifm_height):
        for k in range(0, ifm_width):
            index = i * ifm_height * ifm_width + j * ifm_width + k
            ifm_sw[index] = randint(0, 255)

weights_sw = np.empty(channels * ifm_depth * kernel_height * kernel_width,
                      dtype=np.int16)

for i in range(0, channels):
    for j in range(0, ifm_depth):
        for k in range(0, kernel_height * kernel_width):
            addr = i * ifm_depth * kernel_height * kernel_width + \
                   j * kernel_height * kernel_width + k
            weights_sw[addr] = randint(0, 255)

__Run the following in code cell for Debug:   __
```Python           
for i in range(0, ifm_width*ifm_height*ifm_depth, 4):
    print(hex(ifm_sw[i]))

for i in range(0, channels*ifm_depth*kernel_height*kernel_width, 4):
    print(hex(weights_sw[i]))
```    

### Reshape IFM volume and Weights 
__Volumes are reshaped from row-major format to IP format and data is copied to their respective shared buffer__

In [23]:
conv.reshape_and_copy_ifm(ifm_height, ifm_width, ifm_depth, ifm_sw, ifm)
conv.reshape_and_copy_weights(kernel_height, kernel_width, ifm_depth,
                              weights_sw, weights)

__Run the following in code cell for Debug:__   
```Python
for i in range(0, ifm_width*ifm_height*ifm_depth, 4):
    print(hex(ifm[i]))

for i in range(0, channels*ifm_depth*kernel_height*kernel_width, 4):
    print(hex(weights[i]))
```

### Send Convolution Command to CNNDataflow IP

In [24]:
conv.load_conv_cmd(cmd_baseaddr)

### Start IP

In [25]:
# Load the number of commands and command physical address to offset addresses
cnn.write(NUM_COMMANDS_OFFSET, 1)
cnn.write(CMD_BASEADDR_OFFSET, cmd_baseaddr)

# Start Convolution if CNNDataflow IP is in Idle state
state = cnn.read(0x0)
if state == 4: # Idle state
    print("state: IP IDLE")
    start = cnn.write(0x0, 1) # Start IP
    start
else:
    print("state %x: IP BUSY" % state)

state: IP IDLE


### Check Status of the CNNDataflow IP

In [26]:
# Check if Convolution IP is in Done state
state = cnn.read(0x0)
if state == 6: # Done state
    print("state: IP DONE")
else:
    print("state %x: IP BUSY" % state)

state: IP DONE


### Read the Convoluted OFM Buffer

In [30]:
conv.read_ofm(ofm_baseaddr)

0x721e
0x8f52
0x9d7d
0x8c89
0x1e12
0x9f0e
0x3929
0x5bae
0xdab7
0xede5
0x567b
0x4a
0xede3
0xa0fa
0xf91
0xdbb1
0x85ee
0xfd74
0xb04a
0x7af2
0x2db8
0xa8fe
0x3d72
0x7266
0xd0d
0xace
0xc838
0x2c7d
0xd3bf
0xe0e4
0xda5d
0x63eb
0xe355
0xc60c
0x11c9
0x14e0
0x439f
0xadb7
0x8772
0x66ee
0x9df9
0xbd1e
0x252c
0xd676
0x7aad
0xd086
0x5cbb
0x975c
0xff1a
0x13ff
0xe15e
0xc481
0x5846
0x88b5
0x729f
0x6f7
0x59c2
0x49f2
0x7b1f
0xd16f
0x205
0x38fc
0x4b95
0x16
0x9b11
0x656f
0x83f
0x66fd
0x3fde
0xdc3f
0x3526
0x9883
0x9bd5
0x619e
0xc6d7
0x42bb
0xf830
0x934b
0xa9a5
0x9302
0xd2f2
0xac6c
0x777a
0x9159
0xc9d9
0x4625
0x7705
0x6ae8
0xa888
0x26c1
0x1d0e
0xb4a5
0xab88
0x3c95
0x4ff1
0xf6c3
0xa610
0xeee1
0xcf8e
0xb3c0
0xe56a
0xeab
0x1bca
0x3099
0x56d3
0x9de7
0x2eb6
0xc4c3
0x8cc9
0x7e56
0x61eb
0x94a5
0xfea1
0x1542
0xd1f0
0xd67a
0x9218
0x278e
0xf268
0x8c4e
0x19d0
0x87bc
0xede4
0xc3d5
0x38eb
0x2297
0x4db9
0xa551
0x259d
0xd45b
0xddf4
0xa5cf
0x21eb
0x9037
0x96ac
0xa539
0x2ab4
0xa97e
0x4d56
0x13d8
0x96c1
0xfa2e
0x51dd
0x298e
0x5

### Read Cycle Count and Efficiency the Run

In [14]:
hw_cycles = cnn.read(CYCLE_COUNT_OFFSET, 4)
conv.calc_efficiency(kernel_height, kernel_width, ifm_depth, hw_cycles)

CNNDataflow IP cycles: 44539, effciency: 93.11%


### Reset Xlnk

In [15]:
mmu.xlnk_reset()
print_kb(mmu)
print("Cleared Memory!")

Available Memory (KB): 85812
Available Buffers: 0
Cleared Memory!
