## Convolution Single Run Test
This notebook shows a single run of the convolution using CNNDataflow IP. The input feature map is read from memory, processed and output feature map is captured for one single convolution command. The cycle count for the full operation is read and displayed at the end.
The input data in memory is set with random integers in this notebook to test the convolution run.

### Input the arguments for the convolution in CNNDataflow IP

In [1]:
# Input Feature Map (IFM) dimensions
ifm_height = 14
ifm_width = 14
ifm_depth = 64
# Kernel Window dimensions
kernel_height = 3
kernel_width = 3
# Other arguments
pad = 0
stride = 1
# Channels
channels = 32

print("HOST CMD: CNNDataflow IP Arguments set are - IH %d, IW %d, ID %d, KH %d, KW %d, P %d, S %d, CH %d" 
          % (ifm_height, ifm_width, ifm_depth, kernel_height, kernel_width, pad, stride, channels))

HOST CMD: CNNDataflow IP Arguments set are - IH 14, IW 14, ID 64, KH 3, KW 3, P 0, S 1, CH 32


### Download `Convolution IP` bitstream

In [2]:
from pynq import Overlay

overlay = Overlay("/opt/python3.6/lib/python3.6/site-packages/pynq/overlays/darius/convolution.bit")
overlay.download()
overlay.is_loaded()

True

### Create MMIO object to access the CNNDataflow IP

In [3]:
from pynq import MMIO

# Constants
CNNDATAFLOW_BASEADDR = 0x43C00000
NUM_COMMANDS_OFFSET = 0x60
CMD_BASEADDR_OFFSET = 0x70
CYCLE_COUNT_OFFSET = 0xd0

cnn = MMIO(CNNDATAFLOW_BASEADDR, 65536)
print(f'Idle state: {hex(cnn.read(0x0, 4))}')

Idle state: 0x4


### Create Xlnk object (Memory Management Unit) for allocating contiguous array in memory for data transfer between software and hardware

In [4]:
from pynq import Xlnk
import numpy as np

# Constant
SIZE = 5000000  # 20 MB of numpy.uint32s

mmu = Xlnk()

# Contiguous memory buffers for CNNDataflow IP convolution command, IFM Volume, Weights and OFM Volume
# These buffers are shared memories that are used to transfer data between software and hardware
cmd = mmu.cma_array(SIZE)
ifm = mmu.cma_array(SIZE)
weights = mmu.cma_array(SIZE)
ofm = mmu.cma_array(SIZE)

# Saving the base phyiscal address for the command, ifm, weights, and ofm buffers
# These addresses will be used later to copy and transfer data between hardware and software
cmd_baseaddr     = cmd.physical_address
ifm_baseaddr     = ifm.physical_address
weights_baseaddr = weights.physical_address
ofm_baseaddr     = ofm.physical_address

### Functions to print Xlnk statistics

In [5]:
def get_kb(mmu):
    return int(mmu.cma_stats()['CMA Memory Available'] // 1024)

def get_bufcount(mmu):
    return int(mmu.cma_stats()['Buffer Count'])

def print_kb(mmu):
    print("Available Memory (KB): " + str(get_kb(mmu)))
    print("Available Buffers: " + str(get_bufcount(mmu))) 
print_kb(mmu)

Available Memory (KB): 52664
Available Buffers: 4


### Check arguments range and construct convolution command for hardware

In [6]:
from darius import cnndataflow_lib
conv = cnndataflow_lib.CNNDataflow(ifm_height, ifm_width, ifm_depth, kernel_height, kernel_width, pad, stride, channels, ifm_baseaddr, weights_baseaddr, ofm_baseaddr)

conv.construct_conv_cmd(ifm_height, ifm_width, ifm_depth, kernel_height, kernel_width, pad, stride, channels, ifm_baseaddr, weights_baseaddr, ofm_baseaddr)

All convolution arguments are in supported range
Convolution command to CNNDataflow IP: 
b'\x0e\x00\x0e\x00\x03\x00\x03\x00\x01\x00\x00\x00\x0c\x00\x0c\x00\x08\x00\x04\x00\x01\x00\x01\x00\x00\x00\xd0\x17 \x06\x00\x00\x001\x00\x00\x00\x00\x00\x00\x00\x00P\x1a@\x02\x00\x00\x00\x00\x10\x19@\x02\x00\x00\x00\x12\x00\x00\x00\x00\x00\x00'


### Create ifm volume and weight volume in software and populate random values in a row-major format.

In [7]:
from random import *

ifm_sw = np.empty(ifm_width*ifm_height*ifm_depth, dtype=np.int16)

for i in range (0, ifm_depth):
    for j in range (0, ifm_height):
        for k in range (0, ifm_width):
            index = i*ifm_height*ifm_width + j*ifm_width + k
            ifm_sw[index] = randint(0,255)
            
weights_sw = np.empty(channels*ifm_depth*kernel_height*kernel_width, dtype=np.int16)

for i in range (0, channels):
    for j in range (0, ifm_depth):
        for k in range (0, kernel_height*kernel_width):
            addr = i*ifm_depth*kernel_height*kernel_width + j*kernel_height*kernel_width + k
            weights_sw[addr] = randint(0,255)
            
#for i in range(0, ifm_width*ifm_height*ifm_depth, 4):
    #print(hex(ifm_sw[i]))

#for i in range(0, channels*ifm_depth*kernel_height*kernel_width, 4):
    #print(hex(weights_sw[i]))

### Reshape IFM volume and Weights from row-major format to IP format and copy the data to their respective shared buffer

In [8]:
conv.reshape_and_copy_ifm(ifm_height, ifm_width, ifm_depth, ifm_sw, ifm)
conv.reshape_and_copy_weights(kernel_height, kernel_width, ifm_depth, weights_sw, weights)

#for i in range(0, ifm_width*ifm_height*ifm_depth, 4):
    #print(hex(ifm[i]))

#for i in range(0, channels*ifm_depth*kernel_height*kernel_width, 4):
    #print(hex(weights[i]))

### Send convolution command to CNNDataflow IP

In [9]:
conv.load_conv_cmd(cmd_baseaddr)

### Load the number of commands and Command physical address to offset addresses and Start IP

In [10]:
cnn.write(NUM_COMMANDS_OFFSET, 1)
cnn.write(CMD_BASEADDR_OFFSET, cmd_baseaddr)

# Start Convolution if CNNDataflow IP is in Idle state
state = cnn.read(0x0)
if state == 4: # Idle state
    print("state: IP IDLE")
    start = cnn.write(0x0, 1) # Start IP
    start
else:
    print("state %x: IP BUSY" % state)

state: IP IDLE


### Check status of the CNNDataflow IP

In [11]:
# Check if Convolution IP is in Done state
state = cnn.read(0x0)
if state == 6: # Done state
    print("state: IP DONE")
else:
    print("state %x: IP BUSY" % state)

state: IP DONE


### Read back the convoluted OFM buffer

In [12]:
conv.read_ofm(ofm_baseaddr)

0x79d6
0x445d
0x6ab8
0xe02
0x5e87
0x6818
0xe769
0x941f
0xe809
0x7c62
0x6100
0xc219
0x2047
0x1faf
0x6fc3
0x5c5f
0xfbf8
0xa700
0x248b
0x1248
0x1477
0xbb43
0x6bee
0x6371
0x46c5
0xecff
0x10df
0xc95a
0xdde5
0x5c89
0x4d05
0xea2c
0x5645
0x947b
0xb7e2
0x83a2
0xdbe7
0xd449
0x6c44
0x520b
0xf190
0x37c1
0x1b10
0x7e2d
0x40dd
0x5f43
0x663f
0x743a
0x782a
0xb24c
0x13ad
0xed98
0x4d6f
0xdb14
0xd029
0x7c51
0xe3db
0x69c2
0x2d26
0xe95b
0xd8fb
0xde51
0x375f
0x616
0x3828
0x8261
0x79d
0x22a7
0xa40c
0x77c2
0xdde2
0x1321
0x55e0
0x8ada
0x3371
0xca36
0x77de
0x8a27
0xcc6d
0x282b
0x6a2
0xf4a8
0x2250
0x3400
0xf814
0xac3e
0xd26a
0x9b1a
0xd8b7
0xe777
0xda5f
0xa76d
0x53e9
0xf865
0x410b
0x74c0
0x4879
0xc9da
0xb31b
0x4994
0xaa8b
0x1a4
0xbe6a
0x2a95
0x380b
0x3141
0x82a8
0x2a6e
0x5e82
0x115f
0x8d18
0xce34
0xa581
0xd688
0xd318
0xe5c
0xa77d
0x249d
0x3f6a
0xf897
0xf864
0x9b47
0x720e
0x1124
0xf62c
0xf5da
0x3cbf
0xdbe5
0x9a59
0x9f80
0x2a46
0x6718
0x6c53
0xa86
0xd8c6
0xad17
0x86fd
0xfab1
0x30ba
0xb9fd
0xcfb8
0x5356
0x142a
0x57a8

### Read cycle count of the complete run

In [13]:
hw_cycles = cnn.read(CYCLE_COUNT_OFFSET, 4)
conv.calc_efficiency(kernel_height, kernel_width, ifm_depth, hw_cycles)

CNNDataflow IP cycles: 44501, effciency: 93.19%


### Clear Cache

In [14]:
mmu.xlnk_reset()
print_kb(mmu)
print("Cleared Memory!")

Available Memory (KB): 130092
Available Buffers: 0
Cleared Memory!
