In [1]:
import os, warnings
from pynq import PL
from pynq import Overlay
from pynq import allocate
import numpy as np
from AMCCNN import AMCCNN
import scipy.io
import pickle
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick

In [2]:
ol = Overlay("amc_dma_rfsoc.bit")
dma = ol.axi_dma_0
amc = ol.amc_cnn_0
testset_file = 'AMC_dataset_demo.pkl'
X = pickle.load(open(testset_file, 'rb'), encoding='latin')
from AMCWidget import AMCWidget
amc_widget=AMCWidget(X, amc, dma)

# Streaming-CNN FPGA Architecture for Communications-based Applications
----
This demonstration will present a modulation classification application for wireless communications modulation schemes running on a **AMD-Xilinx RFSoC 2x2 development board**.

## Modulation Classification
This demo intends to showcase the proposed streaming-CNN architecture running on an RFSoC 2x2 development board. Currently, the IP exists on-chip with inputs transferred via AXI4-Stream from the Processing System to Programmable Logic with future aims to connect directly to RF Data Converters on the RFSoC.

**Modulation Classification** is the task of indentifying what modulation scheme a received signal has been encoded with. The possible modulation schemes are: 
* 8PSK
* BPSK
* CPFSK
* GFSK
* PAM14
* QAM16
* QAM64
* QPSK

## Neural Network Structure

![](NN.png)

----

## CNN Architecture
The CNN architecture has been built to support constant streaming inputs similar to how a classical communications pipeline is built with filters processing a stream of samples. This architecture aims to allow deep learning solutions to be inserted within an already existing communications pipeline. We assume samples from the air are constantly being received and this architecture has been built to support a stream without interruptions.

### Weight and Input Sample Restructure

To facilitate a streaming input convention, the order in which the neural network calculations are processed must be revisited. In a typical sliding window convolutional layer approach, the kernel weights may be processed over the input data multiple times. To simplify the calculations being performed on chip, the input data is transformed into a matrix equivalent of the sliding kernel approach. Similarly, the kernel weights are transformed into their matrix equivalent before deployment. The **left** figure below indicates how this is possible. The **right** figure shows how the input and kernel weights are transformed from 3D values to a matrix.

Convolutions to Matrix Multiplies  |  Transforming Inputs and Weights
:---------------------------------:|:-----------------------------------------:
![](GEMMCalculations.png)             |  ![](GEMM_inputs_weights.png)

### Overall Structure
The streaming-CNN architecture's overall structure is shown **below**. The architecture accepts a streaming input with one sample entering at a time before being stored in a 'Block RAM buffer'. The 'Read and Write Controller' performs the matrix conversion and passes columns of the resulting matrix to the 'Matrix-Vector Multiplier'. The resulting data is then passed through a 'ReLU' activation before it is ready of the next layer to be processed.

![](overall_architecture.png)

<!-- #### Optimisations in Matrix-Vector Multipliers

Some of the matrix-vector multiplications can become quite large. We can take advantage of using a faster clock to time-share some of our resources. Below are two optimisations for both Convolutional and Dense neural network layers.

On the **left** there is a fully parallel Matrix-Vector Multiplier for when the resulting input vector is small enough to run calulations in parallel. On the **right** is the serial-parallel Matrix-Vector Multiplier where a subset of the multiple-accumulates are time-shared to reduce the resources used.

Fully Parallel Matrix-Vector Multiplier  |  Serial-Parallel Matrix-Vector Multiplier
:---------------------------------------:|:-----------------------------------------:
![](fullyparallel.png)                   |  ![](serial-parallel.png) -->

### Quantisation
The whole model runs with **18-bit fixed point** arithmetic for both inputs and weights. This value was chosen as it is the maximum fixed point length accepted by the DSP48s on-chip and maintains good precision of the original floating point weights.

----

# Interactive Demo
### Choose from 8 different modulation schemes and test out the CNN.
Firstly, the input waveform is plotted. Next the prediction by the model compared to the actual label.
Finally the prediction confidence by the CNN is displayed in the form of a bar graph.

In [3]:
amc_widget.display()

VBox(children=(Dropdown(description='Mods:', options=('8PSK', 'BPSK', 'CPFSK', 'GFSK', 'PAM4', 'QAM16', 'QAM64…

In [None]:
# To activate Voila dashboard, run the following command
# voila /home/xilinx/jupyter_notebooks/rfsoc_amc/voila_modulation_classification.ipynb --ExecutePreprocessor.timeout=180 --port=8866 --TagRemovePreprocessor.remove_cell_tags='{"ignore_me"}'

----