# Machine Learning for EM Data Classification

This notebook includes the following activities.

- High-level use of SVM and neural network classifiers.
- Classifying dummy signal data
- Classifying real EM data

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from emvincelib import iq, ml, stat
from sklearn import svm
from sklearn.neural_network import MLPClassifier
from scipy.fftpack import fft
from sklearn import preprocessing

%matplotlib inline

### 1. Machine Learning Concepts

Machine Learning (ML) is a broad domain that develops various algorithms and statistical models to learn patterns from data and later make predictions. It is not possible to learn ML within this workshop. However, in this Jupyter-Notebook, we'll look at the basic concepts of ML and how it can help us to learn patterns in EM data. Machine Learning can be categorized into two types as **supervised** and **unsupervised** learning. In supervised learning, we provide example data and their expected classification output to an algorithm to learn. Once learned, the algorithm can produce classification output for new data. In unsupervised learning, we only provide data so that the algorithm learn patterns on its own and become capable of classifying new data.

When performing ML on EM data, supervised learning is the approach we need to focus on. We can provide example EM data for specific known things occuring on computing devices and train ML models to recognize similar activities when given unknown EM data. The training data are provided in a 2-D array format named as **X**. The target classification classes for the input data are in a 1-D array named as **y**. Towards this goal, let's focus on the following two supervised ML architectures for time being.

   1. Support vector machines (SVM)
   2. Neural networks

#### Support Vector Machines

![alt text](./images/svm-intro.png "Support vector machines (SVM)")


Reference: https://scikit-learn.org/stable/modules/svm.html



In [None]:
X = [[0., 0.], [1., 1.]]
y = [0, 1]

clf = svm.SVC()

clf.fit(X, y)

clf.predict([[2., 2.], [-1., -2.]])

#### Neural Networks

![alt text](./images/neural-network-intro.png "Neural Networks")

Reference: https://scikit-learn.org/stable/modules/neural_networks_supervised.html

In [None]:
X = [[0., 0.], [1., 1.]]
y = [0, 1]

clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)

clf.fit(X, y)

clf.predict([[2., 2.], [-1., -2.]])

### 2. Pre-processing Signals for Machine Learning

We cannot directly use EM data files we have acquired as input to an ML model. We have to pre-process our EM data files and build the **X** and **y** data structures. Let's learn it by using a dummy dataset. Following GRC flowgraph emulates two signals that are being generated with different frequency components. Our task is to train a ML model that can distinquish between the two signal sources when a new signal file is provided.

#### Dataset

You can find the two data files called **4.training-class-1.cfile** and **4.training-class-2.cfile** in the data folder. Let's see the basics details about these two data files.

In [None]:
file1="./data/ml-for-signal-classification/class-1.npy"
file2="./data/ml-for-signal-classification/class-2.npy"

In [None]:
iq.sampleRate = 32e3
                   
duration1 = iq.getTimeDuration(file1, fileType="npy")
print("Time duration of the numpy file: " + str(duration1) + " seconds")

data1 = iq.getSegmentData(file1, 0, duration1, fileType='npy')
length = len(data1)
print("Number of samples in numpy data: " + str(length))

In [None]:
duration2 = iq.getTimeDuration(file2, fileType="npy")
print("Time duration of the numpy file: " + str(duration2) + " seconds")

data2 = iq.getSegmentData(file2, 0, duration2, fileType='npy')
length = len(data2)
print("Number of samples in numpy data: " + str(length))

Let's visualize them and see if we can visually recognize differences between the two signals.

In [None]:
iq.plotFFT(data1)

iq.plotFFT(data2)

In [None]:
iq.plotSpectrogram(data1)

iq.plotSpectrogram(data2)

### 3. Training and Testing a Machine Learning Model

With these insights we have gained during the previous step, now we can generate the **X** and **y** training dataset for a classifier. We do that by sending a sliding window over the data files and converting each window data segment into a feature vector. Each feature vector is appended to the **X** matrix along with the appropriate label for it in the **y** vector.

In [None]:
iq.sampleRate = 32e3
sliding_window = 0.1
feature_vector_size = 50

ml.loadTrainingData(file1, iq.sampleRate, feature_vector_size, sliding_window, duration1, "Class 1")
ml.loadTrainingData(file2, iq.sampleRate, feature_vector_size, sliding_window, duration2, "Class 2")

clf = ml.createClassifier()
ml.trainAndTest(clf)