<a href="https://colab.research.google.com/github/SarahLares/Classify_Radio_Signals_from_Space_with_keras./blob/master/Classify_Radio_Signals_from_Space_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Classify Radio Signals from Space with Keras

We use 2D Spectrograms of deep-space radio signals collected by the antennas at the SETI institute.

The SETI Institute is a non-profit research organization whose mission is to explore, understand, and explain the origin and nature of life in the universe, and apply the knowledge gained to inspire and guide present and future generations. Their goal is the discovery and exchange of knowledge as scientific ambassadors for the public, the press and the government. SETI stands for "search for extraterrestrial intelligence".

The Allen Telescope Array (also abbreviated as ATA) represents the joint effort of the SETI Institute and the Radio Astronomy Laboratory at the University of California, Berkeley to build an interferometry radio telescope (or "radiointerferometer") that is dedicated to both astronomical observations and the simultaneous search for Extraterrestrial Intelligence.

The ATA is currently under construction at the Hat Creek Astronomical Observatory, 450 kilometers from San Francisco, California. When finished, it will consist of 350 antennas. The first phase with 42 antennas (ATA-42) is already complete and began operating on October 11, 2007.

The goal of the ATA is to search for faint but persistent signals.

The current signal detection system is programmed for only particular kinds of signals such as narrow-band carrier waves however, the detection system sometimes triggers the signals that are not narrow-band signals with some unknown efficiency and are also not explicitly known frequency interference. So there seem to be various categories of these kinds of events that have been observed in the recent past so our goal is to build an image classification model to classify these signals accurately.

So the original signals were not 2-D spectrograms but were time-series data collected and downloaded by [SETI](https://www.seti.org/). We’re going to work with 2-D spectrograms which were created by transforming the input time-series data. So we’re going to use the spectrograms as images to train or classification model.

### First, We Import Libraries

In [3]:
#from livelossplot.tf_keras import PlotLossesCallback
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
import keras

from sklearn.metrics import confusion_matrix
from sklearn import metrics

import numpy as np
np.random.seed(42)
import warnings;warnings.simplefilter('ignore')
%matplotlib inline
print('Tensorflow version:', tf.__version__)

Tensorflow version: 2.2.0


Using TensorFlow backend.


## Task 2: Load and Preprocess SETI Data

The data are in my Github repository, so we need unzip and the files will be read with the read_csv function from pandas. 

we have two data sets, one for the train and other for the validation. Each set have two files, the images.csv contains the spectrograph images their raw pixel intensity values ​​and are normalized so that the values ​​are between 0 and 1. They are then converted to a matrix by stretching them. Therefore, each row in the CSV file corresponds to a single image.

The labels.csv file contain the class for each image, so each row correspond to a image.

The label were found to be one hot encoded in to a vector of 1,4
(no. of classes).

* [1,0,0,0] is squiggle
* [0,1,0,0] is Narrow-band signal
* [0,0,1,0] is Noise
* [0,0,0,1] is Narrow-band-drd signal

In [5]:
!wget https://github.com/SarahLares/Classify_Radio_Signals_from_Space_with_keras./blob/master/train.tar.xz

--2020-06-15 22:47:45--  https://github.com/SarahLares/Classify_Radio_Signals_from_Space_with_keras./blob/master/train.tar.xz
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘train.tar.xz’

train.tar.xz            [<=>                 ]       0  --.-KB/s               train.tar.xz            [ <=>                ]  71.75K  --.-KB/s    in 0.02s   

2020-06-15 22:47:45 (3.81 MB/s) - ‘train.tar.xz’ saved [73467]



In [15]:
!wget -i https://github.com/SarahLares/Classify_Radio_Signals_from_Space_with_keras./blob/master/validation.tar.xz

--2020-06-15 23:00:52--  https://github.com/SarahLares/Classify_Radio_Signals_from_Space_with_keras./blob/master/validation.tar.xz
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘validation.tar.xz.1’

validation.tar.xz.1     [<=>                 ]       0  --.-KB/s               validation.tar.xz.1     [ <=>                ]  71.84K  --.-KB/s    in 0.02s   

2020-06-15 23:00:53 (4.02 MB/s) - ‘validation.tar.xz.1’ saved [73567]

--2020-06-15 23:00:53--  https://github.githubassets.com/
Resolving github.githubassets.com (github.githubassets.com)... 185.199.111.154, 185.199.108.154, 185.199.110.154, ...
Connecting to github.githubassets.com (github.githubassets.com)|185.199.111.154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8 [text/html]
Saving to: ‘index.html’


2020-06-15 23:00:53 (419 KB/s

In [16]:
!tar -xzvf validation.tar.xz


gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now


In [14]:
!ls

sample_data  train.tar.xz  validation.tar.xz


In [0]:
train_images = pd.read_csv('dataset/train/images.csv', header=None)
train_labels = pd.read_csv('dataset/train/labels.csv', header=None)

val_images = pd.read_csv('dataset/validation/images.csv', header=None)
val_labels = pd.read_csv('dataset/validation/labels.csv', header=None)

In [0]:
train_images.head()

In [0]:
train_labels.head()

In [0]:
print("Training set shape:", train_images.shape, train_labels.shape)
print("Validation set shape:", val_images.shape, val_labels.shape)

In [0]:
x_train = train_images.values.reshape(3200, 64, 128, 1)
x_val = val_images.values.reshape(800, 64, 128, 1)

y_train = train_labels.values
y_val = val_labels.values

## Task 3: Plot 2D Spectrograms

In [0]:
plt.figure(0, figsize=(12,12))
for i in range(1,4):
    plt.subplot(1,3,i)
    img = np.squeeze(x_train[np.random.randint(0, x_train.shape[0])])
    plt.xticks([])
    plt.yticks([])
    plt.imshow(img)

In [0]:
plt.imshow(np.squeeze(x_train[3]), cmap="gray");

## Task 4: Create Training and Validation Data Generators

In [0]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen_train = ImageDataGenerator(horizontal_flip=True)
datagen_train.fit(x_train)

datagen_val = ImageDataGenerator(horizontal_flip=True)
datagen_val.fit(x_val)

## Task 5: Creating the CNN Model

In [0]:
from tensorflow.keras.layers import Dense, Input, Dropout,Flatten, Conv2D
from tensorflow.keras.layers import BatchNormalization, Activation, MaxPooling2D

from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.utils import plot_model

In [0]:
# Initialising the CNN
model = Sequential()

# 1st Convolution
model.add(Conv2D(32,(5,5), padding='same', input_shape=(64, 128,1)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# 2nd Convolution layer
model.add(Conv2D(64,(5,5), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

# Flattening
model.add(Flatten())

# Fully connected layer
model.add(Dense(1024))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.4))

model.add(Dense(4, activation='softmax'))

## Task 6: Learning Rate Scheduling and Compile the Model

In [0]:
initial_learning_rate = 0.005
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=5,
    decay_rate=0.96,
    staircase=True)

optimizer = Adam(learning_rate=lr_schedule)

In [0]:
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

## Task 7: Training the Model

In [0]:
checkpoint = ModelCheckpoint("model_weights.h5", monitor='val_loss',
                             save_weights_only=True, mode='min', verbose=0)
callbacks = [PlotLossesCallback(), checkpoint]#, reduce_lr]
batch_size = 32
history = model.fit(
    datagen_train.flow(x_train, y_train, batch_size=batch_size, shuffle=True),
    steps_per_epoch=len(x_train)//batch_size,
    validation_data = datagen_val.flow(x_val, y_val, batch_size=batch_size, shuffle=True),
    validation_steps = len(x_val)//batch_size,
    epochs=12,
    callbacks=callbacks
)

## Task 8: Model Evaluation

In [0]:
model.evaluate(x_val, y_val)

In [0]:
from sklearn.metrics import confusion_matrix
from sklearn import metrics
import seaborn as sns

y_true = np.argmax(y_val, 1)
y_pred = np.argmax(model.predict(x_val), 1)
print(metrics.classification_report(y_true, y_pred))
print("Classification accuracy: %0.6f" % metrics.accuracy_score(y_true, y_pred))

In [0]:
labels = ["squiggle", "narrowband", "noise", "narrowbanddrd"]

ax= plt.subplot()
sns.heatmap(metrics.confusion_matrix(y_true, y_pred, normalize='true'), annot=True, ax = ax, cmap=plt.cm.Blues); #annot=True to annotate cells

# labels, title and ticks
ax.set_title('Confusion Matrix'); 
ax.xaxis.set_ticklabels(labels); ax.yaxis.set_ticklabels(labels);