 # <div  style="color:#303030;font-family:'arial blACK', sans-serif,monospace; text-align: center; padding: 50px 0; vertical-align:middle;" > <img src="https://www.nicepng.com/png/full/204-2043038_white-lightbulb-icon-light-bulb-icon-white.png" style=" background:#00a0e4;border-radius:10px;width:150px;text-align:left"  /> <span style="position:relative; bottom:75px; left:20px">  Morphological Analysis of Time Series Biosignals using Autoencoders </span> </div>

# I. Introduction
<div style="width:100%; background:#00a0e4;font-family:'arial black',monospace; text-align: center; padding: 7px 0; border-radius: 5px 50px;margin-top:-15px" >  </div>


On this example we will perform Apnea Detection using AutoEncoders to learn respiration morphology. This notebook will guide you through a morphological approach to analyze biosignals and extract information.

## <div style="color:#00a0e4"> 1. Background </div>


Autoencoders are neural networks with encoding layers and decoding layers which try to find a compact representation of the data by comparing the input with the outpt of the network. The encoding layers try to represent the input with lesser nodes reaching the smallest layer usually called bottleneck. Whereas the decoding layers take the bottleneck layer and reconstruct the original data. 

To learn more about autoencoders:
  - [Theoric explanation](https://towardsdatascience.com/auto-encoder-what-is-it-and-what-is-it-used-for-part-1-3e5c6f017726)
  - [More info](https://towardsdatascience.com/generating-images-with-autoencoders-77fd3a8dd368)
  
The application of Autoencoders to biosignals is useful to find a good representation of the data just by its morphological structure. Then, when there is a significant change in morphology, we can assess if its an anomaly, artifact, or particular disease we might be evaluating. 
  
 ![alt text ](https://i.ibb.co/F7Hkfzm/ae-scheme.png)

## <div style="color:#00a0e4"> 2. Objectives</div>
* Create an autoencoder to learn the respiratory pattern 
* Perform apnea detection using correlation and supervised learning

# II. Experimental

<div style="width:100%; background:#00a0e4;color:#282828;font-family:'arial black'; text-align: center; padding: 7px 0; border-radius: 5px 50px; margin-top:-15px" > </div>

This section should guide the students during their experimental procedure, and contain the most relevant content.

### <div style="color:#00a0e4">  1. Requirements (optional) </div>

In this section, the libraries required should be installed, using the command:

In [None]:
!pip install biosppy >/dev/null 2>&1
!pip install tensorflow >/dev/null 2>&1
!pip install keras >/dev/null 2>&1
!pip install sklearn >/dev/null 2>&1
!pip install mpld3 >/dev/null 2>&1

and imported:

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pickle
import pandas as pd
import biosppy
import tensorflow
import keras
import sklearn
import mpld3 

### <div style="color:#00a0e4">  2. Data Loading </div>

To perform this task we will use the files 'data_sep' and 'labels_sep' provided [here](https://github.com/MarianaAbreu/BioSPPy/tree/master/examples).

The 'data_sep' contains respiration signal segmented in one minute windows, which was already filtered, normalized to [-1,1] and resampled to 1000 points. 

The 'label_sep' contains respectively the label information of each 'data_sep' sample, regarding to the breathing pattern: 'N' is its normal breathing and 'A' if it is a breath-hold period (apnea).

Both files are divided in a training set with 600 samples (40 users) and a testing set with 75 samples (5 users).

In [None]:
data = pickle.load(open('data_sep', 'rb'))
labels = pickle.load(open('label_sep', 'rb'))
print('\nTesting set ' + str(len(labels[1])) + ' users, labels: \n')
print(labels[1])

### <div style="color:#00a0e4">   3. Data Visualization </div>

Visualize normal and apnea samples to understand the differences in morphologies.

#### <div style="color:#00a0e4">   3.1. Plot Example  </div>

In [None]:
# Data for plotting
index_1 = 2
index_2 = 12
x1 = data[0][index_1]
x2 = data[0][index_2]
# plot
plt.title('Sample of breathing and apnea', {'size':20})
plt.xlabel('Time (s)', color="#00a0e4")
plt.ylabel('Amplitude', color="#00a0e4")
plt.plot(x1, label=labels[0][index_1], color="#00bfc2")
plt.plot(x2, label=labels[0][index_2], color="#5756d6")
plt.legend()
plt.savefig('resp.png')
plt.show()



### <div style="color:#00a0e4">   4. Autoencoder Creation </div>

To train the autoencoder we will need to choose the following parameters: 
* Number of layers and number of nodes in each layer 
* Activation function
* Optimization function
* Error function

#### <div style="color:#00a0e4">   4.1. Autoencoder creation   </div>

The function "autoencoder" was created to receive the common parameters of a neural network and create the autoencoder. The advantage of this function is the ability to be adapted to different number of layers and nodes.

In [None]:
from keras.layers import Input, Dense
from keras.models import Model

def autoencoder_(encoding_dim=50, input_len=1000, list_layers=[500,250,50], a_fun='tanh', 
                optimizer='adam', loss='cosine_similarity', x_train=None, x_test=None, label='', epochs=200):
    """
    encoding_dim: number of nodes in the smallest layer (bottleneck)
    input_len: size of input samples
    list_layers: list of the nodes in each encoding layer, the number of nodes of the decoding layers are mirrowed from this
    a_fun: activation function 'tanh', 'sigmoid', 'relu',...
    optimizer: usually 'adam'
    loss function: 'cosine_similarity', 'mse', 'cross_entropy', ...
    """

    if sorted(list_layers, key=int, reverse=True) != list_layers:
        print('\nList should be in order!\n')
    #Encoder layers creation    
    input_sig = Input(shape=(input_len,))
    
    encoded = Dense(list_layers[0], activation = a_fun)(input_sig)
    
    network = list_layers + list_layers[::-1][1:] + [input_len]
    
    for nt in list_layers[1:]:
        encoded = Dense(nt, activation=a_fun)(encoded)
        
    #Encoder layers creation
    #"decoded" is the lossy reconstruction of the input
    inverse_layers = list_layers[::-1][1:] + [input_len]
    decoded = Dense(inverse_layers[0], activation=a_fun)(encoded)
    for ll in inverse_layers[1:]:
        decoded = Dense(ll, activation=a_fun)(decoded)

    autoencoder = Model(input_sig, decoded)
    ##Let's also create a separate encoder model as well as the decoder model:
    encoder = Model(input_sig, encoded)
    encoded_input = Input(shape=(encoding_dim,))

    dec_layers = autoencoder.layers[len(list_layers)+1:]
    decoder_output = dec_layers[0](encoded_input)
    for dl in dec_layers[1:]:
        decoder_output = dl(decoder_output)
        
    decoder = Model(encoded_input,decoder_output)

    ##First, we'll configure our model to use a per-pixel binary crossentropy loss, and the Adadelta optimizer
    autoencoder.compile(optimizer=optimizer,loss=loss)

    print('\nAutoencoder Created:')
    print('Layers: ' + str(list_layers))
    print('Input Length: ' + str(input_len))
    print('Compression: ' + str(encoding_dim))
    print('Activation: ' + str(a_fun))
    print('Optimizer: ' + str(optimizer))
    print('Loss: ' + str(loss) +'\n')

    autoencoder.fit(x_train,x_train,
                    epochs=epochs,
                    batch_size=100,
                    shuffle=True,
                    validation_data=(x_test, x_test))
    return encoder, decoder



#### <div style="color:#00a0e4">   4.2. Autoencoder train   </div>

To train the autoencoder we will only 'N' samples from data[0], which is the training set and we will separate in a train and a validation set with a proportion of 70/30. 


In [None]:
index_N = np.argwhere(labels[0] == 'N')
data_N = data[0][index_N]

<div style="background:#00bfc2;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; ">  Note </span> <br>
  <div style="background:#a8e0e0;"> 
    The autoencoder training will show a line for each epoch, showing the loss for train and validation sets. To hide this information, uncomment the part between '#---', which will create a function to hide all information.     
</div>

In [None]:
from sklearn.model_selection import train_test_split
#separate the train data in the train and validation sets for the autoencoder
data_train, data_val = train_test_split(data_N, test_size=0.3)

#---
#import sys, os
#from contextlib import contextmanager
#@contextmanager
#def suppress_stdout():
#    with open(os.devnull, "w") as devnull:
#        old_stdout = sys.stdout
#        sys.stdout = devnull
#        try:  
#            yield
#        finally:
#            sys.stdout = old_stdout
#
#with suppress_stdout():
#---
encoder, decoder = autoencoder_(x_train=data_train, x_test=data_val, epochs = 200)

<div style="background:#62d321;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#c2e8ac;"> 
    The parameters in default were particularized for this analysis, however other might improve results even further. Experiment with the parameters with the options available at:  
</div>


[Loss functions](https://keras.io/api/losses/)

[Activation functions](https://keras.io/api/layers/activations/)

[Optimizers](https://keras.io/api/optimizers/)


#### <div style="color:#00a0e4">   4.3. Morphological Analysis   </div>

Using the outputs of the 'autoencoder' function we will take the original test and train data once again. Preferably, for this part we should use only new data, however, since our dataset is small, we will use the training set (data[0]) to train the supervised learning model, whereas the testing set (data[1]) will be used for the first time to evaluate the overall algorithm.

In the plotting we see how an autoencoder trained in breathing samples tries to reconstruct an apnea sample.

In [None]:
enc_test = encoder.predict(data[1])
dec_test = decoder.predict(enc_test)
enc_train = encoder.predict(data[0])
dec_train = decoder.predict(enc_train)

In [None]:
# Data for plotting
index = 12
x1 = data[1][index]
x2 = dec_test[index]
# plot
plt.title('Sample of breathing and apnea', {'size':20})
plt.xlabel('Time (s)', color="#00a0e4")
plt.ylabel('Amplitude', color="#00a0e4")
plt.plot(x1, label=labels[0][index_1], color="#00bfc2")
plt.plot(x2, label=labels[0][index_2], color="#5756d6")
plt.legend()
plt.savefig('resp.png')
plt.show()

<div style="background:#62d321;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#c2e8ac;"> 
    What happens if the Autoencoder is trained with the entire train data instead of only 'N' samples? And what if it is only trained on 'A' samples? Change the first cell of section 4.2 Autoencoder train, to explore this different options.     
</div>

#### <div style="color:#00a0e4">   4.4. Input/Output Correlation   </div>

Through pearson correlation, we will compare the data used as input (data[0], data[1]) to the correspondant reconstructed output (dec_train, dec_test). Instead of computing just one value of correlation for the entire sample, we will divide the sample in 10 segments and evaluate the correlation inside each segment. 

![alt_text](https://i.ibb.co/qM8xPq6/correlation-scheme2.jpg)

In [None]:
corr_size = 10 
cs = int(1000/corr_size)

from scipy.stats import pearsonr
train_cl, test_cl, y_train_new, y_test_new = [], [], [], []

#correlation in training set
for d in range(len(dec_train)):
    corr_train = [pearsonr(dec_train[d][i:i+cs],data[0][d][i:i+cs])[0] for i in range(0,len(dec_train[d]),cs)]
    if np.isfinite(corr_train).all():
        train_cl += [corr_train]
        y_train_new += [labels[0][d]]

#correlation in the testing set
for d in range(len(dec_test)):
    corr = [pearsonr(dec_test[d][i:i+cs],data[1][d][i:i+cs])[0] for i in range(0,len(dec_test[d]),cs)]
    if np.isfinite(corr).all():
        test_cl += [corr]
        y_test_new += [labels[1][d]]


<div style="background:#62d321;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#c2e8ac;"> 
    The correlation size = 10 is good for this signal since it seems to encapsulate breathing cycles, however other correlation sizes might also be interesting to evaluate. Change the correlation size and see its effect on the classifier performance in the next section.    
</div>

### <div style="color:#00a0e4">   5. Supervised Learning   </div>

Train a supervised learning classifier to discriminate between 'N' samples and 'A' samples based solely on the correlation vector previously computed (train_cl, test_cl), and the respective label (labels[0], labels[1]).

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
classifier = GaussianNB()
classifier.fit(train_cl, labels[0].ravel())

y_pred = classifier.predict(test_cl)
score_acc = accuracy_score(labels[1], y_pred)

print(" --- Accuracy: " + str(score_acc) + '%')

<div style="background:#62d321;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#c2e8ac;"> 
    Other classifiers might also produce solid results, experiment them and their parameters to achieve better results.      
</div>
    
    


<div style="background:#fada5e;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; ">  Warning! </span> <br>
  <div style="background:#fff4c9;"> 
    Our initial data was already separated in a training set and testing set. However to perform solid experiments, we should repeat the process, changing the data within both sets, in order to have a performance result more representative of this problem.   
</div>

# III. Explore
<div class='h1'  style="width:100%; background:#00a0e4;color:#282828;font-family:'arial black'; text-align: center; padding: 7px 0; border-radius: 5px 50px;margin-top:-15px" > </div>

### <div style="color:#00a0e4">  1. Final Notes </div>

In this notebook we create an autoencoder to understand respiratory signal's morphology and use it to distinguish apneas from breathing samples. This example can be applied to other time series and classification tasks, even using the bottleneck representation as features instead of the correlation analysis. Thus, many options are available to use autoencoders or other new algorithms as substitutes for feature engineering.

### <div style="color:#00a0e4">  2. Quiz  </div>
1. Use the bottleneck representation (enc_train and enc_test) as input to the classifier.
2. Use cross validation to evaluate the performance critically.
3. Apply this technique to another dataset and problem of your choice.

### <div style="color:#00a0e4">  3. Further Reading  </div>

1. Check the full paper were this technique is explained thoroughly (link paper)
2. Explore supervised through feature engineering (link Signal Classification using SL)
3. Explore other neural networks applied to biosignals (link prof.André Martins notebook)
4. Explore how to extract datasets from public databases for upgrades to this challenges (link Tiago)

<div style="height:115px; background:white;border-radius:10px;text-align:center"> 

<img src="https://www.lx.it.pt/~asmc/predict/images/IT.png" alt="it" style="position: relative; margin-left: 10px; bottom:-55px;max-width:150px;height:auto;"/> 
<img src="https://cqe.tecnico.ulisboa.pt/files/files/logos/IST_A_RGB_POS.png"
         alt="alternate text" 
         style="position: relative; margin-left: 10px;  bottom:-50px; width:150px;height:auto;"/>
</div> 

<div style="width: 100%; ">
    <div style="background:#00a0e4;color:white;font-family:'arial', monospace; text-align: center; padding: 50px 0; border-radius:10px; height:10px; width:100%; float:left " >
  <span style="font-size:20px;position:relative; top:-25px">  Suggestions are welcome! </span> <br>
 <span style="font-size:12px;position:relative; top:-25px">  Please provide us your feedback at jehdwne@it.lx.pt</span> 
</div>