<a href="https://colab.research.google.com/github/djsabelo/BiosignalsDeepLearningWorkshop/blob/main/SignalProcessingForSynthesis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Processing:

1. Sacar ficheiro - meter em numpy arrays
2. Fazer plot aos dados com dois ou mais sinais
3. Cortar os dados
4. Remover a média e normalizar com o maximo absoluto
5. Subamostragem (decimate)
5. Fazer plot aos dados com dois ou mais sinais
6. Remover ruido - smooth
7. Fazer plot aos dados
8. Mostrar e tratar do baseline wander
9. Remover o minimo
10. Quantização
11. Fazer plot aos dados
12. Segmentação - tens que garantir pelo menos 2 ciclos - sliding window (janelas 2^x + 1)
11. Fazer plot de muitas janelas





# Signal Processing for  Synthesis

The application of Deep Learning for biosignals synthesis depends on the quality of signal processing and even on the quality of the raw signal.

In this workshop, we will go through the signal processing steps that are usually applied for biosignals synthesis using Python. Here, we will use **numpy** to ease mathematical operations, **matplotlib** to visualize the results of each step and **os** to search data files.

In [1]:
import numpy as np
import matplotlib as plt
import os

In [2]:
!git clone https://github.com/djsabelo/BiosignalsDeepLearningWorkshop.git
!cd BiosignalsDeepLearningWorkshop

fatal: destination path 'BiosignalsDeepLearningWorkshop' already exists and is not an empty directory.


## Reading Data

The first step for the application of signal processing is to read the files that contain the data. In this case, we will use ECG files from the Fantasia dataset, that were previously downloaded and, so, can be reached using **os**.

In [None]:
# Find files in folder
folder = './data/'
files = os.listdir(folder)

print(files)

In [None]:
import scipy.interpolate as itp

def interpolate_signal(time, new_time, signal):
    """
    This function uses the linear interpolates of the input signal with a certain number of samples with "time" 
    timestamp vector, according to the "new_time" vector.
     
    :param time: vector containing the the timestamps for the signal input 
    :param new_time: vector containing the new timestamps for interpolation
    :param signal: vector with the values associated with each timestamp of the "time" vector
    :return: 
        The interpolated version of the input signal for the "new_time" vector timestamps 
    """

    f = itp.interp1d(time, signal, fill_value="extrapolate")
    return f(new_time)


def process_hr(data, quantization_size=1000):
    """
        The function makes a preprocessing for the HR data and also stores the maximum and minimum values of the
        HR across all records in the data dictionary. This preprocessing includes interpolation, minimum subtraction, 
        maximum normalization and quantization. 

        :param data: data dictionary containing all information
        :param quantization_size: number of steps to be used in the quantization process
        :return:
            data: updated version of the dictionary (this may be accessed, since it is a global variable, but for a 
            latter publication of this code, it should be protected) 
            normalized_array: list containing all numpy array processed HR records 
    """
    # For the synchronization of both signals, I chose to interpolate the hr values.
    # Since the hr values are made of a square signal the interpolation will not be far from the original
    interpolated_hr = [interpolate_signal(hr_time, ppg_time, hr)
                       for hr_time, ppg_time, hr in zip(data["hr_times"], data["ppg_times"], data["hr_values"])]

    
    # Check if the interpolation was successfull
    # for i in range(data["size"]):
    #    plt.plot(data["hr_times"][i], normalized_hr[i], "k", alpha=0.7, label="HR")
    #    plt.plot(data["ppg_times"][i], interpolated_hr[i], "b", alpha=0.7, label="HR")
    #    plt.show()

    # In order to maintain the distribution of values across all subjects, I chose to normalize the data according
    # to the minimum and maximum values across the all records

    max = np.max(np.array([np.max(record) for record in interpolated_hr]))
    min = np.min(np.array([np.min(record) for record in interpolated_hr]))
    
    data["hr_max"] = max
    data["hr_min"] = min

    # The minimum is removed and the set is scaled
    normalized_array = [record - min for record in interpolated_hr] # Remove minimum
    normalized_array = [record / max for record in normalized_array] # Normalization step

    # Quantization step
    # this step reduces the dimentionality of the signal, ensuring that the selected model (GRU) learns
    normalized_array = [np.around(record * (quantization_size-1)).astype(int) for record in normalized_array]


    return data, normalized_array

    def process_ppg(records_dictionary, quantization_size=1000):
    times = []
    normalized_array = []
    updated_ppg = []
    for ppg_values, hr_time, ppg_time in \
            zip(records_dictionary["ppg_values"], records_dictionary["hr_times"], records_dictionary["ppg_times"]):
        # Since this data is not synchronized with the hr_data, we will remove the first points which does not have
        # any respective value. Regarding the interpolation of the hr data, this will make the results more reliable.
        first_index = np.where(ppg_time > hr_time[0])[0][0]
        updated_ppg.append(ppg_values[:, first_index:])
        times.append(ppg_time[first_index:])

        normalized_record = ppg_values[:, first_index:]
        normalized_record -= np.expand_dims(np.min(normalized_record, axis=1), -1) \
                             * np.ones((1, np.shape(normalized_record)[-1]))
                             # Remove mean using algebric equation for speed performance
        normalized_record = normalized_record / np.max(abs(normalized_record))  # Normalization step
        #normalized_record = normalized_record / np.std(abs(normalized_record))  # Normalization step
        normalized_array.append(normalized_record)

    return times, normalized_array, updated_ppg