## Notebook Description

This is an implementation of instantaneous pulse rate estimation algorithm described in:
> Xu, Ke, et al. "Deep Recurrent Neural Network for Extracting Pulse Rate Variability from Photoplethysmography During Strenuous Physical Exercise." 2019 IEEE Biomedical Circuits and Systems Conference (BioCAS). IEEE, 2019.

The dataset comes from the author's [github](https://github.com/WillionXU/CIME-PPG-dataset-2018/).

In [1]:
import os
from pathlib import Path

import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import io as sci_io


DATA_PATH = Path('/Volumes/Samsung_SSD/CIME-PPG-dataset-2018')

## Prepare Data

In [2]:
train_all_noise = sci_io.loadmat(DATA_PATH / 'train_all_noise.mat')
test_all_noise = sci_io.loadmat(DATA_PATH / 'test_all_noise.mat')

In [3]:
train_all_noise.keys()

dict_keys(['__header__', '__version__', '__globals__', 'noise_input_all', 'noise_output_all'])

In [4]:
train_input = train_all_noise['noise_input_all']
train_output = train_all_noise['noise_output_all']
test_input = test_all_noise['noise_input_all']
test_output = test_all_noise['noise_output_all']

### Dataset Description

In [5]:
print(f'{train_input.shape=}, {train_output.shape=}, {test_input.shape=}, {test_output.shape=}')

train_input.shape=(5, 1000), train_output.shape=(5, 1000), test_input.shape=(5, 380), test_output.shape=(5, 380)


According to the author's description, the dataset is organized as following:
For training and testing dataset with shape (5, 1000) or (5, 380), each element represent for a 30 sec of data in Matlab cell matrix format, and each column with 5 elements represents for 1 sequence. The sequence-wise concatenation is doable, thus will give you a 30 * 5 = 150 sec data.

For the **input data (X)**, each Matlab cell matrix contains 30 sec * 3 rows of signals, which are corrupted PPG in the 1st row, x-axis accelerometer in the 2nd row and y-axis gyroscope in the 3rd row.

For the **output data (y)**, each Matlab cell matrix contains 30 sec * 1 row of signal, which is the clean PPG.

Sensor sampling frequency is 200 Hz, but further down-sampled to 100 Hz in this dataset.

For more details, please refer to [data description page](https://github.com/WillionXU/CIME-PPG-dataset-2018/blob/master/dataset_description.pdf).



### Visualize a signal sequence in training set.

In [8]:
visualize_seq_input = train_input[:, 0]
visualize_seq_output = train_output[:, 0]

corrupted_ppg = []
x_acce = []
y_gyro = []
clean_ppg = []

for _input, _output in zip(visualize_seq_input, visualize_seq_output):
    corrupted_ppg.extend(_input[0].tolist())
    x_acce.extend(_input[1].tolist())
    y_gyro.extend(_input[2].tolist())
    clean_ppg.extend(_output[0].tolist())
    
# The PPG signal is usually flipped to visualize
corrupted_ppg = np.array(corrupted_ppg)
clean_ppg = np.array(clean_ppg)
corrupted_ppg = 2 * corrupted_ppg.mean() - corrupted_ppg
clean_ppg = 2 * clean_ppg.mean() - clean_ppg
    
fig = make_subplots(specs=[[{}], [{'secondary_y': True}]], rows=2, shared_xaxes=True, vertical_spacing=0.02)
fig.add_trace(go.Scatter(y=corrupted_ppg, name='corrupted_ppg'), col=1, row=1)
fig.add_trace(go.Scatter(y=clean_ppg, name='clean_ppg'), col=1, row=1)
fig.add_trace(go.Scatter(y=x_acce, name='x_acce'), col=1, row=2)
fig.add_trace(go.Scatter(y=y_gyro, name='y_gyro'), col=1, row=2, secondary_y=True)