# Analysis of Heart Rate Estimation Dataset

- Of a single participant (S1)

---

In [1]:
import pandas as pd
import numpy as np

## Load Data

In [2]:
def load_data(data):
    
    original_data = pd.read_pickle(data)    
    return original_data

In [3]:
file = "S1.pkl"
path = "../../2-pre_autoTS_and_wings_analysis/heart_rate_estimation/PPG_FieldStudy/S1/"
data = path + file
dataset = load_data(data)
dataset

{'rpeaks': array([    240,    1174,    2128, ..., 6447292, 6447748, 6448224],
       dtype=int32),
 'signal': {'chest': {'ACC': array([[ 0.85640001, -0.06779999, -0.36559999],
          [ 0.8556    , -0.06900001, -0.3646    ],
          [ 0.852     , -0.07020003, -0.3642    ],
          ...,
          [ 0.88759995, -0.1038    , -0.27920002],
          [ 0.88619995, -0.10159999, -0.27819997],
          [ 0.88680005, -0.10140002, -0.27380002]]),
   'ECG': array([[0.01560974],
          [0.01812744],
          [0.01753235],
          ...,
          [0.21368408],
          [0.21556091],
          [0.21702576]]),
   'EMG': array([[-1.5],
          [-1.5],
          [-1.5],
          ...,
          [-1.5],
          [-1.5],
          [-1.5]]),
   'EDA': array([[0.],
          [0.],
          [0.],
          ...,
          [0.],
          [0.],
          [0.]]),
   'Temp': array([[-273.15],
          [-273.15],
          [-273.15],
          ...,
          [-273.15],
          [-273.15],
    

In [4]:
# copy dataset so I can make proper adjustments
dataset_copy = dataset.copy()

keys = []
# get the keys
for key in dataset_copy.keys():
    # print("key : ", key)
    keys.append(key)
# Create dataframe only with data we care about
keys

['rpeaks', 'signal', 'label', 'activity', 'questionnaire', 'subject']

In [5]:
# set each key
rpeaks = dataset_copy['rpeaks']
signal = dataset_copy['signal']
label = dataset_copy['label']
activity = dataset_copy['activity']
questionnaire = dataset_copy['questionnaire']
subject = dataset_copy['subject']

## End of Load Data

---

## Sensors

### Chest (RespiBAN)
- ECG
- ACC
- RESP

### Wrist (Empatica E4)
- BVP
- ACC

In [6]:
# maybe change func name to format_data_to_df
def format_data_to_df(sensor, cols):
    data = pd.DataFrame(sensor, columns=cols)
    return data

In [7]:
def about_data(data):
    check_nan = data.isnull().values.sum()
    print("check_nan : ", check_nan)
    
    min_measurement = data.min()
    print("min_measurement : ", min_measurement)
    
    max_measurement = data.max()
    print("max_measurement : ", max_measurement)

### 1. Chest (RespiBAN)

In [9]:
chest = signal['chest']
chest

{'ACC': array([[ 0.85640001, -0.06779999, -0.36559999],
        [ 0.8556    , -0.06900001, -0.3646    ],
        [ 0.852     , -0.07020003, -0.3642    ],
        ...,
        [ 0.88759995, -0.1038    , -0.27920002],
        [ 0.88619995, -0.10159999, -0.27819997],
        [ 0.88680005, -0.10140002, -0.27380002]]),
 'ECG': array([[0.01560974],
        [0.01812744],
        [0.01753235],
        ...,
        [0.21368408],
        [0.21556091],
        [0.21702576]]),
 'EMG': array([[-1.5],
        [-1.5],
        [-1.5],
        ...,
        [-1.5],
        [-1.5],
        [-1.5]]),
 'EDA': array([[0.],
        [0.],
        [0.],
        ...,
        [0.],
        [0.],
        [0.]]),
 'Temp': array([[-273.15],
        [-273.15],
        [-273.15],
        ...,
        [-273.15],
        [-273.15],
        [-273.15]], dtype=float32),
 'Resp': array([[ 4.4418335 ],
        [ 4.45098877],
        [ 4.47387695],
        ...,
        [-3.05023193],
        [-3.05328369],
        [-3.056335

#### 1.1. Electrocardiogram (ECG)

- Provides heart rate ground truth
   - Ground truth heart rate is then defined as the mean instantaneous heart rate (IHR) within each 8-second window.
- RespiBAN Professional ECG-signal was acquired via a standard three-point
- Artefacts on signal so they manually inspected and corrected the R-peak measurements
    - Instantaneous heart rate is computed with the R-peak measurement 
- Segmented with a shifted window approach (window length: 8 seconds, window shift: 2 seconds)

---
#### Questions/Further Explore
   1. Meaning of labels in "The labels were extracted from" [this sensor]?
   2. RespiBAN sensor
   3. R-peak with respect to why it's need
   4. ECG with respect to the mathematical notation of the R-peak and other measurements
   5. Why 8/2 with respect to window lenght and shift? Read about in reference list
       - Ex : 1 - 8 secs is the 1st avg (IHR). Will the next avg (IRH) be within timestamps 6 - 14 secs? 8 to 16 secs? 6 - 16 secs?

In [14]:
ecg = chest['ECG']
ecg
columns = ['ECG Measurements']
ecg_measurements = format_data_to_df(ecg, columns)
ecg_measurements

Unnamed: 0,ECG Measurements
0,0.015610
1,0.018127
2,0.017532
3,0.013412
4,0.011948
...,...
6448395,0.212082
6448396,0.210159
6448397,0.213684
6448398,0.215561


In [11]:
ecg_characteristics = about_data(ecg_measurements)
# ecg_characteristics

check_nan :  0
min_measurement :  ECG Measurements   -1.499908
dtype: float64
max_measurement :  ECG Measurements    1.499954
dtype: float64


#### 1.2. 3-Axis Accelerometer (ACC)

- 3D-accelerometer embedded in the same device as the PPG-sensor, used to compensate motion artefacts
- Used in the RespiBAN Professional device
- Three-axis acceleration was acquired via a 3D-accelerometer, which is integrated into the RespiBAN wearable device 
- The 3 data columns refer to the 3 accelerometer channels. Data is provided in units of 1/64g.
- Used in Empatica E4 data as well

---
#### Questions/Further Explore
1. Why 3D?
2. What are some differences between each axis/channel?
3. Why accelerometer is confiugred to measure acceleration in range [-2g, 2g]?

In [17]:
acc_c = chest['ACC']
acc_c
columns = ['x', 'y', 'z']
acc_c_measurements = format_data_to_df(acc_c, columns)
acc_c_measurements

Unnamed: 0,x,y,z
0,0.8564,-0.0678,-0.3656
1,0.8556,-0.0690,-0.3646
2,0.8520,-0.0702,-0.3642
3,0.8526,-0.0690,-0.3640
4,0.8526,-0.0698,-0.3654
...,...,...,...
6448395,0.8862,-0.1022,-0.2760
6448396,0.8866,-0.1036,-0.2766
6448397,0.8876,-0.1038,-0.2792
6448398,0.8862,-0.1016,-0.2782


In [20]:
acc_c_characteristics = about_data(acc_c_measurements)

check_nan :  0
min_measurement :  x   -0.6956
y   -1.4830
z   -4.2490
dtype: float64
max_measurement :  x    3.9512
y    1.2442
z    2.9578
dtype: float64


#### 1.3. Respiration (RESP)

- SX_RespiBAN.h5: contains data from the RespiBAN device. Data is organised in a dictionary, corresponding to the sensor modalities. 
- Signal was acquired with an inductive respiration sensor, which is embedded into the RespiBAN chest strap.
- Three-axis acceleration was acquired via a 3D-accelerometer, which is integrated into the RespiBAN wearable device.
- Raw data is contained in SX_RespiBAN.h5. 
- ‘chest’: RespiBAN data (all the modalities: ACC, ECG, EDA, EMG, RESP, TEMP). As mentioned above, the modalities ‘EDA’, ‘EMG’ and ‘Temp’ only include dummy data and should thus be ignored.

---
#### Questions/Further Explore
1. Does manually synchronizing the data affect it at all?
2. [ ] Clearly distinguish the differences between S1_RespiBAN.h5 and S1.pkl $\rightarrow$ signal $\rightarrow$ chest $\rightarrow$ Resp

In [22]:
resp = chest['Resp']
resp
columns = ['Respiration Measurements']
resp_characteristics = format_data_to_df(resp, columns)
resp_characteristics

Unnamed: 0,Resp Measurements
0,4.441833
1,4.450989
2,4.473877
3,4.478455
4,4.510498
...,...
6448395,-3.062439
6448396,-3.051758
6448397,-3.050232
6448398,-3.053284


In [None]:
about_data(resp_characteristics)

### 2. Wrist (Empatica E4)

In [27]:
wrist = signal['wrist']
wrist

{'ACC': array([[-0.765625, -0.078125,  0.671875],
        [-0.765625, -0.078125,  0.65625 ],
        [-0.765625, -0.078125,  0.671875],
        ...,
        [-0.375   , -0.015625,  0.9375  ],
        [-0.390625,  0.      ,  0.9375  ],
        [-0.375   ,  0.      ,  0.9375  ]]),
 'BVP': array([[  7.28],
        [  6.33],
        [  5.46],
        ...,
        [105.02],
        [109.44],
        [111.06]]),
 'EDA': array([[4.722437],
        [4.728843],
        [4.718594],
        ...,
        [3.170867],
        [3.159336],
        [3.151649]]),
 'TEMP': array([[32.13],
        [32.16],
        [32.16],
        ...,
        [34.37],
        [34.37],
        [34.37]])}

#### 2.1. Blood Volume Pulse (BVP)
- BVP.csv
- Data from photoplethysmograph (PPG)

In [28]:
bvp = wrist['BVP']
bvp
columns = ['BVP Measurements']
bvp_measurements = format_data_to_df(bvp, columns)
bvp_measurements

Unnamed: 0,BVP Measurements
0,7.28
1,6.33
2,5.46
3,4.60
4,3.74
...,...
589563,85.88
589564,97.30
589565,105.02
589566,109.44


In [29]:
bvp_characteristics = about_data(bvp_measurements)

check_nan :  0
min_measurement :  BVP Measurements   -1647.39
dtype: float64
max_measurement :  BVP Measurements    1557.58
dtype: float64


#### 2.2. ACC

- 3D-accelerometer embedded in the same device as the PPG-sensor, used to compensate motion artefacts
- Used in Empatica E4 data
- The 3 data columns refer to the 3 accelerometer channels. Data is provided in units of 1/64g.

---
#### Questions/Further Explore
1. Why 3D?
2. What are some differences between each axis/channel?
3. Why accelerometer is confiugred to measure acceleration in range [-2g, 2g]?

In [31]:
acc_w = wrist['ACC']
acc_w
columns = ['x', 'y', 'z']
acc_w_measurements = format_data_to_df(acc_w, columns)
acc_w_measurements

Unnamed: 0,x,y,z
0,-0.765625,-0.078125,0.671875
1,-0.765625,-0.078125,0.656250
2,-0.765625,-0.078125,0.671875
3,-0.765625,-0.078125,0.671875
4,-0.750000,-0.078125,0.671875
...,...,...,...
294779,-0.375000,0.000000,0.937500
294780,-0.375000,0.000000,0.937500
294781,-0.375000,-0.015625,0.937500
294782,-0.390625,0.000000,0.937500


In [32]:
acc_w_characteristics = about_data(acc_w_measurements)

check_nan :  0
min_measurement :  x   -2.0
y   -2.0
z   -2.0
dtype: float64
max_measurement :  x    1.984375
y    1.984375
z    1.984375
dtype: float64
