## This tutorial will explain the HRV feature extraction pipeline from ECG
In this notebook, we present an exmaple pipeline using existing python packages to processing ECG and extracting HRV features. The example ECG data used in this repository is based a 5 miniutes recording. 


This tutorial is organised in the following way:
1. [ECG Loading Data](#ecg_data_loading)
2. [ECG Raw Data Visualisation](#ecg_visualisation)
3. [Cleaning ECG and Detecting R-points](#ecg_cleaning_rpoints_detection)
3. [RR Interval Visualisation](#rri_visualisation)
3. [RR Interval Data Cleaning](#rri_cleaning)
4. [HRV Feature Extraction](#hrv_feature_extraction)
    1. [Time Domain Feature Extraction](#time_domain_feature)
    2. [Frequency Domain Feature Extraction](#frequency_domain_feature)
    3. [Poincaré Feature](#poincaré_feature)
    4. [Non-linear Domain Feature](#non_linear_domain_feature)
    5. [Geometrical Time Domain Feature](#geometrical_time_domain_feature)
    

At the start of the juputer notebook, we will import the relavent python packages. To install Neurokit2 and hrv-analysis package please do 

`pip install hrv-analysis neurokit2`

In [160]:
%matplotlib inline
%matplotlib notebook
import pandas as pd
from matplotlib import style
import matplotlib.pyplot as plt
import hrvanalysis as hrvana # RR interval processing package
import numpy as np
import neurokit2 as nk  # This package can process ECG

<a id="ecg_data_loading"></a>
## Loading ECG Data
Let's load the data to pandas DataFrame.

In [186]:
bio_df = pd.read_csv(r"../example_data/bio_100Hz.txt")

In [187]:
bio_df.head()

Unnamed: 0,ECG,EDA,Photosensor,RSP
0,-0.015869,13.196868,5.0,0.778931
1,-0.011703,13.197173,5.0,0.777588
2,-0.009766,13.19702,5.0,0.777435
3,-0.013321,13.197631,5.0,0.777557
4,-0.009583,13.196715,5.0,0.775299


<a id="ecg_visualisation"></a>
## ECG Raw Data Visualisation

In [188]:
plt.rcParams['figure.figsize'] = [15, 9]# Default processing pipeline
signals, info = nk.ecg_process(bio_df["ECG"].values, sampling_rate=100)

# Visualize
plot = nk.ecg_plot(signals)

<IPython.core.display.Javascript object>

<a id="ecg_cleaning_rpoints_detection"></a>
## Cleaning ECG and Detecting R-points 

In this case, we use **pantompkins1985** proposed method to cleaning ECG then detecting R-points

In [189]:
cleaned = nk.ecg_clean(bio_df['ECG'], sampling_rate=100, method="pantompkins1985")
pantompkins1985 = nk.ecg_findpeaks(cleaned, method="pantompkins1985")

The **ECG_R_Peaks** column contains the time difference in seconds from each R peak to the **start** of the recording. we will calculate the RR intervals based on the difference between each R points.

In [190]:
hrv_df = pd.DataFrame(pantompkins1985)

In [191]:
hrv_df["RR Intervals"] = hrv_df["ECG_R_Peaks"].diff()
hrv_df.loc[0, "RR Intervals"]=hrv_df.loc[0]['ECG_R_Peaks']

In [192]:
hrv_df.head()

Unnamed: 0,ECG_R_Peaks,RR Intervals
0,306,306.0
1,614,308.0
2,926,312.0
3,1236,310.0
4,1546,310.0


<a id="rri_visualisation"></a>
## RR Interval Visualisation 

Let's have a look the raw RR intervals

In [193]:
hrvana.plot_timeseries(hrv_df["RR Intervals"].values.tolist())

<IPython.core.display.Javascript object>

<a id="rri_cleaning"></a>
## RR Interval Data Cleaning
To cleaning HRV data, we do the following steps:
1. Removing outliters, we accept the valid RR interval between 300ms to 2000ms.
2. Interpolating removed Nan with forward linear interpolation (values calculated using future RR intervals. We ignored the index and treat the values as equally spaced.  more details please see: [Pandas Interpolation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html) [HRV-Analysis Interpolation](https://github.com/Aura-healthcare/hrvanalysis/blob/master/hrvanalysis/preprocessing.py))
3. Using method proposed by Malik M et al. [Heart Rate Variability](https://doi.org/10.1111/j.1542-474X.1996.tb00275.x) to remove ectopic beats
4. Interpolating removed Nan with forward linear interpolation 


In [194]:
clean_rri = hrv_df['RR Intervals'].values
clean_rri = hrvana.remove_outliers(rr_intervals=clean_rri, low_rri=300, high_rri=2000)
clean_rri = hrvana.interpolate_nan_values(rr_intervals=clean_rri, interpolation_method="linear")
clean_rri = hrvana.remove_ectopic_beats(rr_intervals=clean_rri, method="malik")
clean_rri = hrvana.interpolate_nan_values(rr_intervals=clean_rri, interpolation_method="linear") # default is linear

0 outlier(s) have been deleted.
13 ectopic beat(s) have been deleted with malik rule.


Now let's visualise the processed NN intervals

In [198]:
hrvana.plot_timeseries(clean_rri)
plt.title("Cleaned NN intervals")

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'Cleaned NN intervals')

In [199]:
hrv_df["RR Intervals"] = clean_rri 
hrv_df["RR Intervals"].isna().any()

False

<a id="hrv_feature_extraction"></a>
## HRV Feature Extraction
The following code will show you how to extract HRV features from 5 minutes NN intervals

In [200]:
rr_epoch = hrv_df['RR Intervals'].values

In [201]:
rr_epoch

array([306. , 308. , 312. , 310. , 310. , 324. , 301. , 305.5, 310. ,
       302. , 340. , 378. , 384. , 346. , 349. , 303. , 353.5, 404. ,
       384. , 364. , 326. , 384. , 368. , 352. , 301. , 309. , 325. ,
       316. , 303. , 318. , 307. , 310. , 326. , 315. , 306. , 347. ,
       332. , 317. , 312. , 332. , 326. , 301. , 339.5, 378. , 314. ,
       310. , 372. , 395. , 318. , 328. , 360. , 392. , 347. , 302. ,
       340. , 378. , 384. , 346. , 349. , 303. , 353.5, 404. , 384. ,
       364. , 326. , 384. , 368. , 352. , 301. , 309. , 325. , 316. ,
       303. , 318. , 307. , 310. , 326. , 315. , 306. , 347. , 332. ,
       317. , 312. , 332. , 326. , 301. , 339.5, 378. ])

<a id="time_domain_feature"></a>
### Extract Time Domain Features
- **mean_nni**: The mean of NN-intervals.
- **sdnn** : The standard deviation of the time interval between successive normal heart beats (i.e. the NN-intervals).
- **sdsd**: The standard deviation of differences between adjacent NN-intervals
- **rmssd**: The square root of the mean of the sum of the squares of differences between adjacent NN-intervals. Reflects high frequency (fast or parasympathetic) influences on HRV (*i.e.*, those influencing larger changes from one beat to the next).
- **median_nni**: Median Absolute values of the successive differences between the NN-intervals.
- **nni_50**: Number of interval differences of successive NN-intervals greater than 50 ms.
- **pnni_50**: The proportion derived by dividing nni_50 (The number of interval differences of successive NN-intervals greater than 50 ms) by the total number of NN-intervals. (%)
- **nni_20**: Number of interval differences of successive NN-intervals greater than 20 ms.
- **pnni_20**: The proportion derived by dividing nni_20 (The number of interval differences of successive NN-intervals greater than 20 ms) by the total number of NN-intervals. (%)
- **range_nni**: Difference between the maximum and minimum NN_interval.
- **cvsd**: Coefficient of variation of successive differences equal to the rmssd divided by mean_nni.
- **cvnni**: Coefficient of variation equal to the ratio of sdnn divided by mean_nni.
- **mean_hr**: The mean Heart Rate.
- **max_hr**: Max heart rate.
- **min_hr**: Min heart rate.
- **std_hr**: Standard deviation of heart rate.

Note: we measure NN Intervals in ms

In [202]:
hrvana.get_time_domain_features(rr_epoch)

{'mean_nni': 334.52840909090907,
 'sdnn': 29.405933988937598,
 'sdsd': 29.781563724085476,
 'nni_50': 11,
 'pnni_50': 12.5,
 'nni_20': 35,
 'pnni_20': 39.77272727272727,
 'rmssd': 29.793060211760878,
 'median_nni': 326.0,
 'range_nni': 103.0,
 'cvsd': 0.08905988072201225,
 'cvnni': 0.08790265098515579,
 'mean_hr': 180.66111723666245,
 'max_hr': 199.33554817275748,
 'min_hr': 148.5148514851485,
 'std_hr': 14.96269156274295}

<a id="frequency_domain_feature"></a>
### Extract Frequence Domain Features
- **total_power** : Total spectral power of all NN intervals (LF + HF + VLF)
- **vlf** : Variance ( = power ) in HRV in the Very low Frequency (.003 to .04 Hz by default). Reflect an intrinsic rhythm produced by the heart which is modulated primarily by sympathetic activity.
- **lf** : Variance ( = power ) in HRV in the low Frequency (.04 to .15 Hz). Reflects a mixture of sympathetic and parasympathetic activity, but in long-term recordings, it reflects sympathetic activity and can be reduced by the beta-adrenergic antagonist propanolol.
- **hf**: Variance ( = power ) in HRV in the High Frequency (.15 to .40 Hz by default). Reflects fast changes in beat-to-beat variability due to parasympathetic (vagal) activity. Sometimes called the respiratory band because it corresponds to HRV changes related to the respiratory cycle and can be increased by slow, deep breathing (about 6 or 7 breaths per minute) and decreased by anticholinergic drugs or vagal blockade.
- **lf_hf_ratio** : lf/hf ratio is sometimes used by some investigators as a quantitative mirror of the sympatho/vagal balance.
- **lfnu** : Normalized lf power. Units: normalized units = $LF/(total power−VLF)×100$
- **hfnu** : Normalized hf power. Units: normalized units = $HF/(total power−VLF)×100$

Note: Spectral power is measured in $msec^2$ 

In [203]:
hrvana.plot_psd(rr_epoch)
hrvana.plot_psd(rr_epoch, method="lomb")

  .format(nperseg, input_length))


<IPython.core.display.Javascript object>



<IPython.core.display.Javascript object>

In [49]:
hrvana.get_frequency_domain_features(rr_epoch)

  .format(nperseg, input_length))


{'lf': 40.90000696657587,
 'hf': 36.88707675165494,
 'lf_hf_ratio': 1.1087895970162747,
 'lfnu': 52.5794322290427,
 'hfnu': 47.42056777095731,
 'total_power': 88.36695884884966,
 'vlf': 10.579875130618854}

<a id="poincaré_feature"></a>
### Extract Features from Poincaré plot (Non-linear Domain)
Note: Known practise is to use this function on short term recordings, from 5 minutes window.

In [204]:
hrvana.plot_poincare(rr_epoch)

<IPython.core.display.Javascript object>

In [51]:
hrvana.get_poincare_plot_features(rr_epoch)

{'sd1': 19.223540302898627,
 'sd2': 48.12328372513283,
 'ratio_sd2_sd1': 2.5033517742762785}

<a id="non_linear_domain_feature"></a>
### Extract Non-linear Domain Features 
These features includes CVI (cardiovagal index), CSI (cardiosympathetic index) and Modified CSI(is an alternative measure in research of [seizure detection](https://doi.org/10.1109/embc.2014.6944639)). 


In [205]:
hrvana.get_csi_cvi_features(rr_epoch)

{'csi': 1.6896476513786713,
 'cvi': 4.0838019318995755,
 'Modified_csi': 241.87734308256628}

<a id="geometrical_time_domain_feature"></a>
### Extract Geometrical Time Domain Features 
The known practise is to use these features on recordings from 20 minutes to 24 Hours window. We discarded the triangular interpolation of NN-interval histogram (TINN)

In [206]:
hrvana.get_geometrical_features(rr_epoch)

{'triangular_index': 5.176470588235294, 'tinn': None}

The next step is to extract all HRV features and put them in a dataframe

In [207]:
feature_list = []
all_hr_features = {}
all_hr_features.update(hrvana.get_time_domain_features(rr_epoch))
all_hr_features.update(hrvana.get_frequency_domain_features(rr_epoch))
all_hr_features.update(hrvana.get_poincare_plot_features(rr_epoch))
all_hr_features.update(hrvana.get_csi_cvi_features(rr_epoch))
all_hr_features.update(hrvana.get_geometrical_features(rr_epoch))
feature_list.append(all_hr_features)
hrv_feature_df = pd.DataFrame(feature_list)

  .format(nperseg, input_length))


In [208]:
hrv_feature_df

Unnamed: 0,mean_nni,sdnn,sdsd,nni_50,pnni_50,nni_20,pnni_20,rmssd,median_nni,range_nni,...,total_power,vlf,sd1,sd2,ratio_sd2_sd1,csi,cvi,Modified_csi,triangular_index,tinn
0,334.528409,29.405934,29.781564,11,12.5,35,39.772727,29.79306,326.0,103.0,...,308.37294,32.996208,21.180826,35.788134,1.689648,1.689648,4.083802,241.877343,5.176471,


Now you have the **hrv_feature_df**