# Algorithm Outline/Plan
*After developing the full algorithm and testing with the datasets, go through and create functions, etc. in order to make more efficient and pythonic

## Data Preprocessing
### 1. Noise Removal
Fast Fourier Transform to convert from time domain into the frequency domain. <br/>
Let x[n], 0<=n<= N-1, represent the PPG, and the FTT of x[n] is denoted as X[k], 0<=k<=N-1. The frquency components that are lower than 0 Hz or higher than 8 Hz are removed (to reduce noise and baseline wandering) by the following:  Xr[k] = X[k] where k>= 8; 0 otherwise. <br/>
<br/>Inverse FFT (IFFT) to restore the PPG data to the time domain

### 2. Normalization - 1st/2nd Derivative Calculations
Every segment is normalized (between [0 1] using x'=(x-Xm)/(XM-Xm) where x is the data point, Xm is the min, and XM is the max.<br/>
The 1st (dPPG) and 2nd (sdPPG) derivatives are then calculated.

### 3. Feature Point Detection
https://github.com/paulvangentcom/heartrate_analysis_python
<br/>The following points should be labeled/extracted from the PPG/dPPG/sdPPG:<br/>
    a. systolic peaks of PPG<br/>
    b. onset and offset valley points of PPG by finding the min between 2 consecutive systolic peaks<br/>
    c. locations with maximal and minimal slope values of PPG and dPPG by computing the gradients of the valley points<br/>
    d. dicrotic notch points from secondary peaks of sdPPG contour
    
### 4. Data Partioning
Each PPG data segment and its corresponding dPPG and sdPPG waves are partitioned into fragements by reserving each PPG data segment from one valley point of the PPG to the next consecutive valley point. <br/>
At this point, abnormal heart cycles may also be removed

## Feature Extraction - Variables & Equations
QRS_time:  Time for a full QRS wave (ECG)
<br/> RP_time:   Time from P peak to R peak (ECG)
<br/> RT_time:   Time from R peak to T peak (ECG)
<br/> PQ_time:   Time from P peak to Q peak (ECG)
<br/> ST_time:   Time from S peak to T peak (ECG)
<br/> PT_time:   Time from P peak to T peak (ECG)
<br/> P_amp:     P peak amplitude
<br/> R_amp:     R peak amplitude
<br/> T_amp:     T peak amplitude
<br/> RT_ratio:  T_amp/R_amp
<br/> RP_diff:   R_amp - P_amp
<br/> PTT_p:     Time from R peak of ECG to systolic peak of PPG
<br/> PTT_d:     Time from R peak of ECG to max slope point of PPG (dPPG)
<br/> PTT_f:     Time from R peak of ECG to foot of PPG signal
<br/> HR:        Heart Rate (Peak-to-Peak time --> BPM)
<br/> AS:        Ascending slope of PPG (slope from onset point to max peak)
<br/> DS:        Descending slope of PPG (slope from max peak to offset point)
<br/> S1:        Area under curve between onset and max slope point
<br/> S2:        Area under curve between max slope point and max peak
<br/> S3:        Area under curve between max peak and dicrotic notch
<br/> S4:        Area under curve between dicrotic notch and offset point
<br/> AA:        Ascending area of PPG
<br/> DA:        Descending area of PPG
<br/> dAA:       Ascending area of dPPG
<br/> dDA:       Descending area of dPPG
<br/> sdAA:      Ascending area of sdPPG
<br/> sdDA:      Descending area of sdPPG
<br/> PI:        Peak intensity of PPG
<br/> dPI:       Peak intensity of dPPG
<br/> sdPI:      Peak intensity of sdPPG
<br/> dVI:       Valley intensity of dPPG
<br/> sdVI:      Valley intensity of sdPPG
<br/> AID:       Intensity diff between max peak and onset point (PPG)
<br/> dAID:      Intensity diff between max peak and onset point (dPPG)
<br/> sdAID:     Intensity diff between max peak and onset point (sdPPG)
<br/> dDID:      Intensity diff between offset point and max peak (dPPG)
<br/> sdDID:     Intensity diff between offset point and max peak (sdPPG)
<br/> dRIPV:     Ratio of max peak to valley intensity (dPPG)
<br/> sdRIPV:    Ratio of max peak to valley intensity (sdPPG)
<br/> AT:        Ascending time interval of PPG
<br/> Slope_a:   Slope from max peak to dicrotic notch of PPG
<br/> NI:        Dicrotic notch intensity
<br/> AI:        Augmentation index = NI/PI
<br/> AI1:       Augmentation index 1 = (PI-NI)/PI
<br/> RSD:       Ratio of systolic to diastolic duration
<br/> RSC:       Ratio of diastolic duration to cardiac cycle
<br/> RDC:       Ratio of systolic duaration to cardiac cycle
<br/> Other features to include: Gender, Weight, Age, Activity, etc. 

The selected features for each segment would be stored in a pandas dataframe, ready to be run through the algorithm. 

## Machine Learning Algorithm(s)
Techniques:<br/>
Random Forest <br/>
Decision Tree <br/>
Multivariate Regression <br/>
Support Vector Machine <br/>
Neural Network <br/>


## Calibration Algorithm
TBD - currently researching existing calibration equations in literature to determine if any viable ones exist 

## Performance Metrics
Pearson correlation: 95% CI + prediction ellipses<br/>
Bland-Altman plot with error distribution



