# Atrial Fibrillation Features

As with activity classification, we will featurize our time series signal and throw those features into a classifier. For AF we will build a two-class classifier using the inter-beat-interval time series. This time series can be derived by taking the difference between successive QRS complex locations as provided by the Pan-Tompkins algorithm in the previous videos. We source our features from a couple submissions to the [Computing in Cardiology Challenge 2017](https://physionet.org/content/challenge-2017/1.0.0/). 

>[Behar, Rosenberg, Yaniv, Oster. Rhythm and Quality Classification from Short ECGs Recorded Using a Mobile Device. Computing in Cardiology Challenge 2017.](http://www.cinc.org/archives/2017/pdf/165-056.pdf)  

and   
>[Bonizzi, Driessens, Karel. Detection of Atrial Fibrillation Episodes from Short Single Lead Recordings by Means of Ensemble Learning. Computing in Cardiology Challenge 2017.](http://www.cinc.org/archives/2017/pdf/169-313.pdf)

Let's see how our features are implemented

## Imports

In [None]:
import glob
import os

import numpy as np
import pandas as pd

## Load Data

In [None]:
fs = 300
data_dir = '/data/cinc/'
ref = pd.read_csv(data_dir + 'REFERENCE.csv')
ref = dict(zip(ref.record, ref.rhythm))
base = lambda f: os.path.splitext(os.path.basename(f))[0]
files = sorted(glob.glob(data_dir + '*.npz'))
qrs = []
labels = []
for f in files:
    with np.load(f) as npz:
        qrs.append(npz['qrs'])
    labels.append(ref[base(f)])

## Features

The features for the AF detection algorithm are computed from the RR interval time series. We use the time domain and frequency domain features listed below.

### Time domain
 - minimum RR interval
 - maximum RR interval
 - median RR interval
 - average RR interval
 - standard deviation of RR intervals
 - number of RR interval outliers
   - An RR interval is an outlier if it is greater than 1.2x the average RR interval in the ECG record
 - root-mean-square of the difference between adjacent RR intervals
 - percent of RR interval differences greater than 50 milliseconds

### Frequency domain
The RR interval time series is not sampled regularly in time. We only have a datapoint every heart beat. Before we can compute any frequency domain features, the time series must be resampled so that we get uniform data points. Resample the RR interval time series to 4 Hz before computing the features below.

 - peak magnitude between 0.04 Hz and 0.15 Hz in the regularized RR interval time series
 - main frequency between 0.04 Hz and 0.15 Hz in the regularized RR interval time series
 - peak magnitude between 0.15 Hz and 0.4 Hz in the regularized RR interval time series
 - main frequency between 0.15 Hz and 0.4 Hz in the regularized RR interval time series

In [None]:
def Featurize(qrs_inds, fs):
    """Featurize the qrs complex locations time series.

    Args:
        qrs_inds: (np.array of number) the sample indices of the QRS complex locations
        fs: (number) the sampling rate

    Returns:
        n-tuple of features
    """
    # Compute the RR interval time series
    rr = np.diff(qrs_inds)

    # Compute time domain features
    if len(qrs_inds) < 1:
        return [0,0,0,0,0,0,0,0,0,0,0,0]
    else:
        pass
    min_rr = np.min(rr)
    max_rr = np.max(rr)
    median_rr = np.median(rr)
    mean_rr = np.mean(rr)
    std_rr = np.std(rr)
    n_outliers = np.sum(rr > 1.2 * np.mean(rr))
    rmssd = np.sqrt(np.mean(np.square(np.diff(rr))))
    pdrr_50 = np.mean(np.diff(rr) / fs > 0.05)

    # Regularly resample the RR interval time series
    rr_ts = np.arange(qrs_inds[0] / fs, qrs_inds[-2] / fs, 1 / 4)
    rr_interp = np.interp(rr_ts, qrs_inds[:-1] / fs, rr)

    # Compute the Fourier transform of the regular RR interval time series
    freq = np.fft.rfftfreq(len(rr_interp), 1 / 4)
    fft_mag = np.abs(np.fft.rfft(rr_interp))

    # Compute frequency domain features
    lf_mag = np.max(fft_mag[(freq >= 0.04) & (freq <= 0.15)])
    lf_freq = freq[np.argmax(fft_mag[(freq >= 0.04) & (freq <= 0.15)])]
    hf_mag = np.max(fft_mag[(freq >= 0.15) & (freq <= 0.4)])
    hf_freq = freq[np.argmax(fft_mag[(freq >= 0.15) & (freq <= 0.4)])]


    return (min_rr, max_rr, median_rr, n_outliers, mean_rr, std_rr, rmssd, pdrr_50,
            lf_mag, lf_freq, hf_mag, hf_freq)

## Create Feature Matrix

In [None]:
feats = [Featurize(qrs_inds, fs) for qrs_inds in qrs]      
X, y = np.array(feats), np.array(labels)

## Build a Classifier

In [None]:
# TODO:
# Write code that builds and trains a classifier to classify our ECG records.
# Again, a random forest with 100 trees and a depth of 4 works well.
# Evaluate the performance of the classifier using cross validation.