# Using DSP to improve the accuracy of an EEG classifier

**Authors**
1. Kasra Lekan (kl5sq)
2. Derek Johnson (dej3tc)
3. Fiji Marcelin (fm4cg)

## Experimental Setup
**Signal Data**: We applied the a peak finding algorithm along with a bandpass and butterworth filter to sleep stage EEG data. This data is taken from a study looking at “slow-wave microconfinuity” during sleep [1]. 

**MNE Package**: The MNE package ...

In [16]:
from mne_data_setup import *

## Method 1: Peak Finder
### Theory
I implemented three peak finder algorithms (`naive_logical_find_peaks`, `naive_mathematical_find_peaks`, `peak_typing_finder`). Note that the first two naive implementations were designed by me while the other two follow from algorithms written by others. Also note that all of these algorithms can easily be converted to also identify valleys by multiplying the signal data by -1 before running the algorithm.

Each algorithm was compared to the baseline of the MNE implementation of peaking finding on a sin wave signal and the eeg data described above. 

1. `naive_logical_find_peaks` – Compares if values are greater than their neighbors in the signal. The number of neighbors in the comparison is controlled by a parameter.
2. `naive_mathematical_find_peaks` – Performs peak detection on three steps: 1. root mean square 2. peak to average ratios 3. first order logic. Thus, the method assumes that the underlying data follows a particular distribution, i.e. peaks will occur when the squared value of signal value divided by the root mean square (RMS) is larger than its neighbor values. By using a threshold, the algorithm attempts to handle any noise present in the dataset.
3. `peak_typing_finder` – This algorithm is noteable for its time efficiency and that it handles various kinds of peaks based on edge (e.g. None, 'rising', 'falling', 'both'). Otherwise, the algorithm is quite similar to naive_logical_find_peaks. 

An examination of the MNE algorithm's code shows that it follows the following algorithm. Note that some steps have been ommitted which handle edge cases.

0. Initialize a minimum threshold between peaks
    - Note that the thresholds I implemented were not between peaks. Rahter, they were minimum values for peaks or how much greater a peak had to be than its neighboring values.
1. Use `np.diff` to calculate the change between each point in the signal (derivative).
2. Find the indices where the derivative changes sign (peaks and valleys).
3. Create a temporary array with only the values at these indices along with the start and end points of the signal.
4. Find the minimum value in the array (`min_mag`).
5. Check if start and end points should be candidates for peaks.
6. Loop through peak candidates (peak, valley list)
    0. Initialize the minimum magnitude for peak (`temp_mag`) as `min_mag`.
    1. If the last value was a true peak, reset `temp_mag` to `min_mag`.
    2. If candidate value greater than its peer by the threshold ammount. Set a new temp_mag. Add to true peak list.
7. Return chosen candidates.

The step 6 is why it can handle noisy data better than any of my algorithms. If step 6 was removed, MNE's algorithm would be equivalent to `peak_typing_finder`.

### Implementation & Performance

#### MNE Baselines

In [17]:
from peak_finder import PeakFinder as pf

t = np.arange(0, 3, 0.01)
signal_sin = np.sin(np.pi*t) - np.sin(0.5*np.pi*t)
mne_sin_peak_locs, mne_sin_peak_mags = mne.preprocessing.peak_finder(signal_sin) 

signal_eeg = raw_train.get_data()[0]
format_percent = lambda x, y: np.round(len(x)/len(y), 4)
mne_eeg_peak_locs, mne_eeg_peak_mags = mne.preprocessing.peak_finder(raw_train.get_data()[0])

def success_metrics(results, signal='eeg', string=""):
    if signal == 'eeg':
        signal = signal_eeg
        mne_peak_locs = mne_eeg_peak_locs
    elif signal == 'sin':
        signal = signal_sin
        mne_peak_locs = mne_sin_peak_locs

    common_peaks = np.intersect1d(results, mne_peak_locs)
    common_peaks_len = len(common_peaks)

    true_positive_rate = format_percent(common_peaks, mne_peak_locs)

    results_len = len(results)
    peak_to_signal_ratio = format_percent(results, signal)

    actual_to_predicted_peak_count_ratio = format_percent(results, mne_peak_locs)

    print(string + f"Peaks: {results_len}, peak/signal: {peak_to_signal_ratio}, (actual)/(predicted) peaks : {actual_to_predicted_peak_count_ratio}, Intersect Num: {common_peaks_len} ({true_positive_rate})")
    return None

Found 2 significant peaks
Found 29454 significant peaks


**MNE Commentary**: The MNE algorithm performs well on noisy data while keeping time complexity low.

#### scipy.signal.find_peaks

In [24]:
from scipy.signal import find_peaks

success_metrics(mne_eeg_peak_locs, signal='eeg', string="MNE: ")
scipy_peaks_eeg, _ = find_peaks(signal_eeg)
success_metrics(scipy_peaks_eeg, signal='eeg')

print('\n')

success_metrics(mne_sin_peak_locs, signal='sin', string="MNE: ")
scipy_peaks_sin, _ = find_peaks(signal_sin)
success_metrics(scipy_peaks_sin, signal='sin')

MNE: Peaks: 29454, peak/signal: 0.0037, (actual)/(predicted) peaks : 1.0, Intersect Num: 29454 (1.0)
Peaks: 2069676, peak/signal: 0.2603, (actual)/(predicted) peaks : 70.2681, Intersect Num: 29451 (0.9999)


MNE: Peaks: 2, peak/signal: 0.0067, (actual)/(predicted) peaks : 1.0, Intersect Num: 2 (1.0)
Peaks: 2, peak/signal: 0.0067, (actual)/(predicted) peaks : 1.0, Intersect Num: 2 (1.0)


**Results Commentary**: In terms of time, scipy.signal.find_peaks clearly beats all other algorithms tested. Additionally, it performed poorly at distinguishing peaks in the noisy EEG dataset. The documentation from Scipy suggest to use a smoothing function before finding peaks in order to avoid this problem.

#### naive_logical_find_peaks

In [18]:
peaks_eeg = {}
success_metrics(mne_eeg_peak_locs, signal='eeg', string="MNE: ")
distances = [15, 35, 50, 100, 155]
for distance in distances:
    peaks_eeg[distance] = pf.naive_logical_find_peaks(signal_eeg, min_distance=distance)
    success_metrics(peaks_eeg[distance], signal='eeg', string=f"Distance: {distance}, ")

print('\n')

peaks_sin = {}
success_metrics(mne_sin_peak_locs, signal='sin', string="MNE: ")
for distance in distances:
    peaks_sin[distance] = pf.naive_logical_find_peaks(signal_sin, min_distance=distance)
    success_metrics(peaks_sin[distance], signal='sin', string=f"Distance: {distance}, ")

MNE: Peaks: 29454, peak/signal: 0.0037, (actual)/(predicted) peaks : 1.0, Intersect Num: 29454 (1.0)
Distance: 15, Peaks: 188047, peak/signal: 0.0237, (actual)/(predicted) peaks : 6.3844, Intersect Num: 28906 (0.9814)
Distance: 35, Peaks: 87758, peak/signal: 0.011, (actual)/(predicted) peaks : 2.9795, Intersect Num: 27407 (0.9305)
Distance: 50, Peaks: 63595, peak/signal: 0.008, (actual)/(predicted) peaks : 2.1591, Intersect Num: 25730 (0.8736)
Distance: 100, Peaks: 34871, peak/signal: 0.0044, (actual)/(predicted) peaks : 1.1839, Intersect Num: 21150 (0.7181)
Distance: 155, Peaks: 26413, peak/signal: 0.0033, (actual)/(predicted) peaks : 0.8968, Intersect Num: 18193 (0.6177)


MNE: Peaks: 2, peak/signal: 0.0067, (actual)/(predicted) peaks : 1.0, Intersect Num: 2 (1.0)
Distance: 15, Peaks: 2, peak/signal: 0.0067, (actual)/(predicted) peaks : 1.0, Intersect Num: 2 (1.0)
Distance: 35, Peaks: 2, peak/signal: 0.0067, (actual)/(predicted) peaks : 1.0, Intersect Num: 2 (1.0)
Distance: 50, Peaks

**Results Commentary**: In terms of time, naive_logical_find_peaks did not perform as well as MNE. Additionally, it performed poorly at distinguishing peaks in the noisy EEG dataset. 

#### naive_mathematical_find_peaks

In [19]:
success_metrics(mne_eeg_peak_locs, signal='eeg', string="MNE: ")
ind_eeg_naive_mathematical_find_peaks = pf.naive_mathematical_find_peaks(signal_eeg)
success_metrics(ind_eeg_naive_mathematical_find_peaks, signal='eeg')

print('\n')

success_metrics(mne_sin_peak_locs, signal='sin', string="MNE: ")
ind_sin_naive_mathematical_find_peaks = pf.naive_mathematical_find_peaks(signal_sin)
success_metrics(ind_sin_naive_mathematical_find_peaks, signal='sin')

MNE: Peaks: 29454, peak/signal: 0.0037, (actual)/(predicted) peaks : 1.0, Intersect Num: 29454 (1.0)
Peaks: 2128087, peak/signal: 0.2677, (actual)/(predicted) peaks : 72.2512, Intersect Num: 29070 (0.987)


MNE: Peaks: 2, peak/signal: 0.0067, (actual)/(predicted) peaks : 1.0, Intersect Num: 2 (1.0)
Peaks: 2, peak/signal: 0.0067, (actual)/(predicted) peaks : 1.0, Intersect Num: 1 (0.5)


**Results Commentary**: In terms of time, naive_mathematical_find_peaks performed similarly to MNE. However, it performed poorly at distinguishing peaks in the noisy EEG dataset with 72.25 times as many peaks identified compared to MNE. 

#### peak_typing_finder

In [20]:
minimum_height = 4e-5
edges = ['rising', 'falling', 'both', None]
success_metrics(mne_eeg_peak_locs, signal='eeg', string="MNE: ")
for edge in edges:
    ind_eeg_peak_typing_finder = pf.peak_typing_finder(signal_eeg, minimum_height=minimum_height, minimum_distance=1, edge=edge)
    success_metrics(ind_eeg_peak_typing_finder, signal='eeg', string=f"Edge: {edge}, ")

print('\n')

success_metrics(mne_sin_peak_locs, signal='sin', string="MNE: ")
for edge in edges:
    ind_sin_peak_typing_finder = pf.peak_typing_finder(signal_sin, minimum_height=minimum_height, minimum_distance=1, edge=edge)
    success_metrics(ind_sin_peak_typing_finder, signal='sin', string=f"Edge: {edge}, ")

MNE: Peaks: 29454, peak/signal: 0.0037, (actual)/(predicted) peaks : 1.0, Intersect Num: 29454 (1.0)
Edge: rising, Peaks: 161038, peak/signal: 0.0203, (actual)/(predicted) peaks : 5.4674, Intersect Num: 25484 (0.8652)
Edge: falling, Peaks: 161118, peak/signal: 0.0203, (actual)/(predicted) peaks : 5.4702, Intersect Num: 25323 (0.8597)
Edge: both, Peaks: 162401, peak/signal: 0.0204, (actual)/(predicted) peaks : 5.5137, Intersect Num: 25484 (0.8652)
Edge: None, Peaks: 159755, peak/signal: 0.0201, (actual)/(predicted) peaks : 5.4239, Intersect Num: 25323 (0.8597)


MNE: Peaks: 2, peak/signal: 0.0067, (actual)/(predicted) peaks : 1.0, Intersect Num: 2 (1.0)
Edge: rising, Peaks: 2, peak/signal: 0.0067, (actual)/(predicted) peaks : 1.0, Intersect Num: 2 (1.0)
Edge: falling, Peaks: 2, peak/signal: 0.0067, (actual)/(predicted) peaks : 1.0, Intersect Num: 2 (1.0)
Edge: both, Peaks: 2, peak/signal: 0.0067, (actual)/(predicted) peaks : 1.0, Intersect Num: 2 (1.0)
Edge: None, Peaks: 2, peak/signal:

**Results Commentary**: In terms of time, peak_typing_finder peformed as well or better than MNE. However, it performed moderately well at distinguishing peaks in the noisy EEG dataset with 5.47 times as many peaks identified compared to MNE while covering 86.5% of the peaks. The edge type did not significantly affect performance on this data, likely due to the noise in the EEG dataset.

@Derek and @Fiji

## Classification Experiment Results

## Challenges
...

## Work Breakdown

Kasra Lekan: 
- Coding experimental setup (including background MNE research how to modify underlying data)
- Peak Detection Algorithm

Derek Johnson: 
- Butterworth Filter

Fiji Marcelin: 
- Band Pass Filter

All:
- Combining filters and testing classification accuracy

## References
1. B. Kemp, A. H. Zwinderman, B. Tuk, H. A. C. Kamphuisen, and J. J. L. Oberyé. Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG. IEEE Transactions on Biomedical Engineering, 47(9):1185–1194, 2000. doi:10.1109/10.867928.
    Dataset for analysis
2. https://mne.tools/stable/index.html
    Implementation of foundational EEG signals pipleline
3. https://neuraldatascience.io/intro.html
    E-book that covers analysis of EEG data in the frequency domain.
4. https://en.wikipedia.org/wiki/Band-pass_filter
5. https://en.wikipedia.org/wiki/Butterworth_filter
6. https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html
7. http://www.scholarpedia.org/article/Electroencephalogram
    An overview of electroencephologram (EEG) collection. 
