## Introduction

In this phase, our team needs to learn how to preprocess various types of Time Series data using a mix of digital signal processing (DSP) and data science techniques.

### Time Series Data Types
- EMG Data
- EOG Data
- EEG Data

### Task
Follow this guide to learn basic data analysis

### Completion Criteria
Demonstrates the ability to manipulate any dataset and preprocess it for training ML models
- Able to parse and manipulate data files
- Able to preprocess and create visuals from raw data
- Able to visualize features and trends from preprocessed data

## Methods
The 4 steps of data processing with tools for analyzing data

Load, Visualize, Filter, Normalize

### 1. Load and Parse Data
Datasets are commonly loaded from CSV or SQL or txt files.
Use tools like Pandas, Numpy, sqlalchemy, or MNE for easily manipulating structured datasets

#### Pandas
**Best Use Case:** Loading, manipulating, and analyzing tabular data from CSV, Excel, and other formats.  
**Data Type:** EMG, EOG, EEG  
[Documentation](https://pandas.pydata.org/pandas-docs/stable/)

#### NumPy
**Best Use Case:** Handling large datasets and performing complex mathematical operations.  
**Data Type:** EMG, EOG, EEG  
[Documentation](https://numpy.org/doc/)

#### MNE
**Best Use Case:** Processing EEG, MEG, and other electrophysiological data.  
**Data Type:** EEG  
[Documentation](https://mne.tools/stable/index.html)

### 2. Visualize
The following libraries can be used to turn *Pandas Dataframes* or *Numpy Arrays* into graphs and charts from your datasets.

#### Matplotlib
**Best Use Case:** Creating static, animated, and interactive visualizations.  
**Data Type:** EMG, EOG, EEG  
[Documentation](https://matplotlib.org/stable/contents.html)

#### Seaborn
**Best Use Case:** Drawing attractive and informative statistical graphics.  
**Data Type:** EMG, EOG, EEG  
[Documentation](https://seaborn.pydata.org/)

#### Plotly
**Best Use Case:** Creating interactive plots and dashboards.  
**Data Type:** EMG, EOG, EEG  
[Documentation](https://plotly.com/python/)

### 3. Filters
The following tools are specialized libraries for preprocessing datasets of Time Series Data

#### SciPy
**Best Use Case:** Applying various filtering techniques for signal processing tasks.  
**Data Type:** EMG, EOG, EEG  
[Documentation](https://docs.scipy.org/doc/scipy/)

#### PyWavelets
**Best Use Case:** Noise reduction, signal compression, and feature extraction using wavelet transforms.  
**Data Type:** EMG, EOG, EEG  
[Documentation](https://pywavelets.readthedocs.io/en/latest/)

#### NeuroDSP
**Best Use Case:** Digital signal processing of neural time series, including EEG, ECoG, and LFP data.  
**Data Type:** EEG  
[Documentation](https://neurodsp-tools.github.io/neurodsp/)

### 4. Normalizers
The following libraries have tools and data science algorithms for extracting trends and features from preprocessed data

#### scikit-learn
**Best Use Case:** Normalizing data for machine learning models.  
**Data Type:** EMG, EOG, EEG  
[Documentation](https://scikit-learn.org/stable/documentation.html)

#### tsfresh
**Best Use Case:** Extracting and normalizing features from time-series data.  
**Data Type:** EMG, EOG, EEG  
[Documentation](https://tsfresh.readthedocs.io/en/latest/)

#### PyCaret
**Best Use Case:** Automating machine learning workflows, including data preprocessing and normalization.  
**Data Type:** EMG, EOG, EEG  
[Documentation](https://pycaret.org/)

### Installation
To install all necessary packages, run the following command:

In [None]:
pip install pandas numpy mne matplotlib seaborn plotly scipy PyWavelets neurodsp scikit-learn tsfresh pycaret

## Examples

### Example 1: EMG Data Preprocessing

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import butter, filtfilt
from sklearn.preprocessing import StandardScaler

# Load and Parse Data
emg_data = pd.read_csv('emg_data.csv')
print(emg_data.head())

# Visualize
plt.figure(figsize=(10, 4))
plt.plot(emg_data['Time'], emg_data['EMG Signal'])
plt.title('Raw EMG Signal')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()

# Filters - Apply a Butterworth filter
def butter_lowpass_filter(data, cutoff, fs, order=5):
    nyquist = 0.5 * fs
    normal_cutoff = cutoff / nyquist
    b, a = butter(order, normal_cutoff, btype='low', analog=False)
    y = filtfilt(b, a, data)
    return y

fs = 1000  # Sample rate
cutoff = 50  # Desired cutoff frequency of the filter, Hz
emg_data['Filtered Signal'] = butter_lowpass_filter(emg_data['EMG Signal'], cutoff, fs)

plt.figure(figsize=(10, 4))
plt.plot(emg_data['Time'], emg_data['Filtered Signal'])
plt.title('Filtered EMG Signal')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()

# Normalizers
scaler = StandardScaler()
emg_data['Normalized Signal'] = scaler.fit_transform(emg_data[['Filtered Signal']])

plt.figure(figsize=(10, 4))
plt.plot(emg_data['Time'], emg_data['Normalized Signal'])
plt.title('Normalized EMG Signal')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()

### Example 2: EOG Data Preprocessing

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import butter, filtfilt
from sklearn.preprocessing import MinMaxScaler

# Load and Parse Data
eog_data = pd.read_csv('eog_data.csv')
print(eog_data.head())

# Visualize
plt.figure(figsize=(10, 4))
plt.plot(eog_data['Time'], eog_data['EOG Signal'])
plt.title('Raw EOG Signal')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()

# Filters - Apply a Butterworth filter
def butter_bandpass_filter(data, lowcut, highcut, fs, order=5):
    nyquist = 0.5 * fs
    low = lowcut / nyquist
    high = highcut / nyquist
    b, a = butter(order, [low, high], btype='band')
    y = filtfilt(b, a, data)
    return y

fs = 1000  # Sample rate
lowcut = 0.5  # Lower bound of the filter
highcut = 30  # Upper bound of the filter
eog_data['Filtered Signal'] = butter_bandpass_filter(eog_data['EOG Signal'], lowcut, highcut, fs)

plt.figure(figsize=(10, 4))
plt.plot(eog_data['Time'], eog_data['Filtered Signal'])
plt.title('Filtered EOG Signal')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()

# Normalizers
scaler = MinMaxScaler()
eog_data['Normalized Signal'] = scaler.fit_transform(eog_data[['Filtered Signal']])

plt.figure(figsize=(10, 4))
plt.plot(eog_data['Time'], eog_data['Normalized Signal'])
plt.title('Normalized EOG Signal')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.show()


### Example 3: EEG Data Preprocessing

In [None]:
# Import necessary libraries
import mne
import matplotlib.pyplot as plt
from scipy.signal import butter, filtfilt
from sklearn.preprocessing import RobustScaler

# Load and Parse Data
eeg_data = mne.io.read_raw_fif('eeg_data.fif', preload=True)
eeg_data.pick_types(eeg=True)

# Visualize
eeg_data.plot(duration=5, n_channels=10, title='Raw EEG Signal')

# Filters - Apply a Butterworth filter
def butter_highpass_filter(data, cutoff, fs, order=5):
    nyquist = 0.5 * fs
    normal_cutoff = cutoff / nyquist
    b, a = butter(order, normal_cutoff, btype='high', analog=False)
    y = filtfilt(b, a, data)
    return y

fs = int(eeg_data.info['sfreq'])  # Sample rate
cutoff = 1  # Desired cutoff frequency of the filter, Hz
eeg_data_filtered = eeg_data.copy().filter(l_freq=cutoff, h_freq=None)

eeg_data_filtered.plot(duration=5, n_channels=10, title='Filtered EEG Signal')

# Normalizers
scaler = RobustScaler()
data = eeg_data_filtered.get_data()
data_normalized = scaler.fit_transform(data.T).T

# Create a new RawArray with the normalized data
eeg_data_normalized = mne.io.RawArray(data_normalized, eeg_data_filtered.info)

eeg_data_normalized.plot(duration=5, n_channels=10, title='Normalized EEG Signal')

## Going forwards
**Play with different datasets** with the goal of creating visuals with explainable data trends



**Get EXP:** explore different combinations of tools and algorithms for Preprocess, Filter, and Normalize

**Tips for farming EXP:**
- **be lazy:** look up things *when you cant figure something out*, someone's already done it and youll feel more stupid when you find it the longer you wait
- **be curious:** ask yourself *what kind of info can be extracted* through data analysis. think about how each dataset was recorded the trial.
- **try to break things:** tinker with familiar algorithms or ask GPT how to extract different properties from raw data
- **reuse code:** Start building a list of tools and collect the snippets and URLs to every guide that you follow

```SQL
EX: 
COLLECTING "fire ball jutsu"
LOADING "good job delsys :/"
PREPROCESS "I used Z filter to remove the noise"
VISUAL "you can see at X time on trial Y, the patient got fatigued"
NORMALIZE "The patient used more force on 3/10 trials"

```