# Welcome to the Modular Multimodal Data Fusion ML Pipeline for stress detection for the WESAD Database

## Table of contents






## Getting Started:
First, download necessary packages, if you are using a venv such as Conda, activate this first.

In [None]:
%pip install -r requirements.txt

## Data Installation
If you are on Linux, run this cell to download and extract the WESAD dataset automatically, otherwise download manually [here](https://uni-siegen.sciebo.de/s/HGdUkoNlW1Ub0Gx/download) and unzip the `WESAD` file into the `wesad` directory i.e. `wesad/WESAD/`

In [None]:
%cd src/wesad && bash download_database.sh
%cd ../..

## Data Preprocessing

This will automatically extract the biosensor data from the WESAD directory into several merged files in `.pkl` format.

This will take around 10 minutes depending on the machine.

In [None]:
from src.wesad.data_preprocessing import WESADDataPreprocessor

preprocessor = WESADDataPreprocessor('src/wesad/WESAD/')
preprocessor.preprocess()

## Signal Preprocessing Steps
We will preprocess each signal with their respective preprocessing steps:

### Chest Signals

#### ECG
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth band-pass filter of order 3 with cutoff frequencies 0.7 Hz and 3.7 Hz.

#### EMG
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth lowpass filter of order 3 with cutoff frequency 0.5 Hz.

#### EDA
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth lowpass filter of order 2 with cutoff frequency 5 Hz.

#### TEMP
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.

#### RESP
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth band-pass filter of order 3 with cutoff frequencies 0.1 Hz and 0.35 Hz.

#### ACC
- **Smoothing**: Savitzky–Golay filter with window size 31 and order 5.

### Wrist Signals

#### BVP
- **Filtering**: Butterworth band-pass filter of order 3 with cutoff frequencies 0.7 Hz and 3.7 Hz.

#### TEMP
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.

#### ACC
- **Filtering**: Finite Impulse Response (FIR) filter with a length of 64 with a cut-off frequency of 0.4 Hz.


In [None]:
# set config_files
CHEST_CONFIG = 'src/wesad/wesad_chest_configuration.json'
WRIST_CONFIG = 'src/wesad/wesad_wrist_configuration.json'

In [None]:
from src.ml_pipeline.preprocessing import SignalPreprocessor

# preprocess the chest data
signal_preprocessor = SignalPreprocessor('src/wesad/WESAD/raw/merged_chest.pkl', 'src/wesad/WESAD/cleaned/chest_preprocessed.pkl', CHEST_CONFIG)
signal_preprocessor.preprocess_signals()

# preprocess the wrist data
signal_preprocessor = SignalPreprocessor('src/wesad/WESAD/raw/merged_wrist.pkl', 'src/wesad/WESAD/cleaned/wrist_preprocessed.pkl', WRIST_CONFIG, wrist=True)
signal_preprocessor.preprocess_signals()

## Traditional Machine Learning: Manual Feature Extraction

During the feature extraction, data is loaded in an augmented manner using a 60-second window with a sliding length of 5 seconds.

The manual feature extraction derives the following features:

In [1]:
from src.ml_pipeline.data_loader import DataAugmenter
from src.ml_pipeline.feature_extraction import ManualFE

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5

CHEST_CONFIG = 'src/wesad/wesad_chest_configuration.json'
WRIST_CONFIG = 'src/wesad/wesad_wrist_configuration.json'

wrist_augmenter = DataAugmenter('src/wesad/WESAD/cleaned/wrist_preprocessed.pkl', WRIST_CONFIG)
batches = wrist_augmenter.segment_data(WINDOW_LENGTH, SLIDING_LENGTH)

manual_fe = ManualFE(batches, 'src/wesad/WESAD/manual_fe/wrist_manual_fe.hdf5', WRIST_CONFIG)
manual_fe.extract_features()

Segmenting data...
Segmentation complete.
Extracting features from batch 1/17202 | ETA: 0.08 seconds
Extracting features from batch 101/17202 | ETA: 15506.26 seconds
Extracting features from batch 201/17202 | ETA: 15453.80 seconds
Extracting features from batch 301/17202 | ETA: 15234.80 seconds


KeyboardInterrupt: 

In [None]:
from src.ml_pipeline.data_loader import DataAugmenter
from src.ml_pipeline.feature_extraction import ManualFE

chest_augmenter = DataAugmenter('src/wesad/WESAD/cleaned/chest_preprocessed.pkl', CHEST_CONFIG)
batches = chest_augmenter.segment_data()

manual_fe = ManualFE(batches, 'src/wesad/WESAD/manual_fe/chest_manual_fe.hdf5', CHEST_CONFIG)
manual_fe.extract_features()

## Traditional Machine Learning: Automatic Feature Extraction

The automatic feature extraction uses autoencoders to derive features from its latent space:

Now, using the preprocessed `.pkl` files we will make it into a dataloader - where LOSOCV (Leave one subject out cross validation). The data augmented samples will be used in the training set but ignored in the test set.

In [1]:
from src.ml_pipeline.data_loader import LOSOCVDataLoader

# Set Parameters
params = {
    'batch_size': 32,
    'shuffle': True,
    # 'num_workers': 4
}

CHEST_CONFIG = 'src/wesad/wesad_chest_configuration.json'
WRIST_CONFIG = 'src/wesad/wesad_wrist_configuration.json'

losocv_loader = LOSOCVDataLoader('src/wesad/WESAD/manual_fe/wrist_manual_fe.hdf5', WRIST_CONFIG, **params)
dataloaders = losocv_loader.get_dataloaders()

for i, (train_loader, val_loader) in enumerate(dataloaders):
    print(f'Fold {i}')
    print(f'Train: {len(train_loader.dataset)}')
    print(f'Val: {len(val_loader.dataset)}')
    print()


ValueError: num_samples should be a positive integer value, but got num_samples=0

In [None]:
import pandas as pd

df = pd.read_pickle('src/wesad/WESAD/raw/merged_chest.pkl')

df

In [None]:
import pandas as pd

# df = pd.read_pickle('src/wesad/WESAD/raw/merged_wrist.pkl')
df = pd.read_pickle('/home/fsociety/WESAD/S2/S2.pkl')


df

In [None]:
df['w_acc_x']

In [None]:
import pandas as pd

df = pd.read_pickle('src/wesad/WESAD/cleaned/chest_preprocessed.pkl')

df

In [None]:
import pandas as pd

# df = pd.read_pickle('src/wesad/WESAD/cleaned/chest_preprocessed.pkl')

df = pd.read_pickle('src/wesad/WESAD/augmented/chest_augmented.pkl')

type(df)

In [None]:
import pandas as pd

df = pd.read_pickle('src/wesad/WESAD/raw/merged_wrist.pkl')

df.head()

In [None]:
import pandas as pd

# Define file paths
pkl_path1 = 'wesad/WESAD/raw/subj_merged_acc_w.pkl'
pkl_path2 = 'wesad/WESAD/raw/subj_merged_eda_temp_w.pkl'
pkl_path3 = 'wesad/WESAD/raw/subj_merged_bvp_w.pkl'

# Load the data from pickle files
df1 = pd.read_pickle(pkl_path1)
df2 = pd.read_pickle(pkl_path2)
df3 = pd.read_pickle(pkl_path3)

# Merge dataframes on common columns (assumed to be a common index)
# Adjust the merge method and key columns as needed
merged_df = df1.merge(df2, left_index=True, right_index=True, how='inner')
merged_df = merged_df.merge(df3, left_index=True, right_index=True, how='inner')

# Save the merged dataframe to a new pickle file
merged_df.to_pickle('wesad/WESAD/raw/merged_wrist.pkl')
