# Welcome to the Modular Multimodal Data Fusion ML Pipeline for stress detection for the WESAD Database

## Table of contents






## Getting Started:
First, download necessary packages, if you are using a venv such as Conda, activate this first.

In [None]:
%pip install -r requirements.txt

## Data Installation
If you are on Linux, run this cell to download and extract the WESAD dataset automatically, otherwise download manually [here](https://uni-siegen.sciebo.de/s/HGdUkoNlW1Ub0Gx/download) and unzip the `WESAD` file into the `wesad` directory i.e. `wesad/WESAD/`

In [None]:
%cd src/wesad && bash download_database.sh
%cd ../..

## Data Preprocessing

This will automatically extract the biosensor data from the WESAD directory into several merged files in `.pkl` format.

This will take around 10 minutes depending on the machine.

In [None]:
from src.wesad.data_preprocessing import WESADDataPreprocessor

preprocessor = WESADDataPreprocessor('src/wesad/WESAD/')
preprocessor.preprocess()

## Signal Preprocessing Steps
We will preprocess each signal with their respective preprocessing steps:

### Chest Signals

#### ECG
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth band-pass filter of order 3 with cutoff frequencies 0.7 Hz and 3.7 Hz.

#### EMG
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth lowpass filter of order 3 with cutoff frequency 0.5 Hz.

#### EDA
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth lowpass filter of order 2 with cutoff frequency 5 Hz.

#### TEMP
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.

#### RESP
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth band-pass filter of order 3 with cutoff frequencies 0.1 Hz and 0.35 Hz.

#### ACC
- **Smoothing**: Savitzky–Golay filter with window size 31 and order 5.

### Wrist Signals

#### BVP
- **Filtering**: Butterworth band-pass filter of order 3 with cutoff frequencies 0.7 Hz and 3.7 Hz.

#### TEMP
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.

#### ACC
- **Filtering**: Finite Impulse Response (FIR) filter with a length of 64 with a cut-off frequency of 0.4 Hz.


In [None]:
# set config_files
CHEST_CONFIG = 'src/wesad/wesad_chest_configuration.json'
WRIST_CONFIG = 'src/wesad/wesad_wrist_configuration.json'

In [None]:
from src.ml_pipeline.preprocessing import SignalPreprocessor

# preprocess the chest data
signal_preprocessor = SignalPreprocessor('src/wesad/WESAD/raw/merged_chest.pkl', 'src/wesad/WESAD/cleaned/chest_preprocessed.pkl', CHEST_CONFIG)
signal_preprocessor.preprocess_signals()

# preprocess the wrist data
signal_preprocessor = SignalPreprocessor('src/wesad/WESAD/raw/merged_wrist.pkl', 'src/wesad/WESAD/cleaned/wrist_preprocessed.pkl', WRIST_CONFIG, wrist=True)
signal_preprocessor.preprocess_signals()

## Traditional Machine Learning: Manual Feature Extraction

During the feature extraction, data is loaded in an augmented manner using a 60-second window with a sliding length of 5 seconds.

The manual feature extraction derives the following features:

In [None]:
from src.ml_pipeline.data_loader import DataAugmenter
from src.ml_pipeline.feature_extraction import ManualFE

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5

wrist_augmenter = DataAugmenter('src/wesad/WESAD/cleaned/wrist_preprocessed.pkl', WRIST_CONFIG)
batches = wrist_augmenter.segment_data(WINDOW_LENGTH, SLIDING_LENGTH)

manual_fe = ManualFE(batches, 'src/wesad/WESAD/manual_fe/wrist_manual_fe.hdf5', WRIST_CONFIG)
manual_fe.extract_features()

In [None]:
from src.ml_pipeline.data_loader import DataAugmenter
from src.ml_pipeline.feature_extraction import ManualFE

CHEST_CONFIG = 'src/wesad/wesad_chest_configuration.json'
WRIST_CONFIG = 'src/wesad/wesad_wrist_configuration.json'

chest_augmenter = DataAugmenter('src/wesad/WESAD/cleaned/chest_preprocessed.pkl', CHEST_CONFIG)
batches = chest_augmenter.segment_data()

manual_fe = ManualFE(batches, 'src/wesad/WESAD/manual_fe/chest_manual_fe.hdf5', CHEST_CONFIG)
manual_fe.extract_features()

## Traditional Machine Learning: Automatic Feature Extraction

The automatic feature extraction uses autoencoders to derive features from its latent space:

Now, using the preprocessed `.pkl` files we will make it into a dataloader - where LOSOCV (Leave one subject out cross validation). The data augmented samples will be used in the training set but ignored in the test set.

In [1]:
from src.ml_pipeline.data_loader import LOSOCVDataLoader

# Set Parameters
params = {
    'batch_size': 32,
    'shuffle': True,
    # 'num_workers': 4
}

CHEST_CONFIG = 'src/wesad/wesad_chest_configuration.json'
WRIST_CONFIG = 'src/wesad/wesad_wrist_configuration.json'

losocv_loader = LOSOCVDataLoader('src/wesad/WESAD/manual_fe/wrist_manual_fe.hdf5', WRIST_CONFIG, **params)
dataloaders = losocv_loader.get_dataloaders()

for i, (subject_id, loaders) in enumerate(dataloaders.items()):
    train_loader = loaders['train']
    val_loader = loaders['val']
    
    print(f'Fold {i}')
    print(f'Train: {len(train_loader.dataset)}')
    print(f'Val: {len(val_loader.dataset)}')
    print()


Fold 0
Train: 49640
Val: 288

Fold 1
Train: 49592
Val: 288

Fold 2
Train: 49584
Val: 288

Fold 3
Train: 49488
Val: 288

Fold 4
Train: 49520
Val: 296

Fold 5
Train: 49528
Val: 304

Fold 6
Train: 49504
Val: 304

Fold 7
Train: 49520
Val: 280

Fold 8
Train: 49400
Val: 304

Fold 9
Train: 49472
Val: 304

Fold 10
Train: 49480
Val: 304

Fold 11
Train: 49480
Val: 304

Fold 12
Train: 49464
Val: 312

Fold 13
Train: 49488
Val: 296

Fold 14
Train: 49400
Val: 304



In [1]:
from src.ml_pipeline.data_loader import LOSOCVDataLoader

# Set Parameters
params = {
    'batch_size': 32,
    'shuffle': True,
    # 'num_workers': 4
}

CHEST_CONFIG = 'src/wesad/wesad_chest_configuration.json'
WRIST_CONFIG = 'src/wesad/wesad_wrist_configuration.json'

losocv_loader = LOSOCVDataLoader('src/wesad/WESAD/manual_fe/chest_manual_fe.hdf5', CHEST_CONFIG, **params)
dataloaders = losocv_loader.get_dataloaders()

for i, (subject_id, loaders) in enumerate(dataloaders.items()):
    train_loader = loaders['train']
    val_loader = loaders['val']
    
    print(f'Fold {i}')
    print(f'Train: {len(train_loader.dataset)}')
    print(f'Val: {len(val_loader.dataset)}')
    print()


Fold 0
Train: 72552
Val: 420

Fold 1
Train: 72480
Val: 420

Fold 2
Train: 72444
Val: 432

Fold 3
Train: 72324
Val: 432

Fold 4
Train: 72360
Val: 432

Fold 5
Train: 72372
Val: 432

Fold 6
Train: 72348
Val: 432

Fold 7
Train: 72372
Val: 432

Fold 8
Train: 72180
Val: 444

Fold 9
Train: 72300
Val: 444

Fold 10
Train: 72300
Val: 444

Fold 11
Train: 72300
Val: 444

Fold 12
Train: 72288
Val: 444

Fold 13
Train: 72312
Val: 444

Fold 14
Train: 72180
Val: 444



In [None]:
import numpy as np
import pandas as pd
import torch
from torch.utils.data import DataLoader

# Assuming the LOSOCVDataLoader and other necessary classes are defined as provided

# Example configuration and parameters
CHEST_CONFIG = 'path/to/config'  # Replace with the actual path to the config file
params = {'batch_size': 32, 'shuffle': True}

# Initialize LOSOCVDataLoader
losocv_loader = LOSOCVDataLoader('src/wesad/WESAD/manual_fe/chest_manual_fe.hdf5', CHEST_CONFIG, **params)
dataloaders = losocv_loader.get_dataloaders()

# Iterate through the dataloaders and train the model
for i, (subject_id, loaders) in enumerate(dataloaders.items()):
    train_loader = loaders['train']
    val_loader = loaders['val']
    
    print(f'Fold {i}')
    print(f'Train: {len(train_loader.dataset)}')
    print(f'Val: {len(val_loader.dataset)}')
    print()

    # Initialize and train the Random Forest model using H2O
    rf_model = H2ORandomForestEstimator(ntrees=50, max_depth=20)
    trainer = H2OTrainer(rf_model, train_loader, val_loader, target_column='target')
    trainer.train()

    # Example of predicting on validation data
    predictions = trainer.predict(val_loader)
    print(predictions)

    # Shutdown H2O cluster
    trainer.shutdown()
