# Welcome to the Modular Multimodal Data Fusion ML Pipeline for stress detection for the WESAD Database

## Table of contents






## Getting Started:
First, download necessary packages, if you are using a venv such as Conda, activate this first.

In [None]:
%pip install -r requirements.txt

## Data Installation
If you are on Linux, run this cell to download and extract the WESAD dataset automatically, otherwise download manually [here](https://uni-siegen.sciebo.de/s/HGdUkoNlW1Ub0Gx/download) and unzip the `WESAD` file into the `wesad` directory i.e. `wesad/WESAD/`

In [None]:
%cd src/wesad && bash download_database.sh
%cd ../..

## Data Preprocessing

This will automatically extract the biosensor data from the WESAD directory into several merged files in `.pkl` format.

This will take around 10 minutes depending on the machine.

In [None]:
from src.wesad.data_preprocessing import WESADDataPreprocessor

preprocessor = WESADDataPreprocessor('src/wesad/WESAD/')
preprocessor.preprocess()

## Signal Preprocessing Steps
We will preprocess each signal with their respective preprocessing steps:

### Chest Signals

#### ECG
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth band-pass filter of order 3 with cutoff frequencies 0.7 Hz and 3.7 Hz.

#### EMG
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth lowpass filter of order 3 with cutoff frequency 0.5 Hz.

#### EDA
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth lowpass filter of order 2 with cutoff frequency 5 Hz.

#### TEMP
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.

#### RESP
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.
- **Filtering**: Butterworth band-pass filter of order 3 with cutoff frequencies 0.1 Hz and 0.35 Hz.

#### ACC
- **Smoothing**: Savitzky–Golay filter with window size 31 and order 5.

### Wrist Signals

#### BVP
- **Filtering**: Butterworth band-pass filter of order 3 with cutoff frequencies 0.7 Hz and 3.7 Hz.

#### TEMP
- **Smoothing**: Savitzky–Golay filter with window size 11 and order 3.

#### ACC
- **Filtering**: Finite Impulse Response (FIR) filter with a length of 64 with a cut-off frequency of 0.4 Hz.


In [None]:
# set config_files
CHEST_CONFIG = 'config_files/dataset/wesad_chest_configuration.json'
WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'

In [None]:
from src.ml_pipeline.preprocessing import SignalPreprocessor

# preprocess the chest data
signal_preprocessor = SignalPreprocessor('src/wesad/WESAD/raw/merged_chest.pkl', 'src/wesad/WESAD/cleaned/chest_preprocessed.pkl', CHEST_CONFIG)
signal_preprocessor.preprocess_signals()

# preprocess the wrist data
signal_preprocessor = SignalPreprocessor('src/wesad/WESAD/raw/merged_wrist.pkl', 'src/wesad/WESAD/cleaned/wrist_preprocessed.pkl', WRIST_CONFIG, wrist=True)
signal_preprocessor.preprocess_signals()

## Data Augmentation and Splitting
Data augmentation will take the form of a sliding window. Once the data is augmented, each sample will then be split into smaller segments if it is not used in the autoregression case.

In [None]:
from src.ml_pipeline.data_loader import DataAugmenter

AUTOREGRESSIVE = True

if AUTOREGRESSIVE:
    WINDOW_LENGTH = 10
    SLIDING_LENGTH = 2 # this will create 5 segments per 10 seconds since 10/2 = 5 with 4:1 ratio of synthetic to real samples
    SPLIT_LENGTH = WINDOW_LENGTH # this will not sub-split the data
else: 
    WINDOW_LENGTH = 60
    SLIDING_LENGTH = 5 # this will create 12 segments per minute since 60/5 = 12 with 11:1 ratio of synthetic to real samples
    SPLIT_LENGTH = 10 # this will sub-split each 60 second segments into 6 x 10 second segments
    WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'

wrist_augmenter = DataAugmenter('src/wesad/WESAD/cleaned/wrist_preprocessed.pkl', WRIST_CONFIG) 
batches = wrist_augmenter.augment_data(WINDOW_LENGTH, SLIDING_LENGTH)
wrist_splitted_segments = wrist_augmenter.split_segments(batches, WINDOW_LENGTH//SPLIT_LENGTH)

## Traditional Machine Learning

The manual feature extraction derives features in the time, frequency and non-linear domains.

In [None]:
from src.ml_pipeline.feature_extraction import ManualFE

WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'
WRIST_FE = f'src/wesad/WESAD/manual_fe/wrist_manual_fe/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/wrist_features.hdf5'

manual_fe = ManualFE(wrist_splitted_segments, WRIST_FE, WRIST_CONFIG)
manual_fe.extract_features()

### Prepare LOSOCV Datasets

Now, using the preprocessed `.pkl` files we will make it into a dataloader with LOSOCV (Leave one subject out cross validation). The data augmented samples will be used in the training set but ignored in the test set.

In [None]:
from src.ml_pipeline.data_loader import LOSOCVDataLoader

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5
SPLIT_LENGTH = 10 # this will split each 60 second segments into 6 x 10 second segments
WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'
# WRIST_FE = f'src/wesad/WESAD/manual_fe/wrist_manual_fe/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/wrist_features.hdf5'
WRIST_FE = f'src/wesad/WESAD/manual_fe/test203/wrist_manual_fe_{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s.hdf5'

dataloader_params = {
    'batch_size': 32,
    'shuffle': True,
    # 'num_workers': 4
}
losocv_loader = LOSOCVDataLoader(WRIST_FE, WRIST_CONFIG, **dataloader_params)

# Prepare the datasets
DATASETS_PATH = losocv_loader.prepare_datasets()

### Train and Validate Models with LOSOCV

Now we can use the prepared datasets and form dataloaders which will then be used to perform LOSOCV on the models. Using the config file we can set the models that we want to test and their corresponding hyperparameters.

In [None]:
from src.ml_pipeline.data_loader import LOSOCVDataLoader
from src.ml_pipeline.train import TraditionalMLTrainer
from src.utils import save_var

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5
SPLIT_LENGTH = 10 # this will split each 60 second segments into 6 x 10 second segments
WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'
WRIST_FE = f'src/wesad/WESAD/manual_fe/wrist_manual_fe/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/wrist_features.hdf5'

dataloader_params = {
    'batch_size': 32,
    'shuffle': True
    # 'num_workers': 4
}

# Load tradtional model config
losocv_loader = LOSOCVDataLoader(WRIST_FE, WRIST_CONFIG, **dataloader_params)
dataloaders = losocv_loader.get_data_loaders(DATASETS_PATH)
TRADTIONAL_ML_CONFIG = 'config_files/model_training/traditional/traditional_models.json'

results = []
for i, (subject_id, loaders) in enumerate(dataloaders.items()):
    train_loader = loaders['train']
    val_loader = loaders['val']
    
    print(f'\nFold {i}')
    print(f'Train: {len(train_loader.dataset)}')
    print(f'Val: {len(val_loader.dataset)}')
    print()

    # Initialize trainer
    trainer = TraditionalMLTrainer(TRADTIONAL_ML_CONFIG, train_loader, val_loader)

    # trained_models = trainer.tune_hyperparameters(n_jobs=4, cv=None, verbose=2)
    trained_models = trainer.train()

    result = trainer.validate(trained_models)
    print(result)
    results.append(result)

# save the results to pkl
save_var(results, 'src/wesad/WESAD/results/traditional_models/wrist_results.pkl')

### Visualize Results

In [None]:
from src.ml_pipeline.analysis import ModelResultsAnalysis
from src.utils import load_var

results = load_var('src/wesad/WESAD/results/traditional_models/wrist_results.pkl')

analysis = ModelResultsAnalysis(results)
analysis.analyze_collective()

# Deep Learning Methods


## Self-Attention Network Alongside Manual Feature Extraction
This network employs self-attention networks for intra-modality feature extraction before applying late fusion between modalities.

### Prepare LOSOCV Datasets on a Per Sensor Basis

Now, using the preprocessed `.pkl` files we will make it into a dataloader with LOSOCV (Leave one subject out cross validation) on a per sensor basis. The data augmented samples will be used in the training set but ignored in the test set.

In [None]:
from src.ml_pipeline.data_loader import LOSOCVSensorDataLoader

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5
SPLIT_LENGTH = 10 # this will split each 60 second segments into 6 x 10 second segments
WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'
WRIST_FE = f'src/wesad/WESAD/manual_fe/wrist_manual_fe/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/wrist_features.hdf5'
dataloader_params = {
    'batch_size': 32,
    'shuffle': True,
    # 'num_workers': 4
}
losocv_loader = LOSOCVSensorDataLoader(WRIST_FE, WRIST_CONFIG, **dataloader_params)

# Prepare the datasets
DATASETS_PATH = losocv_loader.prepare_datasets(f'src/wesad/WESAD/datasets/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s')

### Train and Validate Models with LOSOCV

Now we can use the prepared datasets and form dataloaders which will then be used to perform LOSOCV on the models. Using the config file we can set the models that we want to test and their corresponding hyperparameters.

In [None]:
from src.ml_pipeline.train import PyTorchTrainer
from src.ml_pipeline.models.attention_models import ModularModalityFusionNet
from src.ml_pipeline.data_loader import LOSOCVSensorDataLoader
from src.ml_pipeline.utils import get_active_key, get_key, load_json, copy_json, get_values
from src.utils import save_var
from datetime import datetime
import torch

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5
SPLIT_LENGTH = 10 # this will split each 60 second segments into 6 x 10 second segments

DATASETS_PATH = f'src/wesad/WESAD/datasets/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/losocv_datasets.pkl'
WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'
WRIST_FE = f'src/wesad/WESAD/manual_fe/wrist_manual_fe/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s_mini/wrist_features.hdf5'
SAN_MODEL_CONFIG = 'config_files/model_training/deep/modular_modality_fusion_net_config.json'

dataloader_params = {
    'batch_size': get_values(SAN_MODEL_CONFIG, 'batch_size'),
    'shuffle': True,
    # 'num_workers': 4
}

# Load Dataloaders for LOSOCV
losocv_loader = LOSOCVSensorDataLoader(WRIST_FE, WRIST_CONFIG, **dataloader_params)
dataloaders, input_dims = losocv_loader.get_data_loaders(DATASETS_PATH)

# Load Model Parameters
model_config = load_json(SAN_MODEL_CONFIG)
model_config = {
    **model_config,
    'input_dims': input_dims
}

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f'Using device: {device}')

results = []
for i, (subject_id, loaders) in enumerate(dataloaders.items()):
    train_loader = loaders['train']
    val_loader = loaders['val']
    
    print(f'\nSubject: {subject_id}')
    print(f'Train: {len(train_loader.dataset)}')
    print(f'Val: {len(val_loader.dataset)}')
    print()

    # Initialize model
    model = ModularModalityFusionNet(**model_config)

    # Initialize trainer
    trainer = PyTorchTrainer(model, train_loader, val_loader, SAN_MODEL_CONFIG, device)
    trainer.save_path = trainer.save_path.format(fold=f'subject_{subject_id}')
    if i == 0:
        trainer.print_model_summary()
    trained_model_ckpt = trainer.train()
    print(f'Model checkpoint saved to: {trained_model_ckpt}\n')

    result = trainer.validate(trained_model_ckpt)
    results.append(result)
    break

# save the results to pkl
current_time = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
save_path = f'src/wesad/WESAD/results/san/wrist_results/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/{current_time}/generalized'
save_var(results, f'{save_path}/results.pkl', 'Results')
copy_json(SAN_MODEL_CONFIG, f'{save_path}/config.json')

### Visualize Results

In [None]:
from src.ml_pipeline.analysis import ModelResultsAnalysis
from src.utils import load_var

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5
SPLIT_LENGTH = 10 # this will split each 60 second segments into 6 x 10 second segments

# results_path = 'src/wesad/WESAD/results/san/wrist_results/60s_5s_10s/2024_06_20_14_55_01/.pkl'
results_path = f'{save_path}/results.pkl'
results = load_var(results_path)

analysis = ModelResultsAnalysis(results)
analysis.analyze_collective()

In [None]:
losocv_datasets = {2: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_2.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_2.hdf5'}, 3: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_3.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_3.hdf5'}, 4: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_4.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_4.hdf5'}, 5: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_5.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_5.hdf5'}, 6: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_6.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_6.hdf5'}, 7: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_7.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_7.hdf5'}, 8: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_8.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_8.hdf5'}, 9: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_9.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_9.hdf5'}, 10: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_10.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_10.hdf5'}, 11: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_11.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_11.hdf5'}, 13: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_13.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_13.hdf5'}, 14: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_14.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_14.hdf5'}, 15: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_15.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_15.hdf5'}, 16: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_16.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_16.hdf5'}, 17: {'train': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/train_17.hdf5', 'val': 'src/wesad/WESAD/datasets/10s_2s_10s/losocv/val_17.hdf5'}}

# save the updated losocv_datasets to the same file
with open('src/wesad/WESAD/datasets/10s_2s_10s/losocv_datasets.pkl', 'wb') as f:
    pickle.dump(losocv_datasets, f)

### Prepare Cross Validation Dataloader for Personalization

First we must prepare a new dataloader which exclusively contains one subject's personalized data that is used to cross validate the personalized model.

In [None]:
from src.ml_pipeline.data_loader import PersonalSensorDataLoader

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5
SPLIT_LENGTH = 10 # this will split each 60 second segments into 6 x 10 second segments
WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'
WRIST_FE = f'src/wesad/WESAD/manual_fe/wrist_manual_fe/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/wrist_features.hdf5'
dataloader_params = {
    'batch_size': 32,
    'shuffle': True,
    # 'num_workers': 4
}

SUBJECT_ID = 2
personal_loader = PersonalSensorDataLoader(WRIST_FE, WRIST_CONFIG, **dataloader_params)

# Prepare the datasets
DATASETS_PATH = personal_loader.prepare_datasets(f'src/wesad/WESAD/datasets/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s', SUBJECT_ID)

### Adding Personalization via Transfer Learning

This will then be used to fine tune the model with the new multiheaded attention blocks architecture.

In [None]:
from src.ml_pipeline.train import PyTorchTrainer
from src.ml_pipeline.models.attention_models import PersonalizedModalityFusionNet, ModularModalityFusionNet
from src.ml_pipeline.data_loader import PersonalSensorDataLoader
from src.ml_pipeline.utils import get_active_key, load_json, copy_json, get_values
from src.utils import save_var
from datetime import datetime
import torch

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5
SPLIT_LENGTH = 10 # this will split each 60 second segments into 6 x 10 second segments

SUBJECT_ID = 2

DATASETS_PATH = f'src/wesad/WESAD/datasets/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/subject_{SUBJECT_ID}/personal_dataset.pkl'
WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'
WRIST_FE = f'src/wesad/WESAD/manual_fe/wrist_manual_fe/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/wrist_features.hdf5'
PERSONALIZED_SAN_MODEL_CONFIG = 'config_files/model_training/deep/personalized_modality_fusion_net_config.json'
MFN_CKPT_PATH = 'src/wesad/WESAD/ckpts/san/wrist_manual_fe/60s_5s_10s/generalized/subject_2/checkpoint_5.pth'

dataloader_params = {
    'batch_size': get_values(PERSONALIZED_SAN_MODEL_CONFIG, 'batch_size'),
    'shuffle': True,
    # 'num_workers': 4
}

# Load Dataloaders for LOSOCV
personal_loader = PersonalSensorDataLoader(WRIST_FE, WRIST_CONFIG, **dataloader_params)
dataloaders, input_dims = personal_loader.get_data_loaders(DATASETS_PATH)

# Load Model Parameters
model_config = load_json(PERSONALIZED_SAN_MODEL_CONFIG)
model_config = {
    **model_config,
    'input_dims': input_dims
}

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f'Using device: {device}')

results = []
for i, loaders in enumerate(dataloaders):
    train_loader = loaders['train']
    val_loader = loaders['val']    
    print(f'\nFold: {i}')
    print(f'Train: {len(train_loader.dataset)}')
    print(f'Val: {len(val_loader.dataset)}')
    print()

    # Initialize model
    model = PersonalizedModalityFusionNet(MFN_CKPT_PATH, ModularModalityFusionNet, **model_config)

    # Initialize trainer
    trainer = PyTorchTrainer(model, train_loader, val_loader, PERSONALIZED_SAN_MODEL_CONFIG, device)
    trainer.save_path = trainer.save_path.format(fold=f'fold_{i}')
    if i == 0:
        trainer.print_model_summary()
    trained_model_ckpt = trainer.train()
    print(f'Model saved at: {trained_model_ckpt}')

    result = trainer.validate(trained_model_ckpt)
    results.append(result)

# save the results to pkl
current_time = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
save_path = f'src/wesad/WESAD/results/san/wrist_results/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/{current_time}/personalized'
save_var(results, f'{save_path}/results.pkl', 'results')
copy_json(PERSONALIZED_SAN_MODEL_CONFIG, f'{save_path}/config.json')

### Visualize Results

In [None]:
from src.ml_pipeline.analysis import ModelResultsAnalysis
from src.utils import load_var

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5
SPLIT_LENGTH = 10 # this will split each 60 second segments into 6 x 10 second segments

results_path = f'src/wesad/WESAD/results/san/wrist_results/60s_5s_10s/2024_06_22_14_20_25/personalized/results.pkl'
results = load_var(results_path)

analysis = ModelResultsAnalysis(results)
analysis.analyze_collective()

## Bidirectional Cross-and Self-modal Attention (BCSA) 

This model now integrates cross attention into the self-attention network for inter-modality and intra-modality feature extraction.

### Train and Validate Model with LOSOCV

In [None]:
from src.ml_pipeline.train import PyTorchTrainer
from src.ml_pipeline.models.attention_models import ModularBCSA
from src.ml_pipeline.data_loader import LOSOCVSensorDataLoader
from src.ml_pipeline.utils import load_json, copy_json, get_values
from src.utils import save_var
from datetime import datetime
import torch

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5
SPLIT_LENGTH = 10 # this will subsplit each 60 second segments into 6 x 10 second segments

# DATASETS_PATH = f'src/wesad/WESAD/datasets/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}_mini/wrist_manual_fe_{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}.hdf5'
DATASETS_PATH = f'src/wesad/WESAD/datasets/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/losocv_datasets.pkl'
WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'
WRIST_FE = f'src/wesad/WESAD/manual_fe/wrist_manual_fe/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s_mini/wrist_features.hdf5'
BCSA_MODEL_CONFIG = 'config_files/model_training/deep/bsca_config.json'

dataloader_params = {
    'batch_size': get_values(BCSA_MODEL_CONFIG, 'batch_size'),
    'shuffle': True,
}

# Load Dataloaders for LOSOCV
losocv_loader = LOSOCVSensorDataLoader(WRIST_FE, WRIST_CONFIG, **dataloader_params)
dataloaders, input_dims = losocv_loader.get_data_loaders(DATASETS_PATH)

# Load Model Parameters
model_config = load_json(BCSA_MODEL_CONFIG)
model_config = {
    **model_config,
    'input_dims': input_dims
}

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f'Using device: {device}')

results = []
for i, (subject_id, loaders) in enumerate(dataloaders.items()):
    train_loader = loaders['train']
    val_loader = loaders['val']
    
    print(f'\nSubject: {subject_id}')
    print(f'Train: {len(train_loader.dataset)}')
    print(f'Val: {len(val_loader.dataset)}')
    print()

    # Initialize model
    model = ModularBCSA(**model_config)

    # Initialize trainer
    trainer = PyTorchTrainer(model, train_loader, val_loader, BCSA_MODEL_CONFIG, device)
    trainer.save_path = trainer.save_path.format(fold=f'subject_{subject_id}')
    
    # if i == 0:
    #     trainer.print_model_summary()

    trained_model_ckpt = trainer.train()
    print(f'Model checkpoint saved to: {trained_model_ckpt}\n')

    result = trainer.validate(trained_model_ckpt)
    results.append(result)
    break

# save the results to pkl
current_time = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
save_path = f'src/wesad/WESAD/results/bcsa/wrist_results/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/{current_time}/generalized'
save_var(results, f'{save_path}/results.pkl', 'Results')
copy_json(BCSA_MODEL_CONFIG, f'{save_path}/config.json')

### Visualize the Results

In [None]:
from src.ml_pipeline.analysis import ModelResultsAnalysis
from src.utils import load_var

WINDOW_LENGTH = 60
SLIDING_LENGTH = 5
SPLIT_LENGTH = 10 # this will split each 60 second segments into 6 x 10 second segments

# results_path = f'{save_path}/results.pkl'
# results_path = 'src/wesad/WESAD/results/bcsa/wrist_results/60s_5s_10s/2024_06_21_15_50_27/generalized/results.pkl'
results_path = 'src/wesad/WESAD/results/bcsa/wrist_results/60s_5s_10s/2024_06_27_17_23_40/generalized/results.pkl'
results = load_var(results_path)

analysis = ModelResultsAnalysis(results)
analysis.analyze_collective()

## Modular Autoregressive Co-Attention Network (MARCONet)  

This model now integrates modularity to the Self and Cross Attention Network for inter-modality and intra-modality feature extraction using both early fusion of pairwise modalities and late fusion of the ensemble learning branches.

### Prepare LOSOCV Datasets on a Per Sensor Basis

Now, using the preprocessed `.pkl` files we will make it into a dataloader with LOSOCV (Leave one subject out cross validation) on a per sensor basis. The data augmented samples will be used in the training set but ignored in the test set.

In [None]:
from src.ml_pipeline.data_loader import LOSOCVSensorDataLoader

WINDOW_LENGTH = 10
SLIDING_LENGTH = 2 # this will create 5 segments per 10 seconds since 10/2 = 5 with 4:1 ratio of synthetic to real samples
SPLIT_LENGTH = WINDOW_LENGTH # this will not sub-split the data

WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'
WRIST_FE = f'src/wesad/WESAD/manual_fe/wrist_manual_fe/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/wrist_features.hdf5'
dataloader_params = {
    'batch_size': 32,
    'shuffle': True,
    # 'num_workers': 4
}
losocv_loader = LOSOCVSensorDataLoader(WRIST_FE, WRIST_CONFIG, **dataloader_params)

# Prepare the datasets
DATASETS_PATH = losocv_loader.prepare_datasets(f'src/wesad/WESAD/datasets/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s')

### Train and Validate Models with LOSOCV

Now we can use the prepared datasets and form dataloaders which will then be used to perform LOSOCV on the models. Using the config file we can set the models that we want to test and their corresponding hyperparameters.

In [1]:
from src.ml_pipeline.train import PyTorchTrainer
from src.ml_pipeline.models.attention_models import MARCONet
from src.ml_pipeline.data_loader import LOSOCVSensorDataLoader
from src.ml_pipeline.utils import get_active_key, get_key, load_json, copy_json, get_values
from src.utils import save_var
from datetime import datetime
import torch

WINDOW_LENGTH = 10
SLIDING_LENGTH = 2 # this will create 5 segments per 10 seconds since 10/2 = 5 with 4:1 ratio of synthetic to real samples
SPLIT_LENGTH = WINDOW_LENGTH # this will not sub-split the data

DATASETS_PATH = f'src/wesad/WESAD/datasets/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/losocv_datasets.pkl'
WRIST_CONFIG = 'config_files/dataset/wesad_wrist_configuration.json'
WRIST_FE = f'src/wesad/WESAD/manual_fe/wrist_manual_fe/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/wrist_features.hdf5'
MARCO_CONFIG = 'config_files/model_training/deep/marco_config.json'

dataloader_params = {
    'batch_size': get_values(MARCO_CONFIG, 'batch_size'),
    'shuffle': False,
    'drop_last': True
}

# Load Dataloaders for LOSOCV
losocv_loader = LOSOCVSensorDataLoader(WRIST_FE, WRIST_CONFIG, **dataloader_params)
dataloaders, input_dims = losocv_loader.get_data_loaders(DATASETS_PATH)

# Load Model Parameters
model_config = load_json(MARCO_CONFIG)
model_config = {
    **model_config,
    'input_dims': input_dims
}

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f'Using device: {device}')

results = []
for i, (subject_id, loaders) in enumerate(dataloaders.items()):
    train_loader = loaders['train']
    val_loader = loaders['val']
    
    print(f'\nSubject: {subject_id}')
    print(f'Train: {len(train_loader.dataset)}')
    print(f'Val: {len(val_loader.dataset)}')
    print()

    # Initialize model
    model = MARCONet(**model_config)

    # Initialize trainerself.attention(x, x, x)
    trainer = PyTorchTrainer(model, train_loader, val_loader, MARCO_CONFIG, device)
    trainer.save_path = trainer.save_path.format(fold=f'subject_{subject_id}')
    # if i == 0:
    #     trainer.print_model_summary()
    trained_model_ckpt = trainer.train()
    print(f'Model checkpoint saved to: {trained_model_ckpt}\n')

    result = trainer.validate(trained_model_ckpt)
    results.append(result)
    break

# save the results to pkl
current_time = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
save_path = f'src/wesad/WESAD/results/san/wrist_results/{WINDOW_LENGTH}s_{SLIDING_LENGTH}s_{SPLIT_LENGTH}s/{current_time}/generalized'
save_var(results, f'{save_path}/results.pkl', 'Results')
copy_json(MARCO_CONFIG, f'{save_path}/config.json')

Using device: cuda

Subject: 2
Train: 12762
Val: 174

Storing tensorboard log to: src/wesad/WESAD/ckpts/co_attention/wrist_manual_fe/10s_2s_10s/generalized/subject_2/tensorboard


Epoch 1/1:  17%|█▋        | 2140/12762 [02:47<06:41, 26.49it/s, loss=1.97e-8]   

### Visualize Results

In [None]:
from src.ml_pipeline.analysis import ModelResultsAnalysis
from src.utils import load_var

results_path = f'{save_path}/results.pkl'
results = load_var(results_path)

analysis = ModelResultsAnalysis(results)
analysis.analyze_collective()