## Try 1, this kinda went nowhere, but it's still here for documentation purposes

To help you prepare your EEG data for input into a ResNet model, we can go through the steps of loading the .edf files, preprocessing the data, and setting up the format for input into ResNet. Here's an outline of the process:

1. **Load .edf files**: We'll use the `pyEDFlib` library to load the EEG data.
2. **Preprocess the EEG data**: This may involve normalization or other preprocessing specific to EEG signals.
3. **Label encoding**: We'll ensure that the labels (normal or abnormal) are mapped into numerical values.
4. **Reshape data for ResNet**: ResNet typically expects input in a specific shape (e.g., (batch_size, height, width, channels)), so we'll format the EEG data accordingly.

In [None]:
import pyedflib
import numpy as np
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical

# Paths to your 10 .edf files
file_paths = ['file1.edf', 'file2.edf', 'file3.edf', 'file4.edf', 'file5.edf',
              'file6.edf', 'file7.edf', 'file8.edf', 'file9.edf', 'file10.edf']

# Labels for the files: 'normal' or 'abnormal'
labels = ['normal', 'abnormal', 'normal', 'normal', 'abnormal', 
          'abnormal', 'normal', 'abnormal', 'normal', 'abnormal']

# Prepare label encoding
label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(labels)
categorical_labels = to_categorical(encoded_labels)

# Function to load and preprocess each .edf file
def load_and_preprocess_edf(file_path):
    f = pyedflib.EdfReader(file_path)
    num_signals = f.signals_in_file
    signal_data = np.zeros((num_signals, f.getNSamples()[0]))

    for i in range(num_signals):
        signal_data[i, :] = f.readSignal(i)
    
    f.close()

    # Preprocess data (e.g., normalization)
    # Here we simply normalize to 0-1 range as an example
    signal_data = (signal_data - np.min(signal_data)) / (np.max(signal_data) - np.min(signal_data))
    
    return signal_data

# Load and preprocess all the files
eeg_data = []
for path in file_paths:
    eeg_data.append(load_and_preprocess_edf(path))

# Convert the list to a numpy array
eeg_data = np.array(eeg_data)

# Reshape data to fit ResNet input shape (assuming 1D signals, we can reshape to (samples, height, width, channels))
# ResNet often takes 2D images, so you might need to adapt depending on your input size
eeg_data = np.expand_dims(eeg_data, axis=-1)  # Add channel dimension

print("EEG data shape:", eeg_data.shape)
print("Labels shape:", categorical_labels.shape)

### Key Steps:
1. **Loading .edf files**: We use `pyedflib` to load the EEG signals.
2. **Preprocessing**: The data is normalized, but you can modify the preprocessing step based on your needs.
3. **Label Encoding**: The labels (normal or abnormal) are encoded into categorical values for training.
4. **ResNet Input Shape**: We ensure the EEG data is reshaped to the required input dimensions for ResNet (e.g., adding a channel dimension).

Once the data is preprocessed, you can split it into training and test sets and feed it into the ResNet model.

Let me know if you'd like to adjust anything, like the preprocessing or data augmentation!

## Try 2 with pickle formatting

In [5]:
import os
import numpy as np
import pickle as pkl
import torch
from torch.utils.data import DataLoader, Dataset

# Define a custom dataset class to handle loading and processing of .edf files
class EEGDataset(Dataset):
    def __init__(self, data_folder, label):
        self.data_folder = data_folder
        self.label = label
        self.file_list = [f for f in os.listdir(data_folder) if f.endswith('.edf')]

    def __len__(self):
        return len(self.file_list)

    def __getitem__(self, idx):
        file_path = os.path.join(self.data_folder, self.file_list[idx])
        try:
            # Load EEG data from .edf file using pickle (or any other preferred method)
            with open(file_path, 'rb') as f:
                data_pkl = pkl.load(f)
                signals = data_pkl['RAW_DATA'][0]  # Extract EEG signals from the pickled data
            return signals, self.label
        except (pkl.UnpicklingError, KeyError) as e:
            print(f"Error loading {file_path}: {e}")
            return None, None

# Prepare data loader function
def prepare_dataloaders(normal_folder, abnormal_folder, batch_size=16):
    normal_data = EEGDataset(normal_folder, label=0)
    abnormal_data = EEGDataset(abnormal_folder, label=1)

    # Combine datasets
    dataset = torch.utils.data.ConcatDataset([normal_data, abnormal_data])

    # Create DataLoader for batching
    data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_fn)

    return data_loader

# Custom collate function to handle None values
def collate_fn(batch):
    batch = [item for item in batch if item[0] is not None]
    if len(batch) == 0:
        return torch.Tensor(), torch.Tensor()
    data, labels = zip(*batch)
    return torch.stack([torch.Tensor(d) for d in data]), torch.Tensor(labels)

# Define paths to normal and abnormal folders
normal_folder = '/Users/User/Downloads/TUH EEG Corpus random files/normal'
abnormal_folder = '/Users/User/Downloads/TUH EEG Corpus random files/abnormal'

# Prepare the data loaders
data_loader = prepare_dataloaders(normal_folder, abnormal_folder)

# Iterate through the data loader
for batch_idx, (data, labels) in enumerate(data_loader):
    if data.size(0) == 0:
        print(f"Batch {batch_idx + 1} is empty due to loading errors.")
        continue
    print(f"Batch {batch_idx + 1} - Data Shape: {data.shape}, Labels: {labels}")

Error loading /Users/User/Downloads/TUH EEG Corpus random files/abnormal/aaaaaaaq_s004_t000.edf: unpickling stack underflow
Error loading /Users/User/Downloads/TUH EEG Corpus random files/normal/aaaaaabn_s005_t000.edf: unpickling stack underflow
Error loading /Users/User/Downloads/TUH EEG Corpus random files/abnormal/aaaaaacq_s008_t001.edf: unpickling stack underflow
Error loading /Users/User/Downloads/TUH EEG Corpus random files/abnormal/aaaaaacq_s009_t000.edf: unpickling stack underflow
Error loading /Users/User/Downloads/TUH EEG Corpus random files/normal/aaaaaaat_s002_t001.edf: unpickling stack underflow
Error loading /Users/User/Downloads/TUH EEG Corpus random files/abnormal/aaaaaaav_s004_t000.edf: unpickling stack underflow
Error loading /Users/User/Downloads/TUH EEG Corpus random files/normal/aaaaaaff_s002_t000.edf: unpickling stack underflow
Error loading /Users/User/Downloads/TUH EEG Corpus random files/abnormal/aaaaaaaq_s005_t001.edf: unpickling stack underflow
Error loading 

## Try 3 using pyedflib (this one worked!)

Important note here, this isn't going to run for you straight away because for the sake of time I pulled the 10 data files from my downloads folder because pulling it from box wasn't working for me

So either change the path to wherever the folder is for you (because it is in our shared box folder, and it's called TUH EEG Corpus random files), or figure out how to properly call it from box for me, because eventually we need a way to call all the data from some shared location

In [2]:
import os
import numpy as np
import pyedflib
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from torchvision import models

# Function to find the length of the shortest signal and the maximum number of channels
def find_min_length_and_max_channels(folder_paths):
    min_length = float('inf')
    max_channels = 0
    
    for folder in folder_paths:
        for file_name in os.listdir(folder):
            if file_name.endswith('.edf'):
                file_path = os.path.join(folder, file_name)
                with pyedflib.EdfReader(file_path) as f:
                    n_signals = f.signals_in_file
                    if n_signals > max_channels:
                        max_channels = n_signals  # Track maximum number of channels
                    for i in range(n_signals):
                        signal_length = len(f.readSignal(i))
                        if signal_length < min_length:
                            min_length = signal_length  # Update if a smaller signal length is found
    
    return min_length, max_channels

# Function to load signals from the .edf files and pad if necessary
def load_edf_signals(file_path, target_length, target_channels):
    with pyedflib.EdfReader(file_path) as f:
        n_signals = f.signals_in_file
        signal_data = []
        for i in range(n_signals):
            signal = f.readSignal(i)
            truncated_signal = signal[:target_length]  # Truncate signal to target length
            signal_data.append(truncated_signal)
        
        # Pad channels with zeros if fewer than the target number of channels
        while len(signal_data) < target_channels:
            signal_data.append(np.zeros(target_length))
        
        signal_data = np.array(signal_data)  # Convert list to numpy array for consistency
    return signal_data

# Prepare data and labels function
def prepare_data(normal_folder, abnormal_folder, target_length, target_channels):
    data = []
    labels = []
    
    # Load normal files
    for file_name in os.listdir(normal_folder):
        if file_name.endswith('.edf'):
            file_path = os.path.join(normal_folder, file_name)
            signal_data = load_edf_signals(file_path, target_length, target_channels)
            data.append(signal_data)
            labels.append(0)  # Label 0 for normal
    
    # Load abnormal files
    for file_name in os.listdir(abnormal_folder):
        if file_name.endswith('.edf'):
            file_path = os.path.join(abnormal_folder, file_name)
            signal_data = load_edf_signals(file_path, target_length, target_channels)
            data.append(signal_data)
            labels.append(1)  # Label 1 for abnormal
    
    return np.array(data), np.array(labels)

# Paths to the folders
normal_folder = '/Users/User/Downloads/TUH EEG Corpus random files/normal'
abnormal_folder = '/Users/User/Downloads/TUH EEG Corpus random files/abnormal'

# Find the minimum signal length and maximum number of channels across both folders
folders = [normal_folder, abnormal_folder]
min_signal_length, max_channels = find_min_length_and_max_channels(folders)
print(f"Minimum signal length found: {min_signal_length}")
print(f"Maximum number of channels found: {max_channels}")

# Load and prepare the dataset with the minimum signal length and maximum channels
data, labels = prepare_data(normal_folder, abnormal_folder, min_signal_length, max_channels)

# Reshape data for ResNet (ResNet generally expects input shape of (n_samples, height, width, channels))
# Let's assume the EEG data is multichannel, so reshape accordingly
data = np.expand_dims(data, axis=-1)  # Add an extra dimension for channels if needed

# Check the shapes of data and labels
print("Data shape:", data.shape)
print("Labels shape:", labels.shape)

# Prepare the data: Shape should be (batch_size, 1, num_channels, signal_length)
data = torch.tensor(data, dtype=torch.float32)  # Data is (batch_size, num_channels, signal_length, 1)
data = data.permute(0, 3, 1, 2)  # Reshape to (batch_size, 1, num_channels, signal_length)

labels = torch.tensor(labels, dtype=torch.long)  # Labels as long tensor

# Create a dataset and dataloader
dataset = TensorDataset(data, labels)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

Minimum signal length found: 1184
Maximum number of channels found: 36
Data shape: (10, 36, 1184, 1)
Labels shape: (10,)


In [20]:
# Use a prebuilt ResNet model and adjust the input for 2D data
class ResNet2D(nn.Module):
    def __init__(self):
        super(ResNet2D, self).__init__()
        # Load a pretrained ResNet (we'll modify it for EEG input)
        self.resnet = models.resnet18(pretrained=True)
        # Adjust the first convolutional layer to accept 1 input channel (from EEG)
        self.resnet.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        # Change the final fully connected layer to match the number of classes (normal vs abnormal)
        num_ftrs = self.resnet.fc.in_features
        self.resnet.fc = nn.Linear(num_ftrs, 2)  # 2 classes (normal and abnormal)

    def forward(self, x):
        return self.resnet(x)

# Initialize the model
model = ResNet2D()
# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    running_loss = 0.0  # Accumulate the loss for the epoch
    
    for i, (inputs, labels) in enumerate(dataloader):
        optimizer.zero_grad()  # Zero the parameter gradients
        
        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        
        # Backward pass and optimization
        loss.backward()
        optimizer.step()
        
        # Accumulate the loss for this batch
        running_loss += loss.item()
    
    # Calculate the average loss for the epoch
    avg_loss = running_loss / len(dataloader)
    
    # Print the average loss for the epoch
    print(f"[Epoch {epoch + 1}] Average Loss: {avg_loss:.3f}")

print("Finished Training")

[Epoch 1] Average Loss: 2.006
[Epoch 2] Average Loss: 1.562
[Epoch 3] Average Loss: 0.415
[Epoch 4] Average Loss: 0.459
[Epoch 5] Average Loss: 0.366
[Epoch 6] Average Loss: 0.479
[Epoch 7] Average Loss: 0.956
[Epoch 8] Average Loss: 1.192
[Epoch 9] Average Loss: 0.726
[Epoch 10] Average Loss: 0.514
Finished Training


In [1]:
rsync -auxvL --delete nedc-tuh-eeg@www.isip.piconepress.com:data/tuh_eeg/TEST .

SyntaxError: invalid syntax (3648138051.py, line 1)