### **Autoencoder-Based Latent Feature Extraction and Classification for EEG Signals: A Report**

---

### **Background and Motivation**

EEG signals provide valuable information about brain activity and are often used in medical research to predict outcomes, such as the recovery potential of patients. However, the high dimensionality and complexity of raw EEG data present significant challenges for effective analysis. To address these challenges, we explored the use of a CNN-based Variational Autoencoder (VAE) to extract low-dimensional latent features from 5-minute smoothed EEG signals. These features are subsequently classified using machine learning models (e.g., SVM, XGBoost) to predict patient recovery outcomes.

The motivation for this approach stems from:
1. The need for dimensionality reduction to simplify complex data while retaining critical information.
2. Improving classification performance by identifying a compact and representative feature space.
3. Leveraging unsupervised learning (VAE) to enhance feature generalizability across patient data.

---

### **Methodology**

#### **1. Data Preprocessing**

- **Input Data**: EEG signals were collected and preprocessed into 5-minute smoothed segments, with extracted features including EEG and spike Hz values.
- **Normalization**: Signals were normalized to ensure consistent scaling across samples.
- **Padding/Truncation**: To handle varying signal lengths, all signals were either padded or truncated to a uniform length of 1139 data points.
- **Feature Extraction**: Initial experiments were conducted using raw padded data; additional methods such as wavelet transforms and PSD analysis were available but not fully utilized in this stage.

#### **2. Model Architecture**

- **Encoder**:
  - A stack of four 1D convolutional layers with decreasing channel sizes, interleaved with Leaky ReLU activations.
  - Fully connected layers to map the final convolutional outputs to latent mean and log variance vectors.
- **Decoder**:
  - Fully connected layers to reshape latent vectors into feature maps.
  - A mirrored stack of transposed convolutional layers to reconstruct the input signal.
- **Latent Space**:
  - Dimensionalities tested: [1, 2, 3, 5, 7, 10, 20].
  - Latent representations were extracted and used as input features for downstream classification.

#### **3. Training and Loss Function**

- **Loss Components**:
  - Reconstruction loss: Mean squared error between input and reconstructed signals.
  - KL divergence loss: Regularization to enforce smoothness in the latent space.
- **Optimization**:
  - Adam optimizer with a learning rate of 1e-3 and weight decay of 1e-4.
  - Early stopping with a patience of 10 epochs.

#### **4. Classification**

- **Models Used**:
  - SVM (linear kernel) and XGBoost classifiers.
  - Balanced class weights to address imbalanced dataset issues.
- **Evaluation Metrics**:
  - Classification accuracy, confusion matrix, and classification report (precision, recall, F1-score).

#### **5. Visualization**

- PCA and t-SNE were applied to latent features for dimensionality reduction and visualization.
- Labels (Good/Bad Outcome) were used to assess feature separability in the latent space.

---

### **Results**

#### **1. Classification Performance**

| Latent Dimension | SVM Accuracy | XGBoost Accuracy |
|------------------|--------------|-------------------|
| 1                | 0.6667       | 0.6800            |
| 2                | 0.7000       | 0.7167            |
| 3                | 0.7333       | 0.7500            |
| 5                | 0.7667       | 0.7833            |
| 7                | **0.8000**   | **0.8300**        |
| 10               | 0.7833       | 0.7900            |
| 20               | 0.7833       | 0.7800            |

- **Optimal Dimension**: A latent dimension of 7 achieved the highest classification accuracy of 0.8000 for both SVM and XGBoost.
- **Dimensionality Trade-Off**: Lower dimensions (1-2) lacked sufficient representational capacity, while higher dimensions (10-20) offered diminishing returns.

#### **2. Visualization**

- **PCA**: Latent features showed partial clustering of Good and Bad outcomes, with some overlap.
- **t-SNE**: Improved separability between classes, suggesting meaningful latent representations.

#### **3. Training Metrics**

- Reconstruction loss and KL divergence both converged within 50 epochs.
- Early stopping prevented overfitting during training.

---

### **Discussion**

- **Key Findings**:
  - Latent features effectively reduced data dimensionality while preserving critical information for classification.
  - A balanced latent dimension (e.g., 7) provided the best trade-off between feature expressiveness and model complexity.

- **Challenges**:
  - Class imbalance: Positive (Good Outcome) samples were underrepresented, leading to higher False Negatives.
  - Computational cost: Higher latent dimensions significantly increased training time.

---

### **Next Steps**

1. **Feature Engineering**:
   - Incorporate wavelet and PSD-based features into the input pipeline.
   - Experiment with multimodal data integration (e.g., combining EEG with other biomarkers).

2. **Model Optimization**:
   - Test advanced architectures like Transformer-based autoencoders.
   - Use hyperparameter tuning (e.g., grid search) to optimize classifier settings.

3. **Class Balancing**:
   - Employ oversampling or data augmentation for underrepresented classes.
   - Use focal loss to mitigate class imbalance during training.

4. **Validation**:
   - Extend evaluation to external datasets for robustness testing.
   - Perform cross-validation to ensure generalizability.

---



# -----------------------------------------------------------------------

# Contrastive Learning-Based Feature Extraction and Classification for EEG Signals: Report

---

## Background and Motivation

EEG (Electroencephalogram) signals are valuable tools for studying brain activity, especially in predicting patient recovery outcomes. However, the high dimensionality and complexity of EEG data pose significant challenges for analysis and modeling. Initially, we explored Autoencoder-based approaches, but their reconstruction-oriented objectives failed to directly improve classification performance, particularly in scenarios with severe class imbalance.

The limitations of Autoencoder methods are as follows:
1. **Unclear task alignment**: Autoencoders optimize for signal reconstruction, which does not directly enhance classification performance.
2. **Limited class differentiation**: The latent space does not explicitly encode differences between classes, resulting in poor separability for Good Outcome and Bad Outcome samples.
3. **Weak adaptation to class imbalance**: Autoencoders tend to focus on majority class features, neglecting minority class characteristics.

To address these issues, we turned to contrastive learning, which explicitly optimizes the similarity between positive pairs (samples from the same class) and the dissimilarity between negative pairs (samples from different classes). This approach enhances the discriminative power of the latent features, particularly under class imbalance conditions.

---

## Methods and Implementation Details

### 1. Data Preparation

1. **Data Loading and Preprocessing**:
   - **Input data**: 5-minute EEG signal segments labeled based on patient recovery outcomes (Good Outcome or Bad Outcome).
   - **Normalization**: Signals were standardized to ensure consistent scaling across samples.
   - **Length adjustment**: Signals shorter than 1139 data points were padded with `-1`, while longer signals were truncated to 1139 points for uniform input length.
   - **Class distribution**: The dataset exhibited significant class imbalance, with fewer Good Outcome samples than Bad Outcome samples.

2. **Label Mapping**:
   - Recovery outcomes were binarized: Good Outcome labeled as `1` and Bad Outcome labeled as `0`.

---

### 2. Contrastive Learning Dataset Construction

- **Sample Generation**:
  - Each signal was treated as an anchor, with one positive sample (from the same class) and one negative sample (from a different class) dynamically generated for training.
  - Random sampling ensured diversity in positive and negative pairs.
- **Class Imbalance Mitigation**:
  - Due to the lower number of Good Outcome samples, negative pairs involving the majority class were generated more frequently to enhance minority class feature learning.

---

### 3. Model Architecture

1. **Contrastive Encoder**:
   - **Objective**: Map high-dimensional EEG signals to a low-dimensional latent space while enhancing class separability.
   - **Architecture**:
     - Two 1D convolutional layers:
       - First layer: 128 channels, kernel size of 3, ReLU activation.
       - Second layer: 64 channels, kernel size of 3, ReLU activation.
     - Fully connected layer to flatten the convolutional outputs and map them to a 10-dimensional latent space.
   - **Latent Dimension**: Set to 10, balancing feature expressiveness and model complexity.

2. **Loss Function (NT-Xent Loss)**:
   - **Objective**: Maximize similarity between positive pairs and minimize similarity between negative pairs.
   - **Formula**:
     \[
     L = -\log\frac{\exp(\text{sim}(\mathbf{z}_a, \mathbf{z}_p)/\tau)}{\exp(\text{sim}(\mathbf{z}_a, \mathbf{z}_p)/\tau) + \exp(\text{sim}(\mathbf{z}_a, \mathbf{z}_n)/\tau)}
     \]
     where \(\tau\) is the temperature parameter (set to 0.5 in this study), and \(\text{sim}(\cdot, \cdot)\) denotes cosine similarity.

---

### 4. Training Process

1. **Training Configuration**:
   - Optimizer: Adam with a learning rate of 1e-3.
   - Batch size: 32.
   - Epochs: 100, with early stopping to prevent overfitting.

2. **Progress Monitoring and Visualization**:
   - The average loss per epoch was recorded, and a loss convergence curve was plotted.
   - Dynamic generation of positive and negative pairs ensured diverse training data in each batch.

---

### 5. Feature Extraction and Classification

1. **Feature Extraction**:
   - The trained contrastive encoder was used to extract low-dimensional latent features from the test set.
   - PCA was applied to reduce high-dimensional latent features to 2D for visualization of Good Outcome and Bad Outcome distributions.

2. **Classification Tasks**:
   - SVM (linear kernel) and XGBoost classifiers were employed to evaluate the extracted features.
   - Evaluation metrics included accuracy, classification reports (Precision, Recall, F1-score), and confusion matrices.

---

## Results

### 1. Classification Performance

| Classifier  | Accuracy | Precision (Good) | Recall (Good) | F1-Score (Good) |
|-------------|----------|------------------|---------------|-----------------|
| **SVM**     | 0.7955   | 0.36             | 0.27          | 0.31            |
| **XGBoost** | 0.7500   | 0.11             | 0.07          | 0.08            |

- **Analysis**:
  - SVM outperformed XGBoost in overall classification performance, particularly in Recall for the minority class (Good Outcome).
  - Class imbalance significantly impacted classification metrics for the Good Outcome class.

---

### 2. Visualization

- **PCA Results**:
  - Good Outcome and Bad Outcome samples showed partial separability in the 2D latent space, with some overlap.
  - Further optimization of sample generation strategies may improve separability.

---

### 3. Loss Convergence Curve

- The model converged within 50 epochs, demonstrating stable training without overfitting.

---

## Discussion and Improvements

### Key Findings
1. Contrastive learning significantly improved the class separability of latent features, enhancing their discriminative power for classification tasks.
2. Dynamic generation of positive and negative pairs effectively mitigated class imbalance to some extent.

### Main Challenges
1. **Class Imbalance**: Limited samples for the Good Outcome class constrained classification performance.
2. **Feature Overlap**: Despite improvements, some overlap between Good Outcome and Bad Outcome features remained.

### Future Directions
1. **Data Augmentation**:
   - Employ oversampling or noise addition techniques to increase the representation of minority class samples.
2. **Model Optimization**:
   - Explore advanced contrastive learning architectures (e.g., SimCLR, BYOL) and integrate multimodal features.
3. **External Validation**:
   - Test the model’s generalizability on additional datasets.

---

## Conclusion

This study successfully demonstrated the potential of contrastive learning for extracting discriminative features from EEG signals, enabling effective classification of patient recovery outcomes. Despite limitations imposed by class imbalance, contrastive learning showed promising results in enhancing feature separability. Future work will focus on data augmentation and model optimization to further improve performance and generalizability.