### 0. Explanation of the approach (Theory, reasoning)
**Pre-trained CLIP Embeddings vs. Raw EEG Data**

- CLIP embeddings: CLIP is a model that has been pre-trained on a massive dataset of images and text, so its embeddings contain rich semantic and visual information. These embeddings are already in a high-dimensional space that captures complex features of images and their associated textual descriptions.
- EEG data: EEG signals, on the other hand, are raw brain activity measurements that are highly noisy and subject-specific. The signal varies depending on individual neurological patterns, making it difficult to directly extract meaningful high-level features from EEG.

*Why EEG → CLIP works better:*

- Pre-trained semantic knowledge: CLIP embeddings already provide a well-established representation of visual and textual information. By mapping EEG signals to these rich, pre-trained embeddings, you’re leveraging the model’s learned understanding of the world and associating brain activity with meaningful, high-level features (such as whether the brain is processing visual or semantic information). In contrast, if you tried to map CLIP embeddings to EEG, you’d be trying to generate EEG signals from high-level, abstract representations, which is much more difficult because EEG signals don’t directly encode the structured semantics or visual information that CLIP does.

**Meaningful Associations (Visual vs. Semantic)**

The goal is to identify *when* the EEG data corresponds to visual vs. semantic information from CLIP embeddings. EEG data reflects brain activity related to perception, memory, or cognition, which directly aligns with visual or semantic processing in the brain.
*EEG → CLIP* allows us to train the model to understand which brain patterns correspond to which CLIP embeddings (visual or semantic), making it easier to match brain activity to its associated high-level features in CLIP.

*Why EEG → CLIP is preferred:*

The brain's neural responses are grounded in perception and cognition (which are visual or semantic in nature). This means that mapping EEG to CLIP allows you to uncover how different brain states (such as processing visual vs. semantic information) are reflected in the EEG data.

**!!** CLIP → EEG would not have a clear mapping because CLIP is designed to produce high-level embeddings, not brain signals. The relationship between EEG data and CLIP embeddings is more about mapping the brain's interpretation of images or text, rather than generating EEG signals from abstract embeddings.

**Implementation**
Things which one needs to consider about X in a linear regression:
- N x F (samples x features): N should always be more than F
- I have 4 images x 150 reps x 64 channels x 900 time points 
- y needs to be 4 x 150 x 768



### 1. Simulate captions (LLAVA rich)
### 2. Extract CLIP embeddings

### 3. Use a Simple Model
Start with a linear regression or a simple multi-layer perceptron (MLP):
- **Input:** Flattened EEG data (4 × 150 x 64 × 30 → 4*150=600 x 64) [x 90 (time points), / 2]; train 75% vs test 25% (LOIO approach)
- **Output:** CLIP embeddings (4 × 768).

A simple model is less likely to overfit your limited data and can still learn a basic mapping.


### 4. Code Skeleton idea (Ridge Regression, L2)

In [22]:
import numpy as np
from tqdm import tqdm
from sklearn.linear_model import Ridge
from sklearn.model_selection import LeaveOneOut

# -----------------------------
# Configurable Parameters
# -----------------------------
chunk_size = 2  # Number of time points to group together (set to 1 for no chunking)
alpha = 1.0     # Regularization strength for Ridge regression

# -----------------------------
# Simulated Data (Replace with real EEG & CLIP embeddings)
# -----------------------------
num_images = 4
num_trials = 150
num_channels = 64
num_timepoints = 90
clip_dim = 768  # CLIP embedding size

# EEG Data: (Images x Trials x Channels x Time)
EEG_data = np.random.rand(num_images, num_trials, num_channels, num_timepoints)  
# CLIP Embeddings: (Images x 768)
CLIP_embeddings = np.random.rand(num_images, clip_dim)  

# -----------------------------
# Leave-One-Image-Out CV
# -----------------------------
loo = LeaveOneOut()

for train_image_idxs, test_image_idx in loo.split(np.arange(num_images)):
    print(f"\n--- Testing on Image {test_image_idx} ---")

    # -----------------------------
    # Prepare Training Data
    # -----------------------------
    train_EEG = EEG_data[train_image_idxs]  # (3, 150, 64, 90)
    train_CLIP = CLIP_embeddings[train_image_idxs]  # (3, 768)

    # Reshape: Collapse images and trials into one dimension
    train_EEG = train_EEG.reshape(-1, num_channels, num_timepoints)  # (3*150=450, 64, 90)

    # Whiten data: Guggenmos approach (create a function)

    # -----------------------------
    # Prepare Test Data
    # -----------------------------
    test_EEG = EEG_data[test_image_idx]  # (150, 64, 90)
    test_CLIP = CLIP_embeddings[test_image_idx]  # (768,)

    # -----------------------------
    # Train Separate Models for Each Time Point
    # -----------------------------
    for t in tqdm(range(0, num_timepoints - chunk_size + 1, chunk_size), desc = "Time points"):
        # Chunk time points if needed
        train_X = train_EEG[:, :, t:t+chunk_size].reshape(450, -1)  # (450, chunk_size * 64)
        test_X = test_EEG[:, :, :, t:t+chunk_size].reshape(150, -1)  # (150, chunk_size * 64)

        # Target (CLIP Embeddings)
        train_y = np.repeat(train_CLIP, num_trials, axis=0)  # Expand to (450, 768)
        test_y = np.tile(test_CLIP, (150, 1))  # Expand to (150, 768)

        # Ridge Regression (Can switch to MLP later)
        model = Ridge(alpha=alpha)
        model.fit(train_X, train_y)

        # Predictions
        test_preds = model.predict(test_X)

        # Compute Cosine Similarity
        cosine_similarity = np.mean(np.sum(test_preds * test_y, axis=1) / 
                                    (np.linalg.norm(test_preds, axis=1) * np.linalg.norm(test_y, axis=1)))

        #print(f"Time {t}-{t+chunk_size-1}: Cosine Similarity = {cosine_similarity:.4f}")




--- Testing on Image [0] ---


Time points:   0%|          | 0/45 [00:00<?, ?it/s]

Time points: 100%|██████████| 45/45 [00:02<00:00, 15.33it/s]



--- Testing on Image [1] ---


Time points: 100%|██████████| 45/45 [00:02<00:00, 20.61it/s]



--- Testing on Image [2] ---


Time points: 100%|██████████| 45/45 [00:00<00:00, 62.86it/s] 



--- Testing on Image [3] ---


Time points: 100%|██████████| 45/45 [00:02<00:00, 15.93it/s]


In [19]:
test_EEG[:, :, :, t:t+chunk_size].shape

(1, 150, 64, 1)

In [18]:
train_EEG[:, :, t:t+chunk_size].shape

(450, 64, 1)