# Notebook 03: ZuCo Data Alignment
This notebook processes the **Zurich Cognitive Language Processing Corpus (ZuCo)** data (both version 1.0 and 2.0):
- **Input:** Raw eye-tracking and EEG recordings for each subject (e.g., MATLAB `.mat` files and associated sentence text files in `data/raw_public/zuco/`).
- **Task:** Align each word in the reading sentences with eye-tracking metrics (first fixation duration, gaze duration, etc.) and EEG features (e.g., mean band power during word fixation).
- **Output:** A tidy dataset (CSV/Parquet) where each row is a word instance (with identifiers for subject and sentence), and columns for reading time metrics and EEG features.

In [1]:
# Pseudocode for loading and aligning ZuCo data
import scipy.io
import pandas as pd

# Example: load subject 1 data from ZuCo 1.0
subject_id = 'S1'
mat_file = f'data/raw_public/zuco/v1/{subject_id}.mat'
data = scipy.io.loadmat(mat_file)
# Assume 'data' contains EEG signals and timestamps, plus sentence and word indices.

# Placeholder: parse fixations and EEG
aligned_records = []
# for each sentence in data:
#    for each word in sentence:
#         extract FFD, GD, TRT, GPT from eye-tracking
#         extract EEG band power during word fixation interval
#         append to aligned_records with subject, sentence, word, metrics

aligned_df = pd.DataFrame(aligned_records, columns=['Subject','SentenceID','Word','FFD','GD','TRT','GPT','ThetaPower','AlphaPower','...'])
aligned_df.to_csv('data/processed/zuco_aligned.csv', index=False)
print(f"Aligned data saved: {len(aligned_df)} word-level records.")