## CLASSIFICATION (EEGNet notebook explained)

# Compare classifier performance using:

* Real training data
* Generated training data

STEP 1 — Load original training data
- For every subject & every class 0..3:
Read CSV
Drop first column
Keep 1000 samples
Scale values to 0–1 (MinMax scaling)
Append to train list
- Do same for:
Original/Axx/test/...
- Now you have:
train_dataset
train_label
test_dataset
test_label

STEP 2 — Load generated training data
For every subject & every class:
Read CSV
Keep 1000 samples
Scale values
Append
No test data from generated set.

STEP 3 — Prepare shapes
Neural networks expect 4D input:
(samples, channels, time, 1)
So you add one dimension at the end.
Labels must be converted to one-hot:
Class 2 → [0,0,1,0]

STEP 4 — Define EEGNet
EEGNet is a small CNN specialized for EEG.
Conceptually it does:
1. Temporal convolution
Learns frequency/time patterns in each channel
2. Depthwise spatial convolution
Learns relationships between channels
3. Pooling
Reduces temporal resolution
4. Separable convolution
Learns more complex features efficiently
5. Flatten
Convert feature maps to vector
6. Dense + Softmax
Output class probabilities

STEP 5 — Train two models
Model 1:
- Train on original training data.
Model 2:
- Train on generated training data.
Both are evaluated on the same real test dataset.

STEP 6 — Evaluation
Predict labels on real test set.
        Compute:
        Precision
        Recall
        F1-score
        Accuracy
Compare original vs generated training performance.
If generated improves performance → augmentation helped.

MPORTANT CONSISTENCY RULE
- Your trials must all have the same shape.
- If you extract 1000 timepoints: Then EEGNet must use Samples = 1000.
- If you choose 500: Then everything must consistently use 500.

In [None]:
# ============================================================
# PART 3: CLASSIFICATION (train EEGNet twice; evaluate on real test)
# ============================================================

def classification_experiment() -> None:
    """
    Train and evaluate classifier twice:
      A) Train on Original training set, test on Original test set
      B) Train on Generated training set, test on Original test set
    """

    # -----------------------
    # Step 1: Load datasets
    # -----------------------
    X_train_orig, y_train_orig = load_dataset_tree(
        root=ORIGINAL_ROOT, split="train",
        drop_first_column=True,             # original CSVs have channel-name column
        expected_shape=(22, 1000),
    )

    X_test_real, y_test_real = load_dataset_tree(
        root=ORIGINAL_ROOT, split="test",
        drop_first_column=True,
        expected_shape=(22, 1000),
    )

    X_train_gen, y_train_gen = load_dataset_tree_generated(
        root=GENERATED_ROOT, split="train",
        drop_first_column=False,            # generated CSVs should be numeric only
        expected_shape=(22, 1000),
    )

    # -----------------------
    # Step 2: Scale features
    # -----------------------
    # IMPORTANT: fit scaler on training data only, then apply to train+test.
    scaler_orig = fit_minmax_scaler(X_train_orig)
    X_train_orig = apply_scaler(X_train_orig, scaler_orig)
    X_test_real_A = apply_scaler(X_test_real, scaler_orig)

    scaler_gen = fit_minmax_scaler(X_train_gen)
    X_train_gen = apply_scaler(X_train_gen, scaler_gen)
    X_test_real_B = apply_scaler(X_test_real, scaler_gen)

    # -----------------------
    # Step 3: Shape formatting
    # -----------------------
    # Many CNN implementations expect (N, channels, samples, 1)
    X_train_orig = add_last_dim(X_train_orig)  # -> (N, 22, 1000, 1)
    X_train_gen = add_last_dim(X_train_gen)
    X_test_real_A = add_last_dim(X_test_real_A)
    X_test_real_B = add_last_dim(X_test_real_B)

    # Convert integer labels 0..3 -> one-hot vectors
    y_train_orig_oh = one_hot(y_train_orig, num_classes=4)
    y_train_gen_oh = one_hot(y_train_gen, num_classes=4)

    # -----------------------
    # Step 4: Train model A (original)
    # -----------------------
    model_A = define_eegnet_like_model(num_classes=4, channels=22, samples=1000)
    model_A = train_model(
        model=model_A,
        X=X_train_orig,
        y=y_train_orig_oh,
        validation_split=0.2,
        early_stopping_metric="val_accuracy",
    )

    # Evaluate on real test
    yhat_A = predict_classes(model_A, X_test_real_A)
    report_A = classification_report(y_true=y_test_real, y_pred=yhat_A)
    print("Model A (trained on Original) performance:")
    print(report_A)

    # -----------------------
    # Step 5: Train model B (generated)
    # -----------------------
    model_B = define_eegnet_like_model(num_classes=4, channels=22, samples=1000)
    model_B = train_model(
        model=model_B,
        X=X_train_gen,
        y=y_train_gen_oh,
        validation_split=0.2,
        early_stopping_metric="val_accuracy",
    )

    # Evaluate on real test
    yhat_B = predict_classes(model_B, X_test_real_B)
    report_B = classification_report(y_true=y_test_real, y_pred=yhat_B)
    print("Model B (trained on Generated) performance:")
    print(report_B)
