# Notebook 07 Microexpression Modeling ‚Äî CASME II + SMIC Fusion  
### Project: Trauma-Informed AI Framework  
### Author: Michelle Lynn George (Elle)  
### Institution: Vanderbilt University, School of Engineering  
### Year: 2025  
### Version: 1.0  
### Date of last run: 2025-11-24
### Last polished on: 2025-10-15
---

### Purpose
This notebook launches the first emotion modeling phase of the trauma-informed AI framework.  
Using the fused metadata from CASME II and SMIC, we will:

- Frame the difference between **micro** and **macro** expressions
- Engineer features based on **duration, modality, and action units**
- Train early classifiers to **predict emotion labels**
- Prepare for downstream Z3 symbolic verification (Notebook 08)

---

### Input:
- `fused_microexpression_metadata.parquet` (from Notebook 06)

### Output:
- Classifier artifacts (joblib / pickle)
- Cleaned modeling data
- Visuals: confusion matrix, ROC, emotion distributions

---

### Reminder:
All saves must go to:
- `outputs/checks/` ‚Üí for `.parquet`, `.csv`, `.joblib`
- `outputs/visuals/` ‚Üí for plots and diagrams


In [None]:
# =============================================================================
# 7.0 Microexpression Modeling Kickoff 
# =============================================================================
# Purpose:
#   - Begin emotion modeling using fused CASME II + SMIC metadata
#   - Engineer emotion features, temporal windows, and AU tags
#   - Build early exploratory models (baseline classifiers, timelines, flags)
# =============================================================================

from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# --- Define root paths (project-level consistency) ----------------------------
ROOT = Path.cwd().parent  # From /notebooks/, go up to project root
DATA_DIR = ROOT / "data"
PROCESSED_DIR = DATA_DIR / "processed"
RAW_DIR = DATA_DIR / "raw"
CHECKS_DIR = ROOT / "outputs" / "checks"
VIS_DIR = ROOT / "outputs" / "visuals"

# --- Create output folders if missing -----------------------------------------
CHECKS_DIR.mkdir(parents=True, exist_ok=True)
VIS_DIR.mkdir(parents=True, exist_ok=True)

# --- Confirm notebook init ----------------------------------------------------
print("‚úÖ Notebook 07 initialized successfully")
print(f"üìÇ Root:       {ROOT}")
print(f"üìÇ Checks:     {CHECKS_DIR}")
print(f"üìÇ Visuals:    {VIS_DIR}")


# =============================================================================
# 7.1 Load Fused Metadata
# -----------------------------------------------------------------------------
# Load the cleaned metadata that includes both CASME II and SMIC microexpression records.
# This will be the foundation for all modeling and AU-based augmentation.
# =============================================================================

FUSED_PATH = CHECKS_DIR / "fused_microexpression_metadata.parquet"

try:
    fusion_df = pd.read_parquet(FUSED_PATH)
    print(f"‚úÖ Loaded fused metadata: {fusion_df.shape}")
except FileNotFoundError:
    print(f"‚ùå Fused metadata not found at: {FUSED_PATH}")
    fusion_df = None

# --- Preview structure and distribution ---------------------------------------
if fusion_df is not None:
    display(fusion_df.head(3))
    display(fusion_df.info())
    print("‚úÖ Emotion distribution:")
    print(fusion_df["Emotion"].value_counts())



In [None]:
# =============================================================================
# üï∑Ô∏è Spider Check - verify SMIC & CASME2 are both in the intended fused dataset
#  üß†‚ú® data integrity triumph!
# =============================================================================

print("üî¢ Distribution of samples by source dataset:")
print(fusion_df["SourceDataset"].value_counts())


---
## 7.2 Engineer Microexpression Features

This step sets the stage for any model to learn patterns by creating useful, numeric features from raw metadata.

 Goals:

- Convert Onset, Peak, Offset, and Duration to numeric

- Compute Latency (time between Onset and Peak)

- Compute Intensity Window (Peak to Offset)

- Count ActionUnits (AU count from string list like "4+L10" ‚Üí 2)

- Normalize casing in Modality, handle missing values if needed

- Confirm feature distribution

In [None]:
# =============================================================================
# 7.2 Engineer Microexpression Features
# -----------------------------------------------------------------------------
# Convert timing columns to numeric and compute derived features:
#   - Latency = Peak - Onset
#   - Intensity = Offset - Peak
#   - AU_Count = number of Action Units (e.g., "4+L10" ‚Üí 2)
# Also standardize modality and handle missing values.
# =============================================================================

# --- Convert columns to numeric ------------------------------------------------
cols_to_numeric = ["Onset", "Peak", "Offset", "Duration"]
for col in cols_to_numeric:
    fusion_df[col] = pd.to_numeric(fusion_df[col], errors="coerce")

# --- Derive latency and intensity ---------------------------------------------
fusion_df["Latency"] = fusion_df["Peak"] - fusion_df["Onset"]
fusion_df["Intensity"] = fusion_df["Offset"] - fusion_df["Peak"]

# --- Count Action Units -------------------------------------------------------
# Handles values like "4+L10", "12", etc.
def count_aus(entry):
    if pd.isna(entry):
        return 0
    return len(str(entry).split("+"))

fusion_df["AU_Count"] = fusion_df["ActionUnits"].apply(count_aus)

# --- Normalize modality casing ------------------------------------------------
fusion_df["Modality"] = fusion_df["Modality"].str.upper()

# --- Check nulls and structure ------------------------------------------------
display(fusion_df[["Onset", "Peak", "Offset", "Latency", "Intensity", "AU_Count"]].describe())
print("‚úÖ Feature engineering complete ‚Äî ready for modeling!")


In [None]:
# =============================================================================
# 7.2.1 Save Feature-Engineered Microexpression Metadata
# -----------------------------------------------------------------------------
# Purpose:
#   - Save the updated DataFrame after computing Latency, Intensity, AU_Count
#   - Stored as safe .parquet format for reuse in 7.3 modeling pipeline
# =============================================================================

FEATURES_PATH = CHECKS_DIR / "microexpression_features.parquet"

fusion_df.to_parquet(FEATURES_PATH, index=False)
print(f"‚úÖ Saved engineered features ‚Üí {FEATURES_PATH.name}")


In [None]:
# üï∑Ô∏è SPider Check- Confirm save worked -----------------------------------------------------
if FEATURES_PATH.exists():
    print("üìÇ Feature file contents:")
    display(pd.read_parquet(FEATURES_PATH).sample(3))
else:
    print("‚ùå Save failed ‚Äî file not found!")


---
## 7.3 Microexpression Emotion Modeling Kickoff
Purpose:
   Build baseline classifiers to predict emotion labels using facial-action 
   metadata features (Latency, Intensity, AU_Count).
   Evaluate model performance with accuracy, F1-score, and confusion matrix.
   Save predictions for integration with Z3 rule logic in Notebook 08.


In [None]:
# =============================================================================
# 7.3 Microexpression Emotion Modeling Kickoff
# -----------------------------------------------------------------------------
# Train baseline classifiers (LogReg, RF, KNN) on facial metadata.
# Use a pipeline with imputation + scaling.
# Evaluate with classification metrics and confusion matrices.
# Save Z3-ready predictions as a separate cell to ensure image output finalizes.
# =============================================================================

# --- Import Packages ---------------------------------------------------------
import pandas as pd
import numpy as np
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import (
    accuracy_score, f1_score, confusion_matrix,
    classification_report, ConfusionMatrixDisplay
)
import matplotlib.pyplot as plt
import seaborn as sns

# --- Load Features -----------------------------------------------------------
ROOT = Path.cwd().parent
FEATURE_PATH = ROOT / "outputs" / "checks" / "microexpression_features.parquet"
df = pd.read_parquet(FEATURE_PATH)

print(f"‚úÖ Loaded features: {df.shape}")
display(df.head())

# --- Visualize Emotion Distribution ------------------------------------------
plt.figure(figsize=(8, 4))
sns.countplot(x="Emotion", data=df, order=df["Emotion"].value_counts().index)
plt.title("Emotion Class Distribution")
plt.xlabel("Emotion")
plt.ylabel("Count")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# --- Train/Test Split --------------------------------------------------------
X = df[["Latency", "Intensity", "AU_Count"]]
y = df["Emotion"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, test_size=0.2, random_state=42
)

print(f"üìä Train set: {X_train.shape}, Test set: {X_test.shape}")



In [None]:
# =============================================================================
# 7.3.1 Load Microexpression Features from Notebook 06
# =============================================================================

from pathlib import Path
import pandas as pd

ROOT = Path.cwd().parent
FEATURE_PATH = ROOT / "outputs" / "checks" / "microexpression_features.parquet"
df = pd.read_parquet(FEATURE_PATH)

print(f"‚úÖ Loaded features: {df.shape}")
display(df.head())


In [None]:
# =============================================================================
# 7.3.2  Train/Test Split ‚Äî Microexpression Classifier Setup
# -----------------------------------------------------------------------------
# This cell prepares X and y features from engineered microexpression metadata.
# It splits data into training and test sets for model evaluation.
# Target = Emotion class. Features = Latency, Intensity, AU_Count
# =============================================================================

# Define input and output columns
X = df[["Latency", "Intensity", "AU_Count"]]
y = df["Emotion"]

# Split into train and test sets (80/20 stratified)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, test_size=0.2, random_state=42
)

print(f"üìä Train set: {X_train.shape}, Test set: {X_test.shape}")


In [None]:
# =============================================================================
# 7.3.3  Train & Evaluate ‚Äî Baseline Classifiers (LogReg, RF, KNN)
# -----------------------------------------------------------------------------
# This cell trains three baseline classifiers on microexpression features:
#   - Logistic Regression
#   - Random Forest
#   - K-Nearest Neighbors
#
# Each model uses a pipeline with:
#   - Median imputation for missing values
#   - Standard scaling of features
#   - A classifier specified in the model loop
#
# Evaluation includes:
#   - Accuracy and macro-averaged F1 score
#   - Confusion matrix visualization
#   - PNG export of matrix for inclusion in Appendix and README
#
# Final predictions are collected into a list for saving in Section 7.4.
# =============================================================================

# --- Extract features (X) and labels (y) from engineered metadata ------------
X = df[["Latency", "Intensity", "AU_Count"]]
y = df["Emotion"]

# --- Re-import dependencies for reproducibility ------------------------------
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# --- Define baseline classifiers ---------------------------------------------
models = {
    "LogReg": LogisticRegression(max_iter=1000),
    "RF": RandomForestClassifier(n_estimators=100, random_state=42),
    "KNN": KNeighborsClassifier(n_neighbors=5)
}

# --- Initialize list to store predictions from all models --------------------
all_preds = []

# --- Train, Evaluate, and Visualize for Each Model ---------------------------
for name, model in models.items():

    # Build modeling pipeline: Imputation ‚Üí Scaling ‚Üí Classifier
    pipe = Pipeline([
        ("impute", SimpleImputer(strategy="median")),
        ("scale", StandardScaler()),
        ("clf", model)
    ])

    # Fit model on training data and generate predictions on test data
    pipe.fit(X_train, y_train)
    preds = pipe.predict(X_test)

    # Print classification metrics to console
    print("="*60)
    print(f"‚úÖ Model: {name}")
    print(f"üéØ Accuracy: {accuracy_score(y_test, preds):.3f}")
    print(f"üßÆ F1 Score (Macro): {f1_score(y_test, preds, average='macro'):.3f}")
    print("="*60)

    # Generate confusion matrix and visualize results
    cm = confusion_matrix(y_test, preds, labels=y.unique())
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=y.unique())
    disp.plot(xticks_rotation=45)
    plt.title(f"{name} ‚Äî Confusion Matrix")

    # Save confusion matrix as PNG to visuals folder
    fig = disp.figure_
    img_path = ROOT / "outputs" / "visuals" / f"07_confmat_{name}.png"
    fig.savefig(img_path, bbox_inches="tight")
    print(f"üì∏ Saved confusion matrix to: {img_path.name}")

    # Show the matrix inline
    plt.show()

    # Append predictions to all_preds list for future saving
    all_preds.append(pd.DataFrame({
        "Model": name,
        "y_true": y_test.values,
        "y_pred": preds
    }))




---
## 7.3.3 | Results Summary: Baseline Emotion Classifiers (LogReg, RF, KNN)

By evaluating three traditional classifiers using engineered microexpression features ‚Äî **Latency**, **Intensity**, and **AU Count** ‚Äî to predict emotion class labels derived from CASME II and SMIC.

## Confusion Matrix Observations
- Strong prediction clusters emerged around broad categories such as **'others'**, **'negative'**, and **'sadness'**.
- **Nuanced classes** (e.g., *repression*, *disgust*, *fear*) were often confused with one another, indicating poor separation in the feature space.
- Some emotions (e.g., *happiness*) were **underrepresented or misclassified entirely**, suggesting the taxonomy fails to capture expressive distinctiveness.

##  Performance Overview

| Model   | Accuracy | F1 (Macro) |
|---------|----------|------------|
| LogReg  | ~0.375   | ~0.233     |
| RF      | ~0.487   | ~0.282     |
| KNN     | ~0.428   | ~0.244     |

> üßæ Note: All metrics reflect stratified 80/20 splits across 560 combined microexpression records.

##  Interpretation

Despite preprocessing and model tuning, overall accuracy plateaued around **40‚Äì47%**, and macro F1 scores remained low. These outcomes reinforce our hypothesis that **surface-level metadata alone is insufficient** for distinguishing trauma-informed emotional states.

The failure patterns observed in these confusion matrices support a deeper insight:  
> **The current emotion taxonomy may be too vague to enable meaningful classification**, especially when expressions are masked, suppressed, or dissociative in nature.

This aligns with the motivation behind our symbolic Z3 verification pipeline ‚Äî designed to address exactly these limitations.




In [None]:
# =============================================================================
# 7.3.4 Save Final Predictions for Z3 Symbolic Analysis
# -----------------------------------------------------------------------------
# This cell concatenates predictions from all 3 baseline models into a single
# DataFrame (`all_preds`) and saves them to disk for use in Notebook 08.
# These predictions will be cross-checked using symbolic empathy rules in Z3.
# =============================================================================

import pandas as pd
from pathlib import Path

# --- Define export path ------------------------------------------------------
ROOT = Path.cwd().parent
OUTPUT_PATH = ROOT / "outputs" / "checks" / "z3_ready_input.parquet"

# --- Concatenate all predictions and export ----------------------------------
z3_ready_df = pd.concat(all_preds, axis=0).reset_index(drop=True)
z3_ready_df.to_parquet(OUTPUT_PATH, index=False)

# --- üï∑Ô∏è Spider Check - Confirm save and display preview ----------------------------------------
print("="*60)
print(f"‚úÖ Z3-ready predictions saved to: {OUTPUT_PATH.name}")
print(f"üìê Final shape: {z3_ready_df.shape}")
print("üìÑ Sample rows:")
display(z3_ready_df.sample(5))
print("="*60)



In [None]:
# =============================================================================
# 7.3.5 Additional Classifier Benchmarks: SVC & MLP
# -----------------------------------------------------------------------------
# This cell extends the baseline evaluation by adding:
#   - Support Vector Classifier (SVC)
#   - Multi-Layer Perceptron (MLP)
#
# Both models follow the same pipeline: imputation + scaling + classification.
# Results include accuracy, macro F1, and saved confusion matrices.
# Final predictions are appended to `all_preds` for symbolic use in Notebook 08.
# =============================================================================

# --- Re-import dependencies (if needed) --------------------------------------
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier

# --- Define additional models ------------------------------------------------
additional_models = {
    "SVC": SVC(probability=False, kernel="rbf", C=1.0, random_state=42),
    "MLP": MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=42)
}

# --- Loop through additional models ------------------------------------------
for name, model in additional_models.items():

    # Build modeling pipeline with imputation + scaling
    pipe = Pipeline([
        ("impute", SimpleImputer(strategy="median")),
        ("scale", StandardScaler()),
        ("clf", model)
    ])

    # Fit and predict
    pipe.fit(X_train, y_train)
    preds = pipe.predict(X_test)

    # Print performance
    print(f"üß† Model: {name}")
    print("üìä Accuracy:", accuracy_score(y_test, preds))
    print("üéØ F1 (macro):", f1_score(y_test, preds, average="macro"))

    # Generate and save confusion matrix
    cm = confusion_matrix(y_test, preds, labels=y.unique())
    disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=y.unique())
    disp.plot(xticks_rotation=45)
    plt.title(f"{name} ‚Äî Confusion Matrix")

    # Save figure
    fig = disp.figure_
    img_path = ROOT / "outputs" / "visuals" / f"07_confmat_{name}.png"
    fig.savefig(img_path, bbox_inches="tight")
    print(f"üì∏ Saved confusion matrix to: {img_path.name}")
    plt.show()

    # Store predictions for export
    all_preds.append(pd.DataFrame({
        "Model": name,
        "y_true": y_test.values,
        "y_pred": preds
    }))


---

### 7.3.5 | Additional Model Results Summary: SVC & MLP

Despite architectural differences, both SVC and MLP models produced **lower macro F1 scores** than previous classifiers. Their confusion matrices revealed:

- Continued overlap in *others*, *sadness*, *repression*, and *fear*.
- Frequent misclassification of *happiness* and *positive* emotions.
- Low precision across nuanced emotional states.

These results reinforce the core hypothesis of this framework: **surface-level metadata cannot resolve emotional ambiguity** without **semantic scaffolding.**

---

### Implication:
Even nonlinear or boundary-aware models fail when emotional categories are vague or overlapping ‚Äî highlighting the need for symbolic logic (Z3) and trauma-aware verification rules.


In [None]:
# =============================================================================
# 7.3.6 Save Final Predictions for Z3 Symbolic Analysis
# -----------------------------------------------------------------------------
# Concatenates predictions from all 5 baseline classifiers:
#   - Logistic Regression (LogReg)
#   - Random Forest (RF)
#   - K-Nearest Neighbors (KNN)
#   - Support Vector Classifier (SVC)
#   - Multi-Layer Perceptron (MLP)
#
# Output files:
#   - `z3_ready_input.parquet`: for downstream symbolic reasoning in Notebook 08
#   - `z3_ready_input.csv`: human-readable format for GitHub review and audit
# Both formats preserve predicted labels and true labels for rule contradiction analysis.
# =============================================================================


import pandas as pd
from pathlib import Path

# --- Define export paths -----------------------------------------------------
ROOT = Path.cwd().parent
PARQUET_PATH = ROOT / "outputs" / "checks" / "z3_ready_input.parquet"
CSV_PATH     = ROOT / "outputs" / "checks" / "z3_ready_input.csv"

# --- Concatenate and save ----------------------------------------------------
z3_ready_df = pd.concat(all_preds, axis=0).reset_index(drop=True)
z3_ready_df.to_parquet(PARQUET_PATH, index=False)
z3_ready_df.to_csv(CSV_PATH, index=False)

# --- üï∑Ô∏è Spider Check (Sanity) -------------------------------------------------
print("=" * 60)
print(f"‚úÖ Z3-ready predictions saved:")
print(f"   üìÅ {PARQUET_PATH.name}")
print(f"   üìÑ {CSV_PATH.name}")
print(f"‚úÖ Final shape: {z3_ready_df.shape}")
print(f"‚úÖ Included models: {z3_ready_df['Model'].nunique()} ‚Äî {z3_ready_df['Model'].unique().tolist()}")
print("=" * 60)

# --- Display sample rows -----------------------------------------------------
display(z3_ready_df.sample(5))



---

## Executive Summary ‚Äî Symbolic Verification Milestone


This notebook analyzed a fused dataset composed of 305 samples from **SMIC** and 255 samples from **CASME II**, two widely-used microexpression databases. All records were cleaned, aligned, and fused in Notebook 06 before being passed into this notebook for modeling. Feature engineering and emotion classification were conducted on this combined corpus using three engineered metadata features:

- ‚úÖ **Latency** ‚Äî how fast the microexpression emerged  
- ‚úÖ **Intensity** ‚Äî the emotional magnitude  
- ‚úÖ **AU_Count** ‚Äî count of activated Action Units per clip  

---

### Baseline Classifier Overview

Five classifiers were trained to predict emotion classes:
- Logistic Regression (LogReg)
- Random Forest (RF)
- K-Nearest Neighbors (KNN)
- Support Vector Classifier (SVC)
- Multi-Layer Perceptron (MLP)

All models used a standardized pipeline: `Impute ‚Üí Scale ‚Üí Classify`. The results were benchmarked using **Accuracy** and **F1 (Macro)** scores, and visualized via saved **Confusion Matrices**.

---

### Findings & Milestone Framing

Across all five classifiers, performance plateaued between **~40‚Äì48% accuracy**, with **low macro F1**. Confusion matrices showed:

- ‚ùå Persistent misclassification in nuanced emotions like *repression*, *disgust*, *sadness*, *fear*
- ‚ùå Over-clustering in vague categories like *others* and *negative*
- ‚ùå Frequent mislabeling of *positive* and *happiness* cases

These findings **confirm the central hypothesis** of this trauma-informed AI framework:

> **Surface-level metadata alone is insufficient** to detect trauma-influenced emotional states or subtle affective shifts.

---

### Implication & Handoff to Notebook 08

These limitations motivated the shift to **symbolic logic verification** (Z3). All predictions have been exported to:

üìÇ `z3_ready_input.parquet`

This file now serves as the input to **23 symbolic empathy rules**, which will be used in Notebook 08 to cross-check ML predictions and uncover latent affective structures such as:

- Suppression  
- Dissociation  
- Semantic Absence (*The Haunting Problem*)  

---

## ‚û°Ô∏è Next Steps ‚Äî Notebook 08

- üîÑ Load and evaluate predictions from `z3_ready_input.parquet`
- üß† Apply 23 empathy rules and log activation frequencies
- üìä Cross-check symbolic flags vs. ML predictions
- üö© Flag contradictions and analyze failure cases
- üïµÔ∏è Identify ‚Äúfalse passes‚Äù where symbolic rules catch emotional masking missed by traditional models
- ‚úçÔ∏è Update Glossary with emerging symbolic concepts (e.g., ‚Äúmasked sadness‚Äù, ‚Äúdefensive detachment‚Äù)

> This marks the turning point:
> From prediction to verification.  
> From surface signals to symbolic safety.  
> From unseen to **unforgotten**.

---

## Appendix A | Baseline Classifier Confusion Matrices


**Context:**  
These matrices show how the baseline models (LogReg, RF, KNN, SVC, MLP) classified emotion labels using engineered metadata features.

**Key Legend:**
- Rows = Ground truth emotion labels  
- Columns = Predicted emotion labels  
- Diagonal = Correct predictions  
- Off-diagonal = Misclassifications (ideally minimal)  

**Notable Observations:**
- ‚ùå High error in subtle classes (*e.g., disgust, repression, fear*)  
- ‚ùå Over-clustering in generic categories (*e.g., others, negative*)  
- ‚ö†Ô∏è Strong evidence the current emotion taxonomy is too coarse  
- ‚úÖ Supports symbolic logic for finer affect classification  

**Linked Visuals (Saved):**
- `07_confmat_LogReg.png`  
- `07_confmat_RF.png`  
- `07_confmat_KNN.png`  
- `07_confmat_SVC.png`  
- `07_confmat_MLP.png`  

---

## Additional Model Results Summary: SVC & MLP


To ensure a fair baseline before symbolic logic, two additional models were tested:

| Model | Rationale |
|-------|-----------|
| **SVC** (Support Vector Classifier) | Evaluates class separability with kernel-based boundaries |
| **MLP** (Multi-Layer Perceptron) | Tests nonlinear learning via neural architecture |

**Result Highlights:**
- üìâ Lower F1 scores than other models
- üòµ Continued confusion among *others*, *sadness*, *repression*
- üò¢ Misclassification of *happiness* and *positive* emotions
- ‚úÖ Reinforces core hypothesis: metadata alone can‚Äôt resolve affective ambiguity

**Implication:**  
Even boundary-aware or nonlinear models fail when emotional categories are **semantically vague or overlapping** ‚Äî validating the need for **Z3 symbolic logic** and trauma-aware verification rules.

---

## Glossary of Metrics, Models, and Concepts


| Term                     | Definition |
|--------------------------|------------|
| **Accuracy**             | Proportion of correct predictions |
| **F1 Score (Macro)**     | Harmonic mean of precision/recall across all classes |
| **Imputation (Median)**  | Fills missing values using the median of each feature |
| **StandardScaler**       | Normalizes features to mean = 0, std = 1 |
| **Logistic Regression**  | Linear model for classification |
| **Random Forest**        | Ensemble of decision trees |
| **K-Nearest Neighbors**  | Assigns class based on distance to nearest neighbors |
| **SVC**                  | Classifies using maximum-margin hyperplanes (kernel-based) |
| **MLP**                  | Feedforward neural net for nonlinear classification |
| **AU_Count**             | Number of activated Action Units (facial muscle groups) |
| **Latency**              | Time from onset to peak of a microexpression |
| **Z3 Symbolic Logic**    | Formal logic system for verifying emotional states using trauma-informed rules |

---

