# Phase 2: Model Training & Evaluation
## Feature Selection, Model Complexity, and Data Comparison

**Goal:** Compare the role of different classifiers and feature importance for facial expression recognition

**Models Explored:**
1. Logistic Regression
2. Random Forest
3. Decision Tree

**Analysis Goals:**
1. Study the importance of feature subsets (e.g., facial points related to the mouth, eyes, etc.) using Random Forest
2. Demonstrate learning curves as a function of model complexity
3. Compare models trained on 'geometric' data with 'motion' data
4. Measure model training and inference timing

In [None]:
# Common Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, f1_score
import matplotlib.pyplot as plt
import seaborn as sns
from time import time
import warnings
warnings.filterwarnings('ignore')

## 1. Load Data
### Load both 'motion' and 'geometric' datasets

The motion dataset contains temporal differences in facial landmark positions between consecutive frames.
The geometric dataset contains normalized spatial positions of facial landmarks.

In [None]:
# Load datasets
data_motion = pd.read_csv('data_motion.csv')
data_geometric = pd.read_csv('data_geometric.csv')

print("Motion dataset shape:", data_motion.shape)
print("Geometric dataset shape:", data_geometric.shape)
print("\nLabel distribution in motion dataset:")
print(data_motion['Emotion'].value_counts().sort_index())

# Prepare data - separate features and labels
# Remove non-feature columns: Subject, Subject_Clean, Session_Clean, Emotion
feature_cols = [col for col in data_motion.columns if col not in ['Subject', 'Subject_Clean', 'Session_Clean', 'Emotion']]

X_motion = data_motion[feature_cols]
y_motion = data_motion['Emotion']

X_geometric = data_geometric[feature_cols]
y_geometric = data_geometric['Emotion']

print("\nNumber of features:", len(feature_cols))
print("Feature columns (first 10):", feature_cols[:10])

In [None]:
# Split motion data into train and test sets
X_train_motion, X_test_motion, y_train_motion, y_test_motion = train_test_split(
    X_motion, y_motion, test_size=0.2, random_state=42, stratify=y_motion
)

# Split geometric data into train and test sets
X_train_geometric, X_test_geometric, y_train_geometric, y_test_geometric = train_test_split(
    X_geometric, y_geometric, test_size=0.2, random_state=42, stratify=y_geometric
)

print("Motion - Training set size:", X_train_motion.shape[0])
print("Motion - Test set size:", X_test_motion.shape[0])
print("\nGeometric - Training set size:", X_train_geometric.shape[0])
print("Geometric - Test set size:", X_test_geometric.shape[0])

## 2. Logistic Regression

### Comparison with and without StandardScaler

Logistic Regression can benefit from feature standardization, especially when features have different scales.
We will compare performance with and without StandardScaler to evaluate its impact.

In [None]:
# Logistic Regression WITHOUT StandardScaler
print("=== Logistic Regression WITHOUT StandardScaler ===")

start_time = time()
lr_no_scale = LogisticRegression(max_iter=1000, random_state=42)
lr_no_scale.fit(X_train_motion, y_train_motion)
train_time_lr_no_scale = time() - start_time

# Predictions
y_pred_lr_no_scale = lr_no_scale.predict(X_test_motion)

# Calculate metrics
acc_lr_no_scale = accuracy_score(y_test_motion, y_pred_lr_no_scale)
f1_lr_no_scale = f1_score(y_test_motion, y_pred_lr_no_scale, average='weighted')

print(f"Training time: {train_time_lr_no_scale:.3f} seconds")
print(f"Accuracy: {acc_lr_no_scale:.4f}")
print(f"F1-score (weighted): {f1_lr_no_scale:.4f}")

# Measure inference time for single prediction
single_sample = X_test_motion.iloc[0:1]
inference_times = []
for _ in range(100):
    start = time()
    _ = lr_no_scale.predict(single_sample)
    inference_times.append((time() - start) * 1000)  # Convert to milliseconds

avg_inference_lr_no_scale = np.mean(inference_times)
print(f"Average inference time (single sample): {avg_inference_lr_no_scale:.3f} ms")

In [None]:
# Logistic Regression WITH StandardScaler (using Pipeline)
print("=== Logistic Regression WITH StandardScaler ===")

start_time = time()
lr_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', LogisticRegression(max_iter=1000, random_state=42))
])
lr_pipeline.fit(X_train_motion, y_train_motion)
train_time_lr_scaled = time() - start_time

# Predictions
y_pred_lr_scaled = lr_pipeline.predict(X_test_motion)

# Calculate metrics
acc_lr_scaled = accuracy_score(y_test_motion, y_pred_lr_scaled)
f1_lr_scaled = f1_score(y_test_motion, y_pred_lr_scaled, average='weighted')

print(f"Training time: {train_time_lr_scaled:.3f} seconds")
print(f"Accuracy: {acc_lr_scaled:.4f}")
print(f"F1-score (weighted): {f1_lr_scaled:.4f}")

# Measure inference time for single prediction
inference_times = []
for _ in range(100):
    start = time()
    _ = lr_pipeline.predict(single_sample)
    inference_times.append((time() - start) * 1000)  # Convert to milliseconds

avg_inference_lr_scaled = np.mean(inference_times)
print(f"Average inference time (single sample): {avg_inference_lr_scaled:.3f} ms")

print("\n=== Comparison ===")
print(f"Accuracy improvement: {(acc_lr_scaled - acc_lr_no_scale):.4f}")
print(f"F1-score improvement: {(f1_lr_scaled - f1_lr_no_scale):.4f}")
print("\nJustification: StandardScaler helps normalize features, which can improve convergence and performance.")
print("However, if the improvement is minimal, we can omit it to reduce computational overhead.")
if abs(acc_lr_scaled - acc_lr_no_scale) < 0.01:
    print("In this case, the improvement is minimal, so StandardScaler may not be necessary.")
else:
    print("In this case, StandardScaler provides noticeable improvement and should be retained.")

In [None]:
# Detailed classification report for Logistic Regression (with scaler)
print("\nClassification Report (Logistic Regression with StandardScaler):")
print(classification_report(y_test_motion, y_pred_lr_scaled))

## 3. Random Forest Classifier

Random Forest is an ensemble method that combines multiple decision trees to improve prediction accuracy.
It is particularly useful for feature importance analysis.

In [None]:
# Train Random Forest
print("=== Random Forest Classifier ===")

start_time = time()
rf_model = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
rf_model.fit(X_train_motion, y_train_motion)
train_time_rf = time() - start_time

# Predictions
y_pred_rf = rf_model.predict(X_test_motion)

# Calculate metrics
acc_rf = accuracy_score(y_test_motion, y_pred_rf)
f1_rf = f1_score(y_test_motion, y_pred_rf, average='weighted')

print(f"Training time: {train_time_rf:.3f} seconds")
print(f"Accuracy: {acc_rf:.4f}")
print(f"F1-score (weighted): {f1_rf:.4f}")

# Measure inference time for single prediction
inference_times = []
for _ in range(100):
    start = time()
    _ = rf_model.predict(single_sample)
    inference_times.append((time() - start) * 1000)  # Convert to milliseconds

avg_inference_rf = np.mean(inference_times)
print(f"Average inference time (single sample): {avg_inference_rf:.3f} ms")

print("\nClassification Report:")
print(classification_report(y_test_motion, y_pred_rf))

## 4. Feature Importance Analysis (Random Forest)

We analyze which facial landmark features contribute most to the model's predictions.
Features are grouped by facial regions (e.g., mouth, eyes, eyebrows) to understand their relative importance.

In [None]:
# Get feature importances
feature_importance = pd.DataFrame({
    'feature': feature_cols,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

print("Top 20 most important features:")
print(feature_importance.head(20))

# Plot top 20 features
plt.figure(figsize=(8, 6))
plt.barh(range(20), feature_importance.head(20)['importance'].values)
plt.yticks(range(20), feature_importance.head(20)['feature'].values)
plt.xlabel('Importance')
plt.title('Top 20 Feature Importances (Random Forest)')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

In [None]:
# Group features by facial regions based on landmark indices
# Typical facial landmark groupings (based on 68-point model):
# Landmarks 1-17: Jaw line
# Landmarks 18-22: Right eyebrow
# Landmarks 23-27: Left eyebrow
# Landmarks 28-36: Nose
# Landmarks 37-42: Right eye
# Landmarks 43-48: Left eye
# Landmarks 49-68: Mouth

def categorize_feature(feature_name):
    """Categorize feature by facial region based on landmark number"""
    if feature_name[0] not in ['x', 'y']:
        return 'other'
    
    try:
        landmark_num = int(feature_name[1:])
    except:
        return 'other'
    
    if 1 <= landmark_num <= 17:
        return 'jaw'
    elif 18 <= landmark_num <= 27:
        return 'eyebrows'
    elif 28 <= landmark_num <= 36:
        return 'nose'
    elif 37 <= landmark_num <= 48:
        return 'eyes'
    elif 49 <= landmark_num <= 68:
        return 'mouth'
    else:
        return 'other'

# Categorize and group importances
feature_importance['region'] = feature_importance['feature'].apply(categorize_feature)
region_importance = feature_importance.groupby('region')['importance'].sum().sort_values(ascending=False)

print("\nFeature importance by facial region:")
print(region_importance)

# Plot region importances
plt.figure(figsize=(8, 5))
plt.bar(region_importance.index, region_importance.values)
plt.xlabel('Facial Region')
plt.ylabel('Total Importance')
plt.title('Feature Importance by Facial Region')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

print("\nInterpretation:")
print("- Mouth region features typically have the highest importance for emotion recognition.")
print("- Eyes and eyebrows are also significant contributors.")
print("- Jaw and nose features contribute less to emotion classification.")
print("\nRecommendation for feature selection:")
print("- Retain: mouth, eyes, and eyebrow features (primary emotional indicators)")
print("- Consider removing: jaw features if computational efficiency is prioritized")
print("- Nose features can be retained as they provide structural context")

## 5. Learning Curves for Random Forest

### Effect of max_depth on model performance

max_depth controls the maximum depth of each decision tree in the forest.
Deeper trees can capture more complex patterns but may overfit.

In [None]:
# Learning curve: varying max_depth
max_depth_values = [2, 5, 10, 15, 20, 25, 30, None]
train_accuracies_depth = []
test_accuracies_depth = []

print("Testing different max_depth values...")
for depth in max_depth_values:
    rf_temp = RandomForestClassifier(n_estimators=50, max_depth=depth, random_state=42)
    rf_temp.fit(X_train_motion, y_train_motion)
    
    train_acc = rf_temp.score(X_train_motion, y_train_motion)
    test_acc = rf_temp.score(X_test_motion, y_test_motion)
    
    train_accuracies_depth.append(train_acc)
    test_accuracies_depth.append(test_acc)
    
    depth_str = 'None' if depth is None else str(depth)
    print(f"max_depth={depth_str:>4}: Train Acc={train_acc:.4f}, Test Acc={test_acc:.4f}")

# Plot learning curve
plt.figure(figsize=(8, 5))
x_labels = [str(d) if d is not None else 'None' for d in max_depth_values]
x_pos = range(len(max_depth_values))
plt.plot(x_pos, train_accuracies_depth, marker='o', label='Train Accuracy')
plt.plot(x_pos, test_accuracies_depth, marker='s', label='Test Accuracy')
plt.xlabel('max_depth')
plt.ylabel('Accuracy')
plt.title('Random Forest: Effect of max_depth on Accuracy')
plt.xticks(x_pos, x_labels, rotation=45)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nInterpretation:")
print("- Low max_depth values (2-5) lead to underfitting: both train and test accuracy are low.")
print("- As max_depth increases, training accuracy improves significantly.")
print("- Test accuracy initially improves but may plateau or decrease if trees become too deep (overfitting).")
print("- The optimal max_depth balances model complexity with generalization ability.")

### Effect of n_estimators on model performance

n_estimators is the number of decision trees in the forest.
More trees generally improve performance but increase computational cost.

In [None]:
# Learning curve: varying n_estimators
n_estimators_values = [10, 25, 50, 75, 100, 150, 200]
train_accuracies_est = []
test_accuracies_est = []

print("Testing different n_estimators values...")
for n_est in n_estimators_values:
    rf_temp = RandomForestClassifier(n_estimators=n_est, max_depth=10, random_state=42)
    rf_temp.fit(X_train_motion, y_train_motion)
    
    train_acc = rf_temp.score(X_train_motion, y_train_motion)
    test_acc = rf_temp.score(X_test_motion, y_test_motion)
    
    train_accuracies_est.append(train_acc)
    test_accuracies_est.append(test_acc)
    
    print(f"n_estimators={n_est:>3}: Train Acc={train_acc:.4f}, Test Acc={test_acc:.4f}")

# Plot learning curve
plt.figure(figsize=(8, 5))
plt.plot(n_estimators_values, train_accuracies_est, marker='o', label='Train Accuracy')
plt.plot(n_estimators_values, test_accuracies_est, marker='s', label='Test Accuracy')
plt.xlabel('n_estimators')
plt.ylabel('Accuracy')
plt.title('Random Forest: Effect of n_estimators on Accuracy')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nInterpretation:")
print("- With few trees (10-25), the model's performance is lower due to high variance.")
print("- As n_estimators increases, both train and test accuracy improve and stabilize.")
print("- Beyond a certain point (typically 100-150), additional trees provide diminishing returns.")
print("- The ensemble effect reduces variance by averaging predictions from multiple trees.")
print("- Higher n_estimators values increase training time linearly but improve model robustness.")

## 6. Decision Tree Classifier

Decision Tree is a simpler model compared to Random Forest.
It uses a single tree structure to make decisions, which is more interpretable but may be less accurate.

In [None]:
# Train Decision Tree
print("=== Decision Tree Classifier ===")

start_time = time()
dt_model = DecisionTreeClassifier(max_depth=10, random_state=42)
dt_model.fit(X_train_motion, y_train_motion)
train_time_dt = time() - start_time

# Predictions
y_pred_dt = dt_model.predict(X_test_motion)

# Calculate metrics
acc_dt = accuracy_score(y_test_motion, y_pred_dt)
f1_dt = f1_score(y_test_motion, y_pred_dt, average='weighted')

print(f"Training time: {train_time_dt:.3f} seconds")
print(f"Accuracy: {acc_dt:.4f}")
print(f"F1-score (weighted): {f1_dt:.4f}")

# Measure inference time for single prediction
inference_times = []
for _ in range(100):
    start = time()
    _ = dt_model.predict(single_sample)
    inference_times.append((time() - start) * 1000)  # Convert to milliseconds

avg_inference_dt = np.mean(inference_times)
print(f"Average inference time (single sample): {avg_inference_dt:.3f} ms")

print("\nClassification Report:")
print(classification_report(y_test_motion, y_pred_dt))

## 7. Model Comparison (Motion Dataset)

Compare all three models on the motion dataset in terms of accuracy, F1-score, training time, and inference time.

In [None]:
# Create comparison table
comparison_motion = pd.DataFrame({
    'Model': ['Logistic Regression (no scaler)', 'Logistic Regression (with scaler)', 'Random Forest', 'Decision Tree'],
    'Accuracy': [acc_lr_no_scale, acc_lr_scaled, acc_rf, acc_dt],
    'F1-Score': [f1_lr_no_scale, f1_lr_scaled, f1_rf, f1_dt],
    'Training Time (s)': [train_time_lr_no_scale, train_time_lr_scaled, train_time_rf, train_time_dt],
    'Inference Time (ms)': [avg_inference_lr_no_scale, avg_inference_lr_scaled, avg_inference_rf, avg_inference_dt]
})

print("\nModel Comparison on Motion Dataset:")
print(comparison_motion.to_string(index=False))

print("\nObservations:")
print("- Random Forest typically achieves the highest accuracy and F1-score.")
print("- Decision Tree is faster to train but may have lower accuracy than Random Forest.")
print("- Logistic Regression has the fastest inference time.")
print("- Random Forest has the longest training time due to ensemble complexity.")

## 8. Comparison Between Data Types: Motion vs Geometric

Train all models on both motion and geometric datasets to determine which data representation is better suited for emotion classification.

In [None]:
# Train all models on geometric dataset
print("=== Training models on Geometric dataset ===")

# Logistic Regression (with scaler) on geometric
lr_geo = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', LogisticRegression(max_iter=1000, random_state=42))
])
lr_geo.fit(X_train_geometric, y_train_geometric)
y_pred_lr_geo = lr_geo.predict(X_test_geometric)
acc_lr_geo = accuracy_score(y_test_geometric, y_pred_lr_geo)
f1_lr_geo = f1_score(y_test_geometric, y_pred_lr_geo, average='weighted')
print(f"Logistic Regression - Accuracy: {acc_lr_geo:.4f}, F1-score: {f1_lr_geo:.4f}")

# Per-class F1 scores for Logistic Regression (geometric)
f1_lr_geo_per_class = f1_score(y_test_geometric, y_pred_lr_geo, average=None)

# Random Forest on geometric
rf_geo = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
rf_geo.fit(X_train_geometric, y_train_geometric)
y_pred_rf_geo = rf_geo.predict(X_test_geometric)
acc_rf_geo = accuracy_score(y_test_geometric, y_pred_rf_geo)
f1_rf_geo = f1_score(y_test_geometric, y_pred_rf_geo, average='weighted')
print(f"Random Forest - Accuracy: {acc_rf_geo:.4f}, F1-score: {f1_rf_geo:.4f}")

# Per-class F1 scores for Random Forest (geometric)
f1_rf_geo_per_class = f1_score(y_test_geometric, y_pred_rf_geo, average=None)

# Decision Tree on geometric
dt_geo = DecisionTreeClassifier(max_depth=10, random_state=42)
dt_geo.fit(X_train_geometric, y_train_geometric)
y_pred_dt_geo = dt_geo.predict(X_test_geometric)
acc_dt_geo = accuracy_score(y_test_geometric, y_pred_dt_geo)
f1_dt_geo = f1_score(y_test_geometric, y_pred_dt_geo, average='weighted')
print(f"Decision Tree - Accuracy: {acc_dt_geo:.4f}, F1-score: {f1_dt_geo:.4f}")

# Per-class F1 scores for Decision Tree (geometric)
f1_dt_geo_per_class = f1_score(y_test_geometric, y_pred_dt_geo, average=None)

In [None]:
# Per-class F1 scores for motion dataset
f1_lr_motion_per_class = f1_score(y_test_motion, y_pred_lr_scaled, average=None)
f1_rf_motion_per_class = f1_score(y_test_motion, y_pred_rf, average=None)
f1_dt_motion_per_class = f1_score(y_test_motion, y_pred_dt, average=None)

# Create comprehensive comparison table
comparison_datasets = pd.DataFrame({
    'Model': ['Logistic Regression', 'Logistic Regression', 'Random Forest', 'Random Forest', 'Decision Tree', 'Decision Tree'],
    'Dataset': ['Motion', 'Geometric', 'Motion', 'Geometric', 'Motion', 'Geometric'],
    'Accuracy': [acc_lr_scaled, acc_lr_geo, acc_rf, acc_rf_geo, acc_dt, acc_dt_geo],
    'F1-Score (weighted)': [f1_lr_scaled, f1_lr_geo, f1_rf, f1_rf_geo, f1_dt, f1_dt_geo]
})

print("\n=== Comparison: Motion vs Geometric Datasets ===")
print(comparison_datasets.to_string(index=False))

# Per-class F1 scores comparison
emotion_classes = sorted(y_test_motion.unique())
print("\n=== Per-Class F1-Scores ===")

per_class_comparison = pd.DataFrame({
    'Emotion': emotion_classes,
    'LR_Motion': f1_lr_motion_per_class,
    'LR_Geometric': f1_lr_geo_per_class,
    'RF_Motion': f1_rf_motion_per_class,
    'RF_Geometric': f1_rf_geo_per_class,
    'DT_Motion': f1_dt_motion_per_class,
    'DT_Geometric': f1_dt_geo_per_class
})

print(per_class_comparison.to_string(index=False))

# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Accuracy comparison
models = ['LR', 'RF', 'DT']
motion_accs = [acc_lr_scaled, acc_rf, acc_dt]
geometric_accs = [acc_lr_geo, acc_rf_geo, acc_dt_geo]

x = np.arange(len(models))
width = 0.35

axes[0].bar(x - width/2, motion_accs, width, label='Motion')
axes[0].bar(x + width/2, geometric_accs, width, label='Geometric')
axes[0].set_ylabel('Accuracy')
axes[0].set_title('Accuracy Comparison')
axes[0].set_xticks(x)
axes[0].set_xticklabels(models)
axes[0].legend()
axes[0].grid(True, alpha=0.3, axis='y')

# F1-score comparison
motion_f1s = [f1_lr_scaled, f1_rf, f1_dt]
geometric_f1s = [f1_lr_geo, f1_rf_geo, f1_dt_geo]

axes[1].bar(x - width/2, motion_f1s, width, label='Motion')
axes[1].bar(x + width/2, geometric_f1s, width, label='Geometric')
axes[1].set_ylabel('F1-Score (weighted)')
axes[1].set_title('F1-Score Comparison')
axes[1].set_xticks(x)
axes[1].set_xticklabels(models)
axes[1].legend()
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\n=== Justification: Which Dataset is Better? ===")
avg_motion = np.mean([acc_lr_scaled, acc_rf, acc_dt])
avg_geometric = np.mean([acc_lr_geo, acc_rf_geo, acc_dt_geo])

print(f"Average accuracy across all models - Motion: {avg_motion:.4f}, Geometric: {avg_geometric:.4f}")

if avg_motion > avg_geometric:
    print("\nConclusion: Motion dataset performs better on average.")
    print("Reason: Motion features capture temporal changes in facial expressions,")
    print("which are critical for emotion recognition. Dynamic changes (e.g., mouth movement,")
    print("eyebrow raising) provide stronger signals than static geometric positions.")
else:
    print("\nConclusion: Geometric dataset performs better on average.")
    print("Reason: Geometric features provide stable spatial relationships between facial landmarks,")
    print("which may be more robust to noise and individual variations compared to motion features.")

## 9. Summary and Conclusions

### Key Findings:

1. **Feature Importance**: Mouth and eye regions are the most important for emotion recognition.
2. **Model Complexity**: Random Forest benefits from moderate depth and multiple trees.
3. **Model Performance**: Random Forest typically achieves the best accuracy, followed by Decision Tree and Logistic Regression.
4. **Data Type**: Motion or geometric features may perform differently depending on the dataset characteristics.
5. **Standardization**: StandardScaler may or may not significantly improve Logistic Regression performance.
6. **Timing**: Logistic Regression is fastest for inference, while Random Forest requires more training time.

### Recommendations:

- For highest accuracy: Use Random Forest with optimized hyperparameters.
- For real-time applications: Use Logistic Regression for faster inference.
- For feature selection: Focus on mouth and eye regions; consider removing jaw features.
- For data representation: Choose based on the specific characteristics of your dataset (motion vs geometric).