# Facial Expression Expressiveness Recognition System

This notebook implements a CNN-based facial expression recognition system that classifies facial expressions into three categories based on expressiveness levels rather than traditional emotions:

- **Reserved Expression**: Low facial expressiveness (negative facial expression scores)
- **Balanced Expression**: Neutral facial expressiveness (facial expression scores around zero)
- **Expressive**: High facial expressiveness (positive facial expression scores)

This approach differs from Ekman's 7 basic emotions by focusing on the intensity of facial expressiveness rather than specific emotional states.

The system uses:
- **MediaPipe** for face detection (alternative to OpenCV)
- **TensorFlow/Keras** for the CNN model (alternative to PyTorch)
- **RecruitView_Data** dataset for training

## 1. Setup and Imports

In [None]:
# Install required packages if not already installed
# !pip install tensorflow mediapipe opencv-python pandas numpy scikit-learn matplotlib seaborn

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
import mediapipe as mp
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder
import json
import os
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")

## 2. Data Loading and Analysis

In [None]:
# Load metadata
metadata_path = 'FYP/RecruitView_Data/metadata.jsonl'

# Read the JSONL file
data = []
with open(metadata_path, 'r') as f:
    for line in f:
        data.append(json.loads(line))

df = pd.DataFrame(data)
print(f"Dataset shape: {df.shape}")
print("\nColumns:", list(df.columns))
df.head()

In [None]:
# Analyze facial expression scores distribution
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.histplot(df['facial_expression'], bins=50, kde=True)
plt.title('Distribution of Facial Expression Scores')
plt.xlabel('Facial Expression Score')
plt.ylabel('Count')

plt.subplot(1, 2, 2)
sns.boxplot(y=df['facial_expression'])
plt.title('Box Plot of Facial Expression Scores')
plt.ylabel('Facial Expression Score')

plt.tight_layout()
plt.show()

print(f"Facial Expression Score Statistics:")
print(df['facial_expression'].describe())

## 3. Design Expressiveness Classification System

In [None]:
# Define expressiveness categories based on facial expression scores
def categorize_expressiveness(score):
    """
    Categorize facial expression scores into expressiveness levels
    
    - Reserved Expression: score < -0.3 (low expressiveness)
    - Balanced Expression: -0.3 <= score <= 0.3 (neutral expressiveness)
    - Expressive: score > 0.3 (high expressiveness)
    """
    if score < -0.3:
        return 'Reserved Expression'
    elif score <= 0.3:
        return 'Balanced Expression'
    else:
        return 'Expressive'

# Apply categorization
df['expressiveness_category'] = df['facial_expression'].apply(categorize_expressiveness)

# Display category distribution
plt.figure(figsize=(10, 6))
category_counts = df['expressiveness_category'].value_counts()
sns.barplot(x=category_counts.index, y=category_counts.values)
plt.title('Distribution of Expressiveness Categories')
plt.xlabel('Expressiveness Category')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

print("\nExpressiveness Category Distribution:")
print(category_counts)
print(f"\nTotal samples: {len(df)}")

## 4. Face Detection and Feature Extraction Setup

In [None]:
# Initialize MediaPipe Face Detection
mp_face_detection = mp.solutions.face_detection
mp_drawing = mp.solutions.drawing_utils

# Initialize face detection model
face_detection = mp_face_detection.FaceDetection(
    model_selection=1,  # Use full-range model for better accuracy
    min_detection_confidence=0.5
)

print("MediaPipe Face Detection initialized!")

In [None]:
def extract_face_from_frame(frame):
    """
    Extract face from a video frame using MediaPipe
    
    Args:
        frame: Video frame as numpy array
    
    Returns:
        cropped_face: Cropped face image (48x48 grayscale) or None if no face detected
    """
    # Convert BGR to RGB
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    
    # Process the frame
    results = face_detection.process(rgb_frame)
    
    if results.detections:
        # Get the first detected face
        detection = results.detections[0]
        
        # Get bounding box
        bbox = detection.location_data.relative_bounding_box
        
        # Convert relative coordinates to absolute
        h, w, _ = frame.shape
        x_min = int(bbox.xmin * w)
        y_min = int(bbox.ymin * h)
        width = int(bbox.width * w)
        height = int(bbox.height * h)
        
        # Add some padding
        padding = int(0.1 * width)
        x_min = max(0, x_min - padding)
        y_min = max(0, y_min - padding)
        x_max = min(w, x_min + width + 2*padding)
        y_max = min(h, y_min + height + 2*padding)
        
        # Crop face
        face_crop = frame[y_min:y_max, x_min:x_max]
        
        if face_crop.size > 0:
            # Convert to grayscale
            face_gray = cv2.cvtColor(face_crop, cv2.COLOR_BGR2GRAY)
            
            # Resize to 48x48
            face_resized = cv2.resize(face_gray, (48, 48))
            
            return face_resized
    
    return None

print("Face extraction function defined!")

## 5. Video Frame Extraction and Data Preparation

In [None]:
def extract_frames_from_video(video_path, max_frames=10, frame_skip=5):
    """
    Extract frames from video for facial expression analysis
    
    Args:
        video_path: Path to video file
        max_frames: Maximum number of frames to extract
        frame_skip: Skip every N frames to get diverse expressions
    
    Returns:
        faces: List of extracted face images
    """
    faces = []
    
    try:
        cap = cv2.VideoCapture(video_path)
        
        if not cap.isOpened():
            print(f"Could not open video: {video_path}")
            return faces
        
        frame_count = 0
        extracted_count = 0
        
        while extracted_count < max_frames:
            ret, frame = cap.read()
            
            if not ret:
                break
            
            # Skip frames for diversity
            if frame_count % frame_skip == 0:
                face = extract_face_from_frame(frame)
                if face is not None:
                    faces.append(face)
                    extracted_count += 1
            
            frame_count += 1
            
            # Safety check to prevent infinite loops
            if frame_count > 1000:
                break
        
        cap.release()
        
    except Exception as e:
        print(f"Error processing video {video_path}: {str(e)}")
    
    return faces

print("Video frame extraction function defined!")

In [None]:
# Prepare dataset by extracting faces from videos
# Note: This is a demonstration - in practice, you might want to limit the number of videos processed

# For demonstration, let's process a subset of videos
sample_df = df.head(50)  # Process first 50 videos for demonstration

X_faces = []
y_labels = []

print("Extracting faces from videos...")
for idx, row in tqdm(sample_df.iterrows(), total=len(sample_df)):
    video_path = f"FYP/RecruitView_Data/{row['file_name']}"
    
    if os.path.exists(video_path):
        faces = extract_frames_from_video(video_path, max_frames=3)
        
        for face in faces:
            X_faces.append(face)
            y_labels.append(row['expressiveness_category'])
    else:
        print(f"Video not found: {video_path}")

print(f"\nExtracted {len(X_faces)} face images from {len(sample_df)} videos")

# Convert to numpy arrays
if X_faces:
    X = np.array(X_faces)
    X = X.reshape(-1, 48, 48, 1)  # Add channel dimension
    X = X.astype('float32') / 255.0  # Normalize
    
    # Encode labels
    label_encoder = LabelEncoder()
    y_encoded = label_encoder.fit_transform(y_labels)
    y = keras.utils.to_categorical(y_encoded, num_classes=3)
    
    print(f"X shape: {X.shape}")
    print(f"y shape: {y.shape}")
    print(f"Classes: {label_encoder.classes_}")
else:
    print("No faces extracted. Please check video paths and face detection.")
    X = None
    y = None

## 6. Build CNN Model with TensorFlow/Keras

In [None]:
def create_expressiveness_model():
    """
    Create CNN model for facial expressiveness recognition
    """
    model = keras.Sequential([
        # Input layer
        keras.layers.Input(shape=(48, 48, 1)),
        
        # First convolutional block
        keras.layers.Conv2D(32, (3, 3), padding='same'),
        keras.layers.BatchNormalization(),
        keras.layers.Activation('relu'),
        keras.layers.MaxPooling2D((2, 2)),
        keras.layers.Dropout(0.25),
        
        # Second convolutional block
        keras.layers.Conv2D(64, (3, 3), padding='same'),
        keras.layers.BatchNormalization(),
        keras.layers.Activation('relu'),
        keras.layers.MaxPooling2D((2, 2)),
        keras.layers.Dropout(0.25),
        
        # Third convolutional block
        keras.layers.Conv2D(128, (3, 3), padding='same'),
        keras.layers.BatchNormalization(),
        keras.layers.Activation('relu'),
        keras.layers.MaxPooling2D((2, 2)),
        keras.layers.Dropout(0.25),
        
        # Flatten and dense layers
        keras.layers.Flatten(),
        keras.layers.Dense(256, activation='relu'),
        keras.layers.BatchNormalization(),
        keras.layers.Dropout(0.5),
        
        # Output layer
        keras.layers.Dense(3, activation='softmax')  # 3 classes: Reserved, Balanced, Expressive
    ])
    
    return model

# Create model
model = create_expressiveness_model()

# Compile model
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Display model summary
model.summary()

## 7. Model Training

In [None]:
if X is not None and len(X) > 0:
    # Split data into train and test sets
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y_encoded
    )
    
    print(f"Training set shape: {X_train.shape}")
    print(f"Test set shape: {X_test.shape}")
    
    # Data augmentation
    datagen = keras.preprocessing.image.ImageDataGenerator(
        rotation_range=10,
        width_shift_range=0.1,
        height_shift_range=0.1,
        horizontal_flip=True,
        zoom_range=0.1
    )
    
    # Train the model
    history = model.fit(
        datagen.flow(X_train, y_train, batch_size=32),
        validation_data=(X_test, y_test),
        epochs=50,
        callbacks=[
            keras.callbacks.EarlyStopping(
                monitor='val_accuracy',
                patience=10,
                restore_best_weights=True
            ),
            keras.callbacks.ReduceLROnPlateau(
                monitor='val_loss',
                factor=0.5,
                patience=5,
                min_lr=1e-6
            )
        ]
    )
    
    print("\nTraining completed!")
else:
    print("No data available for training. Please check the data extraction process.")

## 8. Model Evaluation

In [None]:
if 'history' in locals():
    # Plot training history
    plt.figure(figsize=(12, 4))
    
    plt.subplot(1, 2, 1)
    plt.plot(history.history['accuracy'], label='Training Accuracy')
    plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
    plt.title('Model Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()
    
    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'], label='Training Loss')
    plt.plot(history.history['val_loss'], label='Validation Loss')
    plt.title('Model Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    
    plt.tight_layout()
    plt.show()
    
    # Evaluate on test set
    test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
    print(f"\nTest Accuracy: {test_accuracy * 100:.2f}%")
    print(f"Test Loss: {test_loss:.4f}")
    
    # Predictions
    y_pred = model.predict(X_test)
    y_pred_classes = np.argmax(y_pred, axis=1)
    y_true_classes = np.argmax(y_test, axis=1)
    
    # Classification report
    print("\nClassification Report:")
    print(classification_report(y_true_classes, y_pred_classes, 
                               target_names=label_encoder.classes_))
    
    # Confusion matrix
    cm = confusion_matrix(y_true_classes, y_pred_classes)
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
                xticklabels=label_encoder.classes_,
                yticklabels=label_encoder.classes_)
    plt.title('Confusion Matrix')
    plt.xlabel('Predicted')
    plt.ylabel('True')
    plt.show()
else:
    print("No training history available. Model was not trained.")

## 9. Save Model

In [None]:
# Create models directory if it doesn't exist
os.makedirs('models', exist_ok=True)

# Save the model
model.save('models/Facial_Expressiveness_Recognition_Model.h5')

# Save model architecture as JSON
model_json = model.to_json()
with open('models/Facial_Expressiveness_Recognition_Model.json', 'w') as json_file:
    json_file.write(model_json)

# Save weights
model.save_weights('models/expressiveness_model_weights.h5')

# Save label encoder classes
np.save('models/label_encoder_classes.npy', label_encoder.classes_)

print("Model saved successfully!")
print("Files saved:")
print("- models/Facial_Expressiveness_Recognition_Model.h5")
print("- models/Facial_Expressiveness_Recognition_Model.json")
print("- models/expressiveness_model_weights.h5")
print("- models/label_encoder_classes.npy")

## 10. Real-time Testing (Optional)

In [None]:
# Real-time facial expressiveness recognition
# Uncomment and run this cell to test with webcam

"""
import cv2
import numpy as np
from keras.models import model_from_json

# Load the model
model = model_from_json(open("models/Facial_Expressiveness_Recognition_Model.json", "r").read())
model.load_weights('models/expressiveness_model_weights.h5')

# Load label encoder classes
label_classes = np.load('models/label_encoder_classes.npy')

# Initialize webcam
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # Extract face
    face = extract_face_from_frame(frame)
    
    if face is not None:
        # Preprocess face
        face_input = face.reshape(1, 48, 48, 1).astype('float32') / 255.0
        
        # Predict expressiveness
        prediction = model.predict(face_input, verbose=0)
        predicted_class = np.argmax(prediction)
        expressiveness_label = label_classes[predicted_class]
        confidence = prediction[0][predicted_class]
        
        # Display result on frame
        cv2.putText(frame, f'{expressiveness_label}: {confidence:.2f}', 
                   (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    
    # Display frame
    cv2.imshow('Facial Expressiveness Recognition', frame)
    
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
"""

## Summary

This notebook implements a facial expression recognition system that focuses on **expressiveness levels** rather than traditional emotions:

### Key Features:
1. **Alternative to Ekman's 7 emotions**: Classifies into Reserved, Balanced, and Expressive categories
2. **MediaPipe instead of OpenCV**: Uses Google's MediaPipe for face detection
3. **TensorFlow/Keras instead of PyTorch**: Uses TensorFlow/Keras for the CNN model
4. **RecruitView_Data dataset**: Uses the provided interview video dataset

### Model Architecture:
- 3 Convolutional layers with batch normalization and dropout
- Dense layer with 256 neurons
- Output layer with 3 classes (softmax)

### Categories:
- **Reserved Expression**: Low expressiveness (facial expression score < -0.3)
- **Balanced Expression**: Neutral expressiveness (-0.3 ≤ score ≤ 0.3)
- **Expressive**: High expressiveness (score > 0.3)

### Usage:
The model can be used to analyze facial expressiveness in interview videos or real-time webcam feed, providing insights into candidates' communication styles during interviews.