# Emotional State Identification Using BVP Signals

This notebook demonstrates how to identify emotional states using Blood Volume Pulse (BVP) signals through signal processing and machine learning techniques.

## Import Required Libraries

Import libraries such as NumPy, pandas, matplotlib, and scikit-learn for data processing, visualization, and machine learning.

In [None]:
# Import Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import signal
from scipy.stats import skew, kurtosis
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("Libraries imported successfully!")

## Load and Preprocess BVP Signal Data

Load the BVP signal data from a file, handle missing values, and normalize the signal for further analysis.

In [None]:
# Load BVP signal data
# Note: Replace 'bvp_data.csv' with your actual data file path
# Expected format: CSV with columns 'timestamp', 'bvp_signal', 'emotion_label'

def load_bvp_data(filepath):
    """
    Load BVP signal data from a CSV file
    """
    try:
        data = pd.read_csv(filepath)
        print(f"Data loaded successfully! Shape: {data.shape}")
        print(f"Columns: {data.columns.tolist()}")
        return data
    except FileNotFoundError:
        print("File not found. Creating sample data for demonstration...")
        return create_sample_data()

def create_sample_data():
    """
    Create sample BVP data for demonstration purposes
    """
    np.random.seed(42)
    n_samples = 1000
    
    # Simulate BVP signals for different emotional states
    # 0: Neutral, 1: Happy, 2: Stressed, 3: Sad
    emotions = []
    bvp_signals = []
    
    for emotion in range(4):
        for _ in range(n_samples // 4):
            # Base signal with different characteristics for each emotion
            t = np.linspace(0, 10, 640)  # 64 Hz sampling rate, 10 seconds
            base_freq = 1.2 + emotion * 0.2  # Heart rate variability
            amplitude = 100 + emotion * 20
            
            bvp = amplitude * np.sin(2 * np.pi * base_freq * t)
            bvp += np.random.normal(0, 10, len(t))  # Add noise
            
            bvp_signals.append(bvp)
            emotions.append(emotion)
    
    return pd.DataFrame({
        'bvp_signal': bvp_signals,
        'emotion_label': emotions
    })

# Load or create sample data
df = load_bvp_data('bvp_data.csv')

# Handle missing values
if df.isnull().sum().sum() > 0:
    print(f"Missing values found: {df.isnull().sum()}")
    df = df.dropna()
    print(f"Data after removing missing values: {df.shape}")

print("\nData preprocessing completed!")
print(f"Emotion distribution:\n{df['emotion_label'].value_counts()}")

## Visualize BVP Signal

Plot the BVP signal to observe patterns and identify any anomalies in the data.

In [None]:
# Visualize BVP signals for different emotional states

fig, axes = plt.subplots(2, 2, figsize=(15, 10))
emotion_labels = {0: 'Neutral', 1: 'Happy', 2: 'Stressed', 3: 'Sad'}

for idx, (emotion_id, emotion_name) in enumerate(emotion_labels.items()):
    ax = axes[idx // 2, idx % 2]
    
    # Get sample signal for this emotion
    sample_signal = df[df['emotion_label'] == emotion_id]['bvp_signal'].iloc[0]
    
    if isinstance(sample_signal, np.ndarray):
        signal_data = sample_signal
    else:
        signal_data = np.array(eval(sample_signal)) if isinstance(sample_signal, str) else sample_signal
    
    time = np.arange(len(signal_data)) / 64  # Assuming 64 Hz sampling rate
    
    ax.plot(time, signal_data, linewidth=0.8)
    ax.set_title(f'BVP Signal - {emotion_name}', fontsize=14, fontweight='bold')
    ax.set_xlabel('Time (seconds)', fontsize=12)
    ax.set_ylabel('BVP Amplitude', fontsize=12)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("BVP signals visualized for all emotional states!")

## Extract Features from BVP Signal

Extract features such as mean, standard deviation, frequency domain features, and other statistical measures from the BVP signal.

In [None]:
# Feature extraction from BVP signals

def extract_time_domain_features(bvp_signal):
    """
    Extract time-domain features from BVP signal
    """
    features = {
        'mean': np.mean(bvp_signal),
        'std': np.std(bvp_signal),
        'min': np.min(bvp_signal),
        'max': np.max(bvp_signal),
        'median': np.median(bvp_signal),
        'range': np.max(bvp_signal) - np.min(bvp_signal),
        'skewness': skew(bvp_signal),
        'kurtosis': kurtosis(bvp_signal),
        'rms': np.sqrt(np.mean(bvp_signal**2))
    }
    return features

def extract_frequency_domain_features(bvp_signal, fs=64):
    """
    Extract frequency-domain features from BVP signal
    """
    # Compute power spectral density
    freqs, psd = signal.welch(bvp_signal, fs=fs, nperseg=min(256, len(bvp_signal)))
    
    # Define frequency bands
    lf_band = (0.04, 0.15)  # Low frequency
    hf_band = (0.15, 0.4)   # High frequency
    
    # Calculate power in each band
    lf_power = np.trapz(psd[(freqs >= lf_band[0]) & (freqs < lf_band[1])])
    hf_power = np.trapz(psd[(freqs >= hf_band[0]) & (freqs < hf_band[1])])
    
    features = {
        'lf_power': lf_power,
        'hf_power': hf_power,
        'lf_hf_ratio': lf_power / hf_power if hf_power > 0 else 0,
        'total_power': np.trapz(psd),
        'peak_frequency': freqs[np.argmax(psd)]
    }
    return features

def extract_all_features(bvp_signal):
    """
    Extract all features from BVP signal
    """
    # Convert to numpy array if needed
    if isinstance(bvp_signal, str):
        bvp_signal = np.array(eval(bvp_signal))
    elif not isinstance(bvp_signal, np.ndarray):
        bvp_signal = np.array(bvp_signal)
    
    # Extract features
    time_features = extract_time_domain_features(bvp_signal)
    freq_features = extract_frequency_domain_features(bvp_signal)
    
    # Combine all features
    all_features = {**time_features, **freq_features}
    return all_features

# Extract features for all samples
print("Extracting features from BVP signals...")
features_list = []

for idx, row in df.iterrows():
    features = extract_all_features(row['bvp_signal'])
    features['emotion_label'] = row['emotion_label']
    features_list.append(features)

# Create feature dataframe
features_df = pd.DataFrame(features_list)

print(f"Feature extraction completed!")
print(f"Total features extracted: {len(features_df.columns) - 1}")
print(f"\nFeature columns: {[col for col in features_df.columns if col != 'emotion_label']}")
print(f"\nFeature statistics:\n{features_df.describe()}")

## Train a Classifier to Identify Emotional States

Use a machine learning model (e.g., SVM, Random Forest) to classify emotional states based on the extracted features.

In [None]:
# Prepare data for training

# Separate features and labels
X = features_df.drop('emotion_label', axis=1)
y = features_df['emotion_label']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Training set size: {X_train.shape}")
print(f"Testing set size: {X_test.shape}")
print(f"\nTraining emotion distribution:\n{y_train.value_counts()}")

In [None]:
# Train multiple classifiers

# 1. Random Forest Classifier
print("Training Random Forest Classifier...")
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    random_state=42,
    n_jobs=-1
)
rf_model.fit(X_train_scaled, y_train)
print("Random Forest training completed!")

# 2. Support Vector Machine
print("\nTraining Support Vector Machine...")
svm_model = SVC(
    kernel='rbf',
    C=1.0,
    gamma='scale',
    random_state=42
)
svm_model.fit(X_train_scaled, y_train)
print("SVM training completed!")

# Feature importance from Random Forest
feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

print("\nTop 10 Most Important Features:")
print(feature_importance.head(10))

# Visualize feature importance
plt.figure(figsize=(10, 6))
plt.barh(feature_importance['feature'][:10], feature_importance['importance'][:10])
plt.xlabel('Importance', fontsize=12)
plt.ylabel('Feature', fontsize=12)
plt.title('Top 10 Most Important Features for Emotion Classification', fontsize=14, fontweight='bold')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

## Evaluate the Model

Evaluate the performance of the classifier using metrics such as accuracy, precision, recall, and F1-score.

In [None]:
# Evaluate both models

def evaluate_model(model, X_test, y_test, model_name):
    """
    Evaluate a classification model
    """
    # Make predictions
    y_pred = model.predict(X_test)
    
    # Calculate metrics
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    
    print(f"\n{'='*50}")
    print(f"{model_name} Performance Metrics")
    print(f"{'='*50}")
    print(f"Accuracy:  {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall:    {recall:.4f}")
    print(f"F1-Score:  {f1:.4f}")
    print(f"\nClassification Report:")
    print(classification_report(y_test, y_pred, 
                                target_names=['Neutral', 'Happy', 'Stressed', 'Sad']))
    
    return y_pred

# Evaluate Random Forest
rf_predictions = evaluate_model(rf_model, X_test_scaled, y_test, "Random Forest")

# Evaluate SVM
svm_predictions = evaluate_model(svm_model, X_test_scaled, y_test, "Support Vector Machine")

In [None]:
# Visualize confusion matrices

fig, axes = plt.subplots(1, 2, figsize=(16, 6))
emotion_names = ['Neutral', 'Happy', 'Stressed', 'Sad']

# Random Forest confusion matrix
cm_rf = confusion_matrix(y_test, rf_predictions)
sns.heatmap(cm_rf, annot=True, fmt='d', cmap='Blues', 
            xticklabels=emotion_names, yticklabels=emotion_names, ax=axes[0])
axes[0].set_title('Random Forest - Confusion Matrix', fontsize=14, fontweight='bold')
axes[0].set_ylabel('True Label', fontsize=12)
axes[0].set_xlabel('Predicted Label', fontsize=12)

# SVM confusion matrix
cm_svm = confusion_matrix(y_test, svm_predictions)
sns.heatmap(cm_svm, annot=True, fmt='d', cmap='Greens', 
            xticklabels=emotion_names, yticklabels=emotion_names, ax=axes[1])
axes[1].set_title('SVM - Confusion Matrix', fontsize=14, fontweight='bold')
axes[1].set_ylabel('True Label', fontsize=12)
axes[1].set_xlabel('Predicted Label', fontsize=12)

plt.tight_layout()
plt.show()

## Test on New Data

Test the trained model on new BVP signal data to predict emotional states and validate its performance.

In [None]:
# Function to predict emotion from new BVP signal

def predict_emotion(bvp_signal, model, scaler, model_name="Random Forest"):
    """
    Predict emotion from a new BVP signal
    """
    # Extract features
    features = extract_all_features(bvp_signal)
    features_df = pd.DataFrame([features])
    
    # Remove emotion label if present
    if 'emotion_label' in features_df.columns:
        features_df = features_df.drop('emotion_label', axis=1)
    
    # Ensure feature order matches training data
    features_df = features_df[X.columns]
    
    # Scale features
    features_scaled = scaler.transform(features_df)
    
    # Predict
    prediction = model.predict(features_scaled)[0]
    
    # Get probability if available
    if hasattr(model, 'predict_proba'):
        probabilities = model.predict_proba(features_scaled)[0]
    else:
        probabilities = None
    
    emotion_map = {0: 'Neutral', 1: 'Happy', 2: 'Stressed', 3: 'Sad'}
    
    return {
        'predicted_emotion': emotion_map[prediction],
        'emotion_id': prediction,
        'probabilities': probabilities
    }

# Test on a few new samples
print("Testing on new BVP signals...\n")

for i in range(min(5, len(df))):
    test_signal = df['bvp_signal'].iloc[i]
    true_emotion = df['emotion_label'].iloc[i]
    emotion_map = {0: 'Neutral', 1: 'Happy', 2: 'Stressed', 3: 'Sad'}
    
    # Predict using Random Forest
    result_rf = predict_emotion(test_signal, rf_model, scaler, "Random Forest")
    
    # Predict using SVM
    result_svm = predict_emotion(test_signal, svm_model, scaler, "SVM")
    
    print(f"Sample {i+1}:")
    print(f"  True Emotion: {emotion_map[true_emotion]}")
    print(f"  Random Forest Prediction: {result_rf['predicted_emotion']}")
    if result_rf['probabilities'] is not None:
        print(f"  RF Probabilities: {dict(zip(emotion_map.values(), result_rf['probabilities']))}")
    print(f"  SVM Prediction: {result_svm['predicted_emotion']}")
    print()

In [None]:
# Visualize prediction results on test samples

fig, axes = plt.subplots(2, 2, figsize=(15, 10))
emotion_map = {0: 'Neutral', 1: 'Happy', 2: 'Stressed', 3: 'Sad'}

for i in range(4):
    ax = axes[i // 2, i % 2]
    
    # Get test sample
    test_idx = i * 10  # Sample every 10th test point
    if test_idx < len(X_test):
        test_signal = df.iloc[X_test.index[test_idx]]['bvp_signal']
        true_emotion = y_test.iloc[test_idx]
        
        # Convert signal to array if needed
        if isinstance(test_signal, str):
            signal_data = np.array(eval(test_signal))
        elif isinstance(test_signal, np.ndarray):
            signal_data = test_signal
        else:
            signal_data = np.array(test_signal)
        
        # Predict emotion
        result = predict_emotion(test_signal, rf_model, scaler)
        
        # Plot signal
        time = np.arange(len(signal_data)) / 64
        ax.plot(time, signal_data, linewidth=0.8)
        
        # Add title with prediction results
        title = f"True: {emotion_map[true_emotion]} | Predicted: {result['predicted_emotion']}"
        color = 'green' if emotion_map[true_emotion] == result['predicted_emotion'] else 'red'
        ax.set_title(title, fontsize=12, fontweight='bold', color=color)
        ax.set_xlabel('Time (seconds)', fontsize=10)
        ax.set_ylabel('BVP Amplitude', fontsize=10)
        ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Prediction visualization completed!")
print("\n" + "="*50)
print("Emotion Identification from BVP Signals - Complete!")
print("="*50)

## Summary and Next Steps

This notebook demonstrated:
1. **Data Loading**: Loaded and preprocessed BVP signal data
2. **Visualization**: Visualized BVP signals for different emotional states
3. **Feature Extraction**: Extracted time-domain and frequency-domain features
4. **Model Training**: Trained Random Forest and SVM classifiers
5. **Evaluation**: Achieved classification performance with multiple metrics
6. **Prediction**: Successfully predicted emotions from new BVP signals

### Next Steps:
- Collect real BVP data from wearable devices (e.g., Empatica E4)
- Experiment with deep learning models (LSTM, CNN)
- Implement real-time emotion detection
- Add more emotional states for finer-grained classification
- Optimize feature selection using techniques like PCA or feature selection algorithms
- Cross-validate results across multiple subjects