# Deep learning

A deep neural network (multilayer perceptron, MLP) will be used to classify audio recordings as either urban or natural environments. The following plans have been made:

1. I will use the BayesianOptimization hyperparameter tuning procedure to search for the best depth of the model, number of units at each layer, learning rate, and activation function.
2. To avoid overfitting, a dropout layer will be inserted between any two layers, and the dropout rate will also be tuned.
3. Since pre-trained CNN models such as VGGish, YAMNet, and PANNs may represent similar information, I will only use the embeddings of one model at a time, along with features of soundscape indices and the spectrotemporal modulation spectrum.
4. I will train a deep neural network on the features of each of the raw, background (bg), and foreground (fg) signals, as well as on the joint features of these three signals.

### **Code Functionality Breakdown**

#### **1. Purpose of the Code**
This code implements the **training and evaluation pipeline** for a **speech emotion recognition model** using selected features (e.g., MPS, YAMNet, VGGish, Acoustic Indices). It employs **Keras Tuner** to optimize hyperparameters and evaluate the model's performance using classification metrics and visualizations.

---

### **2. Key Components**

#### **2.1 `ModelConfig`**
This dataclass defines the configuration for the model:
- **General Parameters:**
  - `num_classes`: Number of emotion classes (7 in this case).
  - `patience`: Early stopping patience.
  - `max_trials`: Maximum trials for hyperparameter tuning.
  - `max_epochs`: Maximum training epochs.
  - `batch_size`: Batch size during training.
- **Feature Selection:**
  - Flags (`use_indices`, `use_mps`, etc.) to enable/disable specific features.

---

#### **2.2 `EmotionRecognitionModel` Class**
The main class orchestrating the workflow for model training, tuning, and evaluation.

---

### **3. Class Methods**

#### **3.1 Initialization**
- **Attributes:**
  - `data_path`: Path to the aggregated `.pkl` file containing PCA-transformed features and labels.
  - `output_dir`: Directory to store model files and plots.
  - `label_mapping`: Maps emotion names (e.g., "anger") to integer labels.

#### **3.2 Data Loading and Preprocessing**
- **`load_and_preprocess_data`:**
  - Loads features and labels from the aggregated `.pkl` file.
  - Combines selected features based on the flags in `ModelConfig`.
  - Processes emotion labels using `label_mapping`.
  - Ensures that data lengths match and features are finite.

#### **3.3 Model Building**
- **`build_model`:**
  - Builds a sequential neural network with:
    - Input layer with dropout and batch normalization.
    - Configurable number of dense hidden layers (`num_layers`) with varying units, activation functions, and dropout rates.
    - Output layer with softmax activation for classification.
  - Compiles the model with:
    - Sparse categorical cross-entropy as the loss.
    - Accuracy as the evaluation metric.
    - Adam optimizer with tunable learning rate.

#### **3.4 Callbacks**
- **`create_callbacks`:**
  - Includes:
    - Early stopping to avoid overfitting.
    - Model checkpointing to save the best model during training.
    - ReduceLROnPlateau to adjust the learning rate when validation loss stagnates.
    - CSVLogger to log training metrics for each epoch.

#### **3.5 Training and Hyperparameter Tuning**
- **`train_and_evaluate`:**
  - **Hyperparameter Search:** Uses Keras Tuner's `BayesianOptimization` to find the best hyperparameters.
  - **Training:** Fits the model on the training set using the best hyperparameters.
  - **Evaluation:** Evaluates the model on the test set and generates metrics and plots.

#### **3.6 Evaluation**
- **`_evaluate_model`:**
  - Evaluates the model on the test set, logs the accuracy, and generates:
    - **Training History Plot:** Accuracy and loss over epochs.
    - **Confusion Matrix:** Visualization of true vs. predicted labels.
    - **Classification Report:** Precision, recall, and F1-score for each class.

- **`_plot_training_history`:** Creates accuracy and loss plots for training and validation.
- **`_plot_confusion_matrix`:** Generates a confusion matrix heatmap with annotations.

#### **3.7 Saving Model Configuration**
- **`_save_model_config`:**
  - Saves the model architecture and hyperparameters to a text file for reproducibility.

---

### **4. Workflow**

#### **4.1 Main Function**
- Configures feature usage and hyperparameters using `ModelConfig`.
- Initializes `EmotionRecognitionModel`.
- Executes `train_and_evaluate` to complete the training and evaluation workflow.

---

### **5. Outputs**

1. **Model and Hyperparameters:**
   - Best model saved as `best_model.keras` (or `final_model.keras` for the final trained model).
   - Model configuration and hyperparameters saved in `model_config.txt`.

2. **Metrics and Reports:**
   - Classification report with precision, recall, and F1-score for all emotion classes.
   - Test accuracy logged.

3. **Visualizations:**
   - **Training History:** Accuracy and loss plots.
   - **Confusion Matrix:** Annotated heatmap showing classification performance.

---

### **6. Role in Your Project**
This code is the **final stage** of your speech emotion recognition pipeline:
1. **Uses Preprocessed Features:** Combines PCA-transformed features (e.g., MPS, YAMNet) as inputs.
2. **Hyperparameter Tuning:** Optimizes the model's architecture and learning rate using Bayesian optimization.
3. **Model Training:** Trains the model on the training set with early stopping and adaptive learning rate adjustments.
4. **Evaluation:** Generates metrics and visualizations to assess performance.

Let me know if you need clarifications or modifications to this code!

In [1]:
import os
import pickle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow.keras import layers, regularizers
import keras_tuner as kt
from sklearn.metrics import classification_report, confusion_matrix
from typing import Dict, Tuple, Any, List
import logging
from pathlib import Path
from dataclasses import dataclass

# Set up logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('emotion_recognition.log'),
        logging.StreamHandler()
    ]
)

# Visualization settings
plt.rcParams.update({
    'figure.dpi': 300,
    'savefig.dpi': 300,
    'figure.figsize': (10, 6),
    'font.size': 12,
    'axes.grid': True,
    'grid.alpha': 0.3
})

@dataclass
class ModelConfig:
    """Configuration for the emotion recognition model"""
    num_classes: int = 7
    input_dropout: float = 0.2
    l2_lambda: float = 0.01
    patience: int = 10
    max_trials: int = 20
    num_initial_points: int = 5
    max_epochs: int = 50
    batch_size: int = 32
    # Feature selection flags
    use_indices: bool = True
    use_mps: bool = True
    use_vggish: bool = True
    use_yamnet: bool = True

class EmotionRecognitionModel:
    """Class to handle emotion recognition model training and evaluation"""
    
    def __init__(self, 
                 data_path: str,
                 output_dir: str,
                 config: ModelConfig = ModelConfig()):
        """
        Initialize the emotion recognition model
        
        Args:
            data_path: Path to aggregated data pickle file
            output_dir: Directory for saving model outputs
            config: Model configuration
        """
        self.data_path = Path(data_path)
        self.output_dir = Path(output_dir)
        self.config = config
        
        # Create output directories
        self.model_dir = self.output_dir / 'models'
        self.plot_dir = self.output_dir / 'plots'
        for directory in [self.model_dir, self.plot_dir]:
            directory.mkdir(parents=True, exist_ok=True)
            
        # Emotion label mapping
        self.label_mapping = {
            'anger': 0,
            'boredom': 1,
            'disgust': 2,
            'fear': 3,
            'happiness': 4,
            'neutral': 5,
            'sadness': 6
        }
        
        logging.info("Model initialized")
        
    def load_and_preprocess_data(self) -> Tuple[Dict[str, np.ndarray], Dict[str, np.ndarray]]:
        """Load and preprocess the data with configurable feature selection"""
        try:
            logging.info(f"Loading data from {self.data_path}")
            with open(self.data_path, 'rb') as f:
                data = pickle.load(f)
                
            # Define feature types and their usage flags
            feature_types = {
                'indices': ('indices_raw_pca', self.config.use_indices),
                'mps': ('mps_pca', self.config.use_mps),
                'vggish': ('vggish_pca', self.config.use_vggish),
                'yamnet': ('yamnet_pca', self.config.use_yamnet)
            }
            
            # Extract and combine selected features
            features = {}
            for split in ['train', 'valid', 'test']:
                selected_features = []
                for feature_type, (feature_suffix, use_feature) in feature_types.items():
                    if use_feature:
                        feature_name = f'{split}_{feature_suffix}'
                        if feature_name in data:
                            selected_features.append(data[feature_name])
                            logging.info(f"Added {feature_type} features for {split} split: "
                                       f"shape {data[feature_name].shape}")
                        else:
                            logging.warning(f"Feature {feature_name} not found in data")
                
                if not selected_features:
                    raise ValueError("No features selected for processing")
                    
                features[split] = np.concatenate(selected_features, axis=1)
                logging.info(f"Combined {split} features shape: {features[split].shape}")
            
            # Process labels
            labels = {}
            for split in ['train', 'valid', 'test']:
                labels[split] = data[f'y_{split}'].map(self.label_mapping).values
                logging.info(f"{split} labels shape: {labels[split].shape}")
                
            self._validate_data(features, labels)
            return features, labels
            
        except Exception as e:
            logging.error(f"Error loading data: {str(e)}")
            raise

    def _validate_data(self, 
                      features: Dict[str, np.ndarray], 
                      labels: Dict[str, np.ndarray]) -> None:
        """Validate the loaded data"""
        for split in ['train', 'valid', 'test']:
            if len(features[split]) != len(labels[split]):
                raise ValueError(f"Mismatch in {split} set sizes")
            
            if not np.all(np.isfinite(features[split])):
                raise ValueError(f"Non-finite values in {split} features")
            
            unique_labels = np.unique(labels[split])
            expected_labels = np.arange(self.config.num_classes)
            if not np.array_equal(np.sort(unique_labels), expected_labels):
                raise ValueError(f"Missing classes in {split} set")

    def build_model(self, hp: kt.HyperParameters) -> tf.keras.Model:
        """Build the model with given hyperparameters"""
        model = tf.keras.Sequential()
        
        # Input layer
        model.add(layers.Input(shape=(self.input_shape,)))
        
        # Initial dropout
        model.add(layers.BatchNormalization())
        model.add(layers.Dropout(rate=self.config.input_dropout))
        
        # Hidden layers
        n_layers = hp.Int("num_layers", 1, 5)
        for i in range(n_layers):
            model.add(layers.Dense(
                units=hp.Int(f"units_{i}", 16, 512, step=16),
                activation=hp.Choice("activation", ["relu", "tanh"]),
                kernel_regularizer=regularizers.l2(self.config.l2_lambda)
            ))
            model.add(layers.BatchNormalization())
            model.add(layers.Dropout(rate=hp.Float(f"dropout_{i}", 0.0, 0.5)))
        
        # Output layer
        model.add(layers.Dense(self.config.num_classes, activation='softmax'))
        
        # Compile
        model.compile(
            optimizer=tf.keras.optimizers.Adam(
                learning_rate=hp.Float("lr", 1e-4, 1e-2, sampling="log")
            ),
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy']
        )
        
        return model
    
    

    def create_callbacks(self) -> List[tf.keras.callbacks.Callback]:
        """Create training callbacks"""
        return [
            tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                patience=self.config.patience,
                restore_best_weights=True
            ),
            tf.keras.callbacks.ModelCheckpoint(
                filepath=str(self.model_dir / 'best_model.keras'),  # Changed from .h5 to .keras
                monitor='val_accuracy',
                save_best_only=True,
                save_weights_only=False  # Save the entire model
            ),
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='val_loss',
                factor=0.5,
                patience=3,
                min_lr=1e-6
            ),
            tf.keras.callbacks.CSVLogger(
                str(self.model_dir / 'training_history.csv'),
                separator=',',
                append=False
            )
        ]
    
    # Also, let's add a method to save the model configuration:
    def _save_model_config(self, model: tf.keras.Model, hyperparameters: Dict) -> None:
        """Save model configuration and hyperparameters"""
        config_path = self.model_dir / 'model_config.txt'
        with open(config_path, 'w') as f:
            # Save model summary
            model.summary(print_fn=lambda x: f.write(x + '\n'))
            f.write('\n\nHyperparameters:\n')
            for param, value in hyperparameters.items():
                f.write(f'{param}: {value}\n')
            

    # Update the train_and_evaluate method to include the config saving:
    def train_and_evaluate(self) -> None:
        """Train and evaluate the model"""
        try:
            # Load and preprocess data
            features, labels = self.load_and_preprocess_data()
            self.input_shape = features['train'].shape[1]
            
            # Create tuner
            tuner = kt.BayesianOptimization(
                self.build_model,
                objective='val_accuracy',
                max_trials=self.config.max_trials,
                directory=str(self.model_dir),
                project_name='emotion_recognition'
            )
            
            # Search for best hyperparameters
            logging.info("Starting hyperparameter search...")
            tuner.search(
                features['train'],
                labels['train'],
                validation_data=(features['valid'], labels['valid']),
                epochs=self.config.max_epochs,
                batch_size=self.config.batch_size,
                callbacks=self.create_callbacks()
            )
            
            # Get best hyperparameters
            best_hps = tuner.get_best_hyperparameters(1)[0]
            logging.info(f"Best hyperparameters: {best_hps.values}")
            
            # Build and train final model
            best_model = tuner.hypermodel.build(best_hps)
            
            # Save model configuration
            self._save_model_config(best_model, best_hps.values)
            
            # Train model
            history = best_model.fit(
                features['train'],
                labels['train'],
                validation_data=(features['valid'], labels['valid']),
                epochs=self.config.max_epochs,
                batch_size=self.config.batch_size,
                callbacks=self.create_callbacks()
            )
            
            # Evaluate model
            self._evaluate_model(best_model, features, labels, history)
            
            # Save final model
            final_model_path = self.model_dir / 'final_model.keras'
            best_model.save(final_model_path)
            logging.info(f"Saved final model to {final_model_path}")
            
        except Exception as e:
            logging.error(f"Error in training: {str(e)}")
            raise

    def _evaluate_model(self, 
                       model: tf.keras.Model,
                       features: Dict[str, np.ndarray],
                       labels: Dict[str, np.ndarray],
                       history: tf.keras.callbacks.History) -> None:
        """Evaluate the model and create visualizations"""
        # Plot training history
        self._plot_training_history(history)
        
        # Evaluate on test set
        test_loss, test_acc = model.evaluate(features['test'], labels['test'])
        logging.info(f"Test accuracy: {test_acc:.4f}")
        
        # Generate predictions
        y_pred = np.argmax(model.predict(features['test']), axis=1)
        
        # Plot confusion matrix
        self._plot_confusion_matrix(labels['test'], y_pred)
        
        # Print classification report
        report = classification_report(
            labels['test'],
            y_pred,
            target_names=self.label_mapping.keys()
        )
        logging.info(f"\nClassification Report:\n{report}")

    def _plot_training_history(self, history: tf.keras.callbacks.History) -> None:
        """Plot training history"""
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
        
        # Accuracy plot
        ax1.plot(history.history['accuracy'], label='Train')
        ax1.plot(history.history['val_accuracy'], label='Validation')
        ax1.set_title('Model Accuracy')
        ax1.set_xlabel('Epoch')
        ax1.set_ylabel('Accuracy')
        ax1.legend()
        
        # Loss plot
        ax2.plot(history.history['loss'], label='Train')
        ax2.plot(history.history['val_loss'], label='Validation')
        ax2.set_title('Model Loss')
        ax2.set_xlabel('Epoch')
        ax2.set_ylabel('Loss')
        ax2.legend()
        
        plt.tight_layout()
        plt.savefig(self.plot_dir / 'training_history.png')
        plt.close()

    def _plot_confusion_matrix(self, y_true: np.ndarray, y_pred: np.ndarray) -> None:
        """Plot confusion matrix"""
        cm = confusion_matrix(y_true, y_pred)
        plt.figure(figsize=(10, 8))
        sns.heatmap(
            cm,
            annot=True,
            fmt='d',
            cmap='Blues',
            xticklabels=list(self.label_mapping.keys()),
            yticklabels=list(self.label_mapping.keys())
        )
        plt.title('Confusion Matrix')
        plt.xlabel('Predicted')
        plt.ylabel('True')
        plt.tight_layout()
        plt.savefig(self.plot_dir / 'confusion_matrix.png')
        plt.close()

def main():
    """Main execution function"""
    try:
        data_path = "/Users/huangjuhua/文档文稿/NYU/Time_Series/data/processed/aggregated_data.pkl"
        output_dir = "/Users/huangjuhua/文档文稿/NYU/Time_Series/results"
        
        # Configure which features to use
        config = ModelConfig(
            max_trials=20,
            max_epochs=50,
            batch_size=32,
            patience=10,
            # Feature selection
            use_indices=True,
            use_mps=True,
            use_vggish=True,
            use_yamnet=True
        )
        
        # Initialize and run model
        model = EmotionRecognitionModel(data_path, output_dir, config)
        logging.info("Starting model training and evaluation...")
        model.train_and_evaluate()
        
    except Exception as e:
        logging.error(f"Processing failed: {str(e)}")

if __name__ == "__main__":
    main()

2024-12-07 17:28:54,573 - INFO - Best hyperparameters: {'num_layers': 2, 'units_0': 64, 'activation': 'tanh', 'dropout_0': 0.048025717537726054, 'lr': 0.003982481621996575, 'units_1': 96, 'dropout_1': 0.03913898423227741, 'units_2': 112, 'dropout_2': 0.10396428308681199, 'units_3': 480, 'dropout_3': 0.11056798437483073, 'units_4': 256, 'dropout_4': 0.22641850754909082}


Trial 20 Complete [00h 00m 05s]
val_accuracy: 0.8317757248878479

Best val_accuracy So Far: 0.8598130941390991
Total elapsed time: 00h 01m 27s


Epoch 1/50
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 40ms/step - accuracy: 0.2415 - loss: 3.7959 - val_accuracy: 0.7103 - val_loss: 2.8116 - learning_rate: 0.0040
Epoch 2/50
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6843 - loss: 2.5523 - val_accuracy: 0.7383 - val_loss: 2.4748 - learning_rate: 0.0040
Epoch 3/50
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8545 - loss: 2.0248 - val_accuracy: 0.7664 - val_loss: 2.2059 - learning_rate: 0.0040
Epoch 4/50
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9205 - loss: 1.7688 - val_accuracy: 0.7570 - val_loss: 1.9934 - learning_rate: 0.0040
Epoch 5/50
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9053 - loss: 1.6152 - val_accuracy: 0.7757 - val_loss: 1.8177 - learning_rate: 0.0040
Epoch 6/50
[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/

2024-12-07 17:28:58,189 - INFO - Test accuracy: 0.7664


[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step


2024-12-07 17:28:58,764 - INFO - 
Classification Report:
              precision    recall  f1-score   support

       anger       0.69      0.80      0.74        25
     boredom       0.89      0.94      0.91        17
     disgust       1.00      0.78      0.88         9
        fear       0.78      0.50      0.61        14
   happiness       0.50      0.57      0.53        14
     neutral       0.81      0.81      0.81        16
     sadness       0.92      0.92      0.92        12

    accuracy                           0.77       107
   macro avg       0.80      0.76      0.77       107
weighted avg       0.78      0.77      0.77       107

2024-12-07 17:28:58,776 - INFO - Saved final model to /Users/huangjuhua/文档文稿/NYU/Time_Series/results/models/final_model.keras
