# Signal processing pipeline

/Users/huangjuhua/文档文稿/NYU/Time_Series/data/EMODB/03a01Fa.wav

/Users/huangjuhua/文档文稿/NYU/Time_Series/data/EMODB_test

### Signal Processing Pipeline
1.	Prepares the dataset for training by extracting audio features (e.g., mel spectrograms, indices, and MPS) crucial for emotion detection. Uses libraries like `librosa` for mel spectrograms, `maad` for indices, and `soundsig` for MPS calculations.
2.	Loads the audio signal at a sampling rate of 16k and converts it to mono using librosa.
3.	Normalizes audio to 0.1 and handles necessary transformations (e.g., removing DC offset, standardizing the signal) to ensure consistency.


### **Signal Processing Pipeline**
#### **1. Feature Extraction for Emotion Detection**
   - **Purpose:** Extract meaningful representations from raw audio signals that capture patterns and nuances linked to different emotions. These representations become the input features for your machine learning models.
   - **Steps:**
     - **Mel Spectrograms:**
       - Converts audio into a time-frequency representation by mimicking the human ear's perception of sound frequencies.
       - Extracted using `librosa`, it highlights emotional cues such as pitch, rhythm, and timbre changes.
       - Why? Emotions often influence voice pitch and energy patterns, which are well-captured by mel spectrograms.
     - **Indices (Temporal and Spectral):**
       - Extracted using `maad`, these features quantify variations in temporal and spectral characteristics of audio.
       - Includes indices like zero-crossing rate and spectral flatness, which correlate with speech intensity and sound "roughness."
       - Why? Temporal and spectral indices are useful for distinguishing between emotions like anger (high intensity) and sadness (low intensity).
     - **Modulation Power Spectrum (MPS):**
       - Captures amplitude modulations in the audio signal using `soundsig`.
       - Why? Amplitude modulations are linked to prosody, rhythm, and phrasing, all of which are critical for emotional expression.

---

#### **2. Audio Signal Loading**
   - **Purpose:** Ensure the audio data is consistently processed, which is essential for reproducibility and model performance.
   - **Steps:**
     - **Sampling Rate:** Downsamples audio to 16 kHz.
       - Why? Many emotion recognition tasks operate effectively at 16 kHz, which balances computational efficiency and sufficient audio resolution.
     - **Mono Conversion:**
       - Converts stereo audio to mono using `librosa`.
       - Why? Emotion-related features are typically present in both channels of stereo recordings. Mono processing simplifies computation without losing critical information.

---

#### **3. Audio Normalization and Transformation**
   - **Purpose:** Standardize the audio data to minimize noise and inconsistencies, making the input signals comparable across samples.
   - **Steps:**
     - **Remove DC Offset:**
       - Subtracts the mean of the audio signal.
       - Why? Removes low-frequency artifacts caused by recording equipment that can skew feature extraction.
     - **Normalization:**
       - Scales the audio signal to have a Root Mean Square (RMS) amplitude of 0.1.
       - Why? Ensures uniform loudness across all samples, which helps the model focus on emotional patterns rather than volume differences.
     - **Handling Inconsistencies:**
       - Pads or truncates signals to a fixed length (e.g., 10 seconds).
       - Why? Ensures all audio inputs have consistent dimensions, which is necessary for batch processing and compatibility with machine learning models.

---

By following this pipeline, you're not only standardizing the audio data but also extracting **emotion-rich features** that can help the model differentiate between subtle emotional cues in speech. Let me know if you'd like to refine any part of this further!

### Code Functionality Breakdown

#### **1. Importing Libraries and Setting Up Logging**
- **Purpose:** 
  - Loads libraries like `librosa`, `numpy`, `pandas`, and `matplotlib` for audio processing, feature extraction, data manipulation, and visualization.
  - Logging is configured to capture the flow and errors, storing logs in a file (`signal_processing.log`) and displaying them on the console.

#### **2. `FeatureConfig` Class**
- **Purpose:** 
  - Provides a configuration to specify which features to compute during the feature engineering process.
  - Default settings:
    - **`compute_melspectrogram = True`**: Compute mel spectrograms.
    - **`compute_maad_indices = True`**: Extract temporal and spectral indices using the `maad` library.
    - **`compute_mps = True`**: Calculate Modulation Power Spectrum (MPS).
    - **`compute_power_spectrum = False`**: Skip power spectrum as it's only for reference.

#### **3. `feature_eng` Function**
- **Purpose:** Extracts features from an audio signal based on the configurations in `FeatureConfig`.
- **Functionality:**
  - Computes power spectrum, mel spectrogram, temporal/spectral indices, and modulation power spectrum.
  - Uses libraries like `librosa` for mel spectrograms, `maad` for indices, and `soundsig` for MPS calculations.
  - **Visualization:** Optional feature to visualize waveform, mel spectrogram, and MPS.

#### **4. `preproc` Function**
- **Purpose:** Handles preprocessing for each audio file.
- **Steps:**
  1. **Audio Loading:** Uses `librosa.load()` to load audio files, down-sample to 16 kHz, and limit the duration to 10 seconds.
  2. **Normalization:** Removes DC offset and normalizes the audio.
  3. **Feature Extraction:** Calls `feature_eng` to extract features like mel spectrogram, indices, and MPS.
  4. **Output:** Returns a dictionary containing the sampling frequency, processed audio, and features.

#### **5. `visualize_features` Function**
- **Purpose:** Visualizes the waveform, mel spectrogram, and MPS if available.

#### **6. `main` Function**
- **Purpose:** Main entry point for batch processing all audio files listed in a CSV file.
- **Steps:**
  1. **Reads CSV:** Reads metadata (file paths, emotion labels, and gender) from a CSV file.
  2. **Iterates Through Files:** Processes each audio file using the `preproc` function.
  3. **Save Results:** Saves the processed features in `.pkl` format with the emotion label and gender appended.

#### **7. CSV File and Output Directory**
- **CSV File:** 
  - Path: `emodb_features.csv`
  - Contains columns like `path` (file path), `emotion_label`, and `gender`.
- **Output Directory:** 
  - Path: `EMODB_preprocessed_pkl`
  - Stores processed `.pkl` files with features, emotion labels, and gender for each audio file.

#### **8. Usage Flow in `__main__`**
- Reads the dataset metadata from `emodb_features.csv`.
- Extracts features from each audio file and saves the results in the `EMODB_preprocessed_pkl` directory.
- Logs progress, errors, and performance metrics for each file.

---

### **How This Fits in Your Project**
This code is a **signal processing pipeline** for your **deep learning speech emotion recognition project**. Here's how it contributes:

1. **Feature Extraction:** 
   - Prepares the dataset for training by extracting audio features (e.g., mel spectrograms, indices, and MPS) crucial for emotion detection.
   - These features serve as the input for your CNN or DNN model.

2. **Data Preprocessing:**
   - Normalizes audio and handles necessary transformations (e.g., removing DC offset, standardizing the signal) to ensure consistency.

3. **Batch Processing:**
   - Efficiently processes and organizes all audio files, saving the features in `.pkl` format for future training or testing.

4. **Error Handling & Logging:**
   - Ensures robust error detection and debugging with a well-defined logging system.

Let me know if you want clarification on any specific part or further insights!

In [1]:
from __future__ import division
import os
import pickle
import time
import pandas as pd
import librosa
import numpy as np
import matplotlib.pyplot as plt
import warnings
import gc
from tqdm import tqdm
from typing import Dict, Any, Optional
import logging

# Set up logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('signal_processing.log'),
        logging.StreamHandler()
    ]
)

class FeatureConfig:
    """Configuration class for feature extraction"""
    def __init__(self):
        self.compute_power_spectrum = False  # Since it's for reference only
        self.compute_melspectrogram = True
        self.compute_maad_indices = True
        self.compute_mps = True
        self.visualization = False

def feature_eng(y: np.ndarray, fs: int, config: FeatureConfig) -> Dict[str, Any]:
    """
    Enhanced feature engineering with configurable components
    
    Args:
        y: Audio signal
        fs: Sampling frequency
        config: FeatureConfig object specifying which features to compute
    
    Returns:
        Dictionary containing computed features
    """
    features = {}
    
    try:
        if config.compute_power_spectrum:
            # Power spectrum computation
            ps = np.abs(np.fft.fft(y))**2
            time_step = 1/fs
            freqs = np.fft.fftfreq(y.size, time_step)
            features['ps'] = ps[0:int((len(ps)/2)-1)]
            features['freqs'] = freqs[0:int((len(freqs)/2)-1)]

        if config.compute_melspectrogram:
            # Mel spectrogram computation
            S = librosa.feature.melspectrogram(y=y, sr=fs)
            features['S_dB'] = librosa.power_to_db(S, ref=np.max)

        if config.compute_maad_indices:
            # Compute indices using maad
            import maad
            Sxx_power, tn, fn, ext = maad.sound.spectrogram(y, fs, mode='psd')
            df_temporal_indices = maad.features.all_temporal_alpha_indices(y, fs)
            df_spectral_indices, _ = maad.features.all_spectral_alpha_indices(
                Sxx_power, tn, fn, extent=ext
            )
            features['indices'] = pd.concat(
                [df_temporal_indices, df_spectral_indices], 
                axis=1
            )

        if config.compute_mps:
            # Modulation Power Spectrum computation
            from soundsig.sound import BioSound
            myBioSound = BioSound(soundWave=y, fs=fs)
            myBioSound.mpsCalc(window=1, Norm=True)
            
            # Reduce dimension of MPS to one quadrant
            len1 = int(len(myBioSound.wf - 1)/2)
            len2 = int(len(myBioSound.wt - 1)/2)
            quad1 = myBioSound.mps[len1:, len2:]
            quad2 = np.fliplr(myBioSound.mps[len1:, :len2+1])
            features['mps'] = (quad1 + quad2) / 2
            features['wf'] = myBioSound.wf[len1:]
            features['wt'] = myBioSound.wt[len2:]

        if config.visualization:
            visualize_features(y, fs, features)

    except Exception as e:
        logging.error(f"Error in feature engineering: {str(e)}")
        raise

    return features

def preproc(file: str, fs: int = 16000, config: Optional[FeatureConfig] = None) -> Dict[str, Any]:
    """
    Enhanced preprocessing function with error handling
    
    Args:
        file: Path to audio file
        fs: Sampling frequency
        config: FeatureConfig object
    
    Returns:
        Dictionary containing processed audio and features
    """
    if config is None:
        config = FeatureConfig()
    
    try:
        # Load and preprocess audio
        raw_y, fs = librosa.load(file, sr=fs, duration=10, mono=True)
        
        # Remove DC offset and normalize
        y_mono_rs = raw_y - np.mean(raw_y)
        rms = np.sqrt(np.mean(y_mono_rs**2))
        y = y_mono_rs/(rms/0.1)
        
        # Extract features
        features = feature_eng(y, fs, config)
        
        output = {
            'fs': fs,
            'y': y,
            **features
        }
        
        return output
    
    except Exception as e:
        logging.error(f"Error processing file {file}: {str(e)}")
        raise

def visualize_features(y: np.ndarray, fs: int, features: Dict[str, Any]) -> None:
    """Visualize extracted features"""
    plt.figure(figsize=(15, 5))
    
    # Plot waveform
    plt.subplot(131)
    plt.plot(np.arange(0, y.size/fs, 1/fs), y)
    plt.title('Waveform')
    
    # Plot mel spectrogram if available
    if 'S_dB' in features:
        plt.subplot(132)
        librosa.display.specshow(
            features['S_dB'], 
            x_axis='time', 
            y_axis='mel', 
            sr=fs, 
            fmax=fs/2
        )
        plt.colorbar(format='%+2.0f dB')
        plt.title('Mel-frequency spectrogram')
    
    # Plot MPS if available
    if 'mps' in features:
        plt.subplot(133)
        plt.imshow(
            10.0 * np.log10(features['mps']), 
            aspect='auto', 
            origin='lower'
        )
        plt.colorbar()
        plt.title('Modulation Power Spectrum')
    
    plt.tight_layout()
    plt.show()

def main(csv_path: str, output_dir: str):
    """Main processing function"""
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)
    
    # Read CSV file
    try:
        df = pd.read_csv(csv_path)
    except Exception as e:
        logging.error(f"Error reading CSV file: {str(e)}")
        return
    
    # Initialize feature configuration
    config = FeatureConfig()
    
    # Process each file
    for idx, row in tqdm(df.iterrows(), total=len(df), desc="Processing files"):
        file_path = row['path']
        save_file_name = os.path.join(
            output_dir, 
            os.path.splitext(os.path.basename(file_path))[0] + '.pkl'
        )
        
        # Skip if already processed
        if os.path.isfile(save_file_name):
            continue
            
        try:
            start_time = time.time()
            
            # Process file
            output = preproc(file_path, config=config)
            output.update({
                'emotion_label': row['emotion_label'],
                'gender': row['gender']
            })
            
            # Save results
            with open(save_file_name, 'wb') as f:
                pickle.dump(output, f)
                
            processing_time = time.time() - start_time
            logging.info(f"Processed {file_path} in {processing_time:.2f} seconds")
            
        except Exception as e:
            logging.error(f"Error processing {file_path}: {str(e)}")
            continue
        
        # Clean up memory
        gc.collect()

if __name__ == "__main__":
    csv_path = "/Users/huangjuhua/文档文稿/NYU/Time_Series/data/emodb_features.csv"
    output_dir = "/Users/huangjuhua/文档文稿/NYU/Time_Series/data/EMODB_preprocessed_pkl"
    
    main(csv_path, output_dir)

Processing files:   0%|          | 0/535 [00:00<?, ?it/s]2024-12-07 15:30:32,171 - INFO - Processed /Users/huangjuhua/文档文稿/NYU/Time_Series/data/EMODB_preprocessed_wav/16a02Lb.wav in 1.41 seconds
Processing files:   0%|          | 1/535 [00:01<12:59,  1.46s/it]2024-12-07 15:30:32,437 - INFO - Processed /Users/huangjuhua/文档文稿/NYU/Time_Series/data/EMODB_preprocessed_wav/14a07Wc.wav in 0.22 seconds
Processing files:   0%|          | 2/535 [00:01<06:43,  1.32it/s]2024-12-07 15:30:32,681 - INFO - Processed /Users/huangjuhua/文档文稿/NYU/Time_Series/data/EMODB_preprocessed_wav/10a07Ad.wav in 0.19 seconds
Processing files:   1%|          | 3/535 [00:01<04:38,  1.91it/s]2024-12-07 15:30:33,129 - INFO - Processed /Users/huangjuhua/文档文稿/NYU/Time_Series/data/EMODB_preprocessed_wav/13a05Ea.wav in 0.40 seconds
Processing files:   1%|          | 4/535 [00:02<04:20,  2.04it/s]2024-12-07 15:30:33,568 - INFO - Processed /Users/huangjuhua/文档文稿/NYU/Time_Series/data/EMODB_preprocessed_wav/14a05Wa.wav in 0.40 s