# üéôÔ∏è Notebook 01 ‚Äî Audio Signal Processing Demo

**Speech Emotion Recognition | PBL Project ‚Äî Semester 2**

---

## Goals
1. Load a `.wav` audio file using `librosa`
2. Visualise the **waveform** (time-domain)
3. Visualise the **spectrogram** (frequency-domain)
4. Apply preprocessing: silence trimming, normalization, pre-emphasis
5. Compare waveforms before/after preprocessing

---
> **Note:** Make sure you've downloaded a small RAVDESS subset into `data/ravdess_subset/`  
> See `data/README.md` for download instructions.

In [None]:
# Standard imports
import sys
import os

# Add project root to path so we can import src/
sys.path.insert(0, os.path.abspath('..'))

import numpy as np
import librosa
import librosa.display
import matplotlib.pyplot as plt
import IPython.display as ipd

from src.audio_processing import AudioProcessor

%matplotlib inline
plt.rcParams['figure.dpi'] = 100
plt.rcParams['figure.figsize'] = (12, 4)

print('‚úÖ Imports OK')

## 1. Pick a sample audio file

Edit `SAMPLE_WAV` to point to any `.wav` from your dataset.

In [None]:
# -----------------------------------------------------------------------
# EDIT THIS PATH to point to one of your RAVDESS .wav files
# Example: '../data/ravdess_subset/Actor_01/03-01-03-01-01-01-01.wav'
# -----------------------------------------------------------------------
SAMPLE_WAV = '../data/ravdess_subset/Actor_01/03-01-03-01-01-01-01.wav'

# Fallback: generate a synthetic tone if file not found (for demo)
if not os.path.exists(SAMPLE_WAV):
    print(f'‚ö†Ô∏è  File not found: {SAMPLE_WAV}')
    print('   Generating a synthetic 440 Hz tone for demo purposes...')
    sr = 22050
    t  = np.linspace(0, 2, 2 * sr)
    raw_signal = 0.5 * np.sin(2 * np.pi * 440 * t) + 0.1 * np.random.randn(len(t))
    USING_SYNTHETIC = True
else:
    raw_signal, sr = librosa.load(SAMPLE_WAV, sr=22050)
    USING_SYNTHETIC = False
    print(f'‚úÖ Loaded: {SAMPLE_WAV}')

print(f'   Sample rate : {sr} Hz')
print(f'   Duration    : {len(raw_signal)/sr:.2f} s')
print(f'   Samples     : {len(raw_signal)}')

## 2. Listen to the audio

In [None]:
ipd.Audio(raw_signal, rate=sr)

## 3. Time-domain Waveform

In [None]:
fig, ax = plt.subplots(figsize=(12, 3))
times = np.linspace(0, len(raw_signal) / sr, len(raw_signal))
ax.plot(times, raw_signal, color='royalblue', linewidth=0.6)
ax.set_xlabel('Time (seconds)')
ax.set_ylabel('Amplitude')
ax.set_title('üîä Raw Audio Waveform')
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

## 4. Frequency-domain: Spectrogram

A spectrogram shows **how frequencies change over time** using the Short-Time Fourier Transform (STFT).

In [None]:
# Compute and plot spectrogram
D = librosa.stft(raw_signal)              # Complex STFT
D_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)  # Convert to dB scale

fig, ax = plt.subplots(figsize=(12, 4))
img = librosa.display.specshow(D_db, sr=sr, x_axis='time', y_axis='hz',
                                cmap='magma', ax=ax)
fig.colorbar(img, ax=ax, format='%+2.0f dB')
ax.set_title('üìä Log-Power Spectrogram')
plt.tight_layout()
plt.show()

## 5. Mel Spectrogram

The **Mel scale** maps frequencies to match human perception. It's the foundation for MFCC features.

In [None]:
mel = librosa.feature.melspectrogram(y=raw_signal, sr=sr, n_mels=128)
mel_db = librosa.power_to_db(mel, ref=np.max)

fig, ax = plt.subplots(figsize=(12, 4))
img = librosa.display.specshow(mel_db, sr=sr, x_axis='time', y_axis='mel',
                                cmap='viridis', ax=ax)
fig.colorbar(img, ax=ax, format='%+2.0f dB')
ax.set_title('üéõÔ∏è Mel Spectrogram')
plt.tight_layout()
plt.show()

## 6. Preprocessing Pipeline

Using our `AudioProcessor` class:
- **Trim silence** ‚Äî remove leading/trailing quiet parts
- **Normalize** ‚Äî scale to [-1, 1]
- **Pre-emphasis** ‚Äî boost high frequencies before MFCC

In [None]:
proc = AudioProcessor(sample_rate=22050, duration=3.0)

# Step 1: Trim silence
trimmed = proc.remove_silence(raw_signal)

# Step 2: Normalize
normalized = proc.normalize(trimmed)

# Step 3: Pre-emphasis
preemph = proc.apply_preemphasis(normalized, coeff=0.97)

# Visualise each step
fig, axes = plt.subplots(3, 1, figsize=(12, 8), sharex=False)

for ax, sig, title, color in zip(
    axes,
    [raw_signal, normalized, preemph],
    ['Raw Signal', 'After Trim + Normalize', 'After Pre-emphasis'],
    ['gray', 'royalblue', 'tomato']
):
    t = np.linspace(0, len(sig)/sr, len(sig))
    ax.plot(t, sig, color=color, linewidth=0.6)
    ax.set_title(title)
    ax.set_ylabel('Amplitude')
    ax.grid(alpha=0.3)

axes[-1].set_xlabel('Time (seconds)')
plt.suptitle('Preprocessing Steps', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print(f'Original length : {len(raw_signal)/sr:.2f}s ({len(raw_signal)} samples)')
print(f'After trimming  : {len(trimmed)/sr:.2f}s ({len(trimmed)} samples)')

## 7. Compare: Happy vs Sad

Load two different emotion clips and compare their waveforms & spectrograms side by side.

In [None]:
# -----------------------------------------------------------------------
# EDIT these paths to point to one HAPPY and one SAD file in your dataset
# -----------------------------------------------------------------------
HAPPY_WAV  = '../data/ravdess_subset/Actor_01/03-01-03-01-01-01-01.wav'  # emotion code 03 = happy
SAD_WAV    = '../data/ravdess_subset/Actor_01/03-01-04-01-01-01-01.wav'  # emotion code 04 = sad

clips = {'Happy': HAPPY_WAV, 'Sad': SAD_WAV}
loaded = {}

for label, path in clips.items():
    if os.path.exists(path):
        sig, _ = librosa.load(path, sr=22050)
        loaded[label] = sig
        print(f'  {label}: loaded {len(sig)/22050:.2f}s')
    else:
        print(f'  {label}: file not found ({path}). Skipping.')

if len(loaded) == 2:
    fig, axes = plt.subplots(2, 2, figsize=(14, 6))
    colors = ['seagreen', 'crimson']
    labels = list(loaded.keys())
    signals = list(loaded.values())

    for i, (label, sig, color) in enumerate(zip(labels, signals, colors)):
        # Waveform
        t = np.linspace(0, len(sig)/22050, len(sig))
        axes[i, 0].plot(t, sig, color=color, linewidth=0.6)
        axes[i, 0].set_title(f'{label} ‚Äî Waveform')
        axes[i, 0].set_ylabel('Amplitude')

        # Mel spectrogram
        mel = librosa.feature.melspectrogram(y=sig, sr=22050, n_mels=64)
        mel_db = librosa.power_to_db(mel, ref=np.max)
        librosa.display.specshow(mel_db, sr=22050, x_axis='time', y_axis='mel',
                                  ax=axes[i, 1], cmap='viridis')
        axes[i, 1].set_title(f'{label} ‚Äî Mel Spectrogram')

    for ax in axes[-1]:
        ax.set_xlabel('Time (s)')

    plt.suptitle('Happy vs Sad ‚Äî Waveform & Spectrogram', fontsize=13, fontweight='bold')
    plt.tight_layout()
    plt.show()
else:
    print('‚ö†Ô∏è  Both files needed for comparison ‚Äî update paths above.')

## 8. Key Takeaways

| Concept | What we saw |
|---------|-------------|
| Waveform | Amplitude over time ‚Äî energy bursts visible |
| Spectrogram | Frequency content over time (STFT-based) |
| Mel Spectrogram | Perceptually scaled ‚Äî better for speech |
| Pre-emphasis | Boosts higher frequencies before feature extraction |
| Silence trimming | Removes non-speech regions to focus the model |

üëâ **Next:** Notebook 02 ‚Äî Feature Extraction (MFCCs, Pitch, Energy)

---
*PBL Project | Speech Emotion Recognition | Semester 2*