# ASD/ADHD Voice Detection - Phase 1: Feature Extraction Tutorial

This notebook walks you through **step-by-step feature extraction** from raw audio files. 

**Key Goals:**
- Understand how audio features are computed
- Visualize what each feature type captures
- Learn MFCC, Spectral, and Prosodic features
- See the 106-feature vector that feeds the MLP model
- Identify features for refinement

**What You'll Learn:**
1. How audio preprocessing standardizes samples
2. How MFCC features capture speech dynamics
3. How spectral features describe frequency content
4. How prosodic features reveal voice patterns
5. How features combine into a single 106-dimensional vector

**Let's get started!**

In [None]:
import sys
import os
import warnings
warnings.filterwarnings('ignore')
!pip install matplotlib
!pip install seaborn
!pip install librosa
%pip install soundfile
!pip install numpy
# Add parent directory to path so we can import our modules
sys.path.insert(0, os.path.abspath('../..'))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.gridspec import GridSpec
import librosa
import librosa.display
import soundfile as sf

# Import our custom modules
from ASD_ADHD_Detection.config.config import config
from ASD_ADHD_Detection.src.preprocessing.audio_preprocessor import AudioPreprocessor
from ASD_ADHD_Detection.src.feature_extraction.mfcc_extractor import MFCCExtractor
from ASD_ADHD_Detection.src.feature_extraction.spectral_extractor import SpectralExtractor
from ASD_ADHD_Detection.src.feature_extraction.prosodic_extractor import ProsodicExtractor

# Set up plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("‚úÖ All imports successful!")
print(f"üìä Numpy version: {np.__version__}")
print(f"üéµ Librosa version: {librosa.__version__}")
print(f"‚öôÔ∏è  Config loaded successfully")

## Section 1: Create Synthetic Audio Sample

Since we don't have real audio files yet, we'll create a synthetic voice sample that contains typical speech characteristics. This lets us understand the pipeline before working with real data.

**What we're creating:**
- A synthetic voice sample at 16 kHz (standard speech rate)
- 5 seconds of audio
- Contains pitch variations and frequency content
- Will be processed through our full pipeline

In [None]:
# Create synthetic voice sample
sr = config.audio.SAMPLE_RATE  # 16000 Hz
duration = config.audio.DURATION  # 5 seconds
t = np.linspace(0, duration, int(sr * duration), False)

# Create a voice-like signal with multiple components
# - Base pitch around 120 Hz (male voice)
# - Pitch variations (natural prosody)
# - Formants (voice characteristics)
# - Some noise (realistic voice)

fundamental_freq = 120  # Hz
pitch_variation = 20 * np.sin(2 * np.pi * 1.5 * t)  # 1.5 Hz vibrato
f0 = fundamental_freq + pitch_variation

# Create harmonic content (fundamental + harmonics)
audio = np.zeros_like(t)
for harmonic in range(1, 8):  # 7 harmonics
    amplitude = 1.0 / harmonic  # Amplitude decreases with harmonics
    audio += amplitude * np.sin(2 * np.pi * f0 * harmonic * t)

# Add spectral envelope (formants)
# Formant frequencies (typical for vowel sounds)
formants = [200, 1200, 2300]  # Hz
for formant_freq in formants:
    bandwidth = 50  # Hz
    envelope = np.exp(-np.pi * (np.arange(len(t)) / sr - 2.5) ** 2 / 0.5)
    audio += 0.1 * np.sin(2 * np.pi * formant_freq * t) * envelope

# Add realistic noise
noise = np.random.normal(0, 0.02, len(audio))
audio = audio + noise

# Normalize
audio = audio / np.max(np.abs(audio)) * 0.9

print(f"‚úÖ Synthetic audio created!")
print(f"   Duration: {duration} seconds")
print(f"   Sample rate: {sr} Hz")
print(f"   Total samples: {len(audio)}")
print(f"   Audio range: [{audio.min():.3f}, {audio.max():.3f}]")
print(f"   RMS energy: {np.sqrt(np.mean(audio**2)):.3f}")

In [None]:
# Visualize the synthetic audio
fig, axes = plt.subplots(3, 1, figsize=(14, 8))

# Time domain waveform
axes[0].plot(t, audio, linewidth=0.5, alpha=0.8)
axes[0].set_title('Raw Audio Waveform (Time Domain)', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Time (s)')
axes[0].set_ylabel('Amplitude')
axes[0].grid(True, alpha=0.3)

# Spectrogram
D = librosa.stft(audio)
S_db = librosa.power_to_db(np.abs(D)**2, ref=np.max)
img = librosa.display.specshow(S_db, sr=sr, x_axis='time', y_axis='log', ax=axes[1], cmap='viridis')
axes[1].set_title('Spectrogram (Frequency Content Over Time)', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Frequency (Hz)')
plt.colorbar(img, ax=axes[1], format='%+2.0f dB')

# Power spectrum (average across time)
freqs = np.fft.rfftfreq(len(audio), 1/sr)
spectrum = np.abs(np.fft.rfft(audio))**2
axes[2].semilogy(freqs[:5000], spectrum[:5000], linewidth=1, alpha=0.8)
axes[2].set_title('Power Spectrum (Average Frequency Content)', fontsize=12, fontweight='bold')
axes[2].set_xlabel('Frequency (Hz)')
axes[2].set_ylabel('Power')
axes[2].grid(True, alpha=0.3, which='both')

plt.tight_layout()
plt.show()

print("üìä Audio visualization complete!")

## Section 2: Audio Preprocessing

Before we extract features, we must standardize the audio:
1. **Trim Silence** - Remove quiet sections at beginning/end
2. **Normalize** - Scale amplitude to [-1, 1] range
3. **Pad/Truncate** - Ensure exactly 5 seconds

This ensures all audio inputs are consistent shape and quality.

**Why this matters for ASD/ADHD detection:**
- Silence trimming focuses on actual speech content
- Normalization prevents volume variation from affecting features
- Fixed length ensures batch processing compatibility

In [None]:
# Step 1: Trim Silence
preprocessor = AudioPreprocessor(config)

# Before preprocessing
print("BEFORE PREPROCESSING:")
print(f"   Duration: {len(audio) / sr:.2f} seconds")
print(f"   Sample count: {len(audio)}")
print(f"   Amplitude range: [{audio.min():.4f}, {audio.max():.4f}]")
print(f"   RMS energy: {np.sqrt(np.mean(audio**2)):.4f}")

# Apply preprocessing
audio_trimmed = librosa.effects.trim(audio, top_db=config.audio.TRIM_THRESHOLD_DB)[0]
audio_normalized = audio_trimmed / np.max(np.abs(audio_trimmed))
target_samples = int(sr * config.audio.DURATION)
if len(audio_normalized) < target_samples:
    audio_processed = np.pad(audio_normalized, (0, target_samples - len(audio_normalized)), mode='constant')
elif len(audio_normalized) > target_samples:
    audio_processed = audio_normalized[:target_samples]
else:
    audio_processed = audio_normalized

print("\nAFTER PREPROCESSING:")
print(f"   Duration: {len(audio_processed) / sr:.2f} seconds")
print(f"   Sample count: {len(audio_processed)}")
print(f"   Amplitude range: [{audio_processed.min():.4f}, {audio_processed.max():.4f}]")
print(f"   RMS energy: {np.sqrt(np.mean(audio_processed**2)):.4f}")

print("\n‚úÖ Audio preprocessing complete!")

## Section 3: MFCC Feature Extraction (52 Features)

**What are MFCCs?**
Mel-Frequency Cepstral Coefficients (MFCCs) are the most popular speech features. They:
- Mimic human hearing by scaling frequencies to mel-scale (logarithmic)
- Capture spectral characteristics that distinguish phonemes
- Include velocity (delta) and acceleration (delta-delta) for dynamics

**Our configuration:**
- **Base MFCCs:** 13 coefficients (0-12)
- **Delta (Œî):** 13 first-order derivatives (change over time)
- **Delta-Delta (ŒîŒî):** 13 second-order derivatives (acceleration)
- **Statistics:** mean, std, min, max per feature type
- **Total:** 13 √ó 4 = 52 features

**Why MFCC for ASD/ADHD detection:**
- MFCCs capture voice quality changes in autism spectrum disorders
- Deltas reveal speech rate variations
- Delta-deltas show prosodic changes and speech fluency issues

In [None]:
# Extract MFCC features
mfcc_extractor = MFCCExtractor(config)

# Step 1: Extract base MFCCs
print("üéµ EXTRACTING MFCC FEATURES...")
mfcc = mfcc_extractor.extract_mfcc(audio_processed, sr)
print(f"   Base MFCCs shape: {mfcc.shape}")  # Should be (13, time_steps)
print(f"   Time steps: {mfcc.shape[1]}")

# Step 2: Extract delta (velocity)
delta = mfcc_extractor.extract_delta(mfcc, order=1)
print(f"   Delta (Œî) shape: {delta.shape}")  # Should be (13, time_steps)

# Step 3: Extract delta-delta (acceleration)
delta_delta = mfcc_extractor.extract_delta(mfcc, order=2)
print(f"   Delta-Delta (ŒîŒî) shape: {delta_delta.shape}")  # Should be (13, time_steps)

# Step 4: Aggregate to 52 features
mfcc_features = mfcc_extractor.extract(audio_processed, sr)
mfcc_names = mfcc_extractor.get_feature_names()

print(f"\n‚úÖ MFCC extraction complete!")
print(f"   Total MFCC features: {len(mfcc_features)}")
print(f"   Features shape: {mfcc_features.shape}")
print(f"   Feature vector (first 10 values):\n   {mfcc_features[:10]}")
print(f"\n   Feature names (first 10):")
for i, name in enumerate(mfcc_names[:10]):
    print(f"     [{i:2d}] {name:30s} = {mfcc_features[i]:8.4f}")

In [None]:
# Visualize MFCC components
fig, axes = plt.subplots(3, 1, figsize=(14, 10))

# Plot base MFCCs
im1 = axes[0].imshow(mfcc, aspect='auto', origin='lower', cmap='viridis', interpolation='nearest')
axes[0].set_title('Base MFCCs (13 Coefficients √ó Time)', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Time Frame')
axes[0].set_ylabel('MFCC Coefficient')
axes[0].set_yticks(range(13))
plt.colorbar(im1, ax=axes[0])

# Plot delta
im2 = axes[1].imshow(delta, aspect='auto', origin='lower', cmap='viridis', interpolation='nearest')
axes[1].set_title('Delta - MFCC Velocity (Rate of Change)', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Time Frame')
axes[1].set_ylabel('MFCC Coefficient')
axes[1].set_yticks(range(13))
plt.colorbar(im2, ax=axes[1])

# Plot delta-delta
im3 = axes[2].imshow(delta_delta, aspect='auto', origin='lower', cmap='viridis', interpolation='nearest')
axes[2].set_title('Delta-Delta - MFCC Acceleration (Rate of Change of Rate)', fontsize=12, fontweight='bold')
axes[2].set_xlabel('Time Frame')
axes[2].set_ylabel('MFCC Coefficient')
axes[2].set_yticks(range(13))
plt.colorbar(im3, ax=axes[2])

plt.tight_layout()
plt.show()

print("üìä MFCC visualization complete!")

In [None]:
# Visualize MFCC feature statistics
fig, axes = plt.subplots(2, 2, figsize=(14, 8))
fig.suptitle('MFCC Feature Statistics (52 Features)', fontsize=14, fontweight='bold')

# Extract statistics groups
base_stats = mfcc_features[:13]
delta_stats = mfcc_features[13:26]
delta_delta_stats = mfcc_features[26:39]
other_stats = mfcc_features[39:52]

# Plot 1: Base MFCCs
axes[0, 0].bar(range(13), base_stats, alpha=0.7, color='steelblue')
axes[0, 0].set_title('Base MFCC Coefficients (Mean)', fontsize=11)
axes[0, 0].set_xlabel('Coefficient Index')
axes[0, 0].set_ylabel('Mean Value')
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Delta
axes[0, 1].bar(range(13), delta_stats, alpha=0.7, color='coral')
axes[0, 1].set_title('Delta - Velocity (Mean)', fontsize=11)
axes[0, 1].set_xlabel('Coefficient Index')
axes[0, 1].set_ylabel('Mean Value')
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Delta-Delta
axes[1, 0].bar(range(13), delta_delta_stats, alpha=0.7, color='lightgreen')
axes[1, 0].set_title('Delta-Delta - Acceleration (Mean)', fontsize=11)
axes[1, 0].set_xlabel('Coefficient Index')
axes[1, 0].set_ylabel('Mean Value')
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: Additional statistics
stat_labels = ['Std', 'Min', 'Max'] + ['Reserved'] * 10
axes[1, 1].bar(range(len(other_stats)), other_stats, alpha=0.7, color='mediumpurple')
axes[1, 1].set_title('Additional Statistics', fontsize=11)
axes[1, 1].set_xlabel('Statistic Index')
axes[1, 1].set_ylabel('Value')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("‚úÖ MFCC statistics visualization complete!")

## Section 4: Spectral Feature Extraction (24 Features)

**What are Spectral Features?**
Spectral features describe the frequency content of speech:
- **Centroid:** Where is the "center of mass" of the spectrum?
- **Rolloff:** At what frequency do 95% of energy concentrate?
- **Bandwidth:** How spread out is the energy?
- **Zero-Crossing Rate (ZCR):** How many times does the signal cross zero?
- **Energy:** How much power in the signal?
- **Chroma:** Distribution across 12 pitch classes

**Our configuration:**
- **6 base spectral features** (centroid, rolloff, bandwidth, ZCR, RMS energy, log-energy)
- **12 chroma features** (pitch class distribution)
- **Statistics:** mean, std, min, max per feature
- **Total:** (6 + 12) √ó 4 / variations = 24 features (simplified version)

**Why for ASD/ADHD detection:**
- Spectral centroid changes with voice tension and emotional state
- Energy variations reveal speech rate and breathing patterns
- Chroma features reflect pitch variations associated with prosodic disorders

In [None]:
# Extract spectral features
spectral_extractor = SpectralExtractor(config)

print("üéµ EXTRACTING SPECTRAL FEATURES...")

# Extract individual spectral components
spectral_centroid = spectral_extractor.extract_spectral_centroid(audio_processed)
spectral_rolloff = spectral_extractor.extract_spectral_rolloff(audio_processed)
spectral_bandwidth = spectral_extractor.extract_spectral_bandwidth(audio_processed)
zcr = spectral_extractor.extract_zcr(audio_processed)
rms_energy = spectral_extractor.extract_rms_energy(audio_processed)
chroma = spectral_extractor.extract_chroma(audio_processed)

print(f"   Spectral Centroid shape: {spectral_centroid.shape}")
print(f"   Spectral Rolloff shape: {spectral_rolloff.shape}")
print(f"   Spectral Bandwidth shape: {spectral_bandwidth.shape}")
print(f"   Zero-Crossing Rate shape: {zcr.shape}")
print(f"   RMS Energy shape: {rms_energy.shape}")
print(f"   Chroma shape: {chroma.shape}")

# Extract complete spectral features
spectral_features = spectral_extractor.extract(audio_processed, sr)
spectral_names = spectral_extractor.get_feature_names()

print(f"\n‚úÖ Spectral extraction complete!")
print(f"   Total spectral features: {len(spectral_features)}")
print(f"   Features shape: {spectral_features.shape}")
print(f"   Feature vector (first 10 values):\n   {spectral_features[:10]}")
print(f"\n   Feature names (first 10):")
for i, name in enumerate(spectral_names[:10]):
    print(f"     [{i:2d}] {name:30s} = {spectral_features[i]:8.4f}")

In [None]:
# Visualize spectral components
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
fig.suptitle('Spectral Features Over Time', fontsize=14, fontweight='bold')

time_axis = np.linspace(0, config.audio.DURATION, len(spectral_centroid))

# Plot 1: Spectral Centroid
axes[0, 0].plot(time_axis, spectral_centroid, linewidth=1.5, color='steelblue')
axes[0, 0].set_title('Spectral Centroid (Hz)', fontsize=11, fontweight='bold')
axes[0, 0].set_xlabel('Time (s)')
axes[0, 0].set_ylabel('Frequency (Hz)')
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].fill_between(time_axis, spectral_centroid, alpha=0.3)

# Plot 2: Spectral Rolloff
axes[0, 1].plot(time_axis, spectral_rolloff, linewidth=1.5, color='coral')
axes[0, 1].set_title('Spectral Rolloff (Hz)', fontsize=11, fontweight='bold')
axes[0, 1].set_xlabel('Time (s)')
axes[0, 1].set_ylabel('Frequency (Hz)')
axes[0, 1].grid(True, alpha=0.3)
axes[0, 1].fill_between(time_axis, spectral_rolloff, alpha=0.3, color='coral')

# Plot 3: Spectral Bandwidth
axes[0, 2].plot(time_axis, spectral_bandwidth, linewidth=1.5, color='lightgreen')
axes[0, 2].set_title('Spectral Bandwidth (Hz)', fontsize=11, fontweight='bold')
axes[0, 2].set_xlabel('Time (s)')
axes[0, 2].set_ylabel('Bandwidth (Hz)')
axes[0, 2].grid(True, alpha=0.3)
axes[0, 2].fill_between(time_axis, spectral_bandwidth, alpha=0.3, color='lightgreen')

# Plot 4: Zero-Crossing Rate
axes[1, 0].plot(time_axis, zcr, linewidth=1.5, color='mediumpurple')
axes[1, 0].set_title('Zero-Crossing Rate', fontsize=11, fontweight='bold')
axes[1, 0].set_xlabel('Time (s)')
axes[1, 0].set_ylabel('ZCR')
axes[1, 0].grid(True, alpha=0.3)
axes[1, 0].fill_between(time_axis, zcr, alpha=0.3, color='mediumpurple')

# Plot 5: RMS Energy
axes[1, 1].plot(time_axis, rms_energy, linewidth=1.5, color='goldenrod')
axes[1, 1].set_title('RMS Energy', fontsize=11, fontweight='bold')
axes[1, 1].set_xlabel('Time (s)')
axes[1, 1].set_ylabel('Energy')
axes[1, 1].grid(True, alpha=0.3)
axes[1, 1].fill_between(time_axis, rms_energy, alpha=0.3, color='goldenrod')

# Plot 6: Chroma features
im = axes[1, 2].imshow(chroma, aspect='auto', origin='lower', cmap='viridis', interpolation='nearest')
axes[1, 2].set_title('Chromagram (12 Pitch Classes)', fontsize=11, fontweight='bold')
axes[1, 2].set_xlabel('Time Frame')
axes[1, 2].set_ylabel('Pitch Class')
axes[1, 2].set_yticks(range(12))
axes[1, 2].set_yticklabels(['C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B'])
plt.colorbar(im, ax=axes[1, 2])

plt.tight_layout()
plt.show()

print("üìä Spectral features visualization complete!")

## Section 5: Prosodic Feature Extraction (19+ Features)

**What are Prosodic Features?**
Prosody is the "music" of speech - pitch, timing, and emphasis patterns. These features are **especially important for ASD/ADHD detection** because:

- **Fundamental Frequency (F0):** The pitch of the voice
- **Formants:** Resonances of the vocal tract (F1, F2, F3)
- **Jitter:** Pitch instability (sign of voice disorders, **AUTISM MARKER**)
- **Shimmer:** Amplitude instability (sign of vocal fatigue, **AUTISM MARKER**)
- **Harmonic-to-Noise Ratio (HNR):** Voice quality measure

**Our configuration:**
- **F0 statistics:** 8 features (mean, std, min, max, median, range, coefficient of variation, voiced rate)
- **Formants:** 6 features (F1, F2, F3 mean magnitudes)
- **Jitter:** 1 feature (pitch perturbation) - **AUTISM MARKER**
- **Shimmer:** 1 feature (amplitude perturbation) - **AUTISM MARKER**
- **HNR:** 1 feature (harmonic-to-noise ratio)
- **Voice Quality:** 2 features (voice activity rate, voice breaks)
- **Total:** 19 features

**Why Prosodic for ASD/ADHD:**
- **Autism:** Associated with increased jitter/shimmer, monotone pitch, reduced F0 variation
- **ADHD:** Associated with faster speech rate, irregular timing, energy variability
- **Healthy:** Typical pitch variation (150-250 Hz for males), stable voice quality

In [None]:
# Extract prosodic features
prosodic_extractor = ProsodicExtractor(config)

print("üéµ EXTRACTING PROSODIC FEATURES...")

# Extract F0 (fundamental frequency)
f0_result = prosodic_extractor.extract_f0_librosa(audio_processed, sr)
if f0_result is not None:
    f0 = f0_result
    print(f"   F0 extraction successful: {f0.shape}")
else:
    # Create placeholder F0 if extraction fails
    f0 = np.linspace(100, 150, len(audio_processed) // 512)
    print(f"   Using synthetic F0: {f0.shape}")

# Extract complete prosodic features
try:
    prosodic_features = prosodic_extractor.extract(audio_processed, sr)
    prosodic_names = prosodic_extractor.get_feature_names()
    print(f"\n‚úÖ Prosodic extraction complete!")
    print(f"   Total prosodic features: {len(prosodic_features)}")
    print(f"   Features shape: {prosodic_features.shape}")
    print(f"   Feature vector (all values):\n   {prosodic_features}")
    print(f"\n   All feature names:")
    for i, name in enumerate(prosodic_names):
        print(f"     [{i:2d}] {name:30s} = {prosodic_features[i]:8.4f}")
except Exception as e:
    print(f"‚ö†Ô∏è  Error in prosodic extraction: {e}")
    # Create synthetic prosodic features for demonstration
    prosodic_features = np.random.randn(19) * 10 + 50
    prosodic_names = [f'Prosodic_feature_{i}' for i in range(19)]
    print(f"   Using synthetic prosodic features for demonstration")

In [None]:
# Visualize prosodic features
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Prosodic Features Analysis', fontsize=14, fontweight='bold')

# Plot 1: F0 contour over time
time_f0 = np.linspace(0, config.audio.DURATION, len(f0))
axes[0, 0].plot(time_f0, f0, linewidth=2, color='steelblue', marker='o', markersize=3)
axes[0, 0].fill_between(time_f0, f0, alpha=0.3, color='steelblue')
axes[0, 0].set_title('F0 Contour (Fundamental Frequency Over Time)', fontsize=11, fontweight='bold')
axes[0, 0].set_xlabel('Time (s)')
axes[0, 0].set_ylabel('Frequency (Hz)')
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].set_ylim([50, 300])

# Plot 2: Prosodic feature breakdown
feature_groups = {
    'F0 Stats': prosodic_features[0:8] if len(prosodic_features) >= 8 else prosodic_features[0:3],
    'Formants': prosodic_features[8:11] if len(prosodic_features) >= 11 else [prosodic_features[3]],
    'Voice Quality': prosodic_features[11:19] if len(prosodic_features) >= 19 else prosodic_features[4:]
}

colors_dict = {'F0 Stats': 'steelblue', 'Formants': 'coral', 'Voice Quality': 'lightgreen'}
positions = []
labels = []
values = []
colors = []
pos = 0

for group_name, group_values in feature_groups.items():
    for val in group_values:
        positions.append(pos)
        labels.append(group_name)
        values.append(val)
        colors.append(colors_dict[group_name])
        pos += 1

axes[0, 1].bar(positions, values, color=colors, alpha=0.7, edgecolor='black', linewidth=1)
axes[0, 1].set_title('Prosodic Features Values', fontsize=11, fontweight='bold')
axes[0, 1].set_ylabel('Value')
axes[0, 1].grid(True, alpha=0.3, axis='y')
axes[0, 1].set_xticks(range(len(values)))
axes[0, 1].set_xticklabels(range(len(values)), fontsize=8)

# Plot 3: Feature statistics
feature_stats = {
    'Mean': np.mean(prosodic_features),
    'Std': np.std(prosodic_features),
    'Min': np.min(prosodic_features),
    'Max': np.max(prosodic_features),
    'Median': np.median(prosodic_features)
}

axes[1, 0].bar(feature_stats.keys(), feature_stats.values(), color='mediumpurple', alpha=0.7, edgecolor='black')
axes[1, 0].set_title('Prosodic Features Statistics', fontsize=11, fontweight='bold')
axes[1, 0].set_ylabel('Value')
axes[1, 0].grid(True, alpha=0.3, axis='y')

# Plot 4: Feature importance indicators
axes[1, 1].text(0.1, 0.9, 'Key ASD/ADHD Markers:', fontsize=12, fontweight='bold', transform=axes[1, 1].transAxes)
markers_text = """
üî¥ Jitter: Pitch instability
   (‚Üë in Autism Spectrum Disorders)

üî¥ Shimmer: Amplitude instability  
   (‚Üë in Autism Spectrum Disorders)

üü° F0 Variation: Pitch range
   (‚Üì monotone in Autism)

üü° Speech Rate: From timing
   (‚Üë irregular in ADHD)

üü¢ Voice Quality: HNR
   (‚Üì noisy in voice disorders)
"""
axes[1, 1].text(0.1, 0.45, markers_text, fontsize=10, family='monospace', transform=axes[1, 1].transAxes,
                verticalalignment='top', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
axes[1, 1].axis('off')

plt.tight_layout()
plt.show()

print("üìä Prosodic features visualization complete!")

## Section 6: Aggregate All Features (106 Total)

Now we combine all three feature types into a single **106-dimensional feature vector**:

| Feature Type | Count | Description |
|---|---|---|
| **MFCC** | 52 | Speech spectral characteristics (base + velocity + acceleration) |
| **Spectral** | 24 | Frequency content & energy distribution |
| **Prosodic** | 19+ | Pitch, voice quality, and timing patterns |
| **TOTAL** | **106** | Complete acoustic profile |

This 106-dimensional vector is the **input to our MLP neural network classifier**.

**Key insights:**
- Each feature type captures different aspects of voice
- Together they provide comprehensive acoustic profile
- The MLP learns which combinations are diagnostic for ASD/ADHD
- You can disable specific feature types to test their importance

In [None]:
# Combine all features into single 106-dimensional vector
print("üéµ AGGREGATING ALL FEATURES...\n")

# Create the complete feature vector
complete_features = np.concatenate([mfcc_features, spectral_features, prosodic_features])
complete_names = mfcc_names + spectral_names + prosodic_names

print(f"   MFCC features:      {len(mfcc_features):3d} dimensions")
print(f"   Spectral features:  {len(spectral_features):3d} dimensions")
print(f"   Prosodic features:  {len(prosodic_features):3d} dimensions")
print(f"   " + "=" * 40)
print(f"   TOTAL FEATURES:     {len(complete_features):3d} dimensions")

print(f"\n‚úÖ Feature aggregation complete!")
print(f"   Complete feature vector shape: {complete_features.shape}")
print(f"   Total feature names: {len(complete_names)}")
print(f"\n   Feature vector (complete):")
for i in range(0, len(complete_features), 10):
    end_idx = min(i + 10, len(complete_features))
    print(f"   [{i:3d}-{end_idx-1:3d}]: {complete_features[i:end_idx]}")

# Statistics on the complete feature vector
print(f"\nüìä Feature Vector Statistics:")
print(f"   Mean:   {np.mean(complete_features):8.4f}")
print(f"   Std:    {np.std(complete_features):8.4f}")
print(f"   Min:    {np.min(complete_features):8.4f}")
print(f"   Max:    {np.max(complete_features):8.4f}")
print(f"   Median: {np.median(complete_features):8.4f}")
print(f"   Q25:    {np.percentile(complete_features, 25):8.4f}")
print(f"   Q75:    {np.percentile(complete_features, 75):8.4f}")

In [None]:
# Visualize complete feature vector
fig = plt.figure(figsize=(16, 10))
gs = GridSpec(3, 2, figure=fig)

fig.suptitle('Complete 106-Dimensional Feature Vector', fontsize=14, fontweight='bold')

# Plot 1: All features as bar chart
ax1 = fig.add_subplot(gs[0, :])
colors_feat = (['steelblue'] * len(mfcc_features) + 
               ['coral'] * len(spectral_features) + 
               ['lightgreen'] * len(prosodic_features))
bars = ax1.bar(range(len(complete_features)), complete_features, color=colors_feat, alpha=0.7, edgecolor='black', linewidth=0.5)
ax1.set_title('All 106 Features (MFCC | Spectral | Prosodic)', fontsize=12, fontweight='bold')
ax1.set_xlabel('Feature Index')
ax1.set_ylabel('Feature Value')
ax1.grid(True, alpha=0.3, axis='y')
ax1.axvline(x=len(mfcc_features), color='red', linestyle='--', linewidth=2, alpha=0.5, label='MFCC|Spectral')
ax1.axvline(x=len(mfcc_features) + len(spectral_features), color='purple', linestyle='--', linewidth=2, alpha=0.5, label='Spectral|Prosodic')
ax1.legend()

# Plot 2: Feature distribution
ax2 = fig.add_subplot(gs[1, 0])
ax2.hist(complete_features, bins=30, alpha=0.7, color='steelblue', edgecolor='black')
ax2.axvline(x=np.mean(complete_features), color='red', linestyle='--', linewidth=2, label=f'Mean: {np.mean(complete_features):.2f}')
ax2.set_title('Feature Distribution', fontsize=12, fontweight='bold')
ax2.set_xlabel('Feature Value')
ax2.set_ylabel('Frequency')
ax2.legend()
ax2.grid(True, alpha=0.3, axis='y')

# Plot 3: Feature type breakdown
ax3 = fig.add_subplot(gs[1, 1])
feature_types = ['MFCC\n(52)', 'Spectral\n(24)', 'Prosodic\n(19)']
feature_counts = [len(mfcc_features), len(spectral_features), len(prosodic_features)]
colors_pie = ['steelblue', 'coral', 'lightgreen']
wedges, texts, autotexts = ax3.pie(feature_counts, labels=feature_types, autopct='%1.1f%%', 
                                     colors=colors_pie, startangle=90, textprops={'fontsize': 11, 'weight': 'bold'})
ax3.set_title('Feature Type Distribution', fontsize=12, fontweight='bold')

# Plot 4: Cumulative contribution
ax4 = fig.add_subplot(gs[2, :])
sorted_features = np.sort(np.abs(complete_features))[::-1]
cumsum = np.cumsum(sorted_features) / np.sum(np.abs(complete_features))
ax4.plot(range(len(cumsum)), cumsum, linewidth=2, color='steelblue', marker='o', markersize=4)
ax4.axhline(y=0.8, color='red', linestyle='--', linewidth=2, alpha=0.5, label='80% threshold')
ax4.axhline(y=0.95, color='orange', linestyle='--', linewidth=2, alpha=0.5, label='95% threshold')
ax4.fill_between(range(len(cumsum)), cumsum, alpha=0.3, color='steelblue')
ax4.set_title('Cumulative Feature Contribution (Sorted by Absolute Value)', fontsize=12, fontweight='bold')
ax4.set_xlabel('Feature Index (sorted)')
ax4.set_ylabel('Cumulative Contribution')
ax4.set_ylim([0, 1.05])
ax4.grid(True, alpha=0.3)
ax4.legend()

plt.tight_layout()
plt.show()

print("üìä Feature vector visualization complete!")

## Section 7: Feature Importance & Correlation

Now let's understand which features are most variable and how they correlate. This helps identify:
- Which features have meaningful variation
- Which features are redundant (highly correlated)
- Which features might be most diagnostic

**Interpretation Guide:**
- **High variance features:** More discriminative potential
- **Correlated features:** May be redundant (could use dimensionality reduction)
- **Low correlation features:** Capture different aspects of voice

In [None]:
# Create a feature importance DataFrame for analysis
feature_df = pd.DataFrame({
    'Feature': complete_names,
    'Value': complete_features,
    'Abs_Value': np.abs(complete_features),
    'Type': (['MFCC'] * len(mfcc_features) + 
             ['Spectral'] * len(spectral_features) + 
             ['Prosodic'] * len(prosodic_features))
})

# Sort by absolute value
feature_df_sorted = feature_df.sort_values('Abs_Value', ascending=False)

print("üìä TOP 20 MOST IMPORTANT FEATURES (by magnitude):\n")
print(feature_df_sorted.head(20).to_string(index=False))

print("\n\nüìä FEATURE STATISTICS BY TYPE:\n")
print(feature_df.groupby('Type')['Value'].describe())

# Feature variance by type
print("\nüìä FEATURE VARIANCE BY TYPE:\n")
for ftype in ['MFCC', 'Spectral', 'Prosodic']:
    vals = feature_df[feature_df['Type'] == ftype]['Value'].values
    print(f"{ftype:10s}: Mean={np.mean(vals):8.4f}, Std={np.std(vals):8.4f}, Min={np.min(vals):8.4f}, Max={np.max(vals):8.4f}")

In [None]:
# Visualize feature importance and statistics
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Feature Importance & Statistics Analysis', fontsize=14, fontweight='bold')

# Plot 1: Top 20 features
ax = axes[0, 0]
top_20 = feature_df_sorted.head(20)
colors_top = [{'MFCC': 'steelblue', 'Spectral': 'coral', 'Prosodic': 'lightgreen'}[t] for t in top_20['Type']]
ax.barh(range(len(top_20)), top_20['Abs_Value'].values, color=colors_top, alpha=0.7, edgecolor='black')
ax.set_yticks(range(len(top_20)))
ax.set_yticklabels(top_20['Feature'].values, fontsize=9)
ax.set_xlabel('Absolute Value')
ax.set_title('Top 20 Most Important Features', fontsize=12, fontweight='bold')
ax.invert_yaxis()
ax.grid(True, alpha=0.3, axis='x')

# Plot 2: Feature variance by type
ax = axes[0, 1]
types = ['MFCC', 'Spectral', 'Prosodic']
variances = [feature_df[feature_df['Type'] == t]['Value'].std() for t in types]
means = [feature_df[feature_df['Type'] == t]['Value'].mean() for t in types]
x_pos = np.arange(len(types))
ax.bar(x_pos, variances, color=['steelblue', 'coral', 'lightgreen'], alpha=0.7, edgecolor='black', linewidth=2)
ax.set_xticks(x_pos)
ax.set_xticklabels(types)
ax.set_ylabel('Standard Deviation')
ax.set_title('Feature Variance by Type', fontsize=12, fontweight='bold')
ax.grid(True, alpha=0.3, axis='y')

# Add value labels on bars
for i, (var, mean) in enumerate(zip(variances, means)):
    ax.text(i, var, f'{var:.2f}\n(Œº={mean:.2f})', ha='center', va='bottom', fontweight='bold', fontsize=10)

# Plot 3: Box plot by feature type
ax = axes[1, 0]
data_by_type = [feature_df[feature_df['Type'] == t]['Value'].values for t in types]
bp = ax.boxplot(data_by_type, labels=types, patch_artist=True)
for patch, color in zip(bp['boxes'], ['steelblue', 'coral', 'lightgreen']):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)
ax.set_ylabel('Feature Value')
ax.set_title('Feature Value Distribution by Type', fontsize=12, fontweight='bold')
ax.grid(True, alpha=0.3, axis='y')

# Plot 4: Feature count and summary stats
ax = axes[1, 1]
summary_text = f"""
üìä FEATURE EXTRACTION SUMMARY

Total Features: {len(complete_features)}

Feature Breakdown:
  ‚Ä¢ MFCC (52):      {np.mean(np.abs(mfcc_features)):.4f} avg magnitude
  ‚Ä¢ Spectral (24):  {np.mean(np.abs(spectral_features)):.4f} avg magnitude  
  ‚Ä¢ Prosodic (19):  {np.mean(np.abs(prosodic_features)):.4f} avg magnitude

Statistics:
  ‚Ä¢ Mean: {np.mean(complete_features):.4f}
  ‚Ä¢ Std:  {np.std(complete_features):.4f}
  ‚Ä¢ Min:  {np.min(complete_features):.4f}
  ‚Ä¢ Max:  {np.max(complete_features):.4f}

Key Insights:
  ‚úì Features are in different scales
  ‚úì Will need normalization before MLP
  ‚úì Top features: {', '.join(feature_df_sorted.head(3)['Feature'].values)}

Next Step:
  Normalize features for neural network training
"""
ax.text(0.05, 0.95, summary_text, transform=ax.transAxes, fontsize=10, family='monospace',
        verticalalignment='top', bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.8))
ax.axis('off')

plt.tight_layout()
plt.show()

print("‚úÖ Feature importance analysis complete!")

## Section 8: Feature Refinement & Customization

**You can refine the feature extraction by:**

1. **Adjusting MFCC Parameters** in `config.audio`:
   - `N_MFCC`: Number of MFCC coefficients (default: 13)
   - `N_MEL`: Number of mel bands (default: 128)
   - `FMIN`, `FMAX`: Frequency range (default: 80-7600 Hz)

2. **Adjusting Spectral Parameters** in `config.spectral`:
   - Include/exclude specific spectral features
   - Adjust statistics (mean, std, min, max)
   - Modify chroma extraction

3. **Adjusting Prosodic Parameters** in `config.prosodic`:
   - Change F0 extraction method (librosa vs Parselmouth)
   - Adjust jitter/shimmer window sizes
   - Enable/disable specific prosodic markers

4. **Feature Selection**:
   - Use only MFCC (52 features) - simpler model
   - Use MFCC + Spectral (76 features) - balanced
   - Use all 106 features - comprehensive (current)
   - Apply PCA to reduce to 80 features

**Recommendation for your refinement:**
Start with all 106 features to get baseline results. If you see overfitting, gradually:
1. Reduce MFCC coefficients (13 ‚Üí 10)
2. Remove highly correlated features
3. Apply PCA dimensionality reduction
4. Focus on ASD-specific markers (jitter, shimmer, F0 variation)

In [None]:
# Create a feature refinement guide and function
def extract_features_with_options(audio, sr, use_mfcc=True, use_spectral=True, use_prosodic=True):
    """
    Extract features with selective feature types.
    
    Parameters:
    -----------
    audio : array
        Audio signal
    sr : int
        Sample rate
    use_mfcc : bool
        Include MFCC features
    use_spectral : bool
        Include spectral features
    use_prosodic : bool
        Include prosodic features
    
    Returns:
    --------
    features : array
        Concatenated feature vector
    feature_names : list
        Feature names
    feature_counts : dict
        Count of each feature type used
    """
    
    features_list = []
    names_list = []
    counts = {'MFCC': 0, 'Spectral': 0, 'Prosodic': 0}
    
    # MFCC features
    if use_mfcc:
        mfcc_ext = MFCCExtractor(config)
        mfcc_feat = mfcc_ext.extract(audio, sr)
        features_list.append(mfcc_feat)
        names_list.extend(mfcc_ext.get_feature_names())
        counts['MFCC'] = len(mfcc_feat)
    
    # Spectral features
    if use_spectral:
        spec_ext = SpectralExtractor(config)
        spec_feat = spec_ext.extract(audio, sr)
        features_list.append(spec_feat)
        names_list.extend(spec_ext.get_feature_names())
        counts['Spectral'] = len(spec_feat)
    
    # Prosodic features
    if use_prosodic:
        prost_ext = ProsodicExtractor(config)
        try:
            prost_feat = prost_ext.extract(audio, sr)
            features_list.append(prost_feat)
            names_list.extend(prost_ext.get_feature_names())
            counts['Prosodic'] = len(prost_feat)
        except:
            print("‚ö†Ô∏è  Prosodic extraction failed, skipping")
    
    return np.concatenate(features_list), names_list, counts

# Test different feature combinations
print("üß™ TESTING FEATURE COMBINATIONS:\n")

combinations = [
    (True, False, False, "MFCC only"),
    (True, True, False, "MFCC + Spectral"),
    (True, False, True, "MFCC + Prosodic"),
    (False, True, False, "Spectral only"),
    (True, True, True, "All features (default)")
]

results = []
for mfcc, spec, prost, label in combinations:
    try:
        feat, names, counts = extract_features_with_options(audio_processed, sr, mfcc, spec, prost)
        total = len(feat)
        results.append({
            'Configuration': label,
            'Total Features': total,
            'MFCC': counts['MFCC'],
            'Spectral': counts['Spectral'],
            'Prosodic': counts['Prosodic']
        })
        print(f"‚úì {label:30s} ‚Üí {total:3d} features")
    except Exception as e:
        print(f"‚úó {label:30s} ‚Üí Error: {str(e)[:50]}")

results_df = pd.DataFrame(results)
print("\n" + "="*70)
print(results_df.to_string(index=False))
print("="*70)

## Section 9: Summary & Next Steps

### What You Learned in This Notebook:

‚úÖ **Audio Preprocessing**: Standardize audio to 5 seconds, 16 kHz
‚úÖ **MFCC Features (52)**: Speech spectrum with velocity & acceleration
‚úÖ **Spectral Features (24)**: Frequency content & energy distribution  
‚úÖ **Prosodic Features (19)**: Pitch, formants, voice quality
‚úÖ **Feature Aggregation (106)**: Combining all features
‚úÖ **Feature Analysis**: Understanding variance and importance
‚úÖ **Feature Refinement**: Customizing for your needs

### Key ASD/ADHD Markers:
- **Jitter** (pitch instability) - ‚Üë in Autism
- **Shimmer** (amplitude instability) - ‚Üë in Autism
- **F0 Variation** (pitch range) - ‚Üì in Autism (monotone)
- **Energy Patterns** - Irregular in ADHD
- **Speech Rate** - Variations in ADHD

### What's Next:

**Phase 2: Data Preparation**
- Load or generate training dataset
- Create labeled samples (ASD, ADHD, Healthy)
- Handle class imbalance

**Phase 3: Model Training**
- Build MLP classifier (106 ‚Üí 128 ‚Üí 64 ‚Üí 32 ‚Üí 3)
- Train with K-fold cross-validation
- Monitor loss and accuracy

**Phase 4: Model Evaluation**
- Compute metrics (accuracy, precision, recall, F1)
- Generate confusion matrix and ROC curves
- Analyze misclassified samples

**Phase 5: Refinement**
- Test feature combinations
- Adjust model architecture
- Perform hyperparameter tuning

### How to Use This Notebook:
1. **Understanding Mode**: Run all cells to see feature extraction
2. **Customization Mode**: Modify feature extractors and test
3. **Reference Mode**: Use functions for your own audio files
4. **Documentation**: Read markdown cells for theoretical background

## Appendix: Reusing existing repository scripts and data

This section shows how we can reuse the helper scripts and data already present in the repository root (`f:/AIML`) to speed up processing and to connect real datasets.

We'll attempt to:
- Detect and import high-value scripts: `mfcc_extract.py`, `extract_audio.py`, `ser_preprocessing.py`, `spectrogram_conversion.py`, `extractBERT.py`, `predictor.py`, `model.py`.
- Demonstrate examples using functions from those scripts (if available).
- Fall back to our in-notebook implementations when external scripts are missing or incompatible.

Run the next cell to auto-detect available helpers and print a short usage summary.

In [None]:
# Auto-detect helper scripts in repository root and summarize their quick usage
import importlib.util
import inspect

root_dir = r'f:/AIML'
helper_files = [
    'mfcc_extract.py',
    'extract_audio.py',
    'ser_preprocessing.py',
    'spectrogram_conversion.py',
    'extractBERT.py',
    'predictor.py',
    'model.py'
]

available_helpers = {}

for hf in helper_files:
    path = os.path.join(root_dir, hf)
    if os.path.exists(path):
        available_helpers[hf] = path

print(f"Detected {len(available_helpers)} helper scripts in {root_dir}:")
for k, v in available_helpers.items():
    print(f" - {k}: {v}")

# Try to import and list top-level functions/classes for each helper
helper_summaries = {}
for name, path in available_helpers.items():
    spec = importlib.util.spec_from_file_location(name.replace('.py',''), path)
    try:
        module = importlib.util.module_from_spec(spec)
        spec.loader.exec_module(module)
        members = inspect.getmembers(module, predicate=inspect.isfunction)
        classes = inspect.getmembers(module, predicate=inspect.isclass)
        helper_summaries[name] = {
            'functions': [m[0] for m in members],
            'classes': [c[0] for c in classes]
        }
    except Exception as e:
        helper_summaries[name] = {'error': str(e)}

print('\nHelper script summaries:')
for name, summary in helper_summaries.items():
    print(f"\n{name}:")
    if 'error' in summary:
        print(f"  ‚ö†Ô∏è  Import error: {summary['error']}")
    else:
        print(f"  Functions: {summary['functions']}")
        print(f"  Classes:   {summary['classes']}")

# Provide recommended next actions for each detected helper
print('\nRecommended quick actions:')
for name in helper_summaries.keys():
    if name == 'mfcc_extract.py':
        print(' - Use mfcc_extract.extract_mfcc_features(filepath) to extract raw MFCC arrays and compare with our MFCCExtractor outputs.')
    if name == 'extract_audio.py':
        print(' - Use extract_audio.extract_audio_features to run openSMILE configs (if opensmile is available on your system).')
    if name == 'ser_preprocessing.py':
        print(' - Use ser_preprocessing.extract_features and ser_preprocessing.load_data() for ready-made pipelines; can replace synthetic dataset generation when real files are available in `data/` folder.')
    if name == 'predictor.py' or name == 'model.py':
        print(' - Inspect model/predictor for saved models and prediction helpers; can use for inference demonstration.')
    if name == 'extractBERT.py':
        print(' - Use extractBERT.extract_text_features(text) to generate textual embeddings for multimodal experiments.')
    if name == 'spectrogram_conversion.py':
        print(' - Use spectrogram_conversion code for converting long audio files to spectrogram images if you want image-based models.')

print('\nIf you want, I can automatically wire these helpers into the notebooks (example cells that call them).')
