# Phase 1: EDF Audio Extraction and Baseline Establishment

## Research Goal
Extract clean audio from EDF files and establish baseline performance for denoising evaluation pipeline.

## This Notebook:
1. **EDF to WAV Conversion**: Extract audio from clinical EDF files
2. **Baseline Model Testing**: Validate trained model on extracted audio
3. **Clean Audio Performance**: Establish clean audio baseline metrics
4. **Test Set Preparation**: Create standardized test set for noise experiments

## Output:
- WAV files organized by patient for denoising pipeline
- Baseline performance metrics (F1, sensitivity, specificity)
- Feature extraction validation
- Clean audio test set ready for Phase 2 noise injection

---

In [1]:
# Cell 1: Imports and Configuration
print("=== Phase 1: EDF Audio Extraction and Baseline Establishment ===")

import os
import time
import numpy as np
import pandas as pd
import librosa
import soundfile as sf
import mne
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Model loading for baseline testing
import joblib
from sklearn.metrics import classification_report, confusion_matrix, f1_score

# Add custom XML parser
import sys
sys.path.append('../src')

try:
    from working_with_xml import extract_apnea_events
    print("✅ XML parser imported successfully")
except Exception as e:
    print(f"❌ XML parser import failed: {e}")

# Configuration
PATIENT_DATA_DIR = "F:/Solo All In One Docs/Scidb Sleep Data"
AUDIO_OUTPUT_DIR = "F:/Solo All In One Docs/Scidb Sleep Data/processed"  # Where WAV files will be saved
MODEL_PATH = "../models/sleep_apnea_model.pkl"  # Trained model path

# Test patients (subset for Phase 1)
TEST_PATIENTS = ['patient_01', 'patient_02', 'patient_03','patient_04','patient_05','patient_06','patient_07','patient_08',
                 'patient_09','patient_10'] 

# Audio processing settings (same as training)
AUDIO_CHANNEL = 'Mic'
TARGET_SAMPLE_RATE = 16000  # 16kHz as used in training
FRAME_DURATION = 30.0  # 30-second windows
OVERLAP_RATIO = 0.5
APNEA_THRESHOLD = 0.1

print(f"\n📁 Patient data directory: {PATIENT_DATA_DIR}")
print(f"🎵 Audio output directory: {AUDIO_OUTPUT_DIR}")
print(f"🤖 Model path: {MODEL_PATH}")
print(f"👥 Test patients: {TEST_PATIENTS}")
print(f"🎯 Audio settings: {TARGET_SAMPLE_RATE}Hz, {FRAME_DURATION}s frames, {OVERLAP_RATIO} overlap")

# Create output directories
os.makedirs(AUDIO_OUTPUT_DIR, exist_ok=True)
print(f"✅ Directories created/verified")

=== Phase 1: EDF Audio Extraction and Baseline Establishment ===
✅ XML parser imported successfully

📁 Patient data directory: F:/Solo All In One Docs/Scidb Sleep Data
🎵 Audio output directory: F:/Solo All In One Docs/Scidb Sleep Data/processed
🤖 Model path: ../models/sleep_apnea_model.pkl
👥 Test patients: ['patient_01', 'patient_02', 'patient_03', 'patient_04', 'patient_05', 'patient_06', 'patient_07', 'patient_08', 'patient_09', 'patient_10']
🎯 Audio settings: 16000Hz, 30.0s frames, 0.5 overlap
✅ Directories created/verified


In [2]:
# Cell 2: Feature Extraction Functions (Same as Training Pipeline)
def extract_comprehensive_features(audio_frame, sample_rate):
    """Extract the same 27 features used in training pipeline"""
    try:
        if len(audio_frame) == 0:
            return None
            
        # Basic acoustic features
        rms = float(librosa.feature.rms(y=audio_frame).mean())
        zcr = float(librosa.feature.zero_crossing_rate(y=audio_frame).mean())
        centroid = float(librosa.feature.spectral_centroid(y=audio_frame, sr=sample_rate).mean())
        bandwidth = float(librosa.feature.spectral_bandwidth(y=audio_frame, sr=sample_rate).mean())
        rolloff = float(librosa.feature.spectral_rolloff(y=audio_frame, sr=sample_rate).mean())
        
        # MFCCs (first 8 coefficients)
        mfccs = librosa.feature.mfcc(y=audio_frame, sr=sample_rate, n_mfcc=8)
        mfcc_means = mfccs.mean(axis=1)
        mfcc_stds = mfccs.std(axis=1)
        
        # Temporal features for breathing patterns (5-second windows)
        window_size = int(5 * sample_rate)  # 5 seconds
        num_windows = len(audio_frame) // window_size
        
        if num_windows >= 2:
            rms_windows = []
            zcr_windows = []
            
            for i in range(num_windows):
                start_idx = i * window_size
                end_idx = start_idx + window_size
                window = audio_frame[start_idx:end_idx]
                
                rms_windows.append(librosa.feature.rms(y=window).mean())
                zcr_windows.append(librosa.feature.zero_crossing_rate(y=window).mean())
            
            rms_variability = float(np.std(rms_windows))
            zcr_variability = float(np.std(zcr_windows))
            breathing_regularity = float(1.0 / (1.0 + rms_variability))  # Higher = more regular
        else:
            rms_variability = 0.0
            zcr_variability = 0.0
            breathing_regularity = 0.5
        
        # Silence detection
        silence_threshold = np.percentile(np.abs(audio_frame), 20)  # Bottom 20% as silence
        silence_mask = np.abs(audio_frame) < silence_threshold
        silence_ratio = float(np.mean(silence_mask))
        
        # Breathing pause detection (continuous silence periods)
        silence_changes = np.diff(silence_mask.astype(int))
        pause_starts = np.where(silence_changes == 1)[0]
        pause_ends = np.where(silence_changes == -1)[0]
        
        if len(pause_starts) > 0 and len(pause_ends) > 0:
            if len(pause_ends) < len(pause_starts):
                pause_ends = np.append(pause_ends, len(audio_frame))
            pause_durations = (pause_ends[:len(pause_starts)] - pause_starts) / sample_rate
            avg_pause_duration = float(np.mean(pause_durations))
            max_pause_duration = float(np.max(pause_durations))
        else:
            avg_pause_duration = 0.0
            max_pause_duration = 0.0
        
        # Combine all features (same structure as training)
        features = {
            'clean_rms': rms,
            'clean_zcr': zcr,
            'clean_centroid': centroid,
            'clean_bandwidth': bandwidth,
            'clean_rolloff': rolloff,
            'clean_rms_variability': rms_variability,
            'clean_zcr_variability': zcr_variability,
            'clean_breathing_regularity': breathing_regularity,
            'clean_silence_ratio': silence_ratio,
            'clean_avg_pause_duration': avg_pause_duration,
            'clean_max_pause_duration': max_pause_duration
        }
        
        # Add MFCCs
        for i, (mean_val, std_val) in enumerate(zip(mfcc_means, mfcc_stds), 1):
            features[f'clean_mfcc_{i}_mean'] = float(mean_val)
            features[f'clean_mfcc_{i}_std'] = float(std_val)
        
        return features
        
    except Exception as e:
        print(f"   ⚠️  Feature extraction error: {e}")
        return None

def get_apnea_label(timestamp, duration, apnea_events, threshold=0.1):
    """Calculate apnea label based on overlap with events (same as training)"""
    try:
        frame_end = timestamp + duration
        apnea_seconds = 0
        
        for start, end in apnea_events:
            overlap_start = max(timestamp, start)
            overlap_end = min(frame_end, end)
            if overlap_start < overlap_end:
                apnea_seconds += (overlap_end - overlap_start)
        
        proportion = apnea_seconds / duration
        label = 1 if proportion > threshold else 0
        return label, proportion
    except:
        return 0, 0.0

print("✅ Feature extraction functions defined (identical to training pipeline)")

✅ Feature extraction functions defined (identical to training pipeline)


In [3]:
# Cell 3: EDF to WAV Conversion Function
def extract_audio_from_patient(patient_id, data_dir, output_dir):
    """Extract audio from all EDF files for a patient and save as WAV files"""
    
    print(f"🎵 Starting audio extraction for {patient_id}...")
    
    try:
        # Check patient directory
        patient_dir = os.path.join(data_dir, patient_id)
        if not os.path.exists(patient_dir):
            print(f"❌ {patient_id}: Directory not found: {patient_dir}")
            return None
        
        # Find EDF and RML files
        edf_files = sorted([f for f in os.listdir(patient_dir) if f.endswith('.edf')])
        rml_files = [f for f in os.listdir(patient_dir) if f.endswith('.rml')]
        
        if not edf_files or not rml_files:
            print(f"❌ {patient_id}: Missing EDF ({len(edf_files)}) or RML ({len(rml_files)}) files")
            return None
        
        print(f"   📁 {patient_id}: Found {len(edf_files)} EDF and {len(rml_files)} RML files")
        
        # Load apnea events
        rml_path = os.path.join(patient_dir, rml_files[0])
        apnea_data = extract_apnea_events(rml_path, output_csv=None)
        apnea_events = [(float(start), float(end)) for event_type, start, end in apnea_data]
        print(f"   📋 {patient_id}: Found {len(apnea_events)} apnea events")
        
        # Create patient audio output directory
        patient_audio_dir = os.path.join(output_dir, f"{patient_id}_wav")
        os.makedirs(patient_audio_dir, exist_ok=True)
        
        # Process each EDF file
        extracted_data = []
        total_wav_files = 0
        
        for edf_idx, edf_file in enumerate(edf_files, 1):
            print(f"   🎧 {patient_id}: Processing EDF {edf_idx}/{len(edf_files)} - {edf_file}")
            
            try:
                edf_path = os.path.join(patient_dir, edf_file)
                raw = mne.io.read_raw_edf(edf_path, preload=False, verbose=False)
                
                if AUDIO_CHANNEL not in raw.ch_names:
                    print(f"   ⚠️  {patient_id}: No {AUDIO_CHANNEL} channel in {edf_file}, skipping")
                    continue
                
                raw.pick_channels([AUDIO_CHANNEL])
                original_sample_rate = int(raw.info['sfreq'])
                duration_minutes = raw.n_times / original_sample_rate / 60
                print(f"      ⏱️  Duration: {duration_minutes:.1f} min, SR: {original_sample_rate} Hz → {TARGET_SAMPLE_RATE} Hz")
                
                # Frame parameters
                original_frame_samples = int(FRAME_DURATION * original_sample_rate)
                original_step_samples = int(original_frame_samples * (1 - OVERLAP_RATIO))
                
                # Time offset for multi-EDF
                time_offset = (edf_idx - 1) * 60 * 60  # Each EDF = 1 hour
                
                # Extract and save WAV files
                edf_wav_count = 0
                
                for frame_start in range(0, raw.n_times - original_frame_samples + 1, original_step_samples):
                    frame_end = frame_start + original_frame_samples
                    timestamp = (frame_start / original_sample_rate) + time_offset
                    
                    # Load audio frame
                    try:
                        audio_frame, _ = raw[:, frame_start:frame_end]
                        audio_frame = audio_frame.flatten()
                        
                        # Downsample to target sample rate
                        if original_sample_rate != TARGET_SAMPLE_RATE:
                            audio_frame = librosa.resample(audio_frame, 
                                                         orig_sr=original_sample_rate, 
                                                         target_sr=TARGET_SAMPLE_RATE)
                        
                        # Create WAV filename with timestamp
                        edf_base = os.path.splitext(edf_file)[0]
                        wav_filename = f"{patient_id}_{edf_base}_frame_{int(timestamp):06d}.wav"
                        wav_path = os.path.join(patient_audio_dir, wav_filename)
                        
                        # Save as WAV file
                        sf.write(wav_path, audio_frame, TARGET_SAMPLE_RATE)
                        
                        # Get apnea label for metadata
                        apnea_label, apnea_proportion = get_apnea_label(timestamp, FRAME_DURATION, apnea_events, APNEA_THRESHOLD)
                        
                        # Store metadata
                        metadata = {
                            'patient_id': patient_id,
                            'edf_file': edf_file,
                            'wav_file': wav_filename,
                            'wav_path': wav_path,
                            'timestamp': float(timestamp),
                            'frame_duration': FRAME_DURATION,
                            'sample_rate': TARGET_SAMPLE_RATE,
                            'apnea_label': int(apnea_label),
                            'apnea_proportion': float(apnea_proportion)
                        }
                        
                        extracted_data.append(metadata)
                        edf_wav_count += 1
                        total_wav_files += 1
                        
                    except Exception as e:
                        print(f"      ⚠️  Frame extraction failed: {e}")
                        continue
                
                print(f"      ✅ {edf_file}: {edf_wav_count} WAV files created")
                del raw  # Free memory
                
            except Exception as e:
                print(f"   ❌ {patient_id}: Failed processing {edf_file}: {e}")
                continue
        
        print(f"✅ {patient_id}: {total_wav_files} total WAV files extracted")
        return extracted_data
        
    except Exception as e:
        print(f"❌ {patient_id}: Critical error: {e}")
        return None

print("✅ EDF to WAV conversion function defined")

✅ EDF to WAV conversion function defined


In [4]:
# Cell 4: Execute Audio Extraction
print("🎵 STARTING EDF TO WAV AUDIO EXTRACTION")
print(f"Time started: {time.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"{'='*60}")

start_time = time.time()
all_audio_metadata = []

# Extract audio for each test patient
for patient_id in TEST_PATIENTS:
    patient_metadata = extract_audio_from_patient(patient_id, PATIENT_DATA_DIR, AUDIO_OUTPUT_DIR)
    
    if patient_metadata:
        all_audio_metadata.extend(patient_metadata)
        print(f"🎯 Progress: {patient_id} completed - {len(patient_metadata)} WAV files")
    else:
        print(f"❌ {patient_id}: Audio extraction failed")
    
    print(f"{'-'*40}")

total_time = time.time() - start_time

# Summary
print(f"\n{'='*60}")
print(f"🏁 AUDIO EXTRACTION COMPLETE!")
print(f"⏱️  Total time: {total_time:.1f} seconds ({total_time/60:.1f} minutes)")
print(f"🎵 Total WAV files: {len(all_audio_metadata):,}")

if all_audio_metadata:
    # Convert to DataFrame and save metadata
    audio_df = pd.DataFrame(all_audio_metadata)
    metadata_path = os.path.join(AUDIO_OUTPUT_DIR, "audio_metadata.csv")
    audio_df.to_csv(metadata_path, index=False)
    
    # Statistics
    apnea_count = audio_df['apnea_label'].sum()
    apnea_rate = audio_df['apnea_label'].mean() * 100
    
    print(f"\n📋 AUDIO EXTRACTION STATISTICS:")
    print(f"📄 Metadata saved: {metadata_path}")
    print(f"📊 Total files: {len(audio_df):,}")
    print(f"👥 Patients: {audio_df['patient_id'].nunique()}")
    print(f"🚨 Apnea files: {apnea_count:,} ({apnea_rate:.1f}%)")
    print(f"😴 Normal files: {len(audio_df) - apnea_count:,} ({100-apnea_rate:.1f}%)")
    
    # Per-patient breakdown
    print(f"\n👤 PER-PATIENT WAV FILES:")
    patient_stats = audio_df.groupby('patient_id').agg({
        'apnea_label': ['count', 'sum', 'mean']
    }).round(3)
    
    for patient in audio_df['patient_id'].unique():
        count = patient_stats.loc[patient, ('apnea_label', 'count')]
        apnea = patient_stats.loc[patient, ('apnea_label', 'sum')]
        rate = patient_stats.loc[patient, ('apnea_label', 'mean')] * 100
        print(f"   {patient}: {count} files, {apnea} apnea ({rate:.1f}%)")
    
    print(f"\n🎉 SUCCESS! Clean audio extracted and ready for denoising pipeline.")
    
else:
    print(f"❌ No audio files extracted. Check error messages above.")

print(f"\nTime finished: {time.strftime('%Y-%m-%d %H:%M:%S')}")

🎵 STARTING EDF TO WAV AUDIO EXTRACTION
Time started: 2025-07-30 01:24:59
🎵 Starting audio extraction for patient_01...
   📁 patient_01: Found 5 EDF and 1 RML files
Hypopnea: 4329.0s to 4340.0s (duration: 11.0s)
Hypopnea: 4354.5s to 4368.5s (duration: 14.0s)
Hypopnea: 4389.5s to 4401.0s (duration: 11.5s)
Hypopnea: 4434.0s to 4446.0s (duration: 12.0s)
ObstructiveApnea: 4657.5s to 4670.5s (duration: 13.0s)
Hypopnea: 4897.5s to 4910.0s (duration: 12.5s)
Hypopnea: 4936.5s to 4948.0s (duration: 11.5s)
Hypopnea: 4995.5s to 5008.0s (duration: 12.5s)
ObstructiveApnea: 5202.5s to 5215.0s (duration: 12.5s)
Hypopnea: 5279.5s to 5292.0s (duration: 12.5s)
Hypopnea: 5445.0s to 5460.0s (duration: 15.0s)
Hypopnea: 5543.0s to 5558.0s (duration: 15.0s)
ObstructiveApnea: 5634.0s to 5649.0s (duration: 15.0s)
Hypopnea: 5713.5s to 5731.0s (duration: 17.5s)
ObstructiveApnea: 7777.0s to 7788.0s (duration: 11.0s)
ObstructiveApnea: 7856.5s to 7866.5s (duration: 10.0s)
ObstructiveApnea: 7877.5s to 7890.0s (durati

In [5]:
# Cell 5: Load Trained Model and Validate Feature Extraction
print("🤖 LOADING TRAINED MODEL FOR BASELINE VALIDATION")
print(f"{'='*60}")

# Load the trained Random Forest model
try:
    model_data = joblib.load(MODEL_PATH)
    
    if isinstance(model_data, dict):
        model = model_data['model']
        feature_columns = model_data.get('feature_columns', None)
        print(f"✅ Model loaded from: {MODEL_PATH}")
        print(f"📊 Model type: {type(model).__name__}")
        if feature_columns:
            print(f"🎯 Expected features: {len(feature_columns)}")
    else:
        # Fallback if model is saved directly
        model = model_data
        feature_columns = None
        print(f"✅ Model loaded (direct): {MODEL_PATH}")
        print(f"📊 Model type: {type(model).__name__}")
        
except Exception as e:
    print(f"❌ Failed to load model: {e}")
    print(f"🔍 Available model files:")
    model_dir = os.path.dirname(MODEL_PATH)
    if os.path.exists(model_dir):
        model_files = [f for f in os.listdir(model_dir) if f.endswith('.pkl')]
        for f in model_files:
            print(f"   - {f}")
    model = None

# Test feature extraction on a sample WAV file
if all_audio_metadata and len(all_audio_metadata) > 0:
    print(f"\n🧪 TESTING FEATURE EXTRACTION ON SAMPLE AUDIO")
    
    # Get first WAV file for testing
    sample_metadata = all_audio_metadata[0]
    sample_wav_path = sample_metadata['wav_path']
    print(f"📁 Sample file: {os.path.basename(sample_wav_path)}")
    
    try:
        # Load audio and extract features
        audio_data, sr = librosa.load(sample_wav_path, sr=TARGET_SAMPLE_RATE)
        print(f"🎵 Audio loaded: {len(audio_data)} samples at {sr} Hz")
        print(f"⏱️  Duration: {len(audio_data)/sr:.1f} seconds")
        
        # Extract features
        features = extract_comprehensive_features(audio_data, sr)
        
        if features:
            print(f"✅ Features extracted: {len(features)} features")
            print(f"🎯 Feature names match training: {'clean_' in list(features.keys())[0]}")
            
            # Display sample features
            print(f"\n📋 SAMPLE FEATURES:")
            for i, (name, value) in enumerate(list(features.items())[:5]):
                print(f"   {name}: {value:.6f}")
            print(f"   ... and {len(features)-5} more features")
            
            # Test model prediction if model is loaded
            if model is not None:
                print(f"\n🔮 TESTING MODEL PREDICTION")
                
                # Convert features to DataFrame
                feature_df = pd.DataFrame([features])
                
                # Ensure feature order matches training if available
                if feature_columns:
                    missing_features = set(feature_columns) - set(feature_df.columns)
                    extra_features = set(feature_df.columns) - set(feature_columns)
                    
                    if missing_features:
                        print(f"⚠️  Missing features: {missing_features}")
                    if extra_features:
                        print(f"⚠️  Extra features: {extra_features}")
                    
                    # Reorder columns to match training
                    feature_df = feature_df.reindex(columns=feature_columns, fill_value=0)
                
                try:
                    # Make prediction
                    prediction = model.predict(feature_df)[0]
                    prediction_proba = model.predict_proba(feature_df)[0]
                    
                    actual_label = sample_metadata['apnea_label']
                    
                    print(f"🎯 Actual label: {actual_label} ({'Apnea' if actual_label else 'Normal'})")
                    print(f"🤖 Predicted: {prediction} ({'Apnea' if prediction else 'Normal'})")
                    print(f"📊 Probability: Normal={prediction_proba[0]:.3f}, Apnea={prediction_proba[1]:.3f}")
                    print(f"✅ Prediction {'correct' if prediction == actual_label else 'incorrect'}")
                    
                except Exception as e:
                    print(f"❌ Model prediction failed: {e}")
            
        else:
            print(f"❌ Feature extraction failed")
            
    except Exception as e:
        print(f"❌ Audio loading/processing failed: {e}")

print(f"\n✅ Feature extraction validation complete")

🤖 LOADING TRAINED MODEL FOR BASELINE VALIDATION
✅ Model loaded (direct): ../models/sleep_apnea_model.pkl
📊 Model type: RandomForestClassifier

🧪 TESTING FEATURE EXTRACTION ON SAMPLE AUDIO
📁 Sample file: patient_01_00001206-100507[001]_frame_000000.wav
🎵 Audio loaded: 480000 samples at 16000 Hz
⏱️  Duration: 30.0 seconds
✅ Features extracted: 27 features
🎯 Feature names match training: True

📋 SAMPLE FEATURES:
   clean_rms: 0.004864
   clean_zcr: 0.020327
   clean_centroid: 1072.004802
   clean_bandwidth: 1889.160558
   clean_rolloff: 2834.621535
   ... and 22 more features

🔮 TESTING MODEL PREDICTION
🎯 Actual label: 0 (Normal)
🤖 Predicted: 1 (Apnea)
📊 Probability: Normal=0.457, Apnea=0.543
✅ Prediction incorrect

✅ Feature extraction validation complete


In [6]:
# Cell 6: Establish Clean Audio Baseline Performance
print("📊 ESTABLISHING CLEAN AUDIO BASELINE PERFORMANCE")
print(f"{'='*60}")

if model is not None and all_audio_metadata:
    print(f"🧪 Testing model on {len(all_audio_metadata)} extracted audio files...")
    
    # Extract features from all audio files
    all_features = []
    all_labels = []
    processed_count = 0
    failed_count = 0
    
    print(f"\n🔄 Processing audio files for baseline evaluation...")
    
    for i, metadata in enumerate(all_audio_metadata):
        try:
            # Load audio
            audio_data, sr = librosa.load(metadata['wav_path'], sr=TARGET_SAMPLE_RATE)
            
            # Extract features
            features = extract_comprehensive_features(audio_data, sr)
            
            if features:
                all_features.append(features)
                all_labels.append(metadata['apnea_label'])
                processed_count += 1
            else:
                failed_count += 1
                
        except Exception as e:
            failed_count += 1
            if failed_count <= 5:  # Show first 5 errors
                print(f"   ⚠️  Error processing {os.path.basename(metadata['wav_path'])}: {e}")
        
        # Progress update
        if (i + 1) % 50 == 0:
            print(f"   🔄 Processed {i + 1}/{len(all_audio_metadata)} files...")
    
    print(f"\n📊 Feature extraction complete:")
    print(f"   ✅ Successful: {processed_count}")
    print(f"   ❌ Failed: {failed_count}")
    
    if processed_count > 0:
        # Convert to DataFrame
        features_df = pd.DataFrame(all_features)
        labels = np.array(all_labels)
        
        # Ensure feature order matches training
        if feature_columns:
            features_df = features_df.reindex(columns=feature_columns, fill_value=0)
        
        print(f"\n🤖 Making predictions on clean audio...")
        
        try:
            # Make predictions
            predictions = model.predict(features_df)
            prediction_probas = model.predict_proba(features_df)
            
            # Calculate performance metrics
            f1 = f1_score(labels, predictions)
            
            # Detailed classification report
            print(f"\n📈 CLEAN AUDIO BASELINE PERFORMANCE:")
            print(f"🎯 F1-Score: {f1:.3f}")
            
            print(f"\n📋 Detailed Classification Report:")
            report = classification_report(labels, predictions, 
                                         target_names=['Normal', 'Apnea'], 
                                         output_dict=True)
            
            print(f"   Normal Breathing:")
            print(f"     Precision: {report['Normal']['precision']:.3f}")
            print(f"     Recall: {report['Normal']['recall']:.3f}")
            print(f"     F1-Score: {report['Normal']['f1-score']:.3f}")
            
            print(f"   Sleep Apnea:")
            print(f"     Precision: {report['Apnea']['precision']:.3f}")
            print(f"     Recall (Sensitivity): {report['Apnea']['recall']:.3f}")
            print(f"     F1-Score: {report['Apnea']['f1-score']:.3f}")
            
            print(f"   Overall:")
            print(f"     Accuracy: {report['accuracy']:.3f}")
            print(f"     Macro F1: {report['macro avg']['f1-score']:.3f}")
            print(f"     Weighted F1: {report['weighted avg']['f1-score']:.3f}")
            
            # Confusion matrix
            cm = confusion_matrix(labels, predictions)
            print(f"\n📊 Confusion Matrix:")
            print(f"                 Predicted")
            print(f"Actual    Normal  Apnea")
            print(f"Normal    {cm[0,0]:6d}  {cm[0,1]:5d}")
            print(f"Apnea     {cm[1,0]:6d}  {cm[1,1]:5d}")
            
            # Calculate medical metrics
            tn, fp, fn, tp = cm.ravel()
            sensitivity = tp / (tp + fn) if (tp + fn) > 0 else 0
            specificity = tn / (tn + fp) if (tn + fp) > 0 else 0
            
            print(f"\n🏥 Medical Metrics:")
            print(f"   Sensitivity (True Positive Rate): {sensitivity:.3f}")
            print(f"   Specificity (True Negative Rate): {specificity:.3f}")
            print(f"   False Positive Rate: {fp/(fp+tn):.3f}")
            print(f"   False Negative Rate: {fn/(fn+tp):.3f}")
            
            # Save baseline results
            baseline_results = {
                'clean_f1_score': f1,
                'clean_sensitivity': sensitivity,
                'clean_specificity': specificity,
                'clean_accuracy': report['accuracy'],
                'total_samples': len(labels),
                'apnea_samples': int(labels.sum()),
                'normal_samples': int(len(labels) - labels.sum()),
                'confusion_matrix': cm.tolist(),
                'classification_report': report
            }
            
            # Save to file
            baseline_path = os.path.join(AUDIO_OUTPUT_DIR, "clean_audio_baseline_results.json")
            import json
            with open(baseline_path, 'w') as f:
                json.dump(baseline_results, f, indent=2)
            
            print(f"\n💾 Baseline results saved: {baseline_path}")
            print(f"\n🎉 CLEAN AUDIO BASELINE ESTABLISHED!")
            print(f"📊 Ready for Phase 2: Noise injection and degradation analysis")
            
        except Exception as e:
            print(f"❌ Model evaluation failed: {e}")
            import traceback
            traceback.print_exc()
    
    else:
        print(f"❌ No features extracted successfully")

else:
    if model is None:
        print(f"⚠️  Skipping baseline evaluation - model not loaded")
    if not all_audio_metadata:
        print(f"⚠️  Skipping baseline evaluation - no audio files extracted")

print(f"\n✅ Phase 1 complete - Ready for noise injection experiments!")

📊 ESTABLISHING CLEAN AUDIO BASELINE PERFORMANCE
🧪 Testing model on 10972 extracted audio files...

🔄 Processing audio files for baseline evaluation...
   🔄 Processed 50/10972 files...
   🔄 Processed 100/10972 files...
   🔄 Processed 150/10972 files...
   🔄 Processed 200/10972 files...
   🔄 Processed 250/10972 files...
   🔄 Processed 300/10972 files...
   🔄 Processed 350/10972 files...
   🔄 Processed 400/10972 files...
   🔄 Processed 450/10972 files...
   🔄 Processed 500/10972 files...
   🔄 Processed 550/10972 files...
   🔄 Processed 600/10972 files...
   🔄 Processed 650/10972 files...
   🔄 Processed 700/10972 files...
   🔄 Processed 750/10972 files...
   🔄 Processed 800/10972 files...
   🔄 Processed 850/10972 files...
   🔄 Processed 900/10972 files...
   🔄 Processed 950/10972 files...
   🔄 Processed 1000/10972 files...
   🔄 Processed 1050/10972 files...
   🔄 Processed 1100/10972 files...
   🔄 Processed 1150/10972 files...
   🔄 Processed 1200/10972 files...
   🔄 Processed 1250/10972 fil

---

# Phase 1 Summary

## Completed:
1. ✅ **EDF to WAV Conversion**: Extracted clean audio from clinical EDF files
2. ✅ **Feature Extraction Validation**: Confirmed 27 features match training pipeline
3. ✅ **Model Loading**: Successfully loaded trained Random Forest model
4. ✅ **Baseline Performance**: Established clean audio performance metrics
5. ✅ **Test Set Preparation**: Created standardized WAV files for denoising pipeline

## Key Outputs:
- **WAV Files**: Patient audio organized in `{AUDIO_OUTPUT_DIR}/patient_XX_wav/`
- **Metadata**: `audio_metadata.csv` with file paths and apnea labels
- **Baseline Results**: `clean_audio_baseline_results.json` with performance metrics
- **Feature Validation**: Confirmed feature extraction pipeline works correctly

## Next Steps:
- **Phase 2**: Noise injection and baseline degradation analysis
- **Phase 3**: Comprehensive denoising method evaluation
- **Phase 4**: Multi-criteria analysis and smartphone deployment recommendations

---