<a href="https://www.kaggle.com/code/sheemamasood/birdclef2025-mel-generation?scriptVersionId=245829056" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

 # 🐦 BirdCLEF 2025 - Data Pipeline for Precomputed Features Extraction

This notebook implements a robust and efficient data pipeline for bird sound classification using deep learning. The workflow is designed to maximize training speed and flexibility by precomputing audio features , making it ideal for large-scale or multi-phase training experiments.

---

## **Key Features**

- **Multiple Data Quality Splits:**  
  Supports high-quality, medium-quality, and all-data splits, each with its own train/validation sets.

- **Precomputed Feature Pipeline:**  
  Audio files are processed **once** to extract Mel-spectrograms, YAMNet embeddings, and label vectors. These are saved as `.npz` files, drastically speeding up model training and reducing CPU bottlenecks.

- **On-the-Fly Augmentation for Training:**  
  For each training sample, multiple augmented versions (e.g., time-stretch, pitch-shift, noise) are generated and stored during precomputation. This ensures model robustness and diversity while maintaining high throughput.

- **Fast DataLoader:**  
  During training/validation, the DataLoader simply reads precomputed `.npz` files—no audio decoding or augmentation overhead at runtime.

---


## 🔗 BirdCLEF 2025 - Project Notebook Links

Here are the different stages of my BirdCLEF 2025 pipeline, organized by functionality:

### 📊 Data Preparation
- [BirdCLEF 2025 - Data Preparation](https://www.kaggle.com/code/sheemamasood/birdclef-2025-data-prepartion)

### 🎛️ Mel Spectrogram Generation
- [BirdCLEF 2025 - Mel Generation](https://www.kaggle.com/code/sheemamasood/birdclef2025-mel-generation)

### 🏷️ Pseudo Labelling for SSL
- [BirdCLEF 2025 - Pseudo Labelling for SSL](https://www.kaggle.com/code/sheemamasood/birdclef2025-psedolabelling-for-ssl)

### 🧠 Model Training
- [BirdCLEF 2025 - Model Training (Phase 1)](https://www.kaggle.com/code/sheemamasood/birdclef2025-model-training-phase1)

### 📦 Inference & Submissions
- [BirdCLEF 2025 - Submissions](https://www.kaggle.com/code/sheemamasood/birdclef2025-submissions)


In [1]:
# 📦 Basic Utilities
import os
import math
import time
import random
import logging
import warnings
from pathlib import Path

# 📊 Data Handling & Evaluation
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.metrics import roc_auc_score, classification_report, accuracy_score
import pickle

# 🎧 Audio Processing
import librosa
import librosa.display
import torchaudio
import torchaudio.transforms as T
import torchaudio.functional as F

# 🔥 PyTorch and Model Utilities
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from dataclasses import dataclass
from typing import List

# transformers
from transformers import Wav2Vec2Processor, Wav2Vec2Model, Wav2Vec2ForSequenceClassification
from torch.optim import AdamW

# 🖼️ Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import cv2

# 🔁 Progress Tracking
from tqdm.notebook import tqdm  # for notebooks
from tqdm import tqdm  # for scripts
from tqdm.auto import tqdm

# 🧠 Pretrained Models
import timm

# ✅ Confirm librosa
print(f"librosa version : {librosa.__version__}")
print(f"librosa files   : {librosa.__file__}")

print("✅ All libraries successfully imported!")


2025-06-17 00:29:24.866803: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750120165.063183      19 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750120165.116695      19 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


librosa version : 0.11.0
librosa files   : /usr/local/lib/python3.11/dist-packages/librosa/__init__.py
✅ All libraries successfully imported!


In [2]:
import torch

# Check for CUDA
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("✅ CUDA is available. Using GPU.")
else:
    device = torch.device("cpu")
    print("❌ CUDA not available. Using CPU.")


✅ CUDA is available. Using GPU.


In [3]:
class Config:
    # Audio settings
    FS = 32000  # Sampling rate (audio)

    # Mel spectrogram parameters (for converting audio to image)
    N_FFT = 1024       # FFT window size
    HOP_LENGTH = 512   # Step size for each frame
    FS = 32000
    FMIN = 50          # Minimum Mel frequency
    FMAX = 14000       # Maximum Mel frequency
    
    # RGB image shape (C, H, W)
    TARGET_DURATION = 10.0
    N_MELS = 128
    MEL_SHAPE = (256, 256)      
    TARGET_SHAPE = (3, 256, 256)  


    # No limit on the number of samples during training (full dataset)
    N_MAX = None  

    # flag for training mode
    TRAINING_MODE = True  
    
    # Additional training-specific configurations
    EPOCHS = 10  
    BATCH_SIZE = 32  
    LEARNING_RATE = 0.001  

# Create the config object
config = Config()

In [4]:
# Root path where all files and folders are stored
DATA_ROOT = '/kaggle/input/birdclef-2025'

# Load CSVs
train_df = pd.read_csv(os.path.join(DATA_ROOT, 'train.csv'))
taxonomy_df = pd.read_csv(os.path.join(DATA_ROOT, 'taxonomy.csv'))
location_df = pd.read_csv(os.path.join(DATA_ROOT, 'recording_location.txt'), delimiter='\t')
sample_submission = pd.read_csv(os.path.join(DATA_ROOT, 'sample_submission.csv'))



print(f"✅ Loaded train_df: {train_df.shape}")
print(f"✅ Loaded taxonomy_df: {taxonomy_df.shape}")
print(f"✅ Loaded location_df: {location_df.shape}")
print(f"✅ Loaded sample_submission: {sample_submission.shape}")


✅ Loaded train_df: (28564, 13)
✅ Loaded taxonomy_df: (206, 5)
✅ Loaded location_df: (4, 1)
✅ Loaded sample_submission: (3, 207)


In [5]:
# Load your VAD-cleaned CSV file
clean_train_df = pd.read_csv('/kaggle/input/birdcleft-clean-and-vad-filtered-data/train_audio_10sec_chunks_VAD_filtered.csv')
chunked_train_df = pd.read_csv("/kaggle/input/birdcleft-clean-and-vad-filtered-data/train_audio_10sec_chunks.csv")
print(f"✅ Loaded chunked_train_df: {chunked_train_df.shape}")
print(f"✅ Loaded clean_train_df: {clean_train_df.shape}")


working_df = pd.read_csv("/kaggle/input/melspectrogramofbirdclef-2025/working_df.csv")
print(f"✅ Loaded working_df: {working_df.shape}")

soundscape_chunked_df = pd.read_csv("/kaggle/input/birdcleft-clean-and-vad-filtered-data/soundscape_10sec_chunks.csv")
clean_soundscape_df = pd.read_csv("/kaggle/input/birdcleft-clean-and-vad-filtered-data/clean_soundscapes_chunks_10sec_vad_filtered.csv")

print(f"✅ Loaded soundscape_chunked_df: {soundscape_chunked_df.shape}")
print(f"✅ Loaded clean_soundscape_df: {clean_soundscape_df.shape}")

✅ Loaded chunked_train_df: (86915, 15)
✅ Loaded clean_train_df: (47342, 15)
✅ Loaded working_df: (28564, 10)
✅ Loaded soundscape_chunked_df: (58356, 9)
✅ Loaded clean_soundscape_df: (21854, 9)


In [6]:
# Load the master label list and set NUM_CLASSES
with open("/kaggle/input/birdcleft-clean-and-vad-filtered-data/master_label_list.pkl", "rb") as f:
    master_labels = pickle.load(f)
NUM_CLASSES = len(master_labels)  # should be 206


print(f"total number of labels in full data : {len(master_labels)}")  # Should print 206


total number of labels in full data : 206


In [7]:
def validate_df(df, name):
    print(f"\n🔍 Validating: {name}")
    print("-" * 50)
    
    # Check for NaNs
    nan_summary = df.isna().sum()
    if nan_summary.sum() == 0:
        print("✅ No NaNs found.")
    else:
        print("⚠️ NaNs found:")
        display(nan_summary[nan_summary > 0])
    
    # Show data types
    print("\n📊 Data Info:")
    display(df.info())
    
    # Sample rows
    print("\n🧾 Sample Rows:")
    display(df.sample(3, random_state=42))
    
    # Check essential columns (just an example set — adjust as needed)
    expected_cols = ['chunk_id', 'filepath', 'start_sample', 'end_sample']
    missing_cols = [col for col in expected_cols if col not in df.columns]
    if missing_cols:
        print(f"❌ Missing essential columns: {missing_cols}")
    else:
        print("✅ All essential columns are present.")
        
    #unique filename
    print(f"Unique audio files in df: {df['filename'].nunique()}")
    print(f"Average duration in df: {df['duration'].mean()}")
    
    # Sanity check: duration and sample range
    if 'duration' in df.columns and 'start_sample' in df.columns and 'end_sample' in df.columns:
        duration_errors = df[df['end_sample'] <= df['start_sample']]
        if len(duration_errors) > 0:
            print(f"❌ {len(duration_errors)} rows have invalid sample ranges!")
        else:
            print("✅ Sample ranges are valid.")

# Run validation
validate_df(working_df, "working_df (raw/rating filtered)")
validate_df(clean_train_df, "clean_df (VAD filtered)")
validate_df(clean_soundscape_df , "clean_soundscape_df(VAD FILTERED)")


🔍 Validating: working_df (raw/rating filtered)
--------------------------------------------------
✅ No NaNs found.

📊 Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28564 entries, 0 to 28563
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   primary_label     28564 non-null  object 
 1   rating            28564 non-null  float64
 2   filename          28564 non-null  object 
 3   target            28564 non-null  int64  
 4   filepath          28564 non-null  object 
 5   samplename        28564 non-null  object 
 6   class             28564 non-null  object 
 7   secondary_labels  28564 non-null  object 
 8   secondary_target  28564 non-null  object 
 9   duration          28564 non-null  float64
dtypes: float64(2), int64(1), object(7)
memory usage: 2.2+ MB


None


🧾 Sample Rows:


Unnamed: 0,primary_label,rating,filename,target,filepath,samplename,class,secondary_labels,secondary_target,duration
27685,yeofly1,4.0,yeofly1/XC250351.ogg,203,/kaggle/input/birdclef-2025/train_audio/yeofly...,yeofly1-XC250351,Aves,[''],[0],5.433469
25019,whbman1,5.0,whbman1/XC268585.ogg,187,/kaggle/input/birdclef-2025/train_audio/whbman...,whbman1-XC268585,Aves,[''],[0],53.784
96,21211,3.0,21211/XC913998.ogg,12,/kaggle/input/birdclef-2025/train_audio/21211/...,21211-XC913998,Amphibia,[''],[0],0.87875


❌ Missing essential columns: ['chunk_id', 'start_sample', 'end_sample']
Unique audio files in df: 28564
Average duration in df: 35.35245972859193

🔍 Validating: clean_df (VAD filtered)
--------------------------------------------------
✅ No NaNs found.

📊 Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47342 entries, 0 to 47341
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   chunk_id          47342 non-null  object 
 1   primary_label     47342 non-null  object 
 2   rating            47342 non-null  float64
 3   filename          47342 non-null  object 
 4   target            47342 non-null  int64  
 5   filepath          47342 non-null  object 
 6   samplename        47342 non-null  object 
 7   class             47342 non-null  object 
 8   secondary_labels  47342 non-null  object 
 9   secondary_target  47342 non-null  object 
 10  duration          47342 non-null  float64
 11  start_se

None


🧾 Sample Rows:


Unnamed: 0,chunk_id,primary_label,rating,filename,target,filepath,samplename,class,secondary_labels,secondary_target,duration,start_sec,end_sec,start_sample,end_sample
16115,gohman1-XC417778_chunk1,gohman1,5.0,gohman1/XC417778.ogg,107,/kaggle/input/birdclef-2025/train_audio/gohman...,gohman1-XC417778,Aves,[''],[0],37.608,10,20,320000,640000
2464,banana-XC166399_chunk1,banana,4.0,banana/XC166399.ogg,66,/kaggle/input/birdclef-2025/train_audio/banana...,banana-XC166399,Aves,[''],[0],22.439187,10,20,320000,640000
46575,yercac1-XC446946_chunk4,yercac1,4.5,yercac1/XC446946.ogg,204,/kaggle/input/birdclef-2025/train_audio/yercac...,yercac1-XC446946,Aves,['creoro1'],[44],128.0,40,50,1280000,1600000


✅ All essential columns are present.
Unique audio files in df: 14232
Average duration in df: 112.57249427833635
✅ Sample ranges are valid.

🔍 Validating: clean_soundscape_df(VAD FILTERED)
--------------------------------------------------
✅ No NaNs found.

📊 Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21854 entries, 0 to 21853
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   chunk_id      21854 non-null  object 
 1   filename      21854 non-null  object 
 2   filepath      21854 non-null  object 
 3   samplename    21854 non-null  object 
 4   start_sec     21854 non-null  int64  
 5   end_sec       21854 non-null  int64  
 6   start_sample  21854 non-null  int64  
 7   end_sample    21854 non-null  int64  
 8   duration      21854 non-null  float64
dtypes: float64(1), int64(4), object(4)
memory usage: 1.5+ MB


None


🧾 Sample Rows:


Unnamed: 0,chunk_id,filename,filepath,samplename,start_sec,end_sec,start_sample,end_sample,duration
4489,H11_20230506_231500_chunk0,H11_20230506_231500.ogg,/kaggle/input/birdclef-2025/train_soundscapes/...,H11_20230506_231500,0,10,0,320000,60.0
17682,H93_20230510_044000_chunk1,H93_20230510_044000.ogg,/kaggle/input/birdclef-2025/train_soundscapes/...,H93_20230510_044000,10,20,320000,640000,60.0
18129,H79_20230505_003000_chunk2,H79_20230505_003000.ogg,/kaggle/input/birdclef-2025/train_soundscapes/...,H79_20230505_003000,20,30,640000,960000,60.0


✅ All essential columns are present.
Unique audio files in df: 4403
Average duration in df: 60.0
✅ Sample ranges are valid.


In [8]:
print("Total rows:", len(clean_train_df))
print("Unique samplenames:", clean_train_df['samplename'].nunique())
print(clean_train_df['samplename'].value_counts().head(10))


Total rows: 47342
Unique samplenames: 14232
samplename
saffin-XC879442     95
greegr-XC558126     86
52884-CSA18804      86
compau-XC837459     76
compau-XC833301     71
pirfly1-XC695270    61
speowl1-XC525219    61
bkcdon-XC703631     55
banana-XC214521     53
yebsee1-XC879859    53
Name: count, dtype: int64


In [9]:
print("Total rows:", len(clean_soundscape_df))
print("Unique samplenames:", clean_soundscape_df['samplename'].nunique())
print(clean_soundscape_df['samplename'].value_counts().head(10))


Total rows: 21854
Unique samplenames: 4403
samplename
H71_20230430_115000     6
H65_20230511_232500     6
H92_20230508_022000     6
H63_20230422_095500     6
H11_20230429_233000     6
O202_20230506_232500    6
H79_20230513_221000     6
H71_20230520_194500     6
H37_20230510_141500     6
H99_20230504_014000     6
Name: count, dtype: int64


# 🎧 MEL Spectrogram Generation

- **Input**: Raw audio waveform (numpy array)
- **Output**: Normalized mel spectrogram (numpy array with shape `[n_mels, time]`)

## 🧼 Audio Preparation Function

Ensures each audio file is exactly `target_len` samples:

- 🔹 If too short → center **zero-padded**
- 🔹 If too long → center **trimmed**


In [10]:
# Function jo audio ko mel spectrogram me convert karta hai
# def audio2melspec(audio_data, config):
#     # Agar NaN ho to usko remove karte hain
#     if np.isnan(audio_data).any():
#         mean_val = np.nanmean(audio_data)
#         audio_data = np.nan_to_num(audio_data, nan=mean_val)

#     # Mel spectrogram
#     mel = librosa.feature.melspectrogram(
#         y=audio_data,
#         sr=config.FS,
#         n_fft=config.N_FFT,
#         hop_length=config.HOP_LENGTH,
#         n_mels=config.N_MELS,
#         fmin=config.FMIN,
#         fmax=config.FMAX,
#         power=2.0
#     )

#     # Usko decibels me convert karna
#     mel_db = librosa.power_to_db(mel, ref=np.max)

#     # Normalize karna
#     mel_db = (mel_db - mel_db.min()) / (mel_db.max() - mel_db.min() + 1e-8)

#     return mel_db


# # Function to process audio dataset
# def process_audio(df, label, config):
#     print(f"Processing {label} audio data...")
#     start_time = time.time()

#     bird_data = {}
#     errors = []
#     target_len = int(config.TARGET_DURATION * config.FS)

#     for k, row in tqdm(df.iterrows(), total=len(df)):
#         try:
#             audio, _ = librosa.load(row.filepath, sr=config.FS)
#             audio = prepare_audio(audio, target_len)
#             mel = audio2melspec(audio, config)

#             if mel.shape != config.TARGET_SHAPE:
#                 mel = cv2.resize(mel, config.TARGET_SHAPE)

#             bird_data[row.samplename] = mel.astype(np.float32)

#         except Exception as e:
#             errors.append((row.filepath, str(e)))

#     end_time = time.time()

#     print(f"\nFinished processing '{label}' in {end_time - start_time:.1f} seconds")
#     print(f" Successfully processed: {len(bird_data)} files of {label}")
#     print(f"Failed: {len(errors)} files")

#     np.savez_compressed(f'{label}.npz', **bird_data)
#     print(f"Saved data as '{label}.npz'\n")

#     return bird_data, errors


In [11]:
# ========== MEL FUNCTION ===================
# config object has these:
mel_transform = T.MelSpectrogram(
    sample_rate=config.FS,
    n_fft=config.N_FFT,
    hop_length=config.HOP_LENGTH,
    n_mels=config.N_MELS,
    f_min=config.FMIN,
    f_max=config.FMAX,
    power=2.0,
).to(device)  # 👈 GPU-par move

# Your new GPU-compatible function
def audio2melspec_gpu(audio_data):
    # Convert numpy to torch tensor and add batch/channel dims
    if np.isnan(audio_data).any():
        mean_val = np.nanmean(audio_data)
        audio_data = np.nan_to_num(audio_data, nan=mean_val)

    waveform = torch.tensor(audio_data, dtype=torch.float32).unsqueeze(0).to(device)  # shape: (1, samples)

    # Apply mel spectrogram
    mel = mel_transform(waveform)

    # Convert to decibel scale
    mel_db = F.amplitude_to_DB(mel, multiplier=10.0, amin=1e-10, db_multiplier=0.0)

    # Normalize to [0, 1]
    mel_db = (mel_db - mel_db.min()) / (mel_db.max() - mel_db.min() + 1e-8)

    return mel_db.squeeze(0).cpu().numpy()  # shape: (n_mels, time)

def is_valid_5sec_audio(audio, sample_rate=32000, duration_sec=5):
    expected_len = sample_rate * duration_sec
    return len(audio) == expected_len

    
# ====== PROCESS FUNCTION ===================


def prepare_audio(audio, target_len):
    """
    Ensure audio is exactly `target_len` samples.
    - If too short: zero-pad
    - If too long: center-trim
    """
    current_len = len(audio)

    if current_len < target_len:
        # Zero pad (centered)
        pad_left = (target_len - current_len) // 2
        pad_right = target_len - current_len - pad_left
        audio = np.pad(audio, (pad_left, pad_right), mode='constant')

    elif current_len > target_len:
        # Center trim
        start = (current_len - target_len) // 2
        audio = audio[start: start + target_len]

    return audio




In [12]:
print(config.FS, config.TARGET_DURATION, config.TARGET_SHAPE)


32000 10.0 (3, 256, 256)


In [13]:
def process_audio_with_batches(df, label, config, batch_size=1000):
    print(f"🔄 Processing {label} audio data with batch size {batch_size}...\n")
    start_time = time.time()

    bird_data = {}
    errors = []
    target_len = int(config.TARGET_DURATION * config.FS)
    batch_num = 0
    success_count = 0

    for i, row in tqdm(df.iterrows(), total=len(df), leave=True, dynamic_ncols=True):
        try:
            # Load and resample if needed
            audio, sr = torchaudio.load(row.filepath)
            audio = audio.mean(dim=0).numpy()  # mono

            if sr != config.FS:
                audio = torchaudio.functional.resample(
                    torch.tensor(audio), orig_freq=sr, new_freq=config.FS
                ).numpy()

            # Preprocess and convert to mel
            audio = prepare_audio(audio, target_len)
            mel = audio2melspec_gpu(audio)
            mel = cv2.resize(mel, config.MEL_SHAPE[::-1])
            mel = cv2.cvtColor((mel * 255).astype(np.uint8), cv2.COLOR_GRAY2RGB)
            mel = mel.transpose(2, 0, 1)

            # Save with unique key
            bird_data[row.chunk_id] = mel.astype(np.float32)
            success_count += 1

            # Save batch
            if (i + 1) % batch_size == 0:
                batch_filename = f'{label}_batch_{batch_num}.npz'
                np.savez_compressed(batch_filename, **bird_data)
                #print(f"💾 Batch {batch_num} saved with {len(bird_data)} samples ✅")
                tqdm.write(f"💾 Batch {batch_num} saved with {len(bird_data)} samples ✅")
                bird_data.clear()
                batch_num += 1

        except Exception as e:
            #print(f"❌ Error on {row.chunk_id}: {e}")
            tqdm.write(f"❌ Error on {row.chunk_id}: {e}")
            errors.append((row.chunk_id, row.filepath, str(e)))

    # Save remaining samples
    if bird_data:
        batch_filename = f'{label}_batch_{batch_num}.npz'
        np.savez_compressed(batch_filename, **bird_data)
        print(f"💾 Final batch {batch_num} saved with {len(bird_data)} samples ✅")

    end_time = time.time()
    print(f"\n✅ Finished processing '{label}' in {end_time - start_time:.1f} seconds")
    print(f"🟢 Total successful: {success_count}")
    print(f"🔴 Total failed: {len(errors)}")

    return errors


In [14]:
if torch.cuda.is_available():
    torch.cuda.empty_cache()

In [15]:
errors = process_audio_with_batches(clean_soundscape_df, label='clean_soundscape_mel_specs', config=config, batch_size=1000)


🔄 Processing clean_soundscape_mel_specs audio data with batch size 1000...



  0%|          | 0/21854 [00:00<?, ?it/s]

💾 Batch 0 saved with 1000 samples ✅
💾 Batch 1 saved with 1000 samples ✅
💾 Batch 2 saved with 1000 samples ✅
💾 Batch 3 saved with 1000 samples ✅
💾 Batch 4 saved with 1000 samples ✅
💾 Batch 5 saved with 1000 samples ✅
💾 Batch 6 saved with 1000 samples ✅
💾 Batch 7 saved with 1000 samples ✅
💾 Batch 8 saved with 1000 samples ✅
💾 Batch 9 saved with 1000 samples ✅
💾 Batch 10 saved with 1000 samples ✅
💾 Batch 11 saved with 1000 samples ✅
💾 Batch 12 saved with 1000 samples ✅
💾 Batch 13 saved with 1000 samples ✅
💾 Batch 14 saved with 1000 samples ✅
💾 Batch 15 saved with 1000 samples ✅
💾 Batch 16 saved with 1000 samples ✅
💾 Batch 17 saved with 1000 samples ✅
💾 Batch 18 saved with 1000 samples ✅
💾 Batch 19 saved with 1000 samples ✅
💾 Batch 20 saved with 1000 samples ✅
💾 Final batch 21 saved with 854 samples ✅

✅ Finished processing 'clean_soundscape_mel_specs' in 2609.9 seconds
🟢 Total successful: 21854
🔴 Total failed: 0


In [16]:
if torch.cuda.is_available():
    torch.cuda.empty_cache()

In [17]:
# ========== APPLY TO DFs ==========
#clean_mel_specs, errors = process_audio(df=clean_train_df, label="clean_train_audio_mel_specs", config=config)

errors = process_audio_with_batches(clean_train_df, label='clean_train_mel_specs', config=config, batch_size=1000)


🔄 Processing clean_train_mel_specs audio data with batch size 1000...



  0%|          | 0/47342 [00:00<?, ?it/s]

💾 Batch 0 saved with 1000 samples ✅
💾 Batch 1 saved with 1000 samples ✅
💾 Batch 2 saved with 1000 samples ✅
💾 Batch 3 saved with 1000 samples ✅
💾 Batch 4 saved with 1000 samples ✅
💾 Batch 5 saved with 1000 samples ✅
💾 Batch 6 saved with 1000 samples ✅
💾 Batch 7 saved with 1000 samples ✅
💾 Batch 8 saved with 1000 samples ✅
💾 Batch 9 saved with 1000 samples ✅
💾 Batch 10 saved with 1000 samples ✅
💾 Batch 11 saved with 1000 samples ✅
💾 Batch 12 saved with 1000 samples ✅
💾 Batch 13 saved with 1000 samples ✅
💾 Batch 14 saved with 1000 samples ✅
💾 Batch 15 saved with 1000 samples ✅
💾 Batch 16 saved with 1000 samples ✅
💾 Batch 17 saved with 1000 samples ✅
💾 Batch 18 saved with 1000 samples ✅
💾 Batch 19 saved with 1000 samples ✅
💾 Batch 20 saved with 1000 samples ✅
💾 Batch 21 saved with 1000 samples ✅
💾 Batch 22 saved with 1000 samples ✅
💾 Batch 23 saved with 1000 samples ✅
💾 Batch 24 saved with 1000 samples ✅
💾 Batch 25 saved with 1000 samples ✅
💾 Batch 26 saved with 1000 samples ✅
💾 Batch 27 

In [18]:
def process_full_audio_with_batches(df, label, config, batch_size=1000):
    print(f"🔄 Processing {label} audio data with batch size {batch_size}...")
    start_time = time.time()

    bird_data = {}
    errors = []
    target_len = int(config.TARGET_DURATION * config.FS)
    batch_num = 0
    success_count = 0

    for i, row in tqdm(df.iterrows(), total=len(df), leave=True, dynamic_ncols=True):
        try:
            # Load audio using torchaudio (GPU-friendly)
            audio, sr = torchaudio.load(row.filepath)
            audio = audio.mean(dim=0).numpy()  # convert to mono

            # Resample if necessary
            if sr != config.FS:
                audio = torchaudio.functional.resample(
                    torch.tensor(audio), orig_freq=sr, new_freq=config.FS
                ).numpy()

            # Pad/trim to fixed length
            audio = prepare_audio(audio, target_len)

            # Convert to mel spectrogram (GPU)
            mel = audio2melspec_gpu(audio)

            # Resize and convert to RGB
            mel = cv2.resize(mel, config.MEL_SHAPE[::-1])
            mel = cv2.cvtColor((mel * 255).astype(np.uint8), cv2.COLOR_GRAY2RGB)
            mel = mel.transpose(2, 0, 1)  # (3, H, W)

            # Use filename as key
            bird_data[row.filename] = mel.astype(np.float32)
            success_count += 1

            # Save in batches
            if (i + 1) % batch_size == 0:
                batch_filename = f'{label}_batch_{batch_num}.npz'
                np.savez_compressed(batch_filename, **bird_data)
                tqdm.write(f"💾 Batch {batch_num} saved with {len(bird_data)} samples ✅")
                bird_data.clear()
                batch_num += 1

        except Exception as e:
            tqdm.write(f"❌ Error on {row.filename}: {e}")
            errors.append((row.filename, row.filepath, str(e)))

    # Save any remaining data
    if bird_data:
        batch_filename = f'{label}_batch_{batch_num}.npz'
        np.savez_compressed(batch_filename, **bird_data)
        print(f"💾 Final batch {batch_num} saved with {len(bird_data)} samples ✅")

    end_time = time.time()
    print(f"\n✅ Finished processing '{label}' in {end_time - start_time:.1f} seconds")
    print(f"🟢 Total successful: {success_count}")
    print(f"🔴 Total failed: {len(errors)}")

    return errors


In [19]:
errors = process_full_audio_with_batches(working_df, label='working_data_mel_specs', config=config, batch_size=1000)

🔄 Processing working_data_mel_specs audio data with batch size 1000...


  0%|          | 0/28564 [00:00<?, ?it/s]

💾 Batch 0 saved with 1000 samples ✅
💾 Batch 1 saved with 1000 samples ✅
💾 Batch 2 saved with 1000 samples ✅
💾 Batch 3 saved with 1000 samples ✅
💾 Batch 4 saved with 1000 samples ✅
💾 Batch 5 saved with 1000 samples ✅
💾 Batch 6 saved with 1000 samples ✅
💾 Batch 7 saved with 1000 samples ✅
💾 Batch 8 saved with 1000 samples ✅
💾 Batch 9 saved with 1000 samples ✅
💾 Batch 10 saved with 1000 samples ✅
💾 Batch 11 saved with 1000 samples ✅
💾 Batch 12 saved with 1000 samples ✅
💾 Batch 13 saved with 1000 samples ✅
💾 Batch 14 saved with 1000 samples ✅
💾 Batch 15 saved with 1000 samples ✅
💾 Batch 16 saved with 1000 samples ✅
💾 Batch 17 saved with 1000 samples ✅
💾 Batch 18 saved with 1000 samples ✅
💾 Batch 19 saved with 1000 samples ✅
💾 Batch 20 saved with 1000 samples ✅
💾 Batch 21 saved with 1000 samples ✅
💾 Batch 22 saved with 1000 samples ✅
💾 Batch 23 saved with 1000 samples ✅
💾 Batch 24 saved with 1000 samples ✅
💾 Batch 25 saved with 1000 samples ✅
💾 Batch 26 saved with 1000 samples ✅
💾 Batch 27 