# ESC-50 Dataset Generator for BaseAL

Converts the ESC-50 environmental sound dataset to BaseAL-friendly format.

**ESC-50 Dataset:**
- 2000 audio recordings (5-second clips at 44.1kHz)
- 50 environmental sound classes
- Pre-segmented (no onset/offset annotations needed)
- 5 pre-defined folds for cross-validation

**Pipeline:**
1. Load ESC-50 metadata using the adapter system
2. Generate embeddings per audio file using pretrained models (BirdNET, etc.)
3. Convert to per-segment format (model may window audio differently)
4. Package into BaseAL format with fold-based validation split

**Output Format:**
```
ESC50_BASEAL/
├── data/
|   └── birdnet/
│       ├── 1-100032-A-0_000_003.wav
│       ├── 1-100032-A-0_003_005.wav
|       └── ...
├── embeddings/
│   └── birdnet/
│       ├── 1-100032-A-0_000_003_birdnet.npy
│       └── ...
├── labels.csv        # filename, label, validation
└── metadata.csv      # All segment metadata
```

In [1]:
from pathlib import Path
import json
import pandas as pd

from utils.helpers import convert_for_json
from utils.embeddings import initialise, generate_embeddings
from utils.adapters import ESC50Adapter, AdapterConfig
from utils.segment_labels import (
    split_metadata_with_adapter,
    create_labels_csv_with_adapter,
    SegmentConfig
)

## Configuration

Configure the model, paths, and validation fold.

In [2]:
# Model selection (birdnet segments audio into 3-second windows at 48kHz)
MODEL = "birdnet"

# ESC-50 paths
AUDIO_PATH = Path("ESC50/subset")
METADATA_PATH = Path("ESC50/meta/esc50.csv")

# Output dataset path
DATASET_PATH = Path("ESC50_BASEAL")
DATASET_PATH.mkdir(exist_ok=True)

SEG_PATH = DATASET_PATH / "data" / MODEL
EMB_PATH = DATASET_PATH / "embeddings" / MODEL
SEG_PATH.mkdir(exist_ok=True, parents=True)
EMB_PATH.mkdir(exist_ok=True, parents=True)

# Validation configuration
# ESC-50 has 5 folds 
VALIDATION_FOLD = 5

print(f"Audio path: {AUDIO_PATH}")
print(f"Output path: {DATASET_PATH}")
print(f"Validation fold: {VALIDATION_FOLD}")

Audio path: ESC50\subset
Output path: ESC50_BASEAL
Validation fold: 5


## Generate Segments and Embeddings

Uses bacpipe to generate embeddings. BirdNET automatically segments audio into 3-second windows.

For ESC-50's 5-second clips, this means each file produces 2 segments:
- `*_000_003.wav` (0-3 seconds)
- `*_003_005.wav` (3-5 seconds, padded to 3s by the model)

In [3]:
# Initialize the embedding model
embedder = initialise(model_name=MODEL)

Checking if the selected models require a checkpoint, and if so, if the checkpoint already exists.

birdnet checkpoint exists.






Using device='cpu'







Skipping model.eval() because model is from tensorflow.


Model: birdnet
Sample rate: 48000 Hz
Segment length: 144000 samples (3.0s)


In [4]:
# Generate audio segments and embeddings
# This processes all 2000 ESC-50 files
embeddings = generate_embeddings(
    audio_dir=AUDIO_PATH,
    embedder=embedder,
    model_name=MODEL,
    segments_dir=SEG_PATH,
    output_dir=EMB_PATH
)

print(f"\nProcessed {len(embeddings)} audio files")


Found 5 audio files
Processing 1/5: 1-137-A-32.wav                

                                                                  

Processing 2/5: 1-1791-A-26.wav               

                                                                  

Processing 3/5: 1-4211-A-12.wav               

                                                                  

Processing 4/5: 1-5996-A-6.wav                

                                                                  

Processing 5/5: 1-977-A-39.wav                

                                                                  


Processed 5 audio files




## Labels and Metadata

Use the ESC50Adapter to load metadata and create segment-level labels.

The adapter handles:
- Loading ESC-50's CSV metadata format
- Extracting the `category` column as labels
- Using the `fold` column for validation split

In [5]:
# Configure the adapter
adapter_config = AdapterConfig(
    validation_fold=VALIDATION_FOLD,
    no_event_label="unknown"  # Not really used for ESC-50 (all clips have labels)
)

# Create the ESC-50 adapter
adapter = ESC50Adapter(config=adapter_config)

# Load metadata
df = adapter.load_metadata(METADATA_PATH)
print(f"Loaded metadata for {len(df)} files")
print(f"Categories: {df['category'].nunique()} unique classes")
print(f"\nSample categories: {df['category'].unique()[:10]}")

Loaded metadata for 2000 files
Categories: 50 unique classes

Sample categories: ['dog' 'chirping_birds' 'vacuum_cleaner' 'thunderstorm' 'door_wood_knock'
 'can_opening' 'crow' 'clapping' 'fireworks' 'chainsaw']


In [6]:
# Get segment duration from model
duration = embedder.model.segment_length / embedder.model.sr
print(f"Model segment duration: {duration:.1f}s")

# Configure segmentation
config = SegmentConfig(
    segment_duration=duration,
    min_overlap=0.0,
    no_event_label="unknown"
)

# Split into segments using the adapter
segment_df = split_metadata_with_adapter(df, adapter, config)
print(f"\nCreated {len(segment_df)} segments from {len(df)} files")
print(f"Segments per file: {len(segment_df) / len(df):.1f}")

Model segment duration: 3.0s

Created 4000 segments from 2000 files
Segments per file: 2.0


In [7]:
# Preview segment metadata
print("Sample segment metadata:")
segment_df.head()

Sample segment metadata:


Unnamed: 0,filename,original_filepath,segment_start,segment_end,label,has_event,segment_events,segment_event_clusters,target,category,fold,esc10,src_file,take
0,1-100032-A-0_000_003.wav,1-100032-A-0.wav,0.0,3.0,dog,True,"[[0.0, 3.0]]",[],0,dog,1,True,100032,A
1,1-100032-A-0_003_006.wav,1-100032-A-0.wav,3.0,6.0,dog,True,"[[0.0, 3.0]]",[],0,dog,1,True,100032,A
2,1-100038-A-14_000_003.wav,1-100038-A-14.wav,0.0,3.0,chirping_birds,True,"[[0.0, 3.0]]",[],14,chirping_birds,1,False,100038,A
3,1-100038-A-14_003_006.wav,1-100038-A-14.wav,3.0,6.0,chirping_birds,True,"[[0.0, 3.0]]",[],14,chirping_birds,1,False,100038,A
4,1-100210-A-36_000_003.wav,1-100210-A-36.wav,0.0,3.0,vacuum_cleaner,True,"[[0.0, 3.0]]",[],36,vacuum_cleaner,1,False,100210,A


In [8]:
# Convert numpy arrays to JSON strings for CSV compatibility
csv_df = segment_df.copy()
for col in ['segment_events', 'segment_event_clusters']:
    if col in csv_df.columns:
        csv_df[col] = csv_df[col].apply(lambda x: json.dumps(convert_for_json(x)))

# Save metadata.csv
csv_df.to_csv(DATASET_PATH / "metadata.csv", index=False, encoding='utf-8')
print(f"Saved metadata to {DATASET_PATH / 'metadata.csv'}")

Saved metadata to ESC50_BASEAL\metadata.csv


In [9]:
# Create labels.csv with fold-based validation split
labels_df = create_labels_csv_with_adapter(segment_df, adapter)
labels_df.to_csv(DATASET_PATH / "labels.csv", index=False, encoding='utf-8')

print(f"Saved labels to {DATASET_PATH / 'labels.csv'}")
print(f"\nValidation split (fold {VALIDATION_FOLD}):")
print(f"  Train: {(~labels_df['validation']).sum()} segments")
print(f"  Validation: {labels_df['validation'].sum()} segments")

Saved labels to ESC50_BASEAL\labels.csv

Validation split (fold 5):
  Train: 3200 segments
  Validation: 800 segments


## Summary

In [10]:
print("=" * 50)
print("ESC-50 Dataset Generation Complete")
print("=" * 50)
print(f"\nOutput directory: {DATASET_PATH}")
print(f"Model: {MODEL}")
print(f"\nStatistics:")
print(f"  Original files: {len(df)}")
print(f"  Total segments: {len(segment_df)}")
print(f"  Unique labels: {segment_df['label'].nunique()}")
print(f"\nLabel distribution (top 10):")
print(segment_df['label'].value_counts().head(10))

ESC-50 Dataset Generation Complete

Output directory: ESC50_BASEAL
Model: birdnet

Statistics:
  Original files: 2000
  Total segments: 4000
  Unique labels: 50

Label distribution (top 10):
label
dog                 80
glass_breaking      80
drinking_sipping    80
rain                80
insects             80
laughing            80
hen                 80
engine              80
breathing           80
crying_baby         80
Name: count, dtype: int64


## Verify Output Structure

In [11]:
# Verify output structure
print("Output directory structure:")
print(f"  {DATASET_PATH}/")
print(f"  ├── data/{MODEL}/ ({len(list(SEG_PATH.glob('*.wav')))} files)")
print(f"  ├── embeddings/{MODEL}/ ({len(list(EMB_PATH.glob('*.npy')))} files)")
print(f"  ├── labels.csv")
print(f"  └── metadata.csv")

Output directory structure:
  ESC50_BASEAL/
  ├── data/birdnet/ (10 files)
  ├── embeddings/birdnet/ (10 files)
  ├── labels.csv
  └── metadata.csv
