# **Sonic Vault** Audio Dataset

This dataset was automatically collected using a Python script that records short audio clips from streaming radio stations. The recordings capture live audio from selected stations, with each clip having a random duration between 30 and 90 seconds.

## Dataset Structure

Sonic_Vault/ ├── metadata.csv # CSV file logging metadata of each recording ├── NPR<timestamp>1.mp3 # Example audio file from NPR ├── Classic_FM<timestamp>2.mp3 # Example audio file from Classic FM 

- **Audio Files**:  
  Each audio file is saved in MP3 format. The filename contains the station name, a timestamp, and an incremental index (e.g., `NPR20250309_1430121.mp3`).

- **metadata.csv**:  
  This file logs the following information for each recording:
  - **Station Name**: Name of the radio station.
  - **Filename**: The name of the recorded audio file.
  - **Timestamp**: The date and time the recording was made.
  - **Duration (s)**: The duration (in seconds) of the recorded clip.

## Dataset Size

- **Total Recordings**: 30 audio files.
- **Recording Duration**: Each recording has a duration randomly chosen between 30 and 90 seconds.
- **Storage Format**: All audio files are stored in MP3 format within the `Sonic_Vault` folder.

## Data Collection Process

1. **Station Selection**:  
   The script records from two primary radio stations:
   - **NPR**: `http://npr-ice.streamguys1.com/live.mp3`
   - **Classic_FM**: `http://media-ice.musicradio.com/ClassicFMMP3`  
   *(Additional stations are provided as comments for reference.)*

2. **Recording Mechanism**:  
   - The script uses the `ffmpeg` command-line tool to capture audio streams.
   - For each clip, a random duration between 30 and 90 seconds is calculated.
   - Audio clips are recorded and saved with unique filenames in the `Sonic_Vault` directory.

3. **Metadata Logging**:  
   - After each successful recording, metadata (station name, filename, timestamp, and duration) is logged into `metadata.csv`.
   - The process continues until 30 audio files have been recorded.

4. **Error Handling and Timing**:  
   - Errors during the recording process are caught and printed without halting the overall process.
   - A short pause (2 seconds) is added between recordings to ensure smooth operation.

In [1]:
import os
import subprocess
import time
import csv
from datetime import datetime

In [None]:
def main():
    radio_stations = {
        "NPR": "http://npr-ice.streamguys1.com/live.mp3",
        "Classic_FM": "http://media-ice.musicradio.com/ClassicFMMP3",
    }
    ## Some other radio stations which work for fixed time(not 24/7): 
    #             "KEXP": "http://live-aacplus-64.kexp.org/kexp64.aac",
    #             "BBC_Radio": "http://bbcmedia.ic.llnwd.net/stream/bbcmedia_radio1_mf_p",
    #             "Jazz24": "http://live.wostreaming.net/direct/ppm-jazz24mp3-ibc1",
    #             "KCRW": "http://kcrw.streamguys1.com/kcrw_192k_mp3",
    #             "WNYC": "https://fm939.wnyc.org/wnycfm",

    num_files = 30  

    dataset_name = "Sonic_Vault"
    os.makedirs(dataset_name, exist_ok=True)

    metadata_file = os.path.join(dataset_name, "metadata.csv")
    with open(metadata_file, mode='w', newline='', encoding='utf-8') as csv_file:
        csv_writer = csv.writer(csv_file)
        csv_writer.writerow(["Station Name", "Filename", "Timestamp", "Duration (s)"])
        file_count = 0

        while file_count < num_files:
            for station_name, stream_url in radio_stations.items():
                if file_count >= num_files:
                    break
                random_duration = 30 + int(60 * (time.time() % 1))
                timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
                filename = f"{station_name}{timestamp}{file_count+1}.mp3"
                output_path = os.path.join(dataset_name, filename)

                print(f"Recording from {station_name} for {random_duration} seconds...")
                command = [
                    "ffmpeg",
                    "-y",                    
                    "-i", stream_url,        
                    "-t", str(random_duration),  
                    "-acodec", "copy",       
                    output_path
                ]
                try:
                    subprocess.run(command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
                    print(f"Saved recording: {output_path}")
                    csv_writer.writerow([station_name, filename, timestamp, random_duration])
                    file_count += 1
                except Exception as e:
                    print(f"Error recording from {station_name}: {e}")
                time.sleep(2)

    print("\nAudio recording process completed.")


In [3]:
if __name__ == '__main__':
    main()


Recording from NPR for 63 seconds...
Saved recording: Sonic_Vault\NPR20250310_0215061.mp3
Recording from Classic_FM for 75 seconds...
Saved recording: Sonic_Vault\Classic_FM20250310_0216072.mp3
Recording from NPR for 66 seconds...
Saved recording: Sonic_Vault\NPR20250310_0216373.mp3
Recording from Classic_FM for 66 seconds...
Saved recording: Sonic_Vault\Classic_FM20250310_0217414.mp3
Recording from NPR for 66 seconds...
Saved recording: Sonic_Vault\NPR20250310_0218355.mp3
Recording from Classic_FM for 68 seconds...
Saved recording: Sonic_Vault\Classic_FM20250310_0219396.mp3
Recording from NPR for 53 seconds...
Saved recording: Sonic_Vault\NPR20250310_0220357.mp3
Recording from Classic_FM for 48 seconds...
Saved recording: Sonic_Vault\Classic_FM20250310_0221268.mp3
Recording from NPR for 48 seconds...
Saved recording: Sonic_Vault\NPR20250310_0222029.mp3
Recording from Classic_FM for 38 seconds...
Saved recording: Sonic_Vault\Classic_FM20250310_02224810.mp3
Recording from NPR for 73 sec