# Creation of datasets for the "dog_bark" and "siren" classes from the UrbanSound8K dataset

This Jupyter notebook creates customized audio datasets for machine learning models by extracting samples from the UrbanSound8K dataset. Specifically, it filters the UrbanSound8K metadata to extract 1,500 audio samples for each of the "dog_bark" and "siren" classes. It then copies these audio samples into separate output directories to create specialized datasets that can be used to train audio classification models.

In [1]:
import os
import shutil
import pandas as pd

### Load metadata and define directories

This section loads the UrbanSound8K metadata file and defines the directories for the source UrbanSound8K dataset as well as the output directories where the new datasets will be created.

In [2]:
# Path to the UrbanSound8k dataset
data_dir = 'UrbanSound8K/audio'
metadata_path = 'UrbanSound8K/metadata/UrbanSound8K.csv'

In [3]:
# Output directories for the new datasets
output_dir_dog_bark = "dog_bark_dataset"
output_dir_siren = "siren_dataset"

In [4]:
# Create output directories if they don't exist
os.makedirs(output_dir_dog_bark, exist_ok=True)
os.makedirs(output_dir_siren, exist_ok=True)

In [5]:
# Load the UrbanSound8k metadata
metadata = pd.read_csv(metadata_path)

### Filter samples

The metadata is then filtered to only keep samples belonging to the "dog_bark" and "siren" classes. This filters out 1,500 samples for each class to include in the specialized datasets.

In [6]:
# Filter samples for the "dog_bark" class
dog_bark_samples = metadata[metadata["class"] == "dog_bark"].head(1500)

In [7]:
# Filter samples for the "siren" class
siren_samples = metadata[metadata["class"] == "siren"].head(1500)

### Copy audio files

Finally, a function is defined to copy over the audio files for each sample in the filtered metadata DataFrames. This copies the samples from the UrbanSound8K folders into the output directories, creating the finalized datasets for the "dog_bark" and "siren" classes.

In [8]:
# Function to copy audio files to the specified directory
def copy_audio_files(samples, output_dir):
    """
        Copies audio files specified in the samples DataFrame to the given output directory.

        Args:
        - samples: DataFrame containing metadata of audio samples.
        - output_dir: Directory to copy audio files.
        """
    for index, row in samples.iterrows():
        file_path = os.path.join(data_dir, f'fold{row["fold"]}', row["slice_file_name"])
        shutil.copy(file_path, output_dir)


# Copy audio files for the "dog_bark" class
copy_audio_files(dog_bark_samples, output_dir_dog_bark)

# Copy audio files for the "siren" class
copy_audio_files(siren_samples, output_dir_siren)

print("Datasets created successfully.")

Datasets created successfully.
