Some sounds are distinct and instantly recognizable, like a baby’s laugh or the strum of a guitar.

Other sounds aren’t clear and are difficult to pinpoint. If you close your eyes, can you tell which of the sounds below is a chainsaw versus a blender?

Moreover, we often experience a mix of sounds that create an ambience – like the clamoring of construction, a hum of traffic from outside the door, blended with loud laughter from the room, and the ticking of the clock on your wall. The sound clip below is of a busy food court in the UK.

Partly because of the vastness of sounds we experience, no reliable automatic general-purpose audio tagging systems exist. Currently, a lot of manual effort is required for tasks like annotating sound collections and providing captions for non-speech events in audiovisual content.

To tackle this problem, Freesound (an initiative by MTG-UPF that maintains a collaborative database with over 370,000 Creative Commons Licensed sounds) and Google Research’s Machine Perception Team (creators of AudioSet, a large-scale dataset of manually annotated audio events with over 500 classes) have teamed up to develop the dataset for this competition.

You’re challenged to build a general-purpose automatic audio tagging system using a dataset of audio files covering a wide range of real-world environments. Sounds in the dataset include things like musical instruments, human sounds, domestic sounds, and animals from Freesound’s library, annotated using a vocabulary of more than 40 labels from Google’s AudioSet ontology. To succeed in this competition your systems will need to be able to recognize an increased number of sound events of very diverse nature, and to leverage subsets of training data featuring annotations of varying reliability (see Data section for more information).

In [7]:
# set path

import sys
sys.path.insert(0,'../src')

Import necessary packages

In [8]:
from dotenv import load_dotenv
import os
import pandas as pd
from information import Information
from pre_processing import PreProcessing
from prepare_data import PrepareData
from sound_oop import SoundObjectOriented
from utils.sound_features import get_mfcc_features_2

Load environment settings

In [9]:
# Load envs

ENV = os.getenv("ENV")
TRAIN_PATH = os.getenv("TRAIN_PATH")
TEST_PATH = os.getenv("TRAIN_PATH")

Load data

In [10]:
train = pd.read_csv("../data/train.csv")
test = pd.read_csv("../data/test_post_competition.csv")

Extract labels

In [11]:
LABELS = list(train.label.unique())
label_idx = {label: i for i, label in enumerate(LABELS)}
train.set_index("fname", inplace=True)
test.set_index("fname", inplace=True)
train["label_idx"] = train.label.apply(lambda x: label_idx[x])

Extract MFCC for noth train/test audio files

In [13]:
prepare_data = PrepareData()
train_extracted = prepare_data.extract_features(
    "../data/train", "train", loadPreComputed=False
)
test_extracted = prepare_data.extract_features(
    "../data/test", "test", loadPreComputed=False
)


pre-processing object is created



OSError: Cannot save file into a non-existent directory: 'data'

Extract cooresponding labels

In [14]:
y_train = train.loc[train_extracted.index.to_numpy()]

NameError: name 'train_extracted' is not defined

Create the main Sound Classifier Object and train

sound_oop = SoundObjectOriented()
sound_oop.add_data(train_extracted, test_extracted, y_train, index_name="fname")
# sound_oop.information()
sound_oop.pre_processing()
# sound_oop.information()

ML = sound_oop.ml(sound_oop)
ML.show_available_algorithms()
ML.init_regressors("all")
ML.train_test_validation()