# Loading FSD50k DataSet

> Contains example code of how to load the FSD50k data

## Installation

This notebook was tested on an Ubuntu machine.

**Dependencies:** `wget`

## Important details

#### Basic characteristics:

* FSD50K contains 51,197 audio clips from Freesound, totalling 108.3 hours of multi-labeled audio
* The dataset encompasses 200 sound classes (144 leaf nodes and 56 intermediate nodes) hierarchically organized with a subset of the AudioSet Ontology.
* The audio content is composed mainly of sound events produced by physical sound sources and production mechanisms, including human sounds, sounds of things, animals, natural sounds, musical instruments and more. The vocabulary can be inspected in vocabulary.csv (see Files section below).
* The acoustic material has been manually labeled by humans following a data labeling process using the Freesound Annotator platform [2]. 
* Clips are of variable length from 0.3 to 30s, due to the diversity of the sound classes and the preferences of Freesound users when recording sounds.
* All clips are provided as uncompressed PCM 16 bit 44.1 kHz mono audio files.
* Ground truth labels are provided at the clip-level (i.e., weak labels).
* The dataset poses mainly a large-vocabulary multi-label sound event classification problem, but also allows development and evaluation of a variety of machine listening approaches (see Sec. 4D in our paper).
* In addition to audio clips and ground truth, additional metadata is made available (including raw annotations, sound predominance ratings, Freesound metadata, and more), allowing a variety of analyses and sound event research tasks (see Files section below).
* The audio clips are grouped into a development (dev) set and an evaluation (eval) set such that they do not have clips from the same Freesound uploader.

#### Dev set:

* 40,966 audio clips totalling 80.4 hours of audio
* Avg duration/clip: 7.1s
* 114,271 smeared labels (i.e., labels propagated in the upwards direction to the root of the ontology)
* Labels are correct but could be occasionally incomplete
* A train/validation split is provided (Sec. 3H). If a different split is used, it should be specified for reproducibility and fair comparability of results (see Sec. 5C of our paper)

#### Eval set:

* 10,231 audio clips totalling 27.9 hours of audio
* Avg duration/clip: 9.8s
* 38,596 smeared labels
* Eval set is labeled exhaustively (labels are correct and complete for the considered vocabulary)

_Note: All classes in FSD50K are represented in AudioSet, except Crash cymbal, Human group actions, Human voice, Respiratory sounds, and Domestic sounds, home sounds._

> For more details on the dataset see https://zenodo.org/record/4060432 

## Downloading the Data

> Run only when necessary

In [8]:
%env DATA_PATH=data/fsd50k

env: DATA_PATH=data/fsd50k


In [20]:
# Metadata
!wget -q -c -O ${DATA_PATH}/metadata.zip https://zenodo.org/record/4060432/files/FSD50K.metadata.zip?download=1
!wget -q -c -O ${DATA_PATH}/ground_truth.zip https://zenodo.org/record/4060432/files/FSD50K.ground_truth.zip?download=1
!wget -q -c -O ${DATA_PATH}/doc.zip https://zenodo.org/record/4060432/files/FSD50K.doc.zip?download=1 

!unzip -q ${DATA_PATH}/metadata.zip -d ${DATA_PATH}/
!unzip -q ${DATA_PATH}/ground_truth.zip -d ${DATA_PATH}/
!unzip -q ${DATA_PATH}/doc.zip -d ${DATA_PATH}/

In [21]:
# Dev Audio
!wget -q -c -O ${DATA_PATH}/dev_audio.z01 https://zenodo.org/record/4060432/files/FSD50K.dev_audio.z01?download=1
!wget -q -c -O ${DATA_PATH}/dev_audio.z02 https://zenodo.org/record/4060432/files/FSD50K.dev_audio.z02?download=1
!wget -q -c -O ${DATA_PATH}/dev_audio.z03 https://zenodo.org/record/4060432/files/FSD50K.dev_audio.z03?download=1
!wget -q -c -O ${DATA_PATH}/dev_audio.z04 https://zenodo.org/record/4060432/files/FSD50K.dev_audio.z04?download=1
!wget -q -c -O ${DATA_PATH}/dev_audio.z05 https://zenodo.org/record/4060432/files/FSD50K.dev_audio.z05?download=1
!wget -q -c -O ${DATA_PATH}/dev_audio.zip https://zenodo.org/record/4060432/files/FSD50K.dev_audio.zip?download=1

!zip -s 0 ${DATA_PATH}/dev_audio.zip --out ${DATA_PATH}/dev_unsplit.zip # merge split files
!unzip -q ${DATA_PATH}/dev_unsplit.zip -d ${DATA_PATH}/

In [None]:
# Eval Audio
!wget -q -c -O ${DATA_PATH}/eval_audio.z01 https://zenodo.org/record/4060432/files/FSD50K.eval_audio.z01?download=1
!wget -q -c -O ${DATA_PATH}/eval_audio.zip https://zenodo.org/record/4060432/files/FSD50K.eval_audio.zip?download=1

!zip -s 0 ${DATA_PATH}/eval_audio.zip --out ${DATA_PATH}/eval_unsplit.zip # merge split files
!unzip -q ${DATA_PATH}/eval_unsplit.zip -d ${DATA_PATH}/

In [None]:
# Clean up download files
!rm -f ${DATA_PATH}/*.zip

## Let's load the data!

First setup the environment.

In [3]:
# Run only when needed!
!pip install numpy pandas

Collecting numpy
  Downloading numpy-1.23.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.0 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.0/17.0 MB[0m [31m80.0 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25hCollecting pandas
  Downloading pandas-1.4.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.6 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.6/11.6 MB[0m [31m92.6 MB/s[0m eta [36m0:00:00[0m[36m0:00:01[0mm eta [36m0:00:01[0m
[?25hCollecting pytz>=2020.1
  Using cached pytz-2022.1-py2.py3-none-any.whl (503 kB)
Installing collected packages: pytz, numpy, pandas
Successfully installed numpy-1.23.0 pandas-1.4.3 pytz-2022.1
