Tools and tutorials for the OpenMIC-2018 dataset.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
examples
openmic
scripts
tests
.gitignore
.travis.yml
.travis_dependencies.sh
LICENSE
README.md
acknowledgement.md
class-map.json
setup.py

README.md

openmic-2018

Tools and tutorials for the OpenMIC-2018 dataset.

Build Status

Coverage Status

Overview

This repository contains companion source code for working with the OpenMIC-2018 dataset, a collection of audio and crowd-sourced instrument labels produced in a collaboration between Spotify and New York Universiy's MARL and Center for Data Science. The cost of annotation was sponsored by Spotify, whose contributions to open-source research can be found online at the developer site, engineering blog, and public GitHub.

If you use this dataset, please cite the following work:

Humphrey, Eric J., Durand, Simon, and McFee, Brian. "OpenMIC-2018: An Open Dataset for Multiple Instrument Recognition." in Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), 2018. pdf

Download the Dataset

The OpenMIC-2018 dataset is made available on Zenodo. After downloading, decompress with your favorite commandline tar utility:

$ tar xvzf openmic-2018-v1.0.0.tgz -C some/dir

This will expand into some/dir/openmic-2018, with the following structure:

openmic-2018/
  acknowledgement.md
  audio/
    000/
      000046_3840.ogg
      ..
    ..
  checksums
  class-map.json
  license-cc-by.txt
  openmic-2018-aggregated-labels.csv
  openmic-2018-individual-responses.csv
  openmic-2018-metadata.csv
  openmic-2018.npz
  partitions/
    train01.txt
    test01.txt
  vggish/
    000/
      000046_3840.json
      ..
    ..

The openmic-2018.npz is a Python-friendly composite of the vggish features and the openmic-2018-aggregated-labels.csv. An example of how to train and evaluate a model is provided in a tutorial notebook.

Installing

To use the provided openmic Python library, first clone the repository and change directory into it:

$ git clone https://github.com/cosmir/openmic-2018.git
$ cd ./openmic-2018

Next, you'll want to pull down the VGGish model parameters via the following script.

$ ./scripts/download-deps.sh

Finally, you can now install the Python library, e.g. with pip:

$ pip install .

Errata

When initially collecting data, ten audio files were corrupted due to an issue in the source FMA dataset:

'071826', '071827', '087435', '095253', '095259',
'095263', '102144', '113025', '113604', '138485'

Of the 41k responses obtained, only three resulted in erroneous labels by annotators. The following rows have been manually corrected:

Sample Key Instrument True Label
095253_134400 piano yes
095263_96000 mallet percussion yes
113025_99840 trumpet yes