This is the companion repository of the Saraga collections. The repository contains a dump of the data collections (except for the audio and pitch files, which can be downloaded using the provided scripts) and has further detailed documentation on the format and organization of the data, access to the data, Python notebooks and code snippets illustrating different ways to access the data, and ways for community to contribute to the collections. The repository also hosts the companion website for the Saraga collections.
virtualenv -p python3 env
source env/bin/activate
pip install -r requirements.txt
You can register and get an API token from : https://dunya.compmusic.upf.edu/
That's it, scripts are ready to use.
The following notebooks are available to interact with the collections. The primary purpose of the notebooks is to provide examples to download different data and metadata available in the collections. The notebooks also provide some basic illustrative examples to interact with the data collections for analysis.
- dataset_statistics.ipynb : to get the statistics of metadata fields and files in the collections
- concept_statistics.ipynb : to get the number of recordings (MBIDs) corresponding to each metadata field
- download_by_filtering.ipynb : to filter the dataset using metadata fields and associated file types, and subsequently download the filtered dataset
- download_in_bulk.ipynb : to download the complete dataset at once
- MBID : MusicBrainz identifier for the recording
- slug : An identifier for a concept (file or metadata)
It is a machine readable identifier to specify a tradition name that you are analysing.
These are the possible values
Tradition | slug |
---|---|
Hindustani | dunya-hindustani-cc |
Carnatic | dunya-carnatic-cc |
It is an identifier for the type of file that we want to be processing
These are the possible values
Name | slug | type of file |
---|---|---|
audio recording | mp3 | audio |
pitch | pitch | annotation |
tonic | ctonic | annotation |
sama | sama-manual | annotation |
bpm | bpm-manual | annotation |
tempo | tempo-manual | annotation |
sections | sections-manual-p | annotation |
melodic phrases | mphrases-manual | annotation |
vocal recording (multitrack) | multitrack-vocal | audio |
vocal second channel recording (multitrack) | multitrack-vocal-s | audio |
violin recording (multitrack) | multitrack-violin | audio |
ghatam recording (multitrack) | multitrack-ghatam | audio |
mridangam_left recording (multitrack) | multitrack-mridangam-left | audio |
mridangam_right recording (multitrack) | multitrack-mridangam-right | audio |
It is an identifier for the metadata (e.g. release, raga, tala etc)
list the slugs for different metadata types in Hindustani and Carnatic tradition:
Metadata | Hindustani (slug) | Carnatic (slug) |
---|---|---|
Rāga | raags | raaga |
Tāla | taals | taala |
Form | forms | form |
Laya | layas | NA |
Work | works | work |
Release | release | concert |
Album artist | album_artists | album_artists |