# MedleyDB dataset
This project uses the [MedleyDB dataset](https://medleydb.weebly.com/) V1 (and V2). To get access to the dataset, you need to ask for permission through Zenodo. \
This notebook is created to explore the MedleyDB dataset. There is an overview of the dataset here: https://medleydb.weebly.com/description.html \
Fortunately, the authors provides a sample of the dataset here: https://zenodo.org/record/1438309#.Y9dniuxBy3I. The sample set contains 2 songs from V1:
1. LizNelson_Rainfall
2. Phoenix_ScotchMorris

## Setups
To correctly setup the medleydb library, the `medleydb/medleydb/data/` folder need to have the structure below:
```
medleydb #sub-module
├── medleydb
│   ├── data
│   │   ├── Annotations
│   │   ├── Audio
│   │   └── Metadata
...
```
Simply rename `V1` or `V2` folder after extracting to `Audio` and put it inside the `data/` folder above. \
For the MedleyDB Sample (2 songs) simply copy-paste `Audio` folder into `data/`.

For accessibility, we will try to use only the 2 songs in the sample set for making example.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import numpy as np
import pandas as pd
import os
import random

If import medleydb return `TypeError: load() missing 1 required positional argument: 'Loader'`, you need to downgrade pyyaml:
```
pyyaml==5.4.1
```

In [None]:
# path to a forked medleydb repo https://github.com/duotien/medleydb
medleydb_repo_path = f"{os.getcwd()}/../medleydb/"
medleydb_data_path = os.path.join(medleydb_repo_path, 'medleydb/data/')
print(medleydb_data_path)
os.environ['MEDLEYDB_PATH'] = medleydb_data_path
import medleydb
from pitch_tracker.utils.medleydb_melody import gen_label

# Generate annotations
medleyDB also has tools for customizing the annotations. However you need to have the audio for it to work.

In [None]:
HOP = 512*5
output_dir = f'../content/gen_label/{HOP}'
mtracks = medleydb.load_all_multitracks()

for mtrack in mtracks:
    gen_label(mtrack.track_id, output_dir, hop=HOP, overwrite=True, convert_to_midi=True, round_method='round', to_csv=True)
    # gen_label(mtrack.track_id, output_dir, hop=HOP, overwrite=True, convert_to_midi=True, round_method='round', to_csv=False)

In [None]:
melody_1_path, melody_2_path, *_ = [os.path.join(output_dir, folder_path) for folder_path in os.listdir(output_dir)]

In [None]:
melody_2_annotation_paths = []
for root, folders, files in os.walk(melody_2_path):
    melody_2_annotation_paths.extend([os.path.join(root, fpath) for fpath in files])

In [None]:
sample_annotation_path = random.choice(melody_2_annotation_paths)
print(sample_annotation_path)
df = pd.read_csv(sample_annotation_path, names=['start', 'end', 'midi_value'])
df.head(5)

# Visualise
We recommend using Sonic Visualiser to visualize the annotations with audios.

In [None]:
# code to visualize using matplotlib
from matplotlib import pyplot as plt