In [1]:
import os
import data

import numpy as np
import librosa as lib
import IPython.display as ipd

# Dataset Creation Example
This notebook is used to give examples of using the dataset creation tools included in this repo. The LSTM model requires input and output data to be a few minutes of audio for training. The IDMT-SFT-Audio-Effects used has the effects as single notes that last a few seconds. The dataset creation tools combine these notes to create longer audio to meet the requirements of the LSTM. The functions create both clean and audio with effects at the same time to ensure they contain the same notes and durations.

## Loading Data with Librosa
To start, we look at some methods Librosa has to make training the model easier.

In [2]:
# Edit to change folder paths
mono_sample_path = os.path.join("dataset", "monophonic", "Samples")

# Loading sound at its sampling rate
audio, srate = lib.load(os.path.join(mono_sample_path, "Distortion", "G61-41101-4412-38066.wav"))
print("Original Sample Length: %d" % audio.shape[0])
print("Original Sample Rate: %d" % srate)
# Uncomment to hear
# ipd.display(ipd.Audio(audio, rate=srate))

# Loading sound at a lower sampling rate
# Lower sampling rate means less points of data, making training faster at the expense of quality
ds_audio, ds_srate = lib.load(os.path.join(mono_sample_path, "Distortion", "G61-41101-4412-38066.wav"), sr=4000)
print("\nDownsampled Sample Length: %d" % ds_audio.shape[0])
print("Downsampled Sample Rate: %d" % ds_srate)
# Uncomment to hear
# ipd.display(ipd.Audio(ds_audio, rate=ds_srate))

# Mu's law quantizes the output range
# This greatly decreases the range of values the model has to predict down to only 256 values
# Once the prediction is made, the values can be decompressed which will have some info loss
comp_ds_audio = lib.mu_compress(ds_audio, mu=255)
decomp_ds_audio = lib.mu_expand(comp_ds_audio, mu=255)
print("\nOriginal: %f" % max(ds_audio))
print("Mu Compressed: %d" % max(comp_ds_audio))
print("Mu Expanded: %f" % max(decomp_ds_audio))

Original Sample Length: 44101
Original Sample Rate: 22050

Downsampled Sample Length: 8001
Downsampled Sample Rate: 4000

Original: 0.203066
Mu Compressed: 91
Mu Expanded: 0.198179


## Basic Dataset Creation
The dataset creation tool has a few options to make the training audio different. These choices include a simple scale of the available notes, a random tune made from all notes, or a random tune made from a pentatonic scale.

In [None]:
clean_audio, effect_audio = data.create_data("Distortion", "metadata,csv", mu_comp=False)

print("Clean Audio:")
ipd.display(ipd.Audio(clean_audio, rate=22050))
print("\nDistortion Audio:")
ipd.display(ipd.Audio(effect_audio, rate=22050))

The tool also allows specifying whether to use mu's law or not. By default it is turned on, as seen above it can be turned off. This allows for easy experimentation to see what performs better. The sampling rate can also be specified by passing an srate parameter. By default it uses the datasets rate of 22050 Hz. The duration can also be specified in seconds, with a default of 120s (i.e. 2 minutes).

Below are examples of the other data creation types:

In [None]:
clean_audio, effect_audio = data.create_data("Distortion", "metadata,csv", mu_comp=False, type="random")

print("Clean Random Audio:")
ipd.display(ipd.Audio(clean_audio, rate=22050))
print("\nDistortion Random Audio:")
ipd.display(ipd.Audio(effect_audio, rate=22050))