* **Author**: Andrea Ziqing Gallardo Bendito

* **Project**: Bachelor Thesis - *Separación de fuentes musicales en conjuntos de cámara de música clásica*

* **GitHub Repo**: [MusicSourceSep](https://github.com/andrezg98/MusicSourceSep)

In this Notebook we are going to generate training data for music source separation using the [Scaper python library](https://github.com/justinsalamon/scaper) based on this [tutorial](https://source-separation.github.io/tutorial/data/scaper.html#generating-data).



---



## **Project repository download and Library installations**

In [1]:
!pip install scaper -q
!pip install nussl -q
!pip install git+https://github.com/source-separation/tutorial -q

Collecting scaper
  Downloading scaper-1.6.5-py2.py3-none-any.whl (33 kB)
Collecting jams>=0.3.2
  Downloading jams-0.3.4.tar.gz (51 kB)
[?25l[K     |██████▍                         | 10 kB 35.5 MB/s eta 0:00:01[K     |████████████▉                   | 20 kB 40.4 MB/s eta 0:00:01[K     |███████████████████▏            | 30 kB 45.3 MB/s eta 0:00:01[K     |█████████████████████████▋      | 40 kB 48.0 MB/s eta 0:00:01[K     |████████████████████████████████| 51 kB 51.2 MB/s eta 0:00:01[K     |████████████████████████████████| 51 kB 70 kB/s 
Collecting pyloudnorm
  Downloading pyloudnorm-0.1.0-py3-none-any.whl (9.3 kB)
Collecting sox==1.4.0
  Downloading sox-1.4.0-py2.py3-none-any.whl (39 kB)
Collecting soxbindings>=1.2.2
  Downloading soxbindings-1.2.3-cp37-cp37m-manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 36.3 MB/s 
Collecting jsonschema>=3.0.0
  Downloading jsonschema-3.2.0-py2.py3-none-any.whl (56 kB)
[K     |███████████████████████

### **Imports**

In [2]:
# To keep things clean we'll hide all warnings
import warnings
warnings.filterwarnings('ignore')

In [3]:
# Imports
import numpy as np
import os
from scipy.io.wavfile import write
from IPython.display import Audio, display
from pathlib import Path
import shutil

# Scaper
import scaper

# Nussl
import nussl
from common import viz

SoX could not be found!

    If you do not have SoX, proceed here:
     - - - http://sox.sourceforge.net/ - - -

    If you do (or think that you should) have SoX, double-check your
    path variables.
    


In [4]:
# To be able to write in bold format in the code
from IPython.display import Markdown, display
def printmd(string):
    display(Markdown(string))

### **Project repository download**

In [None]:
from getpass import getpass

# IMPORTANT: ¡Change when publishing the repository!

user = 'andrezg98'
password = getpass('GitHub repo password')
u = user; p = password; 
!git clone https://$u:$p@github.com/$u/MusicSourceSep.git
%cd MusicSourceSep/lib
!ls

del p, password

GitHub repo password··········
Cloning into 'MusicSourceSep'...
remote: Enumerating objects: 68, done.[K
remote: Counting objects: 100% (68/68), done.[K
remote: Compressing objects: 100% (57/57), done.[K
remote: Total 68 (delta 19), reused 21 (delta 4), pack-reused 0[K
Unpacking objects: 100% (68/68), done.
/content/drive/MyDrive/ING. SONIDO E IMAGEN/TFG/Preview Tests/Datasets/URMP-Dataset/MusicSourceSep/lib
feature_computation.py	urmp_dataset.py




---



## **Dataset Downloads**

* ***Bach10 Dataset***
  
    Accessing the dataset stored in my Google Drive:

In [5]:
%cd /content/drive/MyDrive/ING. SONIDO E IMAGEN/TFG/Datasets

/content/drive/MyDrive/ING. SONIDO E IMAGEN/TFG/Datasets


> **Note**: To download the dataset for the first time, you can access [here](https://https://docs.google.com/forms/d/e/1FAIpQLSfJ1IdB7Ws2_m0wkkvS1hGm5GevGS3QmqBIoxiGDbw93yoPLQ/viewform?embedded=true&formkey=dGU3cmRlb1Q4RU5zTGNZeHUyRGFwaWc6MQ). The authors ask you to fill in a short form in order to keep track of the use of the dataset. Once downloaded, you can easily unzip and access it as follows. The dataset will be saved in the path you have specified.

In [None]:
# !unzip bach10_dataset_compressed.zip



---



##  **Prepare the source material for Scaper**
- ***Foreground files***: bassoon, clarinet, saxphone and violin.

- ***Background files***: empty.

In [6]:
# Declare the variables
AuSep = {'bassoon': list(), 'clarinet': list(), 
         'saxphone': list(), 'violin': list()}    # Separate audio list of each instrument
AuMix = list()                                    # Mix audio list
pieces = list()                                   # Name of the pieces
count = 0

# We go through the files in the folders and save them in each list
for dirpath, dirnames, filenames in os.walk("Bach10"):
    if count == 0:
        pieces = dirnames
    if count > 0:
        for filename in filenames:
            filename = pieces[count-1] + "/" + filename
            if filename.endswith('.wav'):
                if len(filename.split('-')) == 3:
                    AuMix.append(filename)
                    AuMix.sort(reverse=False)
                    fname = 'Bach10/' + filename
                else:
                    if filename.split('-')[3] == 'bassoon.wav':
                        AuSep['bassoon'].append(filename)
                        AuSep['bassoon'].sort(reverse=False) 
                    elif filename.split('-')[3] == 'clarinet.wav':
                        AuSep['clarinet'].append(filename)
                        AuSep['clarinet'].sort(reverse=False)
                    elif filename.split('-')[3] == 'saxphone.wav':
                        AuSep['saxphone'].append(filename)
                        AuSep['saxphone'].sort(reverse=False)
                    elif filename.split('-')[3] == 'violin.wav':
                        AuSep['violin'].append(filename)
                        AuSep['violin'].sort(reverse=False)
                    else:
                        pass
            else:
                pass
    count += 1
    pieces.sort(reverse=False)

printmd("**Name of the pieces:**")
print(pieces)
printmd("**Separate Audio:**")
print(AuSep)
printmd("**Mix Audio:**")
print(AuMix)

**Name of the pieces:**

['01-AchGottundHerr', '02-AchLiebenChristen', '03-ChristederdubistTagundLicht', '04-ChristeDuBeistand', '05-DieNacht', '06-DieSonne', '07-HerrGott', '08-FuerDeinenThron', '09-Jesus', '10-NunBitten', 'Code']


**Separate Audio:**

{'bassoon': ['01-AchGottundHerr/01-AchGottundHerr-bassoon.wav', '02-AchLiebenChristen/02-AchLiebenChristen-bassoon.wav', '03-ChristederdubistTagundLicht/03-ChristederdubistTagundLicht-bassoon.wav', '04-ChristeDuBeistand/04-ChristeDuBeistand-bassoon.wav', '05-DieNacht/05-DieNacht-bassoon.wav', '06-DieSonne/06-DieSonne-bassoon.wav', '07-HerrGott/07-HerrGott-bassoon.wav', '08-FuerDeinenThron/08-FuerDeinenThron-bassoon.wav', '09-Jesus/09-Jesus-bassoon.wav', '10-NunBitten/10-NunBitten-bassoon.wav'], 'clarinet': ['01-AchGottundHerr/01-AchGottundHerr-clarinet.wav', '02-AchLiebenChristen/02-AchLiebenChristen-clarinet.wav', '03-ChristederdubistTagundLicht/03-ChristederdubistTagundLicht-clarinet.wav', '04-ChristeDuBeistand/04-ChristeDuBeistand-clarinet.wav', '05-DieNacht/05-DieNacht-clarinet.wav', '06-DieSonne/06-DieSonne-clarinet.wav', '07-HerrGott/07-HerrGott-clarinet.wav', '08-FuerDeinenThron/08-FuerDeinenThron-clarinet.wav', '09-Jesus/09-Jesus-clarinet.wav', '10-NunBitten/10-NunBitten-clarin

**Mix Audio:**

['01-AchGottundHerr/01-AchGottundHerr.wav', '02-AchLiebenChristen/02-AchLiebenChristen.wav', '03-ChristederdubistTagundLicht/03-ChristederdubistTagundLicht.wav', '04-ChristeDuBeistand/04-ChristeDuBeistand.wav', '05-DieNacht/05-DieNacht.wav', '06-DieSonne/06-DieSonne.wav', '07-HerrGott/07-HerrGott.wav', '08-FuerDeinenThron/08-FuerDeinenThron.wav', '09-Jesus/09-Jesus.wav', '10-NunBitten/10-NunBitten.wav']


The Bach10_Scaper folder is created in which all new music mixes generated with Scaper will be stored. Also, the foreground and background folders are created.

In [None]:
# !mkdir Bach10_Scaper
# !mkdir Bach10_Scaper/foreground
# !mkdir Bach10_Scaper/background

Add to the foreground folder the separate audios of each instrument.

> **Note**: Prior I also had to create empty folders (one folder for each instrument)

In [10]:
%cd Bach10/

/content/drive/MyDrive/ING. SONIDO E IMAGEN/TFG/Datasets/Bach10


In [12]:
count = 0

for path in AuSep['bassoon']:
    shutil.copy(path, "../Bach10_Scaper/foreground/bassoon/"+ pieces[count] + ".wav")
    count = count + 1

count = 0

for path in AuSep['clarinet']:
    shutil.copy(path, "../Bach10_Scaper/foreground/clarinet/"+ pieces[count] + ".wav")
    count = count + 1

count = 0

for path in AuSep['saxphone']:
    shutil.copy(path, "../Bach10_Scaper/foreground/saxphone/"+ pieces[count] + ".wav")
    count = count + 1

count = 0

for path in AuSep['violin']:
    shutil.copy(path, "../Bach10_Scaper/foreground/violin/"+ pieces[count] + ".wav")
    count = count + 1

Assign the path of the foreground and background folders to each variable.

In [13]:
fg_folder = Path("../Bach10_Scaper/foreground")
bg_folder = Path("../Bach10_Scaper/background")

Let's check the contents of the folders.

In [14]:
for folder in os.listdir(fg_folder):
    if folder[0] != '.':  # to ignore system folders
        stem_files = os.listdir(os.path.join(fg_folder, folder))
        printmd(f"\n**{folder}**\tfolder contains **{len(stem_files)}** audio files:\n")
        for sf in sorted(stem_files)[:5]:
            print(f"\t\t{sf}")
        print("\t\t...")


**bassoon**	folder contains **10** audio files:


		01-AchGottundHerr.wav
		02-AchLiebenChristen.wav
		03-ChristederdubistTagundLicht.wav
		04-ChristeDuBeistand.wav
		05-DieNacht.wav
		...



**clarinet**	folder contains **10** audio files:


		01-AchGottundHerr.wav
		02-AchLiebenChristen.wav
		03-ChristederdubistTagundLicht.wav
		04-ChristeDuBeistand.wav
		05-DieNacht.wav
		...



**saxphone**	folder contains **10** audio files:


		01-AchGottundHerr.wav
		02-AchLiebenChristen.wav
		03-ChristederdubistTagundLicht.wav
		04-ChristeDuBeistand.wav
		05-DieNacht.wav
		...



**violin**	folder contains **10** audio files:


		01-AchGottundHerr.wav
		02-AchLiebenChristen.wav
		03-ChristederdubistTagundLicht.wav
		04-ChristeDuBeistand.wav
		05-DieNacht.wav
		...


> **Note**: The name of each stem audio file matches the name of the song to which it belongs. We will use this to create *“coherent mixtures”*, i.e., music mixtures where all the stems come from the same song and are temporally aligned.





---



## **Coherent Mixing**

Stems in *“coherent mixtures”* come from the same song and are temporally aligned.

> **Note**: We set the stem duration to 25 seconds, as this is the approximate duration of our original audio samples.


In [15]:
# Create a template of probabilistic event parameters
template_event_parameters = {
    'label': ('const', 'bassoon'),           # set the label value explicitly using a constant
    'source_file': ('choose', []),           # choose the source file randomly from all files in the folder
    'source_time': ('uniform', 0, 7),        # sample the source (stem) audio starting at a time between 0-7
    'event_time': ('const', 0),              # always add the stem at time 0 in the mixture
    'event_duration': ('const', 25.0),       # set the stem duration to match the mixture duration
    'snr': ('uniform', -5, 5),               # choose an SNR for the stem uniformly between -5 and 5 dB
    'pitch_shift': ('uniform', -2, 2),       # apply a random pitch shift between -2 and 2 semitones
    'time_stretch': ('uniform', 0.8, 1.2)    # apply a random time stretch between 0.8 (faster) and 1.2 (slower)
}

In [16]:
# Define a function that returns coherent mixture.
def coherent(fg_folder, bg_folder, event_template, seed):
    """
    This function takes the paths to the dataset folders and a random seed,
    and returns a COHERENT mixture (audio + annotations).
    
    Parameters
    ----------
    fg_folder : str
        Path to the foreground source material for Bach10
    bg_folder : str
        Path to the background material for Bach10 (empty folder)
    event_template: dict
        Dictionary containing a template of probabilistic event parameters
    seed : int or np.random.RandomState()
        Seed for setting the Scaper object's random state. Different seeds will 
        generate different mixtures for the same source material and event template.
        
    Returns
    -------
    mixture_audio : np.ndarray
        Audio signal for the mixture
    mixture_jams : np.ndarray
        JAMS annotation for the mixture
    annotation_list : list
        Simple annotation in list format
    stem_audio_list : list
        List containing the audio signals of the stems that comprise the mixture
    """
        
    # Create scaper object and seed random state
    sc = scaper.Scaper(
        duration=template_event_parameters["event_duration"][1],
        fg_path=str(fg_folder),
        bg_path=str(bg_folder),
        random_state=seed
    )
    
    # Set sample rate, reference dB, and channels (mono)
    sc.sr = 44100
    sc.ref_db = -20
    sc.n_channels = 1
    
    # Copy the template so we can change it
    event_parameters = event_template.copy()    
    
    # Instatiate the template once to randomly choose a song, a start time for the sources, 
    # a pitch shift and a time stretch. These values must remain COHERENT across all stems
    sc.add_event(**event_parameters)
    event = sc._instantiate_event(sc.fg_spec[0])
    
    # Reset the Scaper object's the event specification
    sc.reset_fg_event_spec()
    
    # Replace the distributions for source time, pitch shift and time stretch with the constant 
    # values we just sampled, to ensure our added events (stems) are coherent.              
    event_parameters['source_time'] = ('const', event.source_time)
    event_parameters['pitch_shift'] = ('const', event.pitch_shift)
    event_parameters['time_stretch'] = ('const', event.time_stretch)

    # Iterate over the four stems (bassoon, clarinet, saxphone, violin) and 
    # add COHERENT events.                                         
    labels = ['bassoon', 'clarinet', 'saxphone', 'violin']
    for label in labels:
        
        # Set the label to the stem we are adding
        event_parameters['label'] = ('const', label)
        
        # To ensure coherent source files (all from the same song), we leverage the fact that all the stems 
        # from the same song have the same filename. All we have to do is replace the stem file's parent 
        # folder name from "bassoon" to the label we are adding in this iteration of the loop, 
        # which will give the correct path to the stem source file for this current label.
        coherent_source_file = event.source_file.replace('bassoon', label)

        # print(coherent_source_file)
        event_parameters['source_file'] = ('const', coherent_source_file)
        
        # Add the event using the modified, COHERENT, event parameters
        sc.add_event(**event_parameters)
    
    # Generate and return the mixture audio, stem audio, and annotations
    return sc.generate(fix_clipping=True)

First double check our paths and template are correct.

In [18]:
printmd("**Foreground Folder Path:**")
print(str(fg_folder))
printmd("**Background Folder Path:**")
print(str(bg_folder))
print("-------------------------------------")
printmd("**Template Event Parameters:**") 
print(str(template_event_parameters))

**Foreground Folder Path:**

../Bach10_Scaper/foreground


**Background Folder Path:**

../Bach10_Scaper/background
-------------------------------------


**Template Event Parameters:**

{'label': ('const', 'bassoon'), 'source_file': ('choose', []), 'source_time': ('uniform', 0, 7), 'event_time': ('const', 0), 'event_duration': ('const', 25.0), 'snr': ('uniform', -5, 5), 'pitch_shift': ('uniform', -2, 2), 'time_stretch': ('uniform', 0.8, 1.2)}


Let’s generate to some coherent mixtures generated with our code and save the newly generated mixes in a new folder (Bach10_Augmented) to be able to work with the data later.

In [None]:
# Generate 250 coherent mixtures
for seed in range(250):
    mixture_audio, mixture_jam, annotation_list, stem_audio_list = coherent(
        fg_folder, 
        bg_folder, 
        template_event_parameters, 
        seed)
    
    # print("Mixture: ")
    # display(Audio(data=mixture_audio.T, rate=44100))

    # extract the annotation data from the JAMS object
    ann = mixture_jam.annotations.search(namespace='scaper')[0]

    for event in ann:
      source_path = event.value['source_file'].split("/")[-1]    
      break
    # print(source_path)
    
    mix_source = source_path.split(".")[0] + "_" + str(seed)
    print("Augmenting mix_source: " + mix_source)

    dir = os.path.join("Bach10_Augmented", mix_source)
    if not os.path.exists(dir):
      os.mkdir(dir)

    scaled_mix = np.int16(mixture_audio/np.max(np.abs(mixture_audio)) * 32767)
    write("Bach10_Augmented/" + mix_source + "/" + mix_source + ".wav", 44100, scaled_mix)

    # iterate over the annotation and corresponding stem audio data
    for obs, stem_audio in zip(ann.data, stem_audio_list):
        # print(f"Instrument: {obs.value['label']} at SNR: {obs.value['snr']:.2f}" + str(stem_audio))
        # display(Audio(data=stem_audio.T, rate=44100))

        scaled_sep = np.int16(stem_audio/np.max(np.abs(stem_audio)) * 32767)
        write("Bach10_Augmented/" + mix_source + "/" + mix_source + "-" + obs.value['label'] + ".wav", 44100, scaled_sep)

Augmenting mix_source: 06-DieSonne_0
Augmenting mix_source: 06-DieSonne_1
Augmenting mix_source: 09-Jesus_2
Augmenting mix_source: 09-Jesus_3
Augmenting mix_source: 08-FuerDeinenThron_4
Augmenting mix_source: 04-ChristeDuBeistand_5
Augmenting mix_source: 10-NunBitten_6
Augmenting mix_source: 05-DieNacht_7
Augmenting mix_source: 04-ChristeDuBeistand_8
Augmenting mix_source: 06-DieSonne_9
Augmenting mix_source: 10-NunBitten_10
Augmenting mix_source: 10-NunBitten_11
Augmenting mix_source: 07-HerrGott_12
Augmenting mix_source: 03-ChristederdubistTagundLicht_13
Augmenting mix_source: 09-Jesus_14
Augmenting mix_source: 09-Jesus_15
Augmenting mix_source: 10-NunBitten_16
Augmenting mix_source: 02-AchLiebenChristen_17
Augmenting mix_source: 04-ChristeDuBeistand_18
Augmenting mix_source: 06-DieSonne_19
Augmenting mix_source: 04-ChristeDuBeistand_20
Augmenting mix_source: 10-NunBitten_21
Augmenting mix_source: 06-DieSonne_22
Augmenting mix_source: 04-ChristeDuBeistand_23
Augmenting mix_source: 03

Now that we have generated several mixtures, let's see how we can connect **scaper** with the **nussl** library.



---



## **Plugging Scaper into nussl: generating training data on-the-fly**

In [19]:
def generate_mixture(dataset, fg_folder, bg_folder, event_template, seed):
    
    # hide warnings
    with warnings.catch_warnings():
        warnings.filterwarnings('ignore')
        
        # flip a coint to choose coherent or incoherent mixing
        random_state = np.random.RandomState(seed)
        
        # generate mixture
        data = coherent(fg_folder, bg_folder, event_template, seed)
            
    # unpack the data
    mixture_audio, mixture_jam, annotation_list, stem_audio_list = data
    
    # convert mixture to nussl format
    mix = dataset._load_audio_from_array(
        audio_data=mixture_audio, sample_rate=dataset.sample_rate
    )
    
    # convert stems to nussl format
    sources = {}
    ann = mixture_jam.annotations.search(namespace='scaper')[0]
    for obs, stem_audio in zip(ann.data, stem_audio_list):
        key = obs.value['label']
        sources[key] = dataset._load_audio_from_array(
            audio_data=stem_audio, sample_rate=dataset.sample_rate
        )
    
    # store the mixture, stems and JAMS annotation in the format expected by nussl
    output = {
        'mix': mix,
        'sources': sources,
        'metadata': mixture_jam
    }
    return output

In [20]:
# Convenience class so we don't need to enter the fg_folder, bg_folder, and template each time
class MixClosure:
    
    def __init__(self, fg_folder, bg_folder, event_template):
        self.fg_folder = fg_folder
        self.bg_folder = bg_folder
        self.event_template = event_template
        
    def __call__(self, dataset, seed):
        return generate_mixture(dataset, self.fg_folder, self.bg_folder, self.event_template, seed)
    
# Initialize our mixing function with our specific source material and event template
mix_func = MixClosure(fg_folder, bg_folder, template_event_parameters)

# Create a nussl OnTheFly data generator
on_the_fly = nussl.datasets.OnTheFly(
    num_mixtures=1000,
    mix_closure=mix_func
)

Let’s use our on_the_fly generator to visualize and listen to some generated mixtures.

In [22]:
for i in range(3):
    item = on_the_fly[i]
    mix = item['mix']
    sources = item['sources']
    # bassoon = {'bassoon': item['sources']['bassoon']}
    # clarinet = {'clarinet': item['sources']['clarinet']}
    # violin = {'violin': item['sources']['violin']}
    # saxphone = {'saxphone': item['sources']['saxphone']}

    viz.show_sources(sources)
    # viz.show_sources(bassoon)
    # viz.show_sources(clarinet)
    # viz.show_sources(violin)
    # viz.show_sources(saxphone)

Output hidden; open in https://colab.research.google.com to view.



---

