<a href="https://colab.research.google.com/github/chaitanya9948/SuperResolution/blob/master/3_Sampling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Music Unmixing using MUSDB
## 3. The Art of Audio Track Sampling

![](https://sisec18.unmix.app/static/img/hero_header.4f28952.svg)

Even though audio is often processed as matrices using spectrogram images, they still remain to be time series data of variable length. 
When we consider the scenario of music tracks where we have _few tracks_ of several minutes duration, we typically want to train a DNN on __smaller excerpts__ instead of using the whole track because...

1. non-dynamic models such as CNNs can be used without masking 
2. sequence models such as LSTMs are often not capable of using long context of several minutes
3. the number of samples are increased, which can be seen as a kind of _data augmentation_ technique.

So lets assume we have $n$ tracks and we want to yields $k$ number excerpts (with our without overlap) from these tracks. In this notebook we want to address the various use cases and show how they can be implemented using the [pescador]() package.


### Installs

In [0]:
import numpy as np
import random
!pip install --upgrade pescador

Requirement already up-to-date: pescador in /usr/local/lib/python3.6/dist-packages (2.0.1)


### Initialization and Parameters

We first define 6 audio tracks here as with distinguishable content. To make things easier to assess, instead of audio, we use text data here.


In [0]:
tracks = [
    'lalalalala', 'lololololo', 'lilililili', 'lululululu', 
    'lelelelele', 'lülülülülü', 'lälälälälä', 'lölölölölö'
]

Then we define the 3 parameters that are important to go through the tracks.

In [0]:
#@title Set Sampling Parameters
excerpt_length = 6 #@param {type:"slider", min:1, max:10, step:1}
excerpt_hop = 3 #@param {type:"slider", min:1, max:10, step:1}
batch_size = 4 #@param {type:"slider", min:1, max:10, step:1}


# Training

Now, we have a few methods to generate these samples for one training epoch.

## Näive Random Sampling

![](https://sigsep.github.io/ismir2018_tutorial/assets/twsw.gif  =400x350)

One of the simpliest way is to sample from the tracks randomly and select batchsize samples for each batch and repeat the procedure (without replacement) for a fixed number of iterations in each epoch.

We call this sampling __tracks with replacement__ and sampling __excerpts/samples with replacement__.

In [0]:
# we need to fix the number of batches we want to draw as the generator
nb_random_batches = 4
# iterate over data
for epoch in range(3):
    print("epoch", epoch + 1)
    
    for k in range(nb_random_batches):
        # select track and generate eone sample
        batch = []
        for sample in range(batch_size):
           track = random.choice(tracks)
           start = random.randint(0, len(track) - excerpt_length)
           batch.append(track[start:start+excerpt_length])
        print(batch)


epoch 1
['lululu', 'lololo', 'lololo', 'lälälä']
['lälälä', 'lilili', 'lululu', 'lalala']
['ülülül', 'alalal', 'ülülül', 'lalala']
['elelel', 'äläläl', 'lololo', 'elelel']
epoch 2
['lülülü', 'ölölöl', 'ululul', 'lälälä']
['ölölöl', 'lilili', 'lololo', 'lalala']
['alalal', 'lelele', 'äläläl', 'lululu']
['lelele', 'ililil', 'alalal', 'lololo']
epoch 3
['lölölö', 'lololo', 'lelele', 'lälälä']
['lelele', 'ölölöl', 'lololo', 'lilili']
['ililil', 'lululu', 'lilili', 'alalal']
['lilili', 'elelel', 'lululu', 'ululul']


### Pro

*  Simple to implement
*  Scales to large amount of data since it doesn't require knowing the track length

### Cons

* it is likely that we are not going to see all data in one epoch
* Possible Redundancies within batch
* Research showed that that with-replacement sampling performs worse than 
without-replacement sampling, but this is only [valid for non-convex problems]([indications](https://arxiv.org/pdf/1202.4184v1.pdf).


## Advanced Sampling using Pescador

With pescador we can easily make sure that all samples are seen once without writing a lot of code that handles all the mess with lists of seen indices.
We will first create a simple generator that yields __all__ excerpts from one given track and use pescador to mux these track generators and then create batches.

In [0]:
# yield excerpts from tracks
def excerpt_gen(track):
    for i in range(0, len(track) - excerpt_length, excerpt_hop):
        yield dict(Input=track[i:i+excerpt_length])

Lets check this one-track generator and if it really yield all samples

In [0]:
gen = excerpt_gen(tracks[0])
for i in gen:
    print(i)

{'Input': 'lalala'}
{'Input': 'alalal'}


### Yielding samples without replacement 

Now we create a stochastic muxing of generators from multiple tracks that randomly yields samples __without replacements__, but that makes sure that we see very excerpt only once.

![](https://sigsep.github.io/ismir2018_tutorial/assets/twswo.gif =400x350)

In [0]:
import pescador

# set up track streamers
streams = [pescador.Streamer(excerpt_gen, track) for track in tracks]

# randomly sample from track streamers
# set n_active to len(streams), however
# you might consider a lower value to decrease RAM
mux = pescador.StochasticMux(
    streams,
    n_active=len(streams),
    rate=None,
    mode='exhaustive',
)

# sample in batches of size `batch_size`
batches = pescador.buffer_stream(mux, batch_size)
batches = pescador.Streamer(pescador.buffer_stream, mux, batch_size)

# iterate over data
for epoch in range(3):
    print("epoch", epoch + 1)
    for k, batch in enumerate(batches):
       print(batch['Input'])

epoch 1
['lululu' 'lalala' 'lololo' 'lelele']
['lälälä' 'lilili' 'elelel' 'lülülü']
['ululul' 'lölölö' 'ililil' 'ülülül']
['ölölöl' 'alalal' 'äläläl' 'ololol']
epoch 2
['lölölö' 'lalala' 'lululu' 'lelele']
['elelel' 'lülülü' 'ülülül' 'ölölöl']
['ululul' 'lälälä' 'lilili' 'äläläl']
['lololo' 'alalal' 'ololol' 'ililil']
epoch 3
['lölölö' 'lololo' 'lilili' 'lelele']
['ölölöl' 'ololol' 'lululu' 'lülülü']
['elelel' 'ululul' 'lalala' 'ililil']
['ülülül' 'lälälä' 'alalal' 'äläläl']


### Pro

* Mission achived, we saw all excerpts only once
* pescador make it very easy to implement

### Cons

* consumes significant amount of memory when all track loaders are active at the same time 
* still redundancies within batch

### Yielding tracks and samples without replacement

![](https://sigsep.github.io/ismir2018_tutorial/assets/twoswo.gif =400x350)

Now we use the [RoundRobin](https://pescador.readthedocs.io/en/stable/generated/pescador.mux.RoundRobinMux.html) mux that will guarantee at to have unique tracks per batch.

In [0]:
# randomly sample from streamers
mux = pescador.RoundRobinMux(
  streams,
  mode='exhaustive',
)
batches = pescador.buffer_stream(mux, batch_size)
batches = pescador.Streamer(pescador.buffer_stream, mux, batch_size)

# iterate over data
for epoch in range(3):
    print("epoch", epoch + 1)
    for k, batch in enumerate(batches):
       print(batch['Input'])
        

epoch 1
['lalala' 'lololo' 'lilili' 'lululu']
['lelele' 'lülülü' 'lälälä' 'lölölö']
['alalal' 'ololol' 'ililil' 'ululul']
['elelel' 'ülülül' 'äläläl' 'ölölöl']
epoch 2
['lalala' 'lololo' 'lilili' 'lululu']
['lelele' 'lülülü' 'lälälä' 'lölölö']
['alalal' 'ololol' 'ililil' 'ululul']
['elelel' 'ülülül' 'äläläl' 'ölölöl']
epoch 3
['lalala' 'lololo' 'lilili' 'lululu']
['lelele' 'lülülü' 'lälälä' 'lölölö']
['alalal' 'ololol' 'ililil' 'ululul']
['elelel' 'ülülül' 'äläläl' 'ölölöl']


### Pro

* Unique tracks in batch
* All samples per epoch

### Con

* we get interactions between batches due to the order of the streamers within the mux
(See in the epoch1: `lala...` is always followed by `lolo...`)

# Validation/Test

At inference we actually want fully deterministic behavior: every time we draw a sample in an epoch it should be exaclty the same for reproducibility purposes.
We can achive this by using the [ChainMux] class.

In [0]:
# randomly sample from streamers
mux = pescador.ChainMux(streams, mode='exhaustive')
batches = pescador.buffer_stream(mux, batch_size)
batches = pescador.Streamer(pescador.buffer_stream, mux, batch_size)

# iterate over data
for epoch in range(3):
    print("epoch", epoch + 1)
    for k, batch in enumerate(batches):
       print(batch['Input'])


epoch 1
['lalala' 'alalal' 'lololo' 'ololol']
['lilili' 'ililil' 'lululu' 'ululul']
['lelele' 'elelel' 'lülülü' 'ülülül']
['lälälä' 'äläläl' 'lölölö' 'ölölöl']
epoch 2
['lalala' 'alalal' 'lololo' 'ololol']
['lilili' 'ililil' 'lululu' 'ululul']
['lelele' 'elelel' 'lülülü' 'ülülül']
['lälälä' 'äläläl' 'lölölö' 'ölölöl']
epoch 3
['lalala' 'alalal' 'lololo' 'ololol']
['lilili' 'ililil' 'lululu' 'ululul']
['lelele' 'elelel' 'lülülü' 'ülülül']
['lälälä' 'äläläl' 'lölölö' 'ölölöl']
