# Tutorial

* [Intro](#Intro)
* [Data structures](#Data-structures)
   * [Segment](#Segment)
   * [Segment container](#Segment-container)
   * [Feature container](#Feature-container)
   * [Dataset split](#Dataset-split)
* [Examples](#Examples)
    * [Create segment containers and set labels](#Create-segment-containers-and-set-labels)
    * [Generate mini-batches with mel-spectra (using a config file)](#Generate-mini-batches-with-mel-spectra-%28from-a-config-file%29)
    * [Generate mini-batches with mel-spectra (going through all steps)](#Generate-mini-batches-with-mel-spectra-%28going-through-all-steps%29)


## Intro

The main purpose of Dynibatch is to ease the creation of mini-batches of audio data + targets to be fed to some machine learning algorithm.

Typical audio datasets consist in a set of audio files and their corresponding annotation files. Every annotation file usually contains a list of annotated segments of different sizes, each described by a start time, an end time and a label. For example:

```
0.092 0.171 hh
0.170 0.240 eh
0.240 0.282 l
0.282 0.550 ow
```

Typical machine learning algorithms take as input a batch of N overlapping segments of equal shapes S and their corresponding N la

Dynibatch is a Python library dedicated to providing mini-batches of audio data to machine learning algorithmsbels.

Dynibatch makes it easy to create those batches of constant size segments from variable size segments. With the annotated segment examples given above, a mini-batch size of 5, a segment size S of 0.05 s and a segment overlap of 50%, Dynibatch would create mini-batches described as below (for a given segment, the label is set to the label in the ground truth having more than 50% overlap with this segment):

**Mini-batch 1**

Data
```
data_for_segment 0.100-0.150
data_for_segment 0.125-0.175
data_for_segment 0.150-0.200 
data_for_segment 0.175-0.225
data_for_segment 0.200-0.250
```

Labels
```
hh
hh
eh
eh
eh
```

**Mini-batch 2**

Data
```
data_for_segment 0.225-0.275
data_for_segment 0.250-0.300
data_for_segment 0.275-0.325
data_for_segment 0.300-0.350
data_for_segment 0.325-0.375
```

Labels
```
l
l
ow
ow
ow
```

...


In addition, Dynibatch can fill the mini-batches with audio features (e.g. mel spectra) instead of raw audio, and reject segments with no activity detected in it.

## Data structures

### Segment

Segments are the base elements to be fed to the learning algorithm: 1 segment = 1 observation.

Audio files are split into overlapping fixed-length segments, stored into *segment containers*.

Every segment, along with its parent segment container, contains all the data needed to feed a mini-batch (label, features, whether it contains activity or not).

### Segment container

A segment container contains the list of segments related to an audio file.

### Feature container

A feature container is also related to an audio file. It contains all the short-term features (e.g. spectral flatness, mel-spectra...), as well as their parameters, computed from this audio file. When saved on disk, features are not directly stored in segments because that would imply a lot of duplicated data (since segments most often overlap). Instead, they are saved as feature container dumps.

### Dataset split

A dataset split describes how a dataset is split into train/validation/test sets. It is basically a dictionary with a key for each set and a list of files as its value.


## Examples

### Create segment containers and set labels

In [1]:
from dynibatch.utils.segment_container import create_segment_containers_from_audio_files
from dynibatch.parsers.label_parsers import CSVFileLabelParser


# Create a segment container generator
sc_gen = create_segment_containers_from_audio_files("../tests/data")

# Instanciate the label parser (file2label.csv contains the pairs file/label)
parser = CSVFileLabelParser("../tests/data/file2label.csv")

# Get all labels
labels = parser.get_labels()

# Now for every segment container, get and show the label
for sc in sc_gen:
    sc.labels = parser.get_label(sc.audio_path)
    print("Label for file {0}: {1}".format(sc.audio_path, [labels[i] for i in sc.labels]))

Label for file dataset1/ID0132.wav: ['bird_c']
Label for file dataset1/ID0133.wav: ['bird_c']
Label for file dataset2/ID1238.wav: ['bird_d']
Label for file dataset2/ID1322.wav: ['bird_d']


### Generate mini-batches with mel-spectra (from a config file)

In [2]:
import json
from dynibatch.generators.minibatch_gen import MiniBatchGen

# parse json file
with open("example_config.json") as config_file:
    config = json.loads(config_file.read())
            
mb_gen = MiniBatchGen.from_config(config)
mb_gen['default'].start()
mb = mb_gen['default'].execute(active_segments_only=True, with_targets=True)

# get the first mini-batch
data, targets = next(mb)

# show the labels indices and the data of the first mini-batch
print("Label indices:\n{}".format(targets))
print("Mel spectra (truncated):\n{}".format(data))

Label indices:
[2 2 2 2 2 2 2 2 2 2]
Mel spectra (truncated):
[[[[-77.88076782 -74.55519867 -66.03651428 ..., -52.81373978 -54.35548019
    -61.97282028]
   [-66.65984344 -63.33428192 -51.91387939 ..., -56.42927933 -55.33501053
    -56.9071846 ]
   [-85.78934479 -82.46378326 -58.83434677 ..., -57.59656143 -53.44735718
    -58.29795074]
   ..., 
   [-61.77670288 -58.45114136 -49.6537323  ..., -45.98276901 -52.01996231
    -59.93020248]
   [-69.87345886 -66.54789734 -54.32802963 ..., -46.00500107 -51.14955139
    -59.93471146]
   [-66.31863403 -62.9930687  -62.22982788 ..., -45.56991959 -50.94314575
    -54.18180847]]]


 [[[-75.68052673 -72.35495758 -59.55133438 ..., -53.12981415 -54.07550812
    -58.15501022]
   [-72.4744339  -69.14886475 -55.01808548 ..., -50.01371765 -53.38811111
    -57.97169113]
   [-95.34684753 -92.02128601 -62.05028534 ..., -51.11210632 -53.34747696
    -56.77929306]
   ..., 
   [-92.85736847 -89.53180695 -55.01891327 ..., -54.59425354 -56.17362595
    -61.402164

### Generate mini-batches with mel-spectra (going through all steps)

Config files are written in JSON and contain all the parameter to automatically create a minibatch generator.
An example of config file is provided in example_config.json

In [3]:
from dynibatch.generators.segment_container_gen import SegmentContainerGenerator
from dynibatch.generators.audio_frame_gen import AudioFrameGen
from dynibatch.features.extractors.energy import EnergyExtractor
from dynibatch.features.extractors.spectral_flatness import SpectralFlatnessExtractor
from dynibatch.features.extractors.mel_spectrum import MelSpectrumExtractor
from dynibatch.features.frame_feature_processor import FrameFeatureProcessor
from dynibatch.features.segment_feature_processor import SegmentFeatureProcessor
from dynibatch.features.extractors.frame_feature_chunk import FrameFeatureChunkExtractor
from dynibatch.features.activity_detection.simple import Simple
from dynibatch.parsers import label_parsers
from dynibatch.generators.minibatch_gen import MiniBatchGen


#################
# Configuration #
#################

# audio and short-term frames config
audio_root = "../tests/data"
sample_rate = 22050
win_size = 256
hop_size = 128

# mel spectra config
n_mels = 64
min_freq = 0
max_freq = sample_rate / 2

# segments config
seg_duration = 0.2
seg_overlap = 0.5

# activity detection config
energy_threshold = 0.2
spectral_flatness_threshold = 0.3

# mini-batches config
batch_size = 10
feature_size = n_mels
n_time_bins = int(seg_duration * sample_rate / hop_size)

##############
# Processing #
##############

# create a parser to get the labels from the file2label file
parser = label_parsers.CSVFileLabelParser("../tests/data/file2label.csv")

# create needed short-term (aka frame-based) feature extractors
en_ext = EnergyExtractor() # needed for the activity detection
sf_ext = SpectralFlatnessExtractor() # needed for the activity detection
mel_ext = MelSpectrumExtractor(    
    sample_rate=sample_rate,
    fft_size=win_size,
    n_mels=n_mels,
    min_freq=min_freq,
    max_freq=max_freq)

# create an audio frame generator
af_gen = AudioFrameGen(sample_rate, win_size, hop_size)

# create a frame feature processor, in charge of computing all short-term features
ff_pro = FrameFeatureProcessor(af_gen,
                               [en_ext, sf_ext, mel_ext])

# create needed segment-based feature extractors
ffc_ext = FrameFeatureChunkExtractor(mel_ext.name)
act_det = Simple(energy_threshold=energy_threshold,
                 spectral_flatness_threshold=spectral_flatness_threshold)

# create a segment feature processor, in charge of computing all segment-based features
# (here only chunks of mel spectra sequences)
sf_pro = SegmentFeatureProcessor(
        [act_det, ffc_ext],
        ff_pro=ff_pro,
        audio_root=audio_root)

# create and start the segment container generator that will use all the objects above to generate
# for every audio files a segment container containing the list of segments with the labels,
# the mel spectra and an "activity detected" boolean attribute
sc_gen = SegmentContainerGenerator(audio_root,       
                                   sf_pro,
                                   label_parser=parser,
                                   seg_duration=seg_duration,
                                   seg_overlap=seg_overlap)

# generate mini-batches
mb_gen = MiniBatchGen(sc_gen,
                      mel_ext.name,
                      batch_size,
                      feature_size,
                      n_time_bins)

mb_gen.start()
mb = mb_gen.execute(active_segments_only=True, with_targets=True)

# get the first mini-batch
data, targets = next(mb)

In [4]:
# Show the label indices of the first mini-batch
print("Label indices:\n{}".format(targets))

Label indices:
[2 2 2 2 2 2 2 2 2 2]


In [5]:
# and the data
print("Mel spectra (truncated):\n{}".format(data))

Mel spectra (truncated):
[[[[-77.88076782 -74.55519867 -66.03651428 ..., -52.81373978 -54.35548019
    -61.97282028]
   [-66.65984344 -63.33428192 -51.91387939 ..., -56.42927933 -55.33501053
    -56.9071846 ]
   [-85.78934479 -82.46378326 -58.83434677 ..., -57.59656143 -53.44735718
    -58.29795074]
   ..., 
   [-61.77670288 -58.45114136 -49.6537323  ..., -45.98276901 -52.01996231
    -59.93020248]
   [-69.87345886 -66.54789734 -54.32802963 ..., -46.00500107 -51.14955139
    -59.93471146]
   [-66.31863403 -62.9930687  -62.22982788 ..., -45.56991959 -50.94314575
    -54.18180847]]]


 [[[-75.68052673 -72.35495758 -59.55133438 ..., -53.12981415 -54.07550812
    -58.15501022]
   [-72.4744339  -69.14886475 -55.01808548 ..., -50.01371765 -53.38811111
    -57.97169113]
   [-95.34684753 -92.02128601 -62.05028534 ..., -51.11210632 -53.34747696
    -56.77929306]
   ..., 
   [-92.85736847 -89.53180695 -55.01891327 ..., -54.59425354 -56.17362595
    -61.40216446]
   [-68.09340668 -64.76783752 -55