# Encoding
Encoding is the task of convert labels of an utterance to some abstract representation (mostly some array/matrix-like representation), that can be used for example as targets for training some machine learning model.

In [1]:
import os

import audiomate
from audiomate import corpus
from audiomate import encoding
from audiomate.utils import units

First we load some data.

In [2]:
urbansound8k_subset = audiomate.Corpus.load('data/urbansound_subset', reader='urbansound8k')

## Hot Encoding
We want for every frame a vector where each element of the vector indicates if a given label (which is represented by this element) is present.

First we define the path, where the encode data will be stored.

In [3]:
target_path = 'output/targets.hdf5'
os.makedirs('output', exist_ok=True)

The we define the settings that are used to define the start and end time of every frame.

In [4]:
frame_settings = units.FrameSettings(2048, 1024)
sampling_rate = 16000

Then we have to get a list of all labels, that we want to encode. The number of labels defines the length of the encoded vector for every frame.

In [5]:
labels = list(urbansound8k_subset.all_label_values(corpus.LL_SOUND_CLASS))
print(labels)

['dog_bark', 'jackhammer']


Now we create an encoder and proces the full corpus. The output will be stored in a HDF5 container at the path we defined.

In [6]:
encoder = encoding.FrameHotEncoder(labels, corpus.LL_SOUND_CLASS, frame_settings, sampling_rate)
container = encoder.encode_corpus(urbansound8k_subset, target_path)

The resulting container contains the encdoded data for every utterance. The follwing utterance only contains audio with the label "dog_bark" (index 1).

In [7]:
container.open()

sample_utt_id = '18581-3-1-1'
encoded = container.get(sample_utt_id, mem_map=False)[:10]

print(encoded)

[[1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]]
