# Audio Decoder in DALI

This tutorial presents, how to set up a simple pipeline, that loads and decodes audio data using DALI. We will use a simple example from Speech Commands Data Set. While this dataset consists of samples in .wav format, the following procedure can be used for most of the well-known digital audio coding formats as well.

## Step-by-step guide
1. Let's start by importing DALI and a handful of utils.

In [1]:
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops            
import nvidia.dali.types as types
import matplotlib.pyplot as plt
import numpy as np

batch_size = 1
audio_files = "audio"

used `batch_size` is `1`, to keep things simple.

2. Next, let's implement the pipeline. Firstly, we need to load data from disk (or any other source). FileReader is able to load data, as well as it's labels. For more information, refer to FileReader docs. Furthermore, similarly to image data, you can use Reader operators that are specific for a given dataset or a dataset format (see [CaffeReader](https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/supported_ops.html#nvidia.dali.ops.CaffeReader)). After loading the input data, the pipeline decodes the audio data. As stated above, the AudioDecoder operator is able to decode most of the well-known audio formats.
   
   Note: Please remember that you shall pass proper data type (argument `dtype`) to the operator. Supported data types can be found in the documentation. If you have 24-bit audio data and you set `dtype=INT16`, it will result in loosing some information from the samples. The default `dtype` for this operator is `INT16`

In [2]:
class AudioDecoderExample(Pipeline):                   
    def __init__(self, batch_size, num_threads, device_id):
        super(AudioDecoderExample, self).__init__(batch_size, num_threads, device_id)
        self.input = ops.FileReader(device="cpu", file_root=audio_files)
        self.decode = ops.AudioDecoder(device="cpu", dtype=types.FLOAT, sampling_rate=-1)

    def define_graph(self):                                                                
        read, _ = self.input()
        audio, rate = self.decode(read)
        return audio, rate

3. Now let's just build and run the pipeline.

In [3]:
pipecpu = AudioDecoderExample(batch_size=batch_size, num_threads=1, device_id=0)
pipecpu.build()          
cpu_output = pipecpu.run()

RuntimeError: [/home/mszolucha/workspace/DALI/dali/pipeline/operator/op_schema.cc:127] Assert on "HasArgument(name)" failed: Argument "sampling_rate" is not supported by operator "AudioDecoder".
Stacktrace (100 entries):
[frame 0]: /home/mszolucha/workspace/DALI/build/dali/python/nvidia/dali/libdali.so(+0x139673) [0x7f252e881673]
[frame 1]: /home/mszolucha/workspace/DALI/build/dali/python/nvidia/dali/libdali.so(dali::OpSchema::GetArgumentType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x1c8) [0x7f252e986400]
[frame 2]: /home/mszolucha/virtualenv/dali/lib/python3.6/site-packages/nvidia/dali/backend_impl.cpython-36m-x86_64-linux-gnu.so(+0x10b883) [0x7f253c335883]
[frame 3]: /home/mszolucha/virtualenv/dali/lib/python3.6/site-packages/nvidia/dali/backend_impl.cpython-36m-x86_64-linux-gnu.so(+0x15c29d) [0x7f253c38629d]
[frame 4]: /home/mszolucha/virtualenv/dali/lib/python3.6/site-packages/nvidia/dali/backend_impl.cpython-36m-x86_64-linux-gnu.so(+0x147612) [0x7f253c371612]
[frame 5]: /home/mszolucha/virtualenv/dali/lib/python3.6/site-packages/nvidia/dali/backend_impl.cpython-36m-x86_64-linux-gnu.so(+0x12b1f0) [0x7f253c3551f0]
[frame 6]: /home/mszolucha/virtualenv/dali/lib/python3.6/site-packages/nvidia/dali/backend_impl.cpython-36m-x86_64-linux-gnu.so(+0x12b28b) [0x7f253c35528b]
[frame 7]: /home/mszolucha/virtualenv/dali/lib/python3.6/site-packages/nvidia/dali/backend_impl.cpython-36m-x86_64-linux-gnu.so(+0xc98c4) [0x7f253c2f38c4]
[frame 8]: /home/mszolucha/virtualenv/dali/bin/python3(_PyCFunction_FastCallDict+0x35c) [0x565d5c]
[frame 9]: /home/mszolucha/virtualenv/dali/bin/python3() [0x503073]
[frame 10]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 11]: /home/mszolucha/virtualenv/dali/bin/python3() [0x504c28]
[frame 12]: /home/mszolucha/virtualenv/dali/bin/python3(_PyFunction_FastCallDict+0x2de) [0x501b2e]
[frame 13]: /home/mszolucha/virtualenv/dali/bin/python3() [0x591461]
[frame 14]: /home/mszolucha/virtualenv/dali/bin/python3() [0x54b813]
[frame 15]: /home/mszolucha/virtualenv/dali/bin/python3() [0x555421]
[frame 16]: /home/mszolucha/virtualenv/dali/bin/python3(_PyObject_FastCallKeywords+0x19c) [0x5a730c]
[frame 17]: /home/mszolucha/virtualenv/dali/bin/python3() [0x503073]
[frame 18]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x1231) [0x507641]
[frame 19]: /home/mszolucha/virtualenv/dali/bin/python3() [0x504c28]
[frame 20]: /home/mszolucha/virtualenv/dali/bin/python3(_PyFunction_FastCallDict+0x2de) [0x501b2e]
[frame 21]: /home/mszolucha/virtualenv/dali/bin/python3() [0x591461]
[frame 22]: /home/mszolucha/virtualenv/dali/bin/python3() [0x54b813]
[frame 23]: /home/mszolucha/virtualenv/dali/bin/python3() [0x555421]
[frame 24]: /home/mszolucha/virtualenv/dali/bin/python3(_PyObject_FastCallKeywords+0x19c) [0x5a730c]
[frame 25]: /home/mszolucha/virtualenv/dali/bin/python3() [0x503073]
[frame 26]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x1231) [0x507641]
[frame 27]: /home/mszolucha/virtualenv/dali/bin/python3() [0x504c28]
[frame 28]: /home/mszolucha/virtualenv/dali/bin/python3() [0x511eca]
[frame 29]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502d6f]
[frame 30]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 31]: /home/mszolucha/virtualenv/dali/bin/python3() [0x58c63a]
[frame 32]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x19f2) [0x507e02]
[frame 33]: /home/mszolucha/virtualenv/dali/bin/python3() [0x58c63a]
[frame 34]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x19f2) [0x507e02]
[frame 35]: /home/mszolucha/virtualenv/dali/bin/python3() [0x58c63a]
[frame 36]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502e0c]
[frame 37]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 38]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502209]
[frame 39]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502f3d]
[frame 40]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 41]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502209]
[frame 42]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502f3d]
[frame 43]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 44]: /home/mszolucha/virtualenv/dali/bin/python3() [0x504c28]
[frame 45]: /home/mszolucha/virtualenv/dali/bin/python3(_PyFunction_FastCallDict+0x2de) [0x501b2e]
[frame 46]: /home/mszolucha/virtualenv/dali/bin/python3() [0x591461]
[frame 47]: /home/mszolucha/virtualenv/dali/bin/python3(PyObject_Call+0x3e) [0x59ebbe]
[frame 48]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x1807) [0x507c17]
[frame 49]: /home/mszolucha/virtualenv/dali/bin/python3() [0x504c28]
[frame 50]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502540]
[frame 51]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502f3d]
[frame 52]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x1231) [0x507641]
[frame 53]: /home/mszolucha/virtualenv/dali/bin/python3() [0x58c63a]
[frame 54]: /home/mszolucha/virtualenv/dali/bin/python3() [0x50e3f0]
[frame 55]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502d6f]
[frame 56]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 57]: /home/mszolucha/virtualenv/dali/bin/python3() [0x504c28]
[frame 58]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502540]
[frame 59]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502f3d]
[frame 60]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 61]: /home/mszolucha/virtualenv/dali/bin/python3() [0x58c63a]
[frame 62]: /home/mszolucha/virtualenv/dali/bin/python3() [0x50e3f0]
[frame 63]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502d6f]
[frame 64]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 65]: /home/mszolucha/virtualenv/dali/bin/python3() [0x504c28]
[frame 66]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502540]
[frame 67]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502f3d]
[frame 68]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 69]: /home/mszolucha/virtualenv/dali/bin/python3() [0x58c63a]
[frame 70]: /home/mszolucha/virtualenv/dali/bin/python3() [0x50e3f0]
[frame 71]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502d6f]
[frame 72]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 73]: /home/mszolucha/virtualenv/dali/bin/python3() [0x504c28]
[frame 74]: /home/mszolucha/virtualenv/dali/bin/python3(_PyFunction_FastCallDict+0x2de) [0x501b2e]
[frame 75]: /home/mszolucha/virtualenv/dali/bin/python3() [0x591461]
[frame 76]: /home/mszolucha/virtualenv/dali/bin/python3(PyObject_Call+0x3e) [0x59ebbe]
[frame 77]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x1807) [0x507c17]
[frame 78]: /home/mszolucha/virtualenv/dali/bin/python3() [0x58c5d8]
[frame 79]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502e0c]
[frame 80]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 81]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502209]
[frame 82]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502f3d]
[frame 83]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 84]: /home/mszolucha/virtualenv/dali/bin/python3() [0x504c28]
[frame 85]: /home/mszolucha/virtualenv/dali/bin/python3(_PyFunction_FastCallDict+0x2de) [0x501b2e]
[frame 86]: /home/mszolucha/virtualenv/dali/bin/python3(_PyObject_FastCallDict+0x4f1) [0x5a36f1]
[frame 87]: /home/mszolucha/virtualenv/dali/bin/python3() [0x5ef2fc]
[frame 88]: /home/mszolucha/virtualenv/dali/bin/python3(_PyObject_FastCallKeywords+0x19c) [0x5a730c]
[frame 89]: /home/mszolucha/virtualenv/dali/bin/python3() [0x503073]
[frame 90]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 91]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502209]
[frame 92]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502f3d]
[frame 93]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x449) [0x506859]
[frame 94]: /home/mszolucha/virtualenv/dali/bin/python3() [0x504c28]
[frame 95]: /home/mszolucha/virtualenv/dali/bin/python3() [0x58644b]
[frame 96]: /home/mszolucha/virtualenv/dali/bin/python3(PyObject_Call+0x3e) [0x59ebbe]
[frame 97]: /home/mszolucha/virtualenv/dali/bin/python3(_PyEval_EvalFrameDefault+0x1807) [0x507c17]
[frame 98]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502209]
[frame 99]: /home/mszolucha/virtualenv/dali/bin/python3() [0x502f3d]


Outputs from `AudioDecoder` consist of a tensor with the decoded data, as well as some metadata (e.g. sampling rate). To access them just check another output. On top of that, AudioDecoder returns data in interleaved format, so we need to reshape the output tensor, to properly display it. Here's how to do that:

In [None]:
audio_data = cpu_output[0].at(0)
sampling_rate = cpu_output[1].at(0)[0]
print("Sampling rate:", sampling_rate, "[Hz]")
print("Audio data:", audio_data)
audio_data = audio_data.flatten()
print("Audio data flattened:", audio_data)
plt.plot(audio_data)
plt.show()

## Verification

Let's verify, that the AudioDecoder actually works. The presented method can also come in handy for debugging DALI pipeline, in case something doesn't go as planned. 

We will use external tool to decode used data and compare the results against data decoded by DALI.

### Important!

Following snippet installs the external dependency (`simpleaudio`). In case you already have it, or don't want to install it, you might want to stop here and not run this one.

In [None]:
import sys
!{sys.executable} -m pip install simpleaudio

Below is the side-by-side comparision of decoded data. If you have the `simpleaudio` module installed, you can run the snippet and see it for yourself.

In [None]:
import simpleaudio as sa

wav = sa.WaveObject.from_wave_file("audio/wav/three.wav")
three_audio = np.frombuffer(wav.audio_data, dtype=np.int16)

print("src: simpleaudio")
print("shape: ", three_audio.shape)
print("data: ", three_audio)
print("\n")
print("src: DALI")
print("shape: ", audio_data.shape)
print("data: ", audio_data)
print("\nAre the arrays equal?", "YES" if np.all(audio_data == three_audio) else "NO")

fig, ax = plt.subplots(1,2)
ax[0].plot(three_audio)
ax[1].plot(audio_data)
plt.show()