This example demonstrates how to set up a simple audio decoder pipeline. We load and decode audio data using rocAL. The input data used for this example is a sample speech dataset available in as .wav file)

The following python packages are required to run this example.

In [None]:
!pip install opencv-python
!pip install matplotlib

In [None]:
import random
import numpy as np
from amd.rocal.plugin.pytorch import ROCALAudioIterator
import torch
np.set_printoptions(threshold=1000, edgeitems=10000)
from amd.rocal.pipeline import Pipeline
import amd.rocal.fn as fn
import amd.rocal.types as types
import math
import sys
import cv2
import matplotlib.pyplot as plt
import os

The draw_patches function visualizes a given audio tensor by plotting its flattened data. It also extracts and displays a label associated with the audio data.

In [None]:
def draw_patches(img, idx, device):
    image = img.detach().numpy()
    audio_data = image.flatten()
    label = idx.cpu().detach().numpy()
    print("label", label)
    plt.plot(audio_data)
    plt.show()
    plt.close()

Note: Set the ROCAL_DATA_PATH environment variable before running the notebook.

In [None]:
# Check if ROCAL_DATA_PATH is set
rocal_data_path = os.environ.get('ROCAL_DATA_PATH')

if rocal_data_path is None:
    raise EnvironmentError("ROCAL_DATA_PATH environment variable is not set. Please set it to the correct path.")

if rocal_data_path is None:
    print("The environment variable ROCAL_DATA_PATH is not set.")
else:
    print(f"ROCAL_DATA_PATH is set to: {rocal_data_path}")

rocal_audio_data_path = os.path.join(rocal_data_path, "rocal_data", "audio")


Configuration of the rocAL Pipeline:

The rocAL pipeline is configured with the following parameters:

    batch_size: 1
    CPU/GPU Backend: Configured to use CPU (rocal_cpu=True)

Using a batch_size of 1 and cpu backend to keep things simple

In [None]:
file_list = f"{rocal_audio_data_path}/wav_file_list.txt" #Use file list defined in the MIVisisonX-data repo
rocal_cpu = True
audio_pipeline = Pipeline(batch_size=1, num_threads=8, rocal_cpu=rocal_cpu)

Reading Audio and Labels: 

    The 'fn.readers.file' function reads audio files and their labels from the provided file list.

Decoding Audio:

    The fn.decoders.audio function decodes the audio data with specified parameters.

Parameters used for decoding:

    audio: The audio data to be decoded.
    file_root: The base path where audio wav files are present.
    file_list_path: The path to the file list of audio wav files.


In [None]:
with audio_pipeline:
    audio, labels = fn.readers.file(file_root=rocal_audio_data_path, file_list=file_list)
    decoded_audio = fn.decoders.audio(
        audio,
        file_root=rocal_audio_data_path,
        file_list_path=file_list)
    audio_pipeline.set_outputs(decoded_audio)


Build the pipeline and pass the pipeline to the ROCALAudioIterator

In [None]:
audio_pipeline.build()
audioIterator = ROCALAudioIterator(audio_pipeline)

The output from the iterator includes the audio data as PyTorch tensors, the corresponding labels, and the region of interest.

In [None]:
for i, output_list in enumerate(audioIterator):
    for x in range(len(output_list[0])):
        for audio_tensor, label, roi in zip(output_list[0][x], output_list[1], output_list[2]):
            print("Audio shape: ", audio_tensor.shape)
            print("Label: ", label)
            print("Roi: ", roi)
            draw_patches(audio_tensor, label, "cpu")
audioIterator.reset()