# Audio Speech Commands Recognition

* Data Repository: https://upatrasgr-my.sharepoint.com/:u:/g/personal/ceid3565_upatras_gr/EcUKreKCY0tHvjjfma33pkwBTU522U-WKjrRaJgnQ_ghLg?e=2w6Bdd
* The tutorial is based on: https://www.tensorflow.org/tutorials/audio/simple_audio

In [1]:
# Loads the autoreload extension in Notebook
%load_ext autoreload
# Sets the autoreload mode to reload all modules before executing code
%autoreload 2

#### Imports
Disable warnings

In [2]:
import os
# disable TF debug warnings and numpy import
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

import yaml
from datetime import datetime
import time

import numpy as np
import tensorflow as tf

import IPython.display as ipd

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Set the seed value for experiment reproducibility.
seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)

# Disable Warnings for scikit-learn
def warn(*args, **kwargs):
    """Eliminating warnings from scikit-learn.

    """
    pass
import warnings
warnings.warn = warn

In [3]:
now = datetime.now()
print("date and time of NB execution =", now.strftime("%d/%m/%Y %H:%M:%S"))
start_time = time.time()

date and time of NB execution = 10/12/2023 14:05:27


## Load configuration file and dataset path

Append Parent Directory

In [4]:
import sys; sys.path.append('..')

Import Configuration Module 

In [5]:
from utils import load_config

Load configuration file

In [6]:
config = load_config("../config.yml")

In [7]:
# define variables for data path and sample rate
DATASET_PATH = config["TRAINING_DATA_PATH"]
SAMPLE_RATE = 16000

## Load data 
The audio clips are approximately 1 second at 16kHz.

In [8]:
from crossai.loader import audio_loader

# All signals are resampled at `SAMPLE_RATE` and normalised (0, 1)
df = audio_loader(path=DATASET_PATH, sr=SAMPLE_RATE)

Loaded data into the dataframe: 100%|██████████| 10/10 [00:06<00:00,  1.57it/s]                                                   


Create a crossai audio object which will be used for the exploitation of the data processing pipeline.

`cai_audio.data` holds the data in a dataframe: each row/signal is `np.float32`  
and `cai_audio.lalbels` holds the data labels in a dataframe

In [9]:
from crossai.pipelines.audio import Audio

cai_audio = Audio(df)
cai_audio.data.head()

0    [0.506821, 0.50676304, 0.50670505, 0.5071688, ...
1    [0.53376144, 0.5336966, 0.5336966, 0.5336966, ...
2    [0.6036141, 0.6044156, 0.6044885, 0.6037598, 0...
3    [0.4621682, 0.45547011, 0.4510047, 0.44058546,...
4    [0.5535465, 0.5521018, 0.55163574, 0.5524979, ...
Name: data, dtype: object

#### Show audio commands

In [10]:
np.unique(cai_audio.labels)

array(['down', 'go', 'left', 'no', 'right', 'stop', 'up', 'yes'],
      dtype=object)