# Packages, Libraries, and Constants
- Different packages, Libraries

- Different constants and parameters

# Path to the datasets

1. Wake-word (WW) Dataset
 - `no_gaali` - `gaali`

2. Datasets
- `Augmented train data` - `Original train data` - `Test data`


In [1]:
from packages.utils import *

2024-09-29 13:57:03.284817: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-29 13:57:03.312482: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-29 13:57:03.320569: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-09-29 13:57:03.339200: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Directory Labels

In [2]:
from packages.utils import list_directory_contents

In [3]:
train_commands = list_directory_contents(ww_train_data_dir, 'Train')
test_commands = list_directory_contents(ww_test_data_dir, 'Test')

Train commands labels: ['no_gaali' 'gaali']
Test commands labels: ['no_gaali' 'gaali']


# Dataset Pre-processing

## 1. Train and validation Datasets

- `Creating the Train and Validate Datasets`

In [4]:
from packages.data_processing import create_train_val_audio_dataset

train_ds, val_ds, label_names = create_train_val_audio_dataset(ww_train_data_dir)
print(f'Labels: {label_names}')

Found 2834 files belonging to 2 classes.
Using 2268 files for training.
Using 566 files for validation.
Audio Shape: (32, 16000)
Label Shape: (32,)
Labels: ['gaali' 'no_gaali']


2024-09-29 13:57:14.608078: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


## 2. Test dataset

In [5]:
from packages.data_processing import create_test_audio_dataset

test_ds = create_test_audio_dataset(ww_test_data_dir)

Found 644 files belonging to 2 classes.
Audio Shape: (32, 16000)
Label Shape: (32,)


2024-09-29 13:58:11.800590: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


# Data Processing

- `Feature Extraction`

In [6]:
from packages.data_processing import preprocess_melspec_audio_datasets

In [7]:
train_mel_spec_ds, val_mel_spec_ds, test_mel_spec_ds = preprocess_melspec_audio_datasets(train_ds, val_ds, test_ds)

### Shape consistency

In [8]:
print(train_mel_spec_ds.element_spec)
print(val_mel_spec_ds.element_spec)
print(test_mel_spec_ds.element_spec)

(TensorSpec(shape=(None, 124, 128, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))
(TensorSpec(shape=(None, 124, 128, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))
(TensorSpec(shape=(None, 124, 128, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))


# Model 1

### Input shape 

In [None]:
example_spectrograms = next(iter(train_mel_spec_ds))[0]
input_shape = example_spectrograms.shape[1:]
print('Input shape:', input_shape)

num_labels = len(label_names)
print(f'Labels {label_names}')

In [13]:
from packages.model import model

### Model Architecture

In [None]:
model = model(input_shape, num_labels)
model.summary()

### Compile and Train the model

In [15]:
from packages.model import compile_and_train_model

In [None]:
history = compile_and_train_model(model, train_mel_spec_ds, val_mel_spec_ds)

### Plot Accuracy and Loss

In [None]:
from packages.utils import plot_training_history

In [None]:
plot_training_history(history)

### Evaluate the model performance

Run the model on the test set and check the model's performance:

In [24]:
from model import evaluate_model

In [None]:
evaluate_model(model, test_mel_spec_ds)

## Confusion matrix

In [None]:
y_pred = model.predict(test_mel_spec_ds)
y_pred = tf.argmax(y_pred, axis=1)
y_true = tf.concat(list(test_mel_spec_ds.map(lambda s,lab: lab)), axis=0)
label_names_slice = ['gaali', 'no_gaali']

In [27]:
from packages.model import plot_confusion_matrix

In [None]:
plot_confusion_matrix(y_true, y_pred, label_names_slice)

## save the Keras model

In [29]:
KERAS_MODEL_PATH = "model/wakeword_model_1.keras"

In [30]:
model.save(KERAS_MODEL_PATH)