# Optimized Weather Event Classification Training Pipeline

This training pipeline is designed for processing audio data and training a convolutional neural network (CNN) model for audio classification. It specifically deals with weather-related audio data, transforming raw audio files into mel-spectrogram representations that are then used as input for the CNN. Below are the key components and steps involved in this pipeline:

## Pipeline Overview

1. **Audio Data Loading and Preprocessing**:
   - **Loading Audio**: Audio files are loaded with a defined sample rate.
   - **Padding or Trimming**: Each audio is either padded or trimmed to a uniform length based on the specified duration to maintain consistency.
   - **Mel-Spectrogram Conversion**: Converts audio into mel-spectrograms using parameters like FFT window size and hop length.

2. **Data Preparation**:
   - **Spectrogram Resizing**: Spectrograms are resized to fit the model's input dimensions.
   - **Label Encoding**: Converts categorical labels into a one-hot encoding format suitable for classification.

3. **Model Training**:
   - **CNN Architecture**: The model consists of several convolutional layers followed by max pooling and dropout layers to prevent overfitting. It ends with a global average pooling layer and a fully connected layer.
   - **Callbacks**: Includes model checkpointing, reducing learning rate on plateau, and early stopping to improve training efficiency and prevent overfitting.

4. **Data Augmentation**:
   - Applies transformations like rotation, width and height shift, zoom, etc., to artificially expand the training dataset, which helps improve model generalization.

5. **Training Execution**:
   - Uses an image data generator to feed data into the model in batches, facilitating efficient training.

6. **Model Evaluation and Saving**:
   - The training and validation loss and accuracy are plotted to monitor the training process.
   - The best-performing model is saved for later use in practical applications.

## Model Deployment

After training, the model is saved both in TensorFlow's SavedModel format and as an H5 file, ensuring it can be easily loaded for future predictions or evaluation.

This pipeline is built to handle specifically formatted audio data and is optimized for high performance in audio classification tasks related to weather phenomena.


In [1]:
import os
import librosa
#import librosa.display
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Dropout, BatchNormalization, GlobalAveragePooling2D
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Configuration Dictionary for Audio Processing and Model Input

The `SC` dictionary contains configuration settings that are essential for the audio preprocessing and the CNN model input preparation. Below are the details of each configuration:

## Audio Processing Parameters:
- **AUDIO_CLIP_DURATION**: The duration of the audio clip in seconds. Each audio file is processed to have this fixed length (2 seconds).
- **AUDIO_NFFT**: The number of FFT (Fast Fourier Transform) points used to calculate the mel-spectrogram (2048 points).
- **AUDIO_WINDOW**: The windowing function used in the FFT. It is set to `None`, meaning librosa's default will be used.
- **AUDIO_STRIDE**: The stride (hop length) between successive FFTs during the spectrogram calculation (200 samples).
- **AUDIO_SAMPLE_RATE**: The sampling rate used for audio files (16000 Hz).
- **AUDIO_MELS**: The number of Mel bands used in the mel-spectrogram (260 bands).
- **AUDIO_FMIN**: The lowest frequency to include when generating the mel-spectrogram (20 Hz).
- **AUDIO_FMAX**: The highest frequency to include when generating the mel-spectrogram (13000 Hz).
- **AUDIO_TOP_DB**: The threshold for the top decibels used in the dynamic range compression during log mel-spectrogram generation (80 dB).

## Model Input Specifications:
- **MODEL_INPUT_IMAGE_WIDTH**: The width of the input images to the CNN model (260 pixels).
- **MODEL_INPUT_IMAGE_HEIGHT**: The height of the input images to the CNN model (260 pixels).
- **MODEL_INPUT_IMAGE_CHANNELS**: The number of channels in the input images to the CNN model, corresponding to RGB channels (3 channels).

These parameters are critical in ensuring that the audio data is uniformly processed and prepared in a format suitable for the CNN model training, helping in achieving consistent results and effective learning.


In [2]:
SC = {
    'AUDIO_CLIP_DURATION': 2,
    'AUDIO_NFFT': 2048,
    'AUDIO_WINDOW': None,
    'AUDIO_STRIDE': 200,
    'AUDIO_SAMPLE_RATE': 16000,
    'AUDIO_MELS': 260,
    'AUDIO_FMIN': 20,
    'AUDIO_FMAX': 13000,
    'AUDIO_TOP_DB': 80,

    'MODEL_INPUT_IMAGE_WIDTH': 260,
    'MODEL_INPUT_IMAGE_HEIGHT': 260,
    'MODEL_INPUT_IMAGE_CHANNELS': 3,
}

# Function: load_and_pad_audio

The `load_and_pad_audio` function is designed to load an audio file, ensure it has a consistent duration, and handle discrepancies in length by either padding or trimming the audio data. Below are the steps and mechanisms involved:

## Function Details:
- **Parameters**:
  - `file_path`: The path to the audio file.
  - `duration`: The target duration of the audio in seconds. Default is 2 seconds.
  - `sr`: The sample rate to use when loading the audio. Default is 16000 Hz.

- **Process**:
  1. **Loading**: The function attempts to load an audio file using `librosa.load` with the specified sample rate (`sr`).
  2. **Duration Adjustment**:
     - **Short Audio**: If the loaded audio is shorter than the required duration, it calculates the necessary padding on both ends to make the audio meet the specified duration.
     - **Long Audio**: If the audio is longer than required, it trims the audio starting from the middle to the required length.
  3. **Error Handling**: If the audio file cannot be loaded, an error message is printed, and the function returns `None` for both the audio and sample rate.

- **Output**:
  - Returns the adjusted audio array and the sample rate if the audio file is successfully processed. If there is an error, returns `None` for both.

This function is crucial for preparing audio data with consistent length, which is essential for downstream processing like feature extraction and machine learning model training.


In [3]:
def load_and_pad_audio(file_path, duration=2, sr=16000):
    try:
        audio, _ = librosa.load(file_path, sr=sr)
        required_samples = sr * duration
        audio_length = len(audio)

        if audio_length < required_samples:
            pad_length = (required_samples - audio_length) // 2
            audio = np.pad(audio, (pad_length, required_samples - audio_length - pad_length), "constant")
        elif audio_length > required_samples:
            start = (audio_length - required_samples) // 2
            audio = audio[start:start + required_samples]
        return audio, sr
    except Exception as e:
        print(f"Error loading {file_path}: {e}")
        return None, None

# Function: load_audio_files

The `load_audio_files` function is responsible for loading and processing audio files from a specified directory, converting them into mel-spectrograms, and organizing them along with their labels for further use in machine learning training. Below is a detailed breakdown of its functionality:

## Function Details:
- **Parameter**:
  - `path`: The directory path where audio files are organized by class labels in subdirectories.

- **Process**:
  1. **Directory Reading**: Identifies all subdirectories within the specified path, each representing a class label.
  2. **Audio Processing**:
     - For each class label, it iterates through the audio files in the corresponding subdirectory.
     - Each audio file is loaded and processed to ensure it has a uniform duration using the `load_and_pad_audio` function.
     - Converts the audio into a mel-spectrogram using librosa's `melspectrogram` function with predefined settings (FFT points, hop length, number of mel bands, etc.).
     - Converts the mel-spectrogram to a decibel scale using `librosa.power_to_db`, which helps in normalizing the dynamic range.
  3. **Data Collection**:
     - The processed mel-spectrogram and its corresponding label are stored.
     - Continues this process for all audio files, skipping any that cannot be loaded or processed.
  4. **Output Compilation**:
     - Compiles all successfully processed audio data into a list of mel-spectrograms and their corresponding labels.
     - Also returns a list of class labels identified from the directory structure.

- **Output**:
  - Returns three items:
    - `audios`: A list of mel-spectrogram arrays.
    - `labels`: A list of labels corresponding to the mel-spectrograms.
    - `class_labels`: A list of unique class labels derived from the directory names.

This function plays a critical role in the preprocessing pipeline, ensuring that all audio files are consistently formatted and labeled correctly for effective model training.


In [4]:
def load_audio_files(path):
    audios, labels = [], []
    class_labels = [d for d in os.listdir(path) if os.path.isdir(os.path.join(path, d))]
    for label in class_labels:
        class_path = os.path.join(path, label)
        for file in os.listdir(class_path):
            file_path = os.path.join(class_path, file)
            audio, sr = load_and_pad_audio(file_path, duration=SC['AUDIO_CLIP_DURATION'], sr=SC['AUDIO_SAMPLE_RATE'])
            if audio is None:
                continue
            mel_spectrogram = librosa.feature.melspectrogram(y=audio, sr=sr,
                n_fft=SC['AUDIO_NFFT'],
                hop_length=SC['AUDIO_STRIDE'],
                n_mels=SC['AUDIO_MELS'],
                fmin=SC['AUDIO_FMIN'],
                fmax=SC['AUDIO_FMAX'])
            log_mel_spectrogram = librosa.power_to_db(mel_spectrogram, top_db=SC['AUDIO_TOP_DB'])
            audios.append(log_mel_spectrogram)
            labels.append(label)
    return audios, labels, class_labels

# Preparing Spectrograms and Data Splitting for Training

This section of the code is responsible for resizing spectrograms to a uniform target size, encoding labels for classification, padding spectrograms for uniformity, and splitting the data into training and testing sets. Here is a detailed breakdown of each part:

## Functions and Processes:
- **prepare_spectrograms**:
  - **Purpose**: Resizes each spectrogram to a specified target size and adjusts the channel dimension to match the input requirements of the CNN model.
  - **Process**:
    - Converts each spectrogram to a 3-channel image to mimic RGB data, which is typical input for pre-trained CNN models.
    - Uses bilinear interpolation for resizing which is a standard method for image data.

- **Label Encoding**:
  - Utilizes `LabelEncoder` to transform textual class labels into unique integers.
  - Transforms these integer labels into one-hot encoded format using `to_categorical`, making them suitable for model training.

- **Spectrogram Padding**:
  - Ensures all spectrograms have the same dimensions by padding them with zeros where necessary. This is crucial for batching in neural network training.

- **Data Splitting**:
  - Divides the data into training and testing sets with a split ratio of 80% training and 20% testing. This helps in evaluating the model's performance on unseen data.

## Code Execution:
- Loads audio files from the specified path and processes them into mel-spectrograms.
- Encodes labels and prepares spectrograms for neural network input.
- Splits the prepared spectrogram data and corresponding labels into training and testing datasets.

## Outputs:
- **X_train, X_test**: Training and testing sets of spectrograms.
- **y_train, y_test**: Corresponding training and testing sets of labels.

This code segment is crucial for preparing the input data in a way that aligns with the requirements of sophisticated machine learning models, ensuring consistency and usability across different computational processes.


In [5]:
def prepare_spectrograms(spectrograms, target_size=(260, 260)):
    resized_spectrograms = np.array([tf.image.resize(spect[:, :, np.newaxis], target_size, method='bilinear').numpy() for spect in spectrograms])
    resized_spectrograms = np.repeat(resized_spectrograms, 3, axis=3)
    return resized_spectrograms

path = r"D:\Deakin\Project Echo\Weather_Sounds\Weather_Sounds_train_test"
audios, labels, class_labels = load_audio_files(path)

le = LabelEncoder()
labels_encoded = le.fit_transform(labels)
labels_categorical = to_categorical(labels_encoded)

max_length = max(audio.shape[1] for audio in audios)
max_height = max(audio.shape[0] for audio in audios)
audios_padded = np.array([
    np.pad(audio, ((0, max_height - audio.shape[0]), (0, max_length - audio.shape[1])), 'constant')
    for audio in audios
])

prepared_spectrograms = prepare_spectrograms(audios_padded)
X_train, X_test, y_train, y_test = train_test_split(prepared_spectrograms, labels_categorical, test_size=0.2, random_state=42)

  mel_basis = filters.mel(sr=sr, n_fft=n_fft, **kwargs)


# Function: build_model

The `build_model` function constructs a convolutional neural network (CNN) tailored for image-based classification, designed to handle preprocessed audio data represented as spectrograms. Here's how the function is structured and what each part accomplishes:

## Parameters:
- **input_shape**: A tuple defining the shape of the input data, including height, width, and number of channels.
- **num_classes**: The number of unique labels or classes in the dataset, which determines the output layer's size.

## Model Architecture:
1. **Initial Convolution Layer**:
   - Applies a 32-filter convolutional layer with a kernel size of (3x3), using 'same' padding and 'relu' activation. This layer is designed to extract initial features from the input data without reducing its dimensionality.
2. **Batch Normalization and Pooling**:
   - Follows each convolution layer with batch normalization, which normalizes the activations of the previous layer, helping in faster convergence and more stable training.
   - Uses max pooling with a (2x2) window to reduce the spatial dimensions of the feature maps, effectively summarizing the features.
3. **Additional Convolution Layers**:
   - Stacks two more convolution layers with increasing number of filters (64 and 128), each followed by batch normalization, max pooling, and dropout layers. These layers increase the model's capacity to learn more complex features and include dropout to prevent overfitting.
4. **Global Average Pooling**:
   - Applies global average pooling to reduce each feature map to a single value, reducing the total number of parameters and decreasing the risk of overfitting.
5. **Fully Connected Layers**:
   - A dense layer with 256 units and 'relu' activation processes the pooled features, followed by a dropout layer.
   - The final output layer with a number of units equal to `num_classes`, using a 'softmax' activation to output probabilities for each class.

## Compilation:
- The model is compiled with the 'adam' optimizer and 'categorical_crossentropy' loss function, which are suitable for multi-class classification tasks.
- It also tracks 'accuracy' as a metric to evaluate the model's performance during training.

## Output:
- Returns the fully constructed and compiled CNN model, ready to be trained on spectrogram data for audio classification.

This function encapsulates a robust architecture suitable for handling complex patterns in spectrogram images, making it ideal for tasks like audio classification where distinguishing features might be subtle yet crucial for accuracy.


In [6]:
def build_model(input_shape, num_classes):
    model = Sequential()
    model.add(Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=input_shape))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Conv2D(128, (3, 3), padding='same', activation='relu'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(GlobalAveragePooling2D())
    model.add(Dense(256, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model


input_shape = (SC['MODEL_INPUT_IMAGE_WIDTH'], SC['MODEL_INPUT_IMAGE_HEIGHT'], SC['MODEL_INPUT_IMAGE_CHANNELS'])
model = build_model(input_shape, num_classes=len(class_labels))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


# Model Training Setup and Callbacks

This section of the code defines the training process, including data augmentation, training execution, and the use of callbacks to enhance model training. Each component is crafted to optimize the model's performance and handle the training process efficiently.

## Training Callbacks:
- **ModelCheckpoint**:
  - Saves the model after every epoch if it shows improvement over the previous best model, based on validation loss (`val_loss`).
  - `save_best_only=True` ensures that only the top-performing model is saved, optimizing storage and focusing on the best results.
  - `mode='min'` specifies that the `val_loss` should minimize for improvement.
- **ReduceLROnPlateau**:
  - Reduces the learning rate when the validation loss has stopped improving.
  - `factor=0.1` reduces the learning rate to 10% of its current value.
  - `patience=10` waits for 10 epochs without improvement before reducing the learning rate.
  - `min_lr=0.00001` sets a lower bound on the learning rate to prevent it from decreasing too much.
  - `verbose=1` ensures messages about learning rate reduction are printed.
- **EarlyStopping**:
  - Stops training when the validation loss has not improved for a given number of epochs (`patience=20`).
  - `restore_best_weights=True` ensures that the model's weights are reverted to the best encountered during training upon early stop.

## Data Augmentation:
- Uses `ImageDataGenerator` to artificially enhance the size and diversity of the training dataset by applying random transformations that include:
  - `rotation_range=20`: Rotates images by up to 20 degrees.
  - `width_shift_range=0.2` and `height_shift_range=0.2`: Shifts images along the width and height by up to 20% of their dimensions.
  - `shear_range=0.2`: Applies shear transformations.
  - `zoom_range=0.2`: Zooms into images by up to 20%.
  - `horizontal_flip=True`: Flips images horizontally (not typically useful for audio data but included for completeness).
  - `fill_mode='nearest'`: Uses the nearest pixels to fill in new pixels when applying transformations.

## Training Execution:
- The model is trained using the `fit` method on batches of data provided by `train_generator`.
- `epochs=100` sets the number of passes over the complete dataset.
- The training and validation datasets are specified, and the aforementioned callbacks are utilized to monitor and optimize the training process.

## Outputs:
- `history`: Captures the training history, including metrics such as loss and accuracy for both training and validation phases.

This training configuration leverages modern techniques in machine learning to ensure that the model learns effectively from the augmented data, adapts to new challenges dynamically through callbacks, and stops at the optimal time to prevent overfitting.


In [8]:
checkpoint_callback = ModelCheckpoint('models/best_model.weights.h5', save_best_only=True, monitor='val_loss', mode='min',save_weights_only=True)

reduce_lr_callback = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=10, min_lr=0.00001, verbose=1)

early_stopping_callback = EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True)

train_datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

train_generator = train_datagen.flow(X_train, y_train, batch_size=32)

history = model.fit(x=train_generator, epochs=100, validation_data=(X_test, y_test),
                    callbacks=[checkpoint_callback, reduce_lr_callback, early_stopping_callback])

Epoch 1/100


  self._warn_if_super_not_called()


[1m170/294[0m [32m━━━━━━━━━━━[0m[37m━━━━━━━━━[0m [1m4:40[0m 2s/step - accuracy: 0.8325 - loss: 0.4333

AbortedError: Graph execution error:

Detected at node StatefulPartitionedCall/gradient_tape/sequential_1/conv2d_2_1/convolution/Conv2DBackpropFilter defined at (most recent call last):
<stack traces unavailable>
Operation received an exception:Status: 1, message: could not create a primitive, in file tensorflow/core/kernels/mkl/mkl_conv_grad_filter_ops.cc:685
	 [[{{node StatefulPartitionedCall/gradient_tape/sequential_1/conv2d_2_1/convolution/Conv2DBackpropFilter}}]] [Op:__inference_one_step_on_iterator_74118]

# Plotting Training and Validation Loss

This code segment is used to visualize the training and validation loss over epochs, providing a graphical representation of the model's learning progress. Here's an overview of how it is structured:

## Code Explanation:
- `plt.plot(history.history['loss'], label='Training loss')`: Plots the training loss values stored in `history.history['loss']` across all epochs. This line adds a label "Training loss" to distinguish it in the plot.
- `plt.plot(history.history['val_loss'], label='Validation loss')`: Similarly, this line plots the validation loss values from `history.history['val_loss']`, labeled as "Validation loss".
- `plt.title('Training and Validation Loss')`: Sets the title of the plot to "Training and Validation Loss" to indicate what the graph represents.
- `plt.legend()`: Adds a legend to the plot, which helps in distinguishing between the training and validation loss lines.
- `plt.show()`: Displays the plot. This function call ensures that the plot is rendered and shown to the user.

## Purpose:
- **Visualization**: This plot is crucial for understanding how well the model is learning and generalizing over time. It helps in identifying patterns such as overfitting or underfitting based on the divergence or convergence of these two lines.
- **Monitoring**: Allows developers and researchers to monitor the training process visually, making it easier to decide whether further training is necessary or if adjustments need to be made to the training process.

## Output:
- A line graph showing the loss on the vertical axis and the number of epochs on the horizontal axis. The training loss is typically shown in one color and the validation loss in another to clearly illustrate the differences and trends during the training phase.

This visualization is an essential part of machine learning model training, as it provides immediate visual feedback on the effectiveness and progression of the training regime.


In [None]:
plt.plot(history.history['loss'], label='Training loss')
plt.plot(history.history['val_loss'], label='Validation loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.show()

# Saving the Trained Model

This section of the code is dedicated to saving the trained model for later use, ensuring that all the training effort can be utilized effectively in practical applications. The model is saved in two different formats to provide flexibility in deployment and further usage.

## Code Explanation:
- **Directory Setup**:
  - `model_dir = "weather_audio_detection_model"`: Defines a string variable that specifies the directory name where the TensorFlow SavedModel will be stored.
- **Save as TensorFlow SavedModel**:
  - `tf.saved_model.save(model, model_dir)`: Saves the entire model in TensorFlow's SavedModel format, which includes the architecture, weights, and the training configuration. This format is ideal for serving via TensorFlow Serving and can be useful for further fine-tuning or transfer learning.
- **Save as HDF5 File**:
  - `model.save('WeatherAudioDetectionModel.h5')`: Saves the model as an HDF5 file, a versatile storage format that is widely used in data-intensive environments, which is particularly useful for loading the model in other Python environments and for integration with other Python-based tools.

## Purpose:
- **Versatility and Compatibility**: Saving the model in these two formats ensures that it can be easily integrated and deployed across different platforms and applications.
- **Preservation**: These methods ensure the preservation of the model's state after training, allowing for easy replication of results and further analysis at any time.

## Output:
- Two files are generated and saved:
  - A directory named `weather_audio_detection_model` containing the SavedModel.
  - An HDF5 file named `WeatherAudioDetectionModel.h5`.

These saving methods are critical for the lifecycle of machine learning projects, providing the means to utilize trained models beyond the immediate environment in which they were trained.


In [None]:
# model_dir = "weather_audio_detection_model"
# tf.saved_model.save(model, model_dir)
model.save('WeatherAudioDetectionModel.h5')