<a href="https://colab.research.google.com/github/radhakrishnan-omotec/arwan-iris-dog-repo/blob/main/ISEF_FINAL_ArwanMakhija_Prediction_EfficientNet_V2%E2%80%91L_CNN_Training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---

# Enhanced Python Notebook for **TailSense** : EfficientNet V2‑L based Canine Pet  Audio Spectrogram Classification

### Author : ARWAAN MAKHIJA

Below is an enhanced Python notebook implementation for Google Colab that integrates both image classification and spectrogram audio classification using the EfficientNet V2‑L model, optimized for maximum accuracy and depth.

It leverages EfficientNet V2‑L’s deep architecture (~60M parameters) with residual connections for classifying dog emotions from both image and audio data derived from videos of a Cocker Spaniel.

The dataset is assumed to contain 8-10 emotion classes (e.g., "defensive," "stressed," "friendly"), and the implementation includes data preprocessing, model training, evaluation, a Gradio interface for real-time inference, and TensorFlow Lite conversion for edge deployment.

# EfficientNet V2‑L-based Spectrogram Audio Classification

# **Part 2** : Cocker Spaniel 8 Emotions Prediction EfficientNet V2‑L-based Spectrogram Audio Classification


8-class Cocker Spaniel emotion classification using spectrogram images with EfficientNetV2-L for maximum accuracy.

---

# My Project Key Enhancements:
**EfficientNetV2-L for Maximum Accuracy:**<br>
Replaced EfficientNetB0 with EfficientNetV2-L (~118M parameters), the largest EfficientNet model in TensorFlow’s V2 family, designed for high accuracy with deeper layers and advanced scaling (width, depth, resolution).<br>
Used ImageNet weights for transfer learning, freezing the base model initially to leverage pre-trained features.<br>
Fine-tuned the last 30 layers (as in the original EfficientNet V2-L code) to adapt to spectrogram-specific patterns, balancing computational cost and accuracy.<br>
Adjusted input size to 480x480, the default for EfficientNetV2-L, to maximize feature extraction (increased from 224x224).<br>
Used tf.keras.applications.efficientnet_v2.preprocess_input for model-specific preprocessing, ensuring spectrogram images are normalized correctly.<br><br>
**Custom Head for Deep Features:**<br>
Increased dense layer sizes to 2048 and 1024 units (from 512 and 256) to handle the richer feature representations from EfficientNetV2-L’s deeper architecture.
Retained dropout rates (0.5 and 0.3) to prevent overfitting, given the model’s high parameter count.<br><br>
**Optimized Training:**<br>
Set initial learning rate to 1e-4 and fine-tuning learning rate to 1e-5 for stable convergence with the larger model.<br>
Extended fine-tuning to 15 epochs (from 10) to fully leverage EfficientNetV2-L’s capacity, while keeping EarlyStopping to prevent overfitting.
Reduced batch size to 16 (from 32) to accommodate the larger input size (480x480) and model complexity within Colab’s GPU memory constraints.<br><br>
**Retained Enhancements:**<br>
Dataset: Kept the 8-class Cocker Spaniel emotion dataset (Sad, Happy, Stress, Restless, Normal, Love, Unhappy, Tired) and spectrogram pipeline (Librosa for mel-spectrograms).<br>
Data Augmentation: Preserved advanced augmentation (rotation_range=30, vertical_flip=True, brightness_range=[0.8, 1.2]) to enhance generalization for spectrogram images.<br>
Learning Rate Scheduling: Retained ReduceLROnPlateau (factor=0.5, patience=5, min_lr=1e-6) for adaptive optimization.<br>
Grad-CAM: Kept Grad-CAM for interpretability, using the top_conv layer (last convolutional layer in EfficientNetV2-L), allowing visualization of spectrogram regions critical for emotion predictions.<br>
Gradio Interface: Maintained the audio-to-spectrogram prediction pipeline, accepting .wav inputs and displaying emotion predictions with confidence scores.
TensorFlow Lite: Preserved float16 quantization for edge deployment on devices like Raspberry Pi 5.<br>
Modular Code: Retained functions (split_dataset, plot_training_metrics, plot_confusion_matrix, display_gradcam) for clarity and reusability.<br><br>
**Alignment with Previous Code:**<br>
Built on the original EfficientNet V2-L implementation from rpi_testing_isef1_efficientnet_dog_emotion_classification.py, adopting its fine-tuning strategy (last 30 layers) and dense layer structure (2048, 1024 units).<br>
Integrated the template’s (Cocker_Spaniel_Emotion_ResNet152_Training.ipynb) dataset structure, data splitting, and evaluation metrics (confusion matrix, classification report).<br>
Ensured compatibility with Google Colab’s GPU environment, optimizing for the larger model’s computational demands.<br><br>

**Notes:**<br>
The code assumes an audio dataset at /content/drive/MyDrive/Cocker_Spaniel_Emotions_Audio with subfolders for each emotion (Sad, Happy, etc.) containing .wav files. Users must provide this dataset and update paths accordingly.<br>
The Grad-CAM implementation uses the top_conv layer, the last convolutional layer in EfficientNetV2-L. Users should verify this using model.summary() if the architecture differs.<br>
The input size is set to 480x480, optimal for EfficientNetV2-L, but may require significant GPU memory. Users with limited resources can reduce to 384x384, though this may slightly impact accuracy.<br>
Training epochs are set to 50 (initial) + 15 (fine-tuning), with EarlyStopping to prevent overfitting. Users may adjust based on dataset size and convergence.
The batch size is reduced to 16 to fit within Colab’s GPU memory for the larger model and input size. Users with high-end GPUs can increase it for faster training.<br>
The TFLite model is optimized with float16 quantization, but EfficientNetV2-L’s size may challenge edge devices like Raspberry Pi 5. Users may need to test performance or consider model pruning for deployment.<br>

---

#Cocker_Spaniel_Emotion_EfficientNetV2-L

### Enhanced for 8-class Cocker Spaniel emotion classification using spectrogram images with EfficientNetV2-L for maximum accuracy.

# 1) Setup and Import Libraries
# **Step 1**:

In [None]:
# =============================
# Step 1: Setup and Import Libraries
# =============================
import tensorflow as tf
from tensorflow.keras import models, layers
from tensorflow.keras.applications import EfficientNetV2L
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array
import numpy as np
import matplotlib.pyplot as plt
import os
import cv2
import librosa
import librosa.display
import gradio as gr
from google.colab import drive
from sklearn.metrics import confusion_matrix, classification_report
import pathlib
import seaborn as sns
import shutil
import random

# 1) Enable GPU acceleration

In [None]:
# Enable GPU acceleration
physical_devices = tf.config.list_physical_devices('GPU')
if physical_devices:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
print("TensorFlow version:", tf.__version__)
print("GPU available:", tf.test.is_gpu_available())

# 1) Mount Google Drive

In [None]:
# Mount Google Drive
drive.mount('/content/drive')

# 2) Define Dataset Paths and Convert Audio to Spectrogram Images
# **Step 2**:

In [None]:
# =============================
# Step 2: Define Dataset Paths and Convert Audio to Spectrogram Images
# =============================
audio_dataset_dir = '/content/drive/MyDrive/Cocker_Spaniel_Emotions_Audio'
spectrogram_dataset_dir = '/content/drive/MyDrive/Cocker_Spaniel_Emotions_Spectrogram'

os.makedirs(spectrogram_dataset_dir, exist_ok=True)

# 1) Define emotion classes

In [None]:
# Define emotion classes
emotion_classes = ["Sad", "Happy", "Stress", "Restless", "Normal", "Love", "Unhappy", "Tired"]
class_labels = {cls: idx for idx, cls in enumerate(emotion_classes)}

# 1) Convert audio files to spectrogram images

In [None]:
# Convert audio files to spectrogram images
def generate_spectrogram(audio_path, output_image_path):
    """Convert an audio file into a mel-spectrogram image."""
    y, sr = librosa.load(audio_path, sr=44100)
    S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)
    S_dB = librosa.power_to_db(S, ref=np.max)
    plt.figure(figsize=(10, 4))
    librosa.display.specshow(S_dB, sr=sr, x_axis='time', y_axis='mel')
    plt.axis('off')
    plt.savefig(output_image_path, bbox_inches='tight', pad_inches=0)
    plt.close()

# 1) Process audio dataset

In [None]:
# Process audio dataset
for emotion in emotion_classes:
    emotion_path = os.path.join(audio_dataset_dir, emotion)
    output_emotion_folder = os.path.join(spectrogram_dataset_dir, emotion)
    os.makedirs(output_emotion_folder, exist_ok=True)
    if os.path.isdir(emotion_path):
        for audio_file in os.listdir(emotion_path):
            if audio_file.endswith('.wav'):
                audio_path = os.path.join(emotion_path, audio_file)
                output_image_path = os.path.join(output_emotion_folder, f"{os.path.splitext(audio_file)[0]}.png")
                generate_spectrogram(audio_path, output_image_path)
print("✅ Spectrogram dataset generated successfully.")

# 1) Count total spectrogram image

In [None]:
# Count total spectrogram images
dataset_dir = pathlib.Path(spectrogram_dataset_dir)
total_images = len(list(dataset_dir.glob('*/*.png')))
print(f"Total spectrogram images in dataset: {total_images}")

# 1) Count images per class

In [None]:
# Count images per class
image_counts = {}
image_paths = {}
for cls in emotion_classes:
    count = len(list(dataset_dir.glob(f'{cls}/*')))
    paths = list(dataset_dir.glob(f'{cls}/*'))
    image_counts[cls] = count
    image_paths[cls] = paths
    print(f"{cls}: {count} images")

# 1) Plot class distribution

In [None]:
# Plot class distribution
plt.figure(figsize=(10, 6))
plt.bar(image_counts.keys(), image_counts.values(), color='skyblue')
plt.xlabel('Emotion Class')
plt.ylabel('Number of Spectrogram Images')
plt.title('Distribution of Spectrogram Images by Emotion')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

# 3) Split Dataset into Train, Validation, and Test Sets
# **Step 3**:

In [None]:
# =============================
# Step 3: Split Dataset into Train, Validation, and Test Sets
# =============================
def split_dataset(dataset_dir, output_dir, train_ratio=0.7, val_ratio=0.15, test_ratio=0.15):
    for split in ['train', 'val', 'test']:
        os.makedirs(os.path.join(output_dir, split), exist_ok=True)

    for cls in emotion_classes:
        class_path = os.path.join(dataset_dir, cls)
        images = os.listdir(class_path)
        random.shuffle(images)

        total_images = len(images)
        train_end = int(train_ratio * total_images)
        val_end = train_end + int(val_ratio * total_images)

        train_images = images[:train_end]
        val_images = images[train_end:val_end]
        test_images = images[val_end:]

        def copy_images(image_list, split):
            split_class_dir = os.path.join(output_dir, split, cls)
            os.makedirs(split_class_dir, exist_ok=True)
            for img in image_list:
                src = os.path.join(class_path, img)
                dst = os.path.join(split_class_dir, img)
                shutil.copy(src, dst)

        copy_images(train_images, 'train')
        copy_images(val_images, 'val')
        copy_images(test_images, 'test')

splitted_dataset_dir = '/content/drive/MyDrive/Cocker_Spaniel_Emotions_Splitted'
split_dataset(dataset_dir, splitted_dataset_dir)
print("✅ Dataset successfully split into training, validation, and testing sets!")

# 1) Verify split counts

In [None]:
# Verify split counts
for split in ['train', 'val', 'test']:
    print(f"\n📂 {split.upper()} SET:")
    split_path = os.path.join(splitted_dataset_dir, split)
    for cls in emotion_classes:
        class_path = os.path.join(split_path, cls)
        num_images = len(os.listdir(class_path)) if os.path.exists(class_path) else 0
        print(f"   - {cls}: {num_images} images")

# 4) Data Augmentation and Generators
# **Step 4**:

In [None]:
# =============================
# Step 4: Data Augmentation and Generators
# =============================
IMG_HEIGHT, IMG_WIDTH, BATCH_SIZE = 480, 480, 16  # Increased input size for EfficientNetV2-L
NUM_CLASSES = 8

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.3,
    horizontal_flip=True,
    vertical_flip=True,
    brightness_range=[0.8, 1.2],
    fill_mode='nearest',
    preprocessing_function=tf.keras.applications.efficientnet_v2.preprocess_input
)

val_test_datagen = ImageDataGenerator(
    rescale=1./255,
    preprocessing_function=tf.keras.applications.efficientnet_v2.preprocess_input
)

train_generator = train_datagen.flow_from_directory(
    f'{splitted_dataset_dir}/train',
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='categorical'
)

val_generator = val_test_datagen.flow_from_directory(
    f'{splitted_dataset_dir}/val',
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='categorical'
)

test_generator = val_test_datagen.flow_from_directory(
    f'{splitted_dataset_dir}/test',
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    shuffle=False
)

print("\nClass indices:", train_generator.class_indices)

# 5) Define EfficientNetV2-L Model
# **Step 5**:

In [None]:
# =============================
# Step 5: Define EfficientNetV2-L Model
# =============================
def create_efficientnetv2l_model(num_classes):
    base_model = EfficientNetV2L(weights='imagenet', include_top=False, input_shape=(IMG_HEIGHT, IMG_WIDTH, 3))
    base_model.trainable = False  # Freeze base model layers

    inputs = base_model.input
    x = base_model.output
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dense(2048, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    x = layers.Dense(1024, activation='relu')(x)
    x = layers.Dropout(0.3)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)

    model = models.Model(inputs=inputs, outputs=outputs)
    return model

model = create_efficientnetv2l_model(NUM_CLASSES)
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()

# 1) Define callbacks

In [None]:
# Define callbacks
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
checkpoint = tf.keras.callbacks.ModelCheckpoint(
    '/content/drive/MyDrive/Cocker_Spaniel_Emotions/efficientnetv2l_spectrogram_best.h5',
    monitor='val_accuracy', save_best_only=True
)
lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6)

# 1) Train the model

In [None]:
# Train the model
history = model.fit(
    train_generator,
    epochs=50,
    validation_data=val_generator,
    callbacks=[early_stopping, checkpoint, lr_scheduler]
)

# 1) Fine-tune: Unfreeze last 30 layers

In [None]:
# Fine-tune: Unfreeze last 30 layers
base_model = model.layers[0]
base_model.trainable = True
for layer in base_model.layers[:-30]:
    layer.trainable = False

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

fine_tune_epochs = 15
history_fine = model.fit(
    train_generator,
    epochs=fine_tune_epochs,
    validation_data=val_generator,
    callbacks=[early_stopping, checkpoint]
)

# 1) Evaluate on test set

In [None]:
# Evaluate on test set
test_loss, test_accuracy = model.evaluate(test_generator)
print(f"\n✅ Test Accuracy: {test_accuracy * 100:.2f}%")

# 6) Plot Training Metrics
# **Step 6**:

In [None]:
# =============================
# Step 6: Plot Training Metrics
# =============================
def plot_training_metrics(history, history_fine=None):
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']
    loss = history.history['loss']
    val_loss = history.history['val_loss']

    if history_fine:
        acc += history_fine.history['accuracy']
        val_acc += history_fine.history['val_accuracy']
        loss += history_fine.history['loss']
        val_loss += history_fine.history['val_loss']

    epochs_range = range(len(acc))

    plt.figure(figsize=(14, 5))
    plt.subplot(1, 2, 1)
    plt.plot(epochs_range, acc, label='Training Accuracy', marker='o')
    plt.plot(epochs_range, val_acc, label='Validation Accuracy', marker='x')
    plt.title('📈 Training & Validation Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid(True)

    plt.subplot(1, 2, 2)
    plt.plot(epochs_range, loss, label='Training Loss', marker='o', linestyle='--')
    plt.plot(epochs_range, val_loss, label='Validation Loss', marker='x', linestyle='--')
    plt.title('📉 Training & Validation Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

plot_training_metrics(history, history_fine)

# 7) Confusion Matrix and Classification Report
# **Step 7**:

In [None]:
# =============================
# Step 7: Confusion Matrix and Classification Report
# =============================
def plot_confusion_matrix(model, test_generator):
    class_names = list(test_generator.class_indices.keys())
    y_pred_probs = model.predict(test_generator)
    y_pred = np.argmax(y_pred_probs, axis=1)
    y_true = test_generator.classes

    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names)
    plt.title('Confusion Matrix 📊')
    plt.xlabel('Predicted Labels')
    plt.ylabel('True Labels')
    plt.xticks(rotation=45, ha='right')
    plt.show()

    print("Classification Report:")
    print(classification_report(y_true, y_pred, target_names=class_names))

plot_confusion_matrix(model, test_generator)

# 8) Grad-CAM for interpretability
# **Step 8**:

In [None]:
# =============================
# Step 8: Grad-CAM for Interpretability
# =============================
def get_gradcam_heatmap(model, img_array, last_conv_layer_name):
    grad_model = tf.keras.models.Model(
        [model.inputs], [model.get_layer(last_conv_layer_name).output, model.output]
    )

    with tf.GradientTape() as tape:
        conv_outputs, predictions = grad_model(img_array)
        predicted_class = tf.argmax(predictions[0])
        class_output = predictions[:, predicted_class]

    grads = tape.gradient(class_output, conv_outputs)
    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))

    conv_outputs = conv_outputs[0]
    heatmap = tf.reduce_mean(tf.multiply(conv_outputs, pooled_grads), axis=-1)
    heatmap = np.maximum(heatmap, 0) / np.max(heatmap)
    return heatmap

def display_gradcam(img_path, model, last_conv_layer_name):
    img = load_img(img_path, target_size=(IMG_HEIGHT, IMG_WIDTH))
    img_array = img_to_array(img)
    img_array = tf.keras.applications.efficientnet_v2.preprocess_input(img_array)
    img_array = np.expand_dims(img_array, axis=0)

    heatmap = get_gradcam_heatmap(model, img_array, last_conv_layer_name)
    heatmap = cv2.resize(heatmap, (IMG_WIDTH, IMG_HEIGHT))

    img = cv2.imread(img_path)
    img = cv2.resize(img, (IMG_WIDTH, IMG_HEIGHT))
    heatmap = np.uint8(255 * heatmap)
    heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)

    superimposed_img = heatmap * 0.4 + img
    superimposed_img = np.clip(superimposed_img, 0, 255).astype(np.uint8)

    plt.figure(figsize=(10, 5))
    plt.subplot(1, 2, 1)
    plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    plt.title('Original Spectrogram')
    plt.axis('off')

    plt.subplot(1, 2, 2)
    plt.imshow(cv2.cvtColor(superimposed_img, cv2.COLOR_BGR2RGB))
    plt.title('Grad-CAM Heatmap')
    plt.axis('off')

    plt.show()

# 1) Grad-CAM visualization

In [None]:
# Grad-CAM visualization
img_path = '/content/drive/MyDrive/Cocker_Spaniel_Emotions_Splitted/val/Happy/sample_spec.png'  # Update with actual path
display_gradcam(img_path, model, 'top_conv')  # Last conv layer in EfficientNetV2-L

# 9) Gradio Interface for Real-Time Prediction
# **Step 9**:


In [None]:
# =============================
# Step 9: Gradio Interface for Real-Time Prediction
# =============================
def predict_emotion(input_audio):
    # Save uploaded audio to a temporary file
    audio_path = "temp_audio.wav"
    with open(audio_path, "wb") as f:
        f.write(input_audio.read())

    # Convert audio to spectrogram
    spec_path = "temp_spec.png"
    generate_spectrogram(audio_path, spec_path)

    # Preprocess spectrogram image
    spec_img = load_img(spec_path, target_size=(IMG_HEIGHT, IMG_WIDTH))
    spec_array = img_to_array(spec_img)
    spec_array = tf.keras.applications.efficientnet_v2.preprocess_input(spec_array)
    spec_array = np.expand_dims(spec_array, axis=0)

    # Predict emotion
    pred = model.predict(spec_array)
    emotion = emotion_classes[np.argmax(pred)]
    confidence = np.max(pred) * 100

    return f"Predicted Emotion: {emotion} ({confidence:.2f}%)"

with gr.Blocks() as demo:
    gr.Markdown("## Cocker Spaniel Emotion Classification from Audio Spectrograms")
    audio_input = gr.Audio(label="Record/Upload Audio", type="file")
    predict_button = gr.Button("Predict")
    output = gr.Textbox(label="Prediction")

    predict_button.click(predict_emotion, inputs=audio_input, outputs=output)

demo.launch()

# 10) Convert to TensorFlow Lite
# **Step 10**:

In [None]:
# =============================
# Step 10: Convert to TensorFlow Lite
# =============================
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_model = converter.convert()
with open('/content/drive/MyDrive/Cocker_Spaniel_Emotions/efficientnetv2l_spectrogram.tflite', 'wb') as f:
    f.write(tflite_model)
print("✅ TFLite model saved successfully.")

# 1) Save the EfficientNetV2-L model

In [None]:
# Save the full EfficientNetV2-L model
model.save('/content/drive/MyDrive/Cocker_Spaniel_Emotions/efficientnetv2l_spectrogram_final.h5')
print("✅ Model saved successfully as 'efficientnetv2l_spectrogram_final.h5'")

# 1) Test prediction
## Test prediction on a single image

In [None]:
# Test prediction on a single image
def predict_single_image(img_path, model):
    img = image.load_img(img_path, target_size=(IMG_HEIGHT, IMG_WIDTH))
    img_array = image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0) / 255.0

    prediction = model.predict(img_array)
    predicted_class = emotion_classes[np.argmax(prediction)]

    plt.imshow(img)
    plt.title(f"Predicted Emotion: {predicted_class}")
    plt.axis('off')
    plt.show()

predict_single_image(img_path, model)

---
---
---
---
