# Overview

This project involves the development and evaluation of deep learning models to classify the emotional expressions of pets from their facial images. The dataset consists of 1000 images representing a variety of pet species, including dogs, cats, rabbits, hamsters, sheep, horses, and birds. Each image captures different expressions such as happiness, sadness, and anger. By understanding the emotional states of pets, this project aims to contribute to their well-being and improve the bond between pets and their owners.

# Objectives

1. **Model Development**: Build and evaluate the performance of five well-known neural network architectures:
   - Visual Geometry Group (VGG)
   - Residual Neural Network (ResNet)
   - MobileNet
   - Inception V3
   - DenseNet121

2. **Performance Metrics**: Report the accuracy, mean average precision, recall, F1-score, and confusion matrix for each network.

3. **Data Augmentation**: Enhance the dataset using augmentation techniques to improve model robustness.

4. **Transfer Learning**: Apply transfer learning using pre-trained ImageNet weights to improve the performance of the best-performing model from the previous step.

# Approach

## 1. Dataset Preparation
- **Dataset Description**: The dataset contains 1000 pet face images with various expressions and species.
- **Preprocessing**: Normalize and resize images to a consistent dimension for model input.
- **Data Augmentation**: Apply transformations like rotation, flipping, scaling, and brightness adjustment to enrich the dataset.

## 2. Building Well-Known Networks
### Architectures
- **VGG**: A deep CNN architecture that uses small convolutional filters and is known for its simplicity and high accuracy.
- **ResNet**: Employs skip connections to mitigate vanishing gradient issues, enabling deeper networks.
- **MobileNet**: Optimized for mobile and embedded vision applications with lightweight architecture.
- **Inception V3**: A sophisticated architecture combining convolutions of varying sizes to capture features at multiple scales.
- **DenseNet121**: Encourages feature reuse by connecting each layer to every other layer in a feedforward manner.

### Evaluation Metrics
- **Accuracy**: Measures the percentage of correctly classified images.
- **Precision**: Indicates the proportion of true positive predictions among all positive predictions.
- **Recall**: Reflects the proportion of true positive predictions among all actual positives.
- **F1-score**: Harmonic mean of precision and recall, balancing both metrics.
- **Confusion Matrix**: Provides a detailed breakdown of prediction results, showing true positives, false positives, true negatives, and false negatives.

## 3. Transfer Learning
- **Objective**: Enhance the training of the best-performing network using weights pre-trained on the ImageNet dataset.
- **Benefits**:
  - Faster convergence.
  - Better generalization by leveraging features learned from a large and diverse dataset.
- **Implementation**: Use pre-built architectures available in Keras or PyTorch with ImageNet weights.

### 4. Comparative Analysis
- Compare the performance of the models trained from scratch versus those trained with transfer learning.
- Highlight improvements in metrics and discuss findings.


# Importing the necessary libraries 

In [1]:
import kagglehub 
import numpy as np
import cv2
import os
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout,Input
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping

# Dataset preparation

## Loading the dataset

In [2]:
path = kagglehub.dataset_download("anshtanwar/pets-facial-expression-dataset")

## loading images from folders and preprocessing them

resizing the images to (224,224) and normalizing the pixel values 

In [3]:
#function to load the images from folder
def load_images_from_folder(folder):
    images = []
    for filename in os.listdir(folder):
        img = cv2.imread(os.path.join(folder, filename))
        if img is not None:
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            img = cv2.resize(img, (224, 224))  # Resize to a fixed size for the model
            img = img.astype('float32') / 255.0  # Normalize to [0, 1]
            images.append(img)
    return images

happy_folder = "/kaggle/input/pets-facial-expression-dataset/happy"
sad_folder = "/kaggle/input/pets-facial-expression-dataset/Sad"
angry_folder = "/kaggle/input/pets-facial-expression-dataset/Angry"
other_folder = "/kaggle/input/pets-facial-expression-dataset/Other"

# loading the images from folders
happy_images = load_images_from_folder(happy_folder)
sad_images = load_images_from_folder(sad_folder)
angry_images = load_images_from_folder(angry_folder)
other_images = load_images_from_folder(other_folder)

# 0 --> happy , 1 --> sad , 2 -->angry , 3 --> other
labels_map = ["happy","sad","angry","other"]
labels = np.array([0] * len(happy_images) + [1] * len(sad_images) + [2] * len(angry_images) + [3] * len(other_images))
images = np.array(happy_images + sad_images + angry_images + other_images)

In [4]:
print("lenght of labels:",len(labels))
print("lenght of images:",len(images))

lenght of labels: 1000
lenght of images: 1000


**converting the labels to one hot encoded labels**

In [5]:
labels = to_categorical(labels, num_classes=4)

## splitting the data to train, validation and test (80,10,10)

In [6]:
img_train, x_temp, lbl_train, y_temp = train_test_split(images, labels, test_size=0.2, random_state=42)

img_val, img_test, lbl_val, lbl_test = train_test_split(x_temp, y_temp, test_size=0.5, random_state=42)

## Data augmentation

In [7]:
datagen = ImageDataGenerator(
    #rescale=1./255,  # Normalize pixel values to [0, 1]
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

def augment_images(images,labels,num_copies=10):
    augmented_images = []
    aug_labels = []
    for image,label in zip(images,labels):
        augmented_images.append(image)
        aug_labels.append(label)
        image = np.expand_dims(image,axis=0)

        augmented_image_gen = datagen.flow(image, batch_size=1)
        
        # Generate 'num_copies' augmented images for each original image
        for _ in range(num_copies):
            augmented_image = next(augmented_image_gen)  # Get next augmented image
            augmented_images.append(augmented_image[0])  # Add the augmented image to the list
            aug_labels.append(label)
    augmented_images = np.array(augmented_images)
    aug_labels = np.array(aug_labels)
    return augmented_images,aug_labels

In [8]:
aug_images,aug_labels = augment_images(img_train,lbl_train)

In [9]:
print("shape of augmented images:",aug_images.shape)
print("length of labels:",aug_labels.shape)

shape of augmented images: (8800, 224, 224, 3)
length of labels: (8800, 4)


# Building the models

## Building the VGG model

In [10]:
input_ = Input(shape=(224, 224, 3))

conv1 = Conv2D(filters=64, kernel_size=(3,3),padding='same',activation='relu') (input_)
conv2 = Conv2D(filters=64, kernel_size=(3,3),padding='same',activation='relu') (conv1)
pool1 = MaxPooling2D(pool_size=(2,2)) (conv2)

conv3 = Conv2D(filters=128, kernel_size=(3,3),padding='same',activation='relu') (pool1)
conv4 = Conv2D(filters=128, kernel_size=(3,3),padding='same',activation='relu') (conv3)
pool2 = MaxPooling2D(pool_size=(2,2)) (conv4)

conv5 = Conv2D(filters=256, kernel_size=(3,3),padding='same',activation='relu') (pool2)
conv6 = Conv2D(filters=256, kernel_size=(3,3),padding='same',activation='relu') (conv5)
conv7 = Conv2D(filters=256, kernel_size=(3,3),padding='same',activation='relu') (conv6)
pool3 = MaxPooling2D(pool_size=(2,2)) (conv7)

conv8 = Conv2D(filters=512, kernel_size=(3,3),padding='same',activation='relu') (pool3)
conv9 = Conv2D(filters=512, kernel_size=(3,3),padding='same',activation='relu') (conv8)
conv10 = Conv2D(filters=512, kernel_size=(3,3),padding='same',activation='relu') (conv9)
pool4 = MaxPooling2D(pool_size=(2,2)) (conv10)

conv11 = Conv2D(filters=512, kernel_size=(3,3),padding='same',activation='relu') (pool4)
conv12 = Conv2D(filters=512, kernel_size=(3,3),padding='same',activation='relu') (conv11)
conv13 = Conv2D(filters=512, kernel_size=(3,3),padding='same',activation='relu') (conv12)
pool5 = MaxPooling2D(pool_size=(2,2)) (conv13)

flat = Flatten()(pool5)
fc1 = Dense(4096, activation='relu') (flat)
#fc2 = Dense(2048,activation='relu') (fc1)
fc3 = Dense (1024,activation='relu') (fc1)
#fc4 = Dense (512,activation='relu') (fc3)
fc5 = Dense (256,activation='relu') (fc3)
#fc6 = Dense (128,activation='relu') (fc5)
fc7 = Dense (64,activation='relu') (fc5)
#fc8 = Dense (32,activation='relu') (fc7)
fc9 = Dense (16,activation='relu') (fc7)
output = Dense (4,activation='softmax') (fc9)

VGG16 = Model (inputs = input_ , outputs = output)

In [11]:
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

In [12]:
VGG16.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
VGG16.summary()

In [13]:
VGG16.fit(aug_images,aug_labels,
         validation_data = (img_val,lbl_val),
         epochs= 50,
         batch_size= 16,
         callbacks= [early_stopping])

Epoch 1/50
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m97s[0m 144ms/step - accuracy: 0.2600 - loss: 1.4062 - val_accuracy: 0.1800 - val_loss: 1.3889
Epoch 2/50
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m72s[0m 130ms/step - accuracy: 0.2651 - loss: 1.3860 - val_accuracy: 0.2600 - val_loss: 1.3891
Epoch 3/50
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m72s[0m 131ms/step - accuracy: 0.2430 - loss: 1.3865 - val_accuracy: 0.1800 - val_loss: 1.3888
Epoch 4/50
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m72s[0m 130ms/step - accuracy: 0.2582 - loss: 1.3862 - val_accuracy: 0.1800 - val_loss: 1.3895
Epoch 5/50
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m72s[0m 131ms/step - accuracy: 0.2605 - loss: 1.3860 - val_accuracy: 0.1800 - val_loss: 1.3887
Epoch 6/50
[1m550/550[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m72s[0m 130ms/step - accuracy: 0.2516 - loss: 1.3864 - val_accuracy: 0.1800 - val_loss: 1.3907
Epoch 7/50

<keras.src.callbacks.history.History at 0x7d79eb0bf5b0>