<a href="https://colab.research.google.com/github/RohitSen1235/BioSim/blob/main/catbreed_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##*Installing the necessary libraries for this project*##

In [None]:
!pip install imageio
!pip install scikit-image

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


##*connecting the notebook with your google drive*##
###*This is done so that you can access the data set that is stored in google drive*###

In [None]:
from google.colab import drive

drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


##*Declaring the path of the files in the drive*##
##*Also the path where the model will be saved*##

In [None]:
path="/content/drive/MyDrive/Fiverr/vascoreis753/CATS/"
model_dir='/content/drive/MyDrive/Fiverr/vascoreis753/'

##*Importing all the necessary libraries and modules for the program*##

In [None]:
import os
import numpy as np
import cv2
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
from keras.applications import VGG16
from keras.layers import Dense, Flatten, Dropout
from keras.models import Model, Sequential,load_model
from keras.optimizers import Adam
from keras.utils import to_categorical
from keras.callbacks import ModelCheckpoint

import matplotlib.pyplot as plt
import tensorflow as tf
from imageio import imread
from skimage.transform import resize

from sklearn.preprocessing import LabelEncoder

##*Creating an empty list to store all the name of breeds*##

In [None]:
# create a list to store the breeds
breeds = []

##*The naming convention of the dataset is as follows*##
####BreedName_number.jpg###
###the below code splits the name into two portions### 
####*1. before the   "_"*####
####*2. After the    "_"*#### 
###all unique strings before "_" are appended to the list of breeds###

In [None]:
# iterate through the images in the folder
for image in os.listdir(path):
    if image.endswith('.jpg') or image.endswith('.jpeg') or image.endswith('.png'):
        # extract the breed name from the image name
        breed = image.split("_")[0]
        # add the breed to the list if it is not already in the list
        if breed not in breeds:
            breeds.append(breed)


print(breeds)

['Maine', 'Persian', 'Bombay', 'Sphynx', 'Egyptian', 'Birman', 'British', 'Abyssinian', 'Ragdoll', 'Russian', 'Siamese', 'Bengal']


###you can see that there are 12 breeds in the given daaset###

###*creating a dictionary to map breed to an integer for computation purposes*###

In [None]:
# create a dictionary to map breed names to integers
breed_mapping = {breed: i for i, breed in enumerate(breeds)}
print(breed_mapping)

{'Maine': 0, 'Persian': 1, 'Bombay': 2, 'Sphynx': 3, 'Egyptian': 4, 'Birman': 5, 'British': 6, 'Abyssinian': 7, 'Ragdoll': 8, 'Russian': 9, 'Siamese': 10, 'Bengal': 11}


###*Two empty lists are created for storing the image data and the respective labels*###

In [None]:
# create the data and labels arrays
data = []
labels = []

###*Images in the dataset may be of varying size and resolution, to create a model for prediction using Neural Networks all the images should be of a standard size*###
####*- resolution of 224 x 224 was chosen for this project, because this works well with the VGG16 model we plan to use for this project*#### 
####*- all the images should be resized to this resolution before storing into data list*####

In [None]:
# iterate through the images in the folder
for image in os.listdir(path):
    if image.endswith('.jpg') or image.endswith('.jpeg') or image.endswith('.png'):
        # load the image
        img = cv2.imread(path+image)
        # resize the image
        try:
          # resizing the image
          img = cv2.resize(img, (224, 224))
          # add the image to the data list
          data.append(img)
          # extract the breed name from the image name
          breed = image.split("_")[0]
          # add the corresponding label to the labels list
          labels.append(breed_mapping[breed])
        except Exception as e:
          print(f"could not resize {image}")
       

        

could not resize Egyptian_Mau_191.jpg
could not resize Egyptian_Mau_177.jpg
could not resize Egyptian_Mau_139.jpg
could not resize Egyptian_Mau_145.jpg
could not resize Abyssinian_34.jpg
could not resize Egyptian_Mau_167.jpg


###*please note that not all images could be resized, some of the images which are shown above were not resizable hence we had to ignore those*### 

In [None]:
print(f"total nnumber of images available : {len(data)}")

total nnumber of images available : 2394


###*In order to train the model we require the data and labels as numpy arrays*###

####*- hence converting the python lists : data and labels in to numpy arrays*#### 

In [None]:
data = np.array(data)
labels = np.array(labels)
print(f"Data : {data.shape} , lables : {labels.shape}")

Data : (2394, 224, 224, 3) , lables : (2394,)


###*In order to make sure that the training set as well as the test set have representatives of all Breeds the data was split into numerous sample sizes randomly*###

####*Train and test set are later gathered from this randomized pool of data*####

In [None]:
from sklearn.model_selection import StratifiedShuffleSplit

# Your data and labels stored in X and y respectively
X = data
y = labels

# Define the number of splits you want
n_splits = 15

# Create an instance of the StratifiedShuffleSplit class
sss = StratifiedShuffleSplit(n_splits=n_splits, test_size=0.15, random_state=42)

# Use the split method to get the train and test indices
for train_index, test_index in sss.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

print(f"training set: {len(X_train)} , test set : {len(X_test)}")

training set: 2034 , test set : 360


###*download the VGG16 model and define the input data shape*###

In [None]:
# Create the base model of VGG16 model
vgg16_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

###*- Converting data to tensors is necessary as tensors are the basic data structure used in TensorFlow and are required for building and training a machine learning model.*###

###*- Normalizing the data by dividing it by 255.0 helps in scaling the values of the data between 0 and 1, which can help the model learn better.*###

###*- One-hot encoding the labels transforms the labels into a binary matrix representation where each column corresponds to a single category. This encoding is needed for multiclass classification problems as it allows the model to predict the class of a sample by providing the probability of each class.*###

In [None]:
X_train = tf.convert_to_tensor(X_train, dtype=tf.float32)

X_test = tf.convert_to_tensor(X_test, dtype=tf.float32)


X_train = X_train / 255.0
X_test = X_test / 255.0

# One-hot encode the labels
y_train = to_categorical(y_train, len(breeds))
y_test = to_categorical(y_test, len(breeds))


###*- This code block freezes the layers of the base VGG16 model, meaning that the weights of these layers will not be updated during training. This helps to prevent overfitting by retaining the pre-trained feature extraction capabilities of the base model.*###

In [None]:
# Freeze the layers of the base model
for layer in vgg16_model.layers:
    layer.trainable = False

###*- The code below creates a Sequential model in Keras.*###
###*- The VGG16 model is then added on top of the base model by calling model.add(vgg16_model).*###
###*- The Flatten layer is then added, which is used to flatten the multi-dimensional input from the previous layer into a one-dimensional array to be processed by the dense layers.*###
###*- The first Dense layer is added with 256 units and ReLU activation. The number of units determines the dimensionality of the output space and ReLU activation is used for the activation function.*###
###*- The Dropout layer is added with a rate of 0.5, which is used to prevent overfitting by randomly dropping some neurons during training.*###
###*- The final Dense layer is added with the number of units equal to the number of different breeds and with a Softmax activation function. The Softmax activation function is used to ensure that the sum of all outputs is 1, so that the outputs can be interpreted as probabilities.*###
###* -The model is then compiled by specifying the optimizer (Adam), loss function (categorical crossentropy), and evaluation metrics (accuracy).*###
###* -A ModelCheckpoint is created to save the best model after each epoch during training.*###
###* -Finally, the model is fit on the training data for 25 epochs with a batch size of 128, with the validation data and the ModelCheckpoint being passed to the fit method as arguments.*###

In [None]:
# Create a Sequential model
model = Sequential()

# Add the VGG16 model on top of the base model
model.add(vgg16_model)

model.add(Flatten())
# Add a Dense layer with 128 units and ReLU activation
model.add(Dense(256, activation='relu'))
# Add a Dropout layer with a rate of 0.5
model.add(Dropout(0.4))
# Add a Dense layer with the number of units equal to the number of breeds and Softmax activation
model.add(Dense(len(breeds), activation='softmax'))

# Compile the model
model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

# Create a checkpoint to save the best model
checkpoint = ModelCheckpoint(model_dir+'best_model.h5', save_best_only=True)

# Fit the model on the training data
model.fit(X_train, y_train, epochs=20, batch_size=128, validation_data=(X_test, y_test), callbacks=[checkpoint])


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7f6b545c1ca0>

###*Loading the model to use in predicting some images*###

In [None]:
#Load the best model
model = load_model('/content/drive/MyDrive/Fiverr/vascoreis753/best_model.h5')


###*loading the images for whihc I want to get the breed*###

In [None]:
#Load the image you want to classify
# Abyssinian
img1 = cv2.imread('/content/drive/MyDrive/Fiverr/vascoreis753/Abyssinian_13.jpg')
# Bengal
img2 = cv2.imread('/content/drive/MyDrive/Fiverr/vascoreis753/Bengal_109.jpg')
#Bombay
img3 = cv2.imread('/content/drive/MyDrive/Fiverr/vascoreis753/Bombay_14.jpg')
#Spynx
img4 = cv2.imread('/content/drive/MyDrive/Fiverr/vascoreis753/spynx.jpg')

###*Defined a function for finding the breed*###

In [None]:
def predict_breed(model, image):
  
  #Convert the image to a numpy array
  img = cv2.resize(image, (224, 224))
  
  #Expand the dimensions of the image to (1, 224, 224, 3)
  img = np.expand_dims(img, axis=0)
  
  result_prob = model.predict(img)
  result = result_prob.argmax(axis=-1)

  predicted_class_index = np.argmax(result_prob)

  #Get the label of the predicted breed
  predicted_breed = breeds[predicted_class_index]



  print(f"The predicted breed is : {predicted_breed}\n")



In [None]:
# should predict Bengal
predict_breed(model,img2)
# should predict Bombay
predict_breed(model,img3)
# Should predict Abyssinian
predict_breed(model,img1)
# Should predict Sphynx
predict_breed(model,img4)

The predicted breed is : Bengal

The predicted breed is : Bombay

The predicted breed is : Abyssinian

The predicted breed is : Sphynx



###*In conclusion, atleast for the cases tried so far the model seems to be predicting the correct breed*###
###*However please note that the model accuracy is approximately 69% hance it is bound to make some mistakes in general*###