# Problem Statement:

The objective of this project is to build an image classifier capable of distinguishing between cats and dogs using CNN.

Traditional ML models like Linear Regression or Logistic Regression are not suitable for image data because images contain complex features. Convolutional Neural Network (CNN) was selected since CNNs are designed to extract features like edges, textures, and shapes from images by processing the images in layers.

# Data Collection:

The dataset(https://www.kaggle.com/competitions/dogs-vs-cats-redux-kernels-edition/data) consists of labeled images stored in 'dataset' folder. Real-world image data containing pictures of cats and dogs was used. 

Binary Classification was done to classify the images using 'ImageFolder' class of PyTorch as follows:

Cat -> 0
Dog -> 1

# EDA & Preprocessing:

All the images are resized to 128 * 128 pixels.

To improve model generation, random horizontal flip was applied during training.

The images were converted into Tensors (Multi-dimensional arrays) and pixel values were scaled from 0 to 255 to 0 to 1.

Invalid or unreadable images were automatically ignored during dataset loading.

80% of images for dogs were used for training whereas the rest of the images were used for testing. The same train-test split was made for images of cats.

# Model Building:

1.We Input an Image with 3 Channels (Red, Green, Blue) of size 128 * 128 pixels.

2.First Convolution takes the image with 3 Channels and applies 32 filters (3*3 matrices) to obtain feature maps. ReLU is used to introduce non-linearity so that the network can learn curves and complex shapes.

3.Second Convolution takes the 32 filters from first convolution as input and applies 64 filters (3*3 matrices) to obtain feature maps. ReLU is used to introduce non-linearity so that the network can learn curves and complex shapes.

4.Third Convolution takes the 64 filters from second convolution as input and applies 128 filters (3*3 matrices) to obtain feature maps. ReLU is used to introduce non-linearity so that the network can learn curves and complex shapes. This helps identify more patterns.

5.After each Convolution,pooling is done to the feature maps to make them smaller and keep only the most important informations. we use Max Pooling (maximum value from each 2Ã—2 region) which halves the dimension each time it is applied.

6.After 3 pooling operations, 128 feature maps, each of size 16 * 16, are flattened into one long vector of 32,768 numbers. This vector is compressed into 256 numbers.

7.Again, in FC2 (Second Fully Connected Layer), the 256 values are compressed into 1 single number which is the model's prediction.

8.During training, dropout is applied which randomly turns off 50% of neurons, forcing the network to not rely on any single neuron and learn more features.

9.Finally, the model predicts using probability.

# Running the Model:

#### Python Version 3.11.9 is used since latest versions do not support PyTorch.

In [11]:
#For training:
!py -3.11 train.py

True
Epoch 1
Loss: 0.6641
Accuracy: 69.04%

Epoch 2
Loss: 0.5575
Accuracy: 75.26%

Epoch 3
Loss: 0.4882
Accuracy: 77.76%

Epoch 4
Loss: 0.4409
Accuracy: 79.54%

Epoch 5
Loss: 0.4035
Accuracy: 79.98%

Epoch 6
Loss: 0.3705
Accuracy: 83.02%

Epoch 7
Loss: 0.3411
Accuracy: 83.08%

Epoch 8
Loss: 0.3143
Accuracy: 83.16%

Epoch 9
Loss: 0.2819
Accuracy: 83.24%

Epoch 10
Loss: 0.2587
Accuracy: 83.70%

Epoch 11
Loss: 0.2350
Accuracy: 83.92%

Epoch 12
Loss: 0.2120
Accuracy: 85.08%

cnn.pth created!


In [10]:
#For prediction of image (You can edit the image name in line 17 of test.py to try on other images after adding the images in the folder with model.py, test.py and cnn.pth):
!py -3.11 test.py

It is a  dog.


  model.load_state_dict(torch.load("cnn.pth"))


# Model Evaluation:

In [8]:
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score
import numpy as np
from model import CNN

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor()
])

test_dataset = datasets.ImageFolder(
    "dataset/test",
    transform=transform
)

test_loader = DataLoader(
    test_dataset,
    batch_size=32,
    shuffle=False
)


classes = test_dataset.classes
print("Classes:", classes)

model = CNN().to(device)
model.load_state_dict(torch.load("cnn.pth", map_location=device))
model.eval()

all_preds = []
all_labels = []

with torch.no_grad():
    for images, labels in test_loader:

        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)

        
        preds = (outputs > 0.5).float()

        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

all_preds = np.array(all_preds).flatten()
all_labels = np.array(all_labels).flatten()

cm = confusion_matrix(all_labels, all_preds)

print("\nConfusion Matrix:")
print(cm)
print(" ")
for i, class_name in enumerate(classes):

    precision = precision_score(all_labels, all_preds, pos_label=i)
    recall = recall_score(all_labels, all_preds, pos_label=i)
    f1 = f1_score(all_labels, all_preds, pos_label=i)

    print(f"{class_name}:")
    print(f" Precision : {precision:.4f}")
    print(f" Recall    : {recall:.4f}")
    print(f" F1-Score  : {f1:.4f}\n")


Classes: ['cats', 'dogs']


  model.load_state_dict(torch.load("cnn.pth", map_location=device))



Confusion Matrix:
[[2145  355]
 [ 306 2194]]
 
cats:
 Precision : 0.8752
 Recall    : 0.8580
 F1-Score  : 0.8665

dogs:
 Precision : 0.8607
 Recall    : 0.8776
 F1-Score  : 0.8691



# Interpretation & Conclusion:

#### Results:
An accuracy of approximately 85.08% was seen during training with a loss of 0.2120. Dropout helped reduce overfitting and random horizontal flips during training improved generalization.

#### Insights:
The model identified different complex features of the image such as facial structures and shape patterns instead of relying on manually engineered features.

#### Conclusion:
Convolutional Neural Networks are better for image classification tasks. The trained model could accurately classify unseen images of cats and dogs. This model can be extended to multi-class image classification like done in ResNet-18 where 18 deep layers are used.