# Image Classification for Alpaca Animal

In this project, we will classify the animal as Alpaca or not Aplaca based on the features found in image using Custom CNN modek. For this task we will develop a novel approach where the conv2d layer of a Convolution Neural Network will be customised. The CNN built only have one layer.

A **Convolutional Neural Network (CNN)** is a type of deep learning neural network designed for processing structured grid data, such as images and videos. It's particularly well-suited for tasks like image recognition and computer vision. CNNs are characterized by their use of convolutional layers, pooling layers, and fully connected layers. Here's a brief explanation of these key components:

**Convolutional Layers:** These layers apply small filters (also known as kernels) to small regions of the input data. By sliding these filters across the input, the network can detect patterns and features, such as edges, corners, and textures, at various scales.

**Pooling Layers:** Pooling layers reduce the spatial dimensions of the feature maps produced by convolutional layers. Max pooling, for example, selects the maximum value from a small region of the feature map. This helps reduce computational complexity and focus on the most important features.

**Fully Connected Layers:** After multiple convolutional and pooling layers, CNNs typically have one or more fully connected layers. These layers process the high-level features extracted from the earlier layers and produce the final output, which might be class probabilities for image classification tasks.

CNNs have revolutionized computer vision tasks by automatically learning hierarchical representations from raw pixel data, making them highly effective for tasks like image classification, object detection, and image segmentation. Their ability to capture local and global patterns within images has led to their widespread use in various applications beyond computer vision as well.

## Traditional CNN vs Proposed CNN
The key distinction between our proposed CNN and the traditional CNN lies in how we handle weight parameters. In a standard CNN, a fixed kernel is applied across the entire input data. However, in our novel approach, we use a weight matrix. This weight matrix is used dynamically, and for each operation, a locally positioned and orientation-specific kernel is employed to calculate the weighted sum. This approach allows for more flexibility and adaptability in capturing features from the input data.

## Dataset

We have 2 folders having different images of animals. One folder have all Alpaca's images and other folder have other animals' images. We will first load those images into our file, create a dataframe and label the images as 'alpaca' and 'not alpaca'.

## Preprocess Data

We will convert the images from RGB to Grayscale and normalize them using cv2 library build for computer vision.

In [1]:
import os
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split


# Step 1: Load and preprocess image data

input1 = 'alpaca/'
filename1 = []
for filename in os.listdir(input1):                   # reading the directiry
    filename1.append(input1 + filename)               # storing the file name data

input2 = 'not alpaca/'
filename2 = []
for filename in os.listdir(input2):
    filename2.append(input2 + filename) 
    

# Load and preprocess images
def load_and_preprocess_image(filename):
    image = cv2.imread(filename)                                                #reading the image data
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)                              # Convert from BGR to RGB 
    image = cv2.resize(image, (32, 32))                                         # Resize to 32x32
    image = image.astype('float32') / 255.0                                     # Normalize to [0, 1]
    return image

X1 = np.array([load_and_preprocess_image(filename) for filename in filename1])
Y1 = np.ones(len(X1), dtype=np.int32)                                           # generate 1

X2 = np.array([load_and_preprocess_image(filename) for filename in filename2])
Y2 = np.zeros(len(X2), dtype=np.int32)                                          # generate 2

# Concatenate and split data
X = np.concatenate((X1, X2), axis=0)                                            # combining the training data
Y = np.concatenate((Y1, Y2), axis=0)                                            # combining the testing data

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.20, random_state=42)
X_test, X_val, Y_test, Y_val = train_test_split(X_test, Y_test, test_size=0.50, random_state=42)


### Custom Layer Implementation

In [2]:
# single layer custom model

import os
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras import layers, models
import torch
import torch.nn as nn
import numpy as np
import torch.nn.functional as F

#labels = F.one_hot(labels, num_classes)


class ConvolutionalLayer(nn.Module):
    def __init__(self, input_size, num_channels, filter_size):
        super(ConvolutionalLayer, self).__init__()
        self.input_size = input_size
        self.num_channels = num_channels
        self.filter_size = filter_size
        #make weights matrix the same size as the input
        self.weight_matrix = nn.Parameter(torch.randn(batch_size, 32, 32, 3))
        self.output_size = (input_size[0] - filter_size[0] + 1, input_size[1] - filter_size[1] + 1)
        self.output_feature_map = torch.zeros((self.num_channels, self.output_size[0], self.output_size[1]))

    def forward(self, input_feature_map):
        batch_size = input_feature_map.size(0)
        output_feature_maps = []
        #in_channels = input_feature_map.shape[1]        # Check if the number of filters matches the number of input channels
        #assert self.num_channels == in_channels, "Number of filters must match the number of input channels"
    #following line to check if input_fm and weights same shape   
    # print('input_f_m: ', np.shape(input_feature_map), 'weights: ', np.shape(self.weight_matrix))
        for i in range(batch_size):
            output_feature_map = torch.zeros((self.num_channels, self.output_size[0], self.output_size[1]))
            for k in range(self.num_channels):
                for j in range(self.output_size[0]):
                    for l in range(self.output_size[1]):
                        #the same receptive field is applied to the weights as the input
                        receptive_field = input_feature_map[i, :, j:j+self.filter_size[0], l:l+self.filter_size[1]]
                        receptive_field_weight = self.weight_matrix[i, :, j:j+self.filter_size[0], l:l+self.filter_size[1]]
                       
                       # print(l, 'rf:', np.shape(receptive_field), 'rfw: ', np.shape(receptive_field_weight))
                        weighted_output = torch.sum(receptive_field * receptive_field_weight, dim=(1,2))
                        output_feature_map[k, j, l] = weighted_output[k]
            output_feature_maps.append(output_feature_map)
        output_feature_maps = torch.stack(output_feature_maps, dim=0)
        return output_feature_maps
    
    def backward(self, grad_output):
        batch_size = grad_output.size(0)
        grad_input = torch.zeros((batch_size, self.input_size[0], self.input_size[1], self.filter_size[0], self.filter_size[1]), device=self.weight_matrix.device)
        grad_weight = torch.zeros_like(self.weight_matrix)
        for i in range(batch_size):
            for k in range(self.num_channels):
                for j in range(self.output_size[0]):
                    for l in range(self.output_size[1]):
                        # compute the gradient of the output w.r.t. the receptive field
                        grad_weight[k] += grad_output[i, k, j, l] * self.input_feature_map[i, :, j:j+self.filter_size[0], l:l+self.filter_size[1]]
                        # compute the gradient of the output w.r.t. the input feature map
                        grad_input[i, :, j:j+self.filter_size[0], l:l+self.filter_size[1]] += grad_output[i, k, j, l] * self.weight_matrix[k]
        self.weight_matrix.grad = torch.sum(grad_weight, dim=0, keepdim=True)
        return grad_input
    

# Define the CNN architecture
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = ConvolutionalLayer(input_size=(16,16), num_channels=16, filter_size=(3, 3))
        #self.conv2 = ConvolutionalLayer(input_size=(32,32), num_channels=12, filter_size=(3, 3))
        #self.conv3 = ConvolutionalLayer(input_size=(32,32), num_channels=8, filter_size=(3, 3))
        self.pool = nn.MaxPool2d(2,2)
        self.activation = nn.ReLU()
        self.fc1 = nn.Linear(64 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, 2)
        self.output_activation = nn.Softmax(dim=1)

# leelakrishna's idea
    def forward(self, x):
        print('Input shape:', x.shape)
        x = self.pool(nn.functional.relu(self.conv1(x)))
       # print('Input shape:', x.shape)
       # x = self.pool(nn.functional.relu(self.conv2(x)))
       # print('Input shape:', x.shape)
       # x = self.pool(nn.functional.relu(self.conv2(x)))
       # print('Input shape:', x.shape)
        #x = x.view(-1, 64 * 8 * 8)
        x = x.view(x.size(0), -1)
        print('Input shape:', x.shape)
        x = nn.functional.relu((x))
        #print('Input shape:', x.shape)
        #x = self.fc2(x)
        return x

In [3]:
# Single Layer implementation


import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split


# Convert the data to PyTorch tensors and create PyTorch datasets
train_dataset = TensorDataset(torch.Tensor(X_train), torch.Tensor(Y_train))
val_dataset = TensorDataset(torch.Tensor(X_val), torch.Tensor(Y_val))

# Define the batch size for training and validation
batch_size = 5 #16

# Create data loaders for training and validation
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size)

# Define the model
model = CNN()

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
num_classes=2
# Train the model
num_epochs = 10
for epoch in range(num_epochs):
    # Training
    train_loss = 0.0
    train_acc = 0.0
    model.train() # Set the model to training mode
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        labels = labels.long()
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item() * images.size(0)
        _, preds = torch.max(outputs, 1)
        train_acc += torch.sum(preds == labels.long())
    train_loss /= len(train_loader.dataset)
    train_acc = train_acc.float() / len(train_loader.dataset)

    # Validation
    val_loss = 0.0
    val_acc = 0.0
    model.eval() # Set the model to evaluation mode
    with torch.no_grad():
        for images, labels in val_loader:
            outputs = model(images)
            loss = criterion(outputs, labels.long())
            val_loss += loss.item() * images.size(0)
            _, preds = torch.max(outputs, 1)
            val_acc += torch.sum(preds == labels.long())
        val_loss /= len(val_loader.dataset)
        val_acc = val_acc.float() / len(val_loader.dataset)

    # Print the loss and accuracy for this epoch
    print('Epoch [{}/{}], Train Loss: {:.4f}, Train Acc: {:.4f}, Val Loss: {:.4f}, Val Acc: {:.4f}'
          .format(epoch+1, num_epochs, train_loss, train_acc, val_loss, val_acc))


Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3]

Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([1, 32, 32, 3])
Input shape: torch.Size([1, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([3, 32, 32, 3])
Input shape: torch.Size([3, 784])
Epoch [2/10], Train Loss: 6.4112, Train Acc: 0.0268, Val Loss: 6.4547, Val Acc: 0.0000
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([

Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3]

Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Inpu

Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3]

Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Input shape: torch.Size([5, 32, 32, 3])
Input shape: torch.Size([5, 784])
Inpu

The accuracy results obtained from this model are not good but we might be able to obtain better results further as we improvise the model.

# How to change the code to work over 1-Dimensional data? 
We would only need 1 for loop in our custom layer and that just goes across one row. The pooling layer would only be 1 
dimension.

# How to change the code to work over 3-Dimensional data?
We would need another nested for loop in our custom layer to account for the extra dimension. The pooling layer would be 3 dimensions in the combined network.

# References

https://towardsdatascience.com/building-a-convolutional-neural-network-from-scratch-using-numpy-a22808a00a40

https://pyimagesearch.com/2021/07/19/pytorch-training-your-first-convolutional-neural-network-cnn/

https://medium.com/thecyphy/train-cnn-model-with-pytorch-21dafb918f48

https://www.analyticsvidhya.com/blog/2019/10/building-image-classification-models-cnn-pytorch/

https://www.tensorflow.org/tutorials/images/cnn

https://www.analyticsvidhya.com/blog/2020/09/overfitting-in-cnn-show-to-treat-overfitting-in-convolutional-neural-networks/