# Deep Learning - Image Classification - CNN - Fashion Image Dataset

### Author: Masood Ahmed


## Part 1: Image Classification using CNN

A zip file containing the dataset has been put in the google drive and then google drive is mounted. We are using this method becuase it is much more efficient to load data when using Google Colab.

In [None]:
# mounting google drive inorder to upload the dataset (will take about 1 minute for a dataset as large as 1GB - Atleast for me).
from google.colab import drive
drive.mount('/content/gdrive')

Using the OS library, we are just checking what directory structure we have

In [None]:
# To check the directories we have
import os

print(os.listdir())

In [None]:
# uploading data from google drive and /dev/null is used  to suppress the output
!unzip gdrive/My\ Drive/Advanced-DataAnalytics-Assignment2/Fashion-Product-Images.zip > /dev/null

We have also placed the test.csv and train.cvs in the google drive and we are using the locations of those files in the google colab to access those files and load it into the dataframes for further use.

In [None]:
# importing the train csv file from the google drive
import pandas as pd

train_path = '/content/gdrive/MyDrive/Advanced-DataAnalytics-Assignment2/train.csv'

train_df=pd.read_csv(train_path, sep='\t')
train_df

In [None]:
# importing the test csv file from the google drive

import pandas as pd

test_path = '/content/gdrive/MyDrive/Advanced-DataAnalytics-Assignment2/test.csv'

test_df=pd.read_csv(test_path, sep='\t')
test_df

Picking random picture from the dataset and checking it's dimensions and color channels

In [None]:
import imageio
import os
import glob
from collections import Counter
import random
import matplotlib.pyplot as plt
from PIL import Image
import cv2
# setting the seed for reproducibilty
myseed = 12345

from google.colab.patches import cv2_imshow

# let's take a look at one random image

random_pic_file = random.choice(os.listdir('Fashion-Product-Images/images/'))
pic = imageio.imread('Fashion-Product-Images/images/' + random_pic_file)
plt.imshow(pic)
height, width, channels = pic.shape
print(f'original height, width, and channels of each image: {height} {width} {channels}')

We are now adding an additional column which contains path to the images relative to the folder containing it.

In [None]:
# creating an other column in the dataframe for image paths
train_df['image-path'] = train_df.apply(lambda row: str(row['imageid']) + ".jpg", axis=1)
test_df['image-path'] = test_df.apply(lambda row: str(row['imageid']) + ".jpg", axis=1)

Changing the datatype of the imageid which will help us in the processing of teh data and taining of the data.

In [None]:
train_df = train_df.astype({'imageid':'string'})
print(train_df.dtypes)
print(train_df.head())

Doing encoding of labels from strings/objects to numerics for training purpose. If not done, the training model will crash.

In [None]:
from sklearn.preprocessing import LabelEncoder
import pandas as pd
import numpy as np

cat_cols = ['label']

# apply label encoder to categorical columns
le = LabelEncoder()
labels = le.fit_transform(train_df['label'])
train_df[cat_cols] = train_df[cat_cols].apply(lambda x: le.fit_transform(x))

# show the transformed dataframe
print(train_df.head())

## Encodings
# {0: Bags, 1: Bottomwear, 2: Eyewear, 3: Fragrance, 4: Innerwear, 5: Jewellery, 6: Makeup, 7: Others, 8: Sandal, 9: Shoes, 10: Topwear, 11: Wallets, 12: watches}

train_df['label'].unique()
unique_labels = np.unique(labels)
print(unique_labels)

Doing Train-Validation Split for 0.3 test size and shuffle true

In [None]:
from sklearn.model_selection import train_test_split

train_df, val_df = train_test_split(train_df, shuffle = True, test_size = 0.3)

Doing the same preprocessing on the test.csv as well.

In [None]:
test_df = test_df.astype({'imageid':'string'})
print(test_df.dtypes)
test_df.head()

In [None]:
# Import necessary packages.
import os
import numpy as np
import pandas as pd
import torch
from PIL import Image

import matplotlib.pyplot as plt
from torchvision.io import read_image
from torch.utils.data import Dataset
from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader, ConcatDataset, Subset, Dataset
from torchvision.datasets import DatasetFolder, VisionDataset
from torchvision.transforms import Resize
import torch.nn as nn
import torchvision.transforms as transforms

# This is for the progress bar.
from tqdm.auto import tqdm
import random


Here we are doing a basic setup of the pytorch for our CNN model.

**torch.backends.cudnn.deterministic = True:** This line ensures that the CuDNN backend uses deterministic algorithms for convolution operations. This can be important for ensuring reproducibility when training on GPUs, as the non-deterministic nature of GPU operations can lead to slightly different results each time the code is run.

**torch.backends.cudnn.benchmark = False:** This line disables the CuDNN benchmark mode, which is used to automatically find the best algorithms for convolution operations based on the input size and shape. While this can improve performance, it can also lead to slightly different results each time the code is run.

**np.random.seed(myseed):** This sets the seed for the NumPy random number generator, which is used for some random operations in the code.

**torch.manual_seed(myseed):** This sets the seed for the PyTorch random number generator, which is used for other random operations in the code.

**if torch.cuda.is_available(): torch.cuda.manual_seed_all(myseed):** This sets the seed for the PyTorch CUDA random number generator, which is used for random operations when running the code on a GPU. If a GPU is not available, this line is skipped. By setting the same seed for all three random number generators, the code will produce the same results each time it is run, assuming that all other factors (such as the input data and model architecture) remain the same.

In [None]:
# basic setup for PyTorch
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(myseed)

transform.compose function is used to do data-processing for example resizing the images, making sure each image have same color channels and so on.

In [None]:

train_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor
])


val_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor
])

test_tfm = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor

])

### Custom Dataset and Datalaoder

We are creating a custom dataset class which is inherited from python's class called dataset. We have to do this because we have a different structure of images and labels therefore we need to adjust the class according to our needs. 

In init function, we load the csv files with labels and the image directory as well. We have also added a transform variable to support any transform that we do.

In [None]:
image_path = 'Fashion-Product-Images/images/'

class CustomImageDataset_from_csv(Dataset):
    def __init__(self, dataframe , img_dir ,  transform = None , label_transform = None):
        self.img_labels = dataframe #pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.label_transform = label_transform
        
    def __len__(self):
        return len(self.img_labels)
    
    def __getitem__(self , idx):
        img_path = os.path.join(self.img_dir , self.img_labels.iloc[idx, 3])
        print(img_path)
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform is not None:
            image = self.transform(image)
        if self.label_transform:
            label = self.target_transform(label)

        return(image, label)

In [None]:
image_path = 'Fashion-Product-Images/images/'

class CustomImageDataset_from_csv_test(Dataset):
    def __init__(self, dataframe , img_dir ,  transform = None , label_transform = None):
        self.img_labels = dataframe #pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.label_transform = label_transform
        
    def __len__(self):
        return len(self.img_labels)
    
    def __getitem__(self , idx):
        img_path = os.path.join(self.img_dir , self.img_labels.iloc[idx, 3])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform is not None:
            image = self.transform(image)
        if self.label_transform:
            label = self.target_transform(label)

        return(image, label, img_path)

# Convolutional Neural Network Class

### Simple Explanation: 

We are creating our CNN class which is inherited from nn.Module class of python. In the init class, we specify all our layers. Here we have applied the Conv2d filter, batch normalization, relu layer to bring the non-linearity, then we add the maxpooling layer with a kernel size of 2 which will reduce the width and height of the convolutional output by a factor of 2. Then we add multiple layers adjusting the input_channels and output_channels.

Lastly we add the fully connected layer where we feed the number of input feature. Then we define the forward function where we pass the informtion from the above layers.

### Motivation: 

The motivation behind this CNN model is to perform image classification on a dataset with 13 different classes. The model consists of several convolutional layers with batch normalization, ReLU activation, and max-pooling operations, followed by fully connected layers to classify the input images.

The first layer of the model is a convolutional layer with 64 filters and a 3x3 kernel size, which takes the input image with 3 channels and outputs feature maps of size 64x128x128. The output feature maps are then normalized using batch normalization, and the ReLU activation function is applied to introduce non-linearity. Then, a max-pooling layer with a 2x2 kernel size is used to downsample the feature maps, reducing the spatial dimensions by half and producing feature maps of size 64x64x64.

The same process is repeated with the second, third, fourth, and fifth convolutional layers, with increasing numbers of filters and decreasing spatial dimensions. The final convolutional layer produces feature maps of size 512x4x4.

After the convolutional layers, the output feature maps are flattened and fed into a fully connected neural network consisting of three linear layers with ReLU activation functions. The final output layer has 13 units, corresponding to the 13 different classes in the dataset.

Overall, the motivation behind this CNN model is to leverage the power of convolutional layers to extract meaningful features from the input images and to use fully connected layers to classify these features into different classes. The use of batch normalization and ReLU activation helps improve the stability and performance of the model.

In [None]:
class FirstCNN(nn.Module):
    def __init__(self):
        super(FirstCNN, self).__init__()
       
        # input size [3, 128, 128]

        self.cnn = nn.Sequential(
            nn.Conv2d(3, 64, 3, 1, 1),  # [64, 128, 128]
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [64, 64, 64]

            nn.Conv2d(64, 128, 3, 1, 1), # [128, 64, 64]
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [128, 32, 32]

            nn.Conv2d(128, 256, 3, 1, 1), # [256, 32, 32]
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [256, 16, 16]

            nn.Conv2d(256, 512, 3, 1, 1), # [512, 16, 16]
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),       # [512, 8, 8]
            
            nn.Conv2d(512, 512, 3, 1, 1), # [512, 8, 8]
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),       # [512, 4, 4]
        )
        self.fc = nn.Sequential(
            nn.Linear(512*4*4, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 13)
        )

        
    def forward(self, x):
        out = self.cnn(x)
        out = out.view(out.size()[0], -1)
        return self.fc(out)



### Dataloaders

In pytorch we feed the data in the form of dataloaders in batches for training. As we cannot feed all the images as once, as it will cause a memory overload, and as it is not an optimize way in deep learning, therefore we will use batches. In the datalaoder, we have made shuffle to true to ensure that our model is not biased to any categories. Our batch_size (hyperparameter) is 64

In [None]:
_exp_name = "sample_1"
batch_size = 64

train_data = CustomImageDataset_from_csv(train_df , image_path , transform = train_tfm)
train_dataloader = DataLoader(train_data, batch_size = batch_size , shuffle = True, num_workers=0, pin_memory=True)

val_data = CustomImageDataset_from_csv(val_df , image_path , transform = val_tfm)
val_dataloader = DataLoader(val_data, batch_size = batch_size , shuffle = True, num_workers=0, pin_memory=True)

test_data = CustomImageDataset_from_csv_test(test_df , image_path , transform = test_tfm)
test_dataloader = DataLoader(test_data, batch_size = batch_size , shuffle = False, num_workers=0, pin_memory=True)

In [None]:
# Selecting which device to use for training. It would be better to use cuda if GPU is available.
device = "cuda" if torch.cuda.is_available() else "cpu"

In [None]:
# Initializing the model, and put it on the device specified.
model = FirstCNN().to(device)

Setting up the optimizer, we are using adam optimizer and for the loss we are using cross entropy loss.

In [None]:
# Initialize optimizer!
optimizer = torch.optim.Adam(model.parameters(), lr=0.0003, weight_decay=1e-5) 

# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()

Here we are running the model and we will save the model for the epoch which will provide us the best training accuracy!

### Brief Description:

This code trains a convolutional neural network (CNN) using PyTorch for image classification. It first sets the number of training epochs and patience (the number of epochs with no improvement before stopping early). It then initializes the CNN model and puts it on the specified device (e.g., CPU or GPU).

The code then iterates through each epoch and trains the model on the training dataset. For each batch of images and labels, the model computes the forward pass to generate logits, calculates the cross-entropy loss, computes the gradients, clips the gradient norms for stable training, updates the parameters with computed gradients, and records the loss and accuracy. The model then evaluates on the validation dataset by iterating through each batch and recording the loss and accuracy. If the validation accuracy improves, the best model is saved, and the early stop counter is reset. If there is no improvement in the validation accuracy for patience consecutive epochs, the training is stopped early.

In [None]:
# The number of training epochs (hyperparameters) and patience.
n_epochs = 4
patience = 300 # If no improvement in 'patience' epochs, early stop


# Initializing trackers
stale = 0
best_acc = 0

for epoch in range(n_epochs):

    # ---------- Training ----------
    ## Evaluation and training on training dataset.

    model.train()        # model is in train mode before training.

    # These are used to record information in training.
    train_loss = []
    train_accs = []

    for batch in tqdm(train_dataloader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        imgs = imgs.to(device)  # move images to device
        labels = torch.tensor(labels).to(device)  # move labels to device

        # Forward the data
        logits = model(imgs)

        # Calculate the cross-entropy loss.
        loss = criterion(logits, labels)

        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()
        acc = (logits.argmax(dim=-1) == labels).float().mean()

        # Record the loss and accuracy.
        train_loss.append(loss.item())
        train_accs.append(acc)
        
    train_loss = sum(train_loss) / len(train_loss)
    train_acc = sum(train_accs) / len(train_accs)

    # Print the information.
    print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")

    # ---------- Validation ----------
    ## Evaluation and testing on validation dataset

    model.eval()       # model is in eval mode

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []

    # Iterate the validation set by batches.
    for batch in tqdm(val_dataloader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        with torch.no_grad():
            logits = model(imgs.to(device))

        loss = criterion(logits, labels.to(device))

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        valid_loss.append(loss.item())
        valid_accs.append(acc)
        #break

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)

    # Print the information.
    print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")

    # update logs
    if valid_acc > best_acc:
        with open(f"./{_exp_name}_log.txt","a"):
            print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f} -> best")
    else:
        with open(f"./{_exp_name}_log.txt","a"):
            print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")


    # save models
    if valid_acc > best_acc:
        print(f"Best model found at epoch {epoch}, saving model")
        torch.save(model.state_dict(), f"{_exp_name}_best.ckpt") # only save best to prevent output memory exceed error
        best_acc = valid_acc
        stale = 0
    else:
        stale += 1
        if stale > patience:
            print(f"No improvment {patience} consecutive epochs, early stopping")
            break

#### making a csv file with prediction

In [None]:
model_best = FirstCNN().to(device)
model_best.load_state_dict(torch.load(f"{_exp_name}_best.ckpt"))
model_best.eval()

prediction = []
image_paths = []

## Encodings
# {0: Bags, 1: Bottomwear, 2: Eyewear, 3: Fragrance, 4: Innerwear, 5: Jewellery, 6: Makeup, 7: Others, 8: Sandal, 9: Shoes, 10: Topwear, 11: Wallets, 12: watches}


with torch.no_grad():
    for data,_, img in test_dataloader:
        test_pred = model_best(data.to(device))
        test_label = np.argmax(test_pred.cpu().data.numpy(), axis=1)
        prediction += test_label.squeeze().tolist()
        image_paths += list(img)


print(prediction)
print(image_paths)
mapping = {0: "Bags", 1: "Bottomwear", 2: "Eyewear", 3: "Fragrance", 4: "Innerwear", 5: "Jewellery", 6: "Makeup", 7: "Others", 8: "Sandal", 9: "Shoes", 10: "Topwear", 11: "Wallets", 12: "watches" }

# now creating a prediction csv
df = pd.DataFrame()

for i in range(len(prediction)):
  # append rows to an empty DataFrame
  df = df.append({'image-path' : image_paths[i], 'encoded-label' : prediction[i], 'label-decoded' : mapping[prediction[i]]},
        ignore_index = True)

df.to_csv("prediction_1.csv",index = False)

## Conclusion:

Our model provides us with an accuracy of 95 percent which was found at 3rd epoch therefore we saved that model. The training time was quite high due to large number of images. It was as high as 8-10 hours (depending on what GPU/CPU are you using). Overall, the accuracy given by the model is pretty high, especially for the given dataset related to fashion images as it was aimimg at distinguishing between very similar fashion items such as shoes and sandals and innerwear and outerwear and so on.

# Part 2: Improved Image Classification

### Tuning one hyper-parameter and explain why this is worth to tune.

## Hyperparameter Choosen:

I am choosing the batch size hyperparameter to tune my model. Given that the current experiment takes 8 hours to train and has a batch size of 64 and the experiment with batch size of 64 has an accuracy of 95%, I believe that increasing batch size would help us reduce the training time a lot.

When adjusting the batch size, I tried increasing it by a factor of 2 and then by 4 and observed the effect on model performance. By increase the batch size, I was able to speed up training time as more data could be processed in parallel. However, a larger batch size may also lead to a decrease in model performance as the gradient estimate becomes less accurate.

##### Repeating the same training process as above with a batch size of 128 and 256 respectively!

In [None]:
# Imports
import pandas as pd
import imageio
import os
import glob
from collections import Counter
import matplotlib.pyplot as plt
from PIL import Image
import cv2
# setting the seed for reproducibilty
myseed = 12345

from google.colab.patches import cv2_imshow

from sklearn.preprocessing import LabelEncoder
import numpy as np

from sklearn.model_selection import train_test_split

import torch

from torchvision.io import read_image
from torch.utils.data import Dataset
from torch.utils.data import DataLoader, ConcatDataset, Subset, Dataset
from torchvision.datasets import DatasetFolder, VisionDataset
from torchvision.transforms import Resize
import torch.nn as nn
import torchvision.transforms as transforms

from tqdm.auto import tqdm
import random


In [None]:
test_path = '/content/gdrive/MyDrive/Advanced-DataAnalytics-Assignment2/test.csv'

test_df=pd.read_csv(test_path, sep='\t')

train_path = '/content/gdrive/MyDrive/Advanced-DataAnalytics-Assignment2/train.csv'

train_df=pd.read_csv(train_path, sep='\t')

train_df['image-path'] = train_df.apply(lambda row: str(row['imageid']) + ".jpg", axis=1)
test_df['image-path'] = test_df.apply(lambda row: str(row['imageid']) + ".jpg", axis=1)

train_df = train_df.astype({'imageid':'string'})
test_df = test_df.astype({'imageid':'string'})

cat_cols = ['label']

# apply label encoder to categorical columns
le = LabelEncoder()
labels = le.fit_transform(train_df['label'])
train_df[cat_cols] = train_df[cat_cols].apply(lambda x: le.fit_transform(x))

## Encodings
# {0: Bags, 1: Bottomwear, 2: Eyewear, 3: Fragrance, 4: Innerwear, 5: Jewellery, 6: Makeup, 7: Others, 8: Sandal, 9: Shoes, 10: Topwear, 11: Wallets, 12: watches}

train_df['label'].unique()
unique_labels = np.unique(labels)

train_df, val_df = train_test_split(train_df, shuffle = True, test_size = 0.3)

# basic setup for PyTorch
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(myseed)

  
train_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor
])


val_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor
])

test_tfm = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor

])

image_path = 'Fashion-Product-Images/images/'

class CustomImageDataset_from_csv(Dataset):
    def __init__(self, dataframe , img_dir ,  transform = None , label_transform = None):
        self.img_labels = dataframe #pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.label_transform = label_transform
        
    def __len__(self):
        return len(self.img_labels)
    
    def __getitem__(self , idx):
        img_path = os.path.join(self.img_dir , self.img_labels.iloc[idx, 3])
        print(img_path)
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform is not None:
            image = self.transform(image)
        if self.label_transform:
            label = self.target_transform(label)

        return(image, label)

class CustomImageDataset_from_csv_test(Dataset):
    def __init__(self, dataframe , img_dir ,  transform = None , label_transform = None):
        self.img_labels = dataframe #pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.label_transform = label_transform
        
    def __len__(self):
        return len(self.img_labels)
    
    def __getitem__(self , idx):
        img_path = os.path.join(self.img_dir , self.img_labels.iloc[idx, 3])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform is not None:
            image = self.transform(image)
        if self.label_transform:
            label = self.target_transform(label)

        return(image, label, img_path)

class FirstCNN(nn.Module):
    def __init__(self):
        super(FirstCNN, self).__init__()
       
        # input size [3, 128, 128]

        self.cnn = nn.Sequential(
            nn.Conv2d(3, 64, 3, 1, 1),  # [64, 128, 128]
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [64, 64, 64]

            nn.Conv2d(64, 128, 3, 1, 1), # [128, 64, 64]
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [128, 32, 32]

            nn.Conv2d(128, 256, 3, 1, 1), # [256, 32, 32]
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [256, 16, 16]

            nn.Conv2d(256, 512, 3, 1, 1), # [512, 16, 16]
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),       # [512, 8, 8]
            
            nn.Conv2d(512, 512, 3, 1, 1), # [512, 8, 8]
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),       # [512, 4, 4]
        )
        self.fc = nn.Sequential(
            nn.Linear(512*4*4, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 13)
        )

        
    def forward(self, x):
        out = self.cnn(x)
        out = out.view(out.size()[0], -1)
        return self.fc(out)



## Tuning batch size and putting it 128 now!

In [None]:
_exp_name = "sample_2"
batch_size = 128

train_data = CustomImageDataset_from_csv(train_df , image_path , transform = train_tfm)
train_dataloader = DataLoader(train_data, batch_size = batch_size , shuffle = True, num_workers=0, pin_memory=True)

val_data = CustomImageDataset_from_csv(val_df , image_path , transform = val_tfm)
val_dataloader = DataLoader(val_data, batch_size = batch_size , shuffle = True, num_workers=0, pin_memory=True)

test_data = CustomImageDataset_from_csv_test(test_df , image_path , transform = test_tfm)
test_dataloader = DataLoader(test_data, batch_size = batch_size , shuffle = False, num_workers=0, pin_memory=True)

In [None]:
# Selecting which device to use for training. It would be better to use cuda if GPU is available.
device = "cuda" if torch.cuda.is_available() else "cpu"

# Initializing the model, and put it on the device specified.
model = FirstCNN().to(device)

# Initialize optimizer!
optimizer = torch.optim.Adam(model.parameters(), lr=0.0003, weight_decay=1e-5) 

# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()

# The number of training epochs (hyperparameters) and patience.
n_epochs = 4
patience = 300 # If no improvement in 'patience' epochs, early stop


# Initializing trackers
stale = 0
best_acc = 0

for epoch in range(n_epochs):

    # ---------- Training ----------
    ## Evaluation and training on training dataset.

    model.train()        # model is in train mode before training.

    # These are used to record information in training.
    train_loss = []
    train_accs = []

    for batch in tqdm(train_dataloader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        imgs = imgs.to(device)  # move images to device
        labels = torch.tensor(labels).to(device)  # move labels to device

        # Forward the data
        logits = model(imgs)

        # Calculate the cross-entropy loss.
        loss = criterion(logits, labels)

        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()
        acc = (logits.argmax(dim=-1) == labels).float().mean()

        # Record the loss and accuracy.
        train_loss.append(loss.item())
        train_accs.append(acc)
        
    train_loss = sum(train_loss) / len(train_loss)
    train_acc = sum(train_accs) / len(train_accs)

    # Print the information.
    print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")

    # ---------- Validation ----------
    ## Evaluation and testing on validation dataset

    model.eval()       # model is in eval mode

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []

    # Iterate the validation set by batches.
    for batch in tqdm(val_dataloader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        with torch.no_grad():
            logits = model(imgs.to(device))

        loss = criterion(logits, labels.to(device))

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        valid_loss.append(loss.item())
        valid_accs.append(acc)
        #break

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)

    # Print the information.
    print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")

    # update logs
    if valid_acc > best_acc:
        with open(f"./{_exp_name}_log.txt","a"):
            print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f} -> best")
    else:
        with open(f"./{_exp_name}_log.txt","a"):
            print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")


    # save models
    if valid_acc > best_acc:
        print(f"Best model found at epoch {epoch}, saving model")
        torch.save(model.state_dict(), f"{_exp_name}_best.ckpt") # only save best to prevent output memory exceed error
        best_acc = valid_acc
        stale = 0
    else:
        stale += 1
        if stale > patience:
            print(f"No improvment {patience} consecutive epochs, early stopping")
            break

In [None]:
model_best = FirstCNN().to(device)
model_best.load_state_dict(torch.load(f"{_exp_name}_best.ckpt"))
model_best.eval()

prediction = []
image_paths = []

## Encodings
# {0: Bags, 1: Bottomwear, 2: Eyewear, 3: Fragrance, 4: Innerwear, 5: Jewellery, 6: Makeup, 7: Others, 8: Sandal, 9: Shoes, 10: Topwear, 11: Wallets, 12: watches}


with torch.no_grad():
    for data,_, img in test_dataloader:
        test_pred = model_best(data.to(device))
        test_label = np.argmax(test_pred.cpu().data.numpy(), axis=1)
        prediction += test_label.squeeze().tolist()
        image_paths += list(img)


print(prediction)
print(image_paths)
mapping = {0: "Bags", 1: "Bottomwear", 2: "Eyewear", 3: "Fragrance", 4: "Innerwear", 5: "Jewellery", 6: "Makeup", 7: "Others", 8: "Sandal", 9: "Shoes", 10: "Topwear", 11: "Wallets", 12: "watches" }

# now creating a prediction csv
df = pd.DataFrame()

for i in range(len(prediction)):
  # append rows to an empty DataFrame
  df = df.append({'image-path' : image_paths[i], 'encoded-label' : prediction[i], 'label-decoded' : mapping[prediction[i]]},
        ignore_index = True)

df.to_csv("prediction_2.csv",index = False)

## Conclusion: 

In conclusion, the comparison between two CNN image classification models with different batch sizes (hyper-parameter) and training times showed that a longer training time of 8 hours with a smaller batch size of 64 resulted in a higher accuracy of 95% when compared to a shorter training time of 6 hours with a larger batch size of 128, which resulted in an accuracy of 93%. Although the dataset used contained 40000 data-points, it was not sufficient to offset the negative impact of the larger batch size on the model's accuracy. Therefore, choosing an appropriate batch size and training time is critical for optimizing the performance and accuracy of CNN image classification models, depending on the specific dataset and available computational resources. 

In my case, as I was working on my personal machine which was being used for other purposes as well, I believe a reduction in training time will benefit a lot given that there is not much reduction in accuracy of the model.

#### Now the following script is for increasing the batch-size from 128 to 256 and seeing how the results are!

In [None]:
# Imports
import pandas as pd
import imageio
import os
import glob
from collections import Counter
import matplotlib.pyplot as plt
from PIL import Image
import cv2
# setting the seed for reproducibilty
myseed = 12345

from google.colab.patches import cv2_imshow

from sklearn.preprocessing import LabelEncoder
import numpy as np

from sklearn.model_selection import train_test_split

import torch

from torchvision.io import read_image
from torch.utils.data import Dataset
from torch.utils.data import DataLoader, ConcatDataset, Subset, Dataset
from torchvision.datasets import DatasetFolder, VisionDataset
from torchvision.transforms import Resize
import torch.nn as nn
import torchvision.transforms as transforms

from tqdm.auto import tqdm
import random


In [None]:
test_path = '/content/gdrive/MyDrive/Advanced-DataAnalytics-Assignment2/test.csv'

test_df=pd.read_csv(test_path, sep='\t')

train_path = '/content/gdrive/MyDrive/Advanced-DataAnalytics-Assignment2/train.csv'

train_df=pd.read_csv(train_path, sep='\t')

train_df['image-path'] = train_df.apply(lambda row: str(row['imageid']) + ".jpg", axis=1)
test_df['image-path'] = test_df.apply(lambda row: str(row['imageid']) + ".jpg", axis=1)

train_df = train_df.astype({'imageid':'string'})
test_df = test_df.astype({'imageid':'string'})

cat_cols = ['label']

# apply label encoder to categorical columns
le = LabelEncoder()
labels = le.fit_transform(train_df['label'])
train_df[cat_cols] = train_df[cat_cols].apply(lambda x: le.fit_transform(x))

## Encodings
# {0: Bags, 1: Bottomwear, 2: Eyewear, 3: Fragrance, 4: Innerwear, 5: Jewellery, 6: Makeup, 7: Others, 8: Sandal, 9: Shoes, 10: Topwear, 11: Wallets, 12: watches}

train_df['label'].unique()
unique_labels = np.unique(labels)

train_df, val_df = train_test_split(train_df, shuffle = True, test_size = 0.3)

# basic setup for PyTorch
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(myseed)

  
train_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor
])


val_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor
])

test_tfm = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor

])

image_path = 'Fashion-Product-Images/images/'

class CustomImageDataset_from_csv(Dataset):
    def __init__(self, dataframe , img_dir ,  transform = None , label_transform = None):
        self.img_labels = dataframe #pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.label_transform = label_transform
        
    def __len__(self):
        return len(self.img_labels)
    
    def __getitem__(self , idx):
        img_path = os.path.join(self.img_dir , self.img_labels.iloc[idx, 3])
        print(img_path)
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform is not None:
            image = self.transform(image)
        if self.label_transform:
            label = self.target_transform(label)

        return(image, label)

class CustomImageDataset_from_csv_test(Dataset):
    def __init__(self, dataframe , img_dir ,  transform = None , label_transform = None):
        self.img_labels = dataframe #pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.label_transform = label_transform
        
    def __len__(self):
        return len(self.img_labels)
    
    def __getitem__(self , idx):
        img_path = os.path.join(self.img_dir , self.img_labels.iloc[idx, 3])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform is not None:
            image = self.transform(image)
        if self.label_transform:
            label = self.target_transform(label)

        return(image, label, img_path)

class FirstCNN(nn.Module):
    def __init__(self):
        super(FirstCNN, self).__init__()
       
        # input size [3, 128, 128]

        self.cnn = nn.Sequential(
            nn.Conv2d(3, 64, 3, 1, 1),  # [64, 128, 128]
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [64, 64, 64]

            nn.Conv2d(64, 128, 3, 1, 1), # [128, 64, 64]
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [128, 32, 32]

            nn.Conv2d(128, 256, 3, 1, 1), # [256, 32, 32]
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [256, 16, 16]

            nn.Conv2d(256, 512, 3, 1, 1), # [512, 16, 16]
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),       # [512, 8, 8]
            
            nn.Conv2d(512, 512, 3, 1, 1), # [512, 8, 8]
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),       # [512, 4, 4]
        )
        self.fc = nn.Sequential(
            nn.Linear(512*4*4, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 13)
        )

        
    def forward(self, x):
        out = self.cnn(x)
        out = out.view(out.size()[0], -1)
        return self.fc(out)



In [None]:
_exp_name = "sample_3"
batch_size = 256

train_data = CustomImageDataset_from_csv(train_df , image_path , transform = train_tfm)
train_dataloader = DataLoader(train_data, batch_size = batch_size , shuffle = True, num_workers=0, pin_memory=True)

val_data = CustomImageDataset_from_csv(val_df , image_path , transform = val_tfm)
val_dataloader = DataLoader(val_data, batch_size = batch_size , shuffle = True, num_workers=0, pin_memory=True)

test_data = CustomImageDataset_from_csv_test(test_df , image_path , transform = test_tfm)
test_dataloader = DataLoader(test_data, batch_size = batch_size , shuffle = False, num_workers=0, pin_memory=True)

In [None]:
# Selecting which device to use for training. It would be better to use cuda if GPU is available.
device = "cuda" if torch.cuda.is_available() else "cpu"

# Initializing the model, and put it on the device specified.
model = FirstCNN().to(device)

# Initialize optimizer!
optimizer = torch.optim.Adam(model.parameters(), lr=0.0003, weight_decay=1e-5) 

# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()

# The number of training epochs (hyperparameters) and patience.
n_epochs = 4
patience = 300 # If no improvement in 'patience' epochs, early stop


# Initializing trackers
stale = 0
best_acc = 0

for epoch in range(n_epochs):

    # ---------- Training ----------
    ## Evaluation and training on training dataset.

    model.train()        # model is in train mode before training.

    # These are used to record information in training.
    train_loss = []
    train_accs = []

    for batch in tqdm(train_dataloader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        imgs = imgs.to(device)  # move images to device
        labels = torch.tensor(labels).to(device)  # move labels to device

        # Forward the data
        logits = model(imgs)

        # Calculate the cross-entropy loss.
        loss = criterion(logits, labels)

        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()
        acc = (logits.argmax(dim=-1) == labels).float().mean()

        # Record the loss and accuracy.
        train_loss.append(loss.item())
        train_accs.append(acc)
        
    train_loss = sum(train_loss) / len(train_loss)
    train_acc = sum(train_accs) / len(train_accs)

    # Print the information.
    print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")

    # ---------- Validation ----------
    ## Evaluation and testing on validation dataset

    model.eval()       # model is in eval mode

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []

    # Iterate the validation set by batches.
    for batch in tqdm(val_dataloader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        with torch.no_grad():
            logits = model(imgs.to(device))

        loss = criterion(logits, labels.to(device))

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        valid_loss.append(loss.item())
        valid_accs.append(acc)
        #break

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)

    # Print the information.
    print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")

    # update logs
    if valid_acc > best_acc:
        with open(f"./{_exp_name}_log.txt","a"):
            print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f} -> best")
    else:
        with open(f"./{_exp_name}_log.txt","a"):
            print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")


    # save models
    if valid_acc > best_acc:
        print(f"Best model found at epoch {epoch}, saving model")
        torch.save(model.state_dict(), f"{_exp_name}_best.ckpt") # only save best to prevent output memory exceed error
        best_acc = valid_acc
        stale = 0
    else:
        stale += 1
        if stale > patience:
            print(f"No improvment {patience} consecutive epochs, early stopping")
            break

In [None]:
model_best = FirstCNN().to(device)
model_best.load_state_dict(torch.load(f"{_exp_name}_best.ckpt"))
model_best.eval()

prediction = []
image_paths = []

## Encodings
# {0: Bags, 1: Bottomwear, 2: Eyewear, 3: Fragrance, 4: Innerwear, 5: Jewellery, 6: Makeup, 7: Others, 8: Sandal, 9: Shoes, 10: Topwear, 11: Wallets, 12: watches}


with torch.no_grad():
    for data,_, img in test_dataloader:
        test_pred = model_best(data.to(device))
        test_label = np.argmax(test_pred.cpu().data.numpy(), axis=1)
        prediction += test_label.squeeze().tolist()
        image_paths += list(img)


mapping = {0: "Bags", 1: "Bottomwear", 2: "Eyewear", 3: "Fragrance", 4: "Innerwear", 5: "Jewellery", 6: "Makeup", 7: "Others", 8: "Sandal", 9: "Shoes", 10: "Topwear", 11: "Wallets", 12: "watches" }

# now creating a prediction csv
df = pd.DataFrame()

for i in range(len(prediction)):
  # append rows to an empty DataFrame
  df = df.append({'image-path' : image_paths[i], 'encoded-label' : prediction[i], 'label-decoded' : mapping[prediction[i]]},
        ignore_index = True)

df.to_csv("prediction_3.csv",index = False)

## Conclusion:

We did an experiment again with increasing the batch size by 4 from 64 to 256. The prediction accuracy with this batch size was also close to 93% but training time had significantly reduced from 8-10 hours of training to approximately 4 hours of training. This reduction in training time plays an important role as it allows us to adjust hyperparameters further improving the model accuracy while not consuming to much machine time and power.

In general, a larger batch size can lead to faster model development and better efficiency, but this must be balanced against the risk of overfitting and lower accuracy but in our case the drop in accuracy is not that much so we can consider keeping the batch size of 256 for further processing and training if needed.

## Data augmentation, i.e., generating more images for training.

We will generate new data-points from the original ones by rotating the images horizontally, and using grayscale.

In [None]:
# Imports
import pandas as pd
import imageio
import os
import glob
from collections import Counter
import matplotlib.pyplot as plt
from PIL import Image
import cv2
# setting the seed for reproducibilty
myseed = 12345

from google.colab.patches import cv2_imshow

from sklearn.preprocessing import LabelEncoder
import numpy as np

from sklearn.model_selection import train_test_split

import torch

from torchvision.io import read_image
from torch.utils.data import Dataset
from torch.utils.data import DataLoader, ConcatDataset, Subset, Dataset
from torchvision.datasets import DatasetFolder, VisionDataset
from torchvision.transforms import Resize
import torch.nn as nn
import torchvision.transforms as transforms

from tqdm.auto import tqdm
import random


In [None]:
test_path = '/content/gdrive/MyDrive/Advanced-DataAnalytics-Assignment2/test.csv'

test_df=pd.read_csv(test_path, sep='\t')

train_path = '/content/gdrive/MyDrive/Advanced-DataAnalytics-Assignment2/train.csv'

train_df=pd.read_csv(train_path, sep='\t')

train_df['image-path'] = train_df.apply(lambda row: str(row['imageid']) + ".jpg", axis=1)
test_df['image-path'] = test_df.apply(lambda row: str(row['imageid']) + ".jpg", axis=1)

train_df = train_df.astype({'imageid':'string'})
test_df = test_df.astype({'imageid':'string'})

cat_cols = ['label']

# apply label encoder to categorical columns
le = LabelEncoder()
labels = le.fit_transform(train_df['label'])
train_df[cat_cols] = train_df[cat_cols].apply(lambda x: le.fit_transform(x))

## Encodings
# {0: Bags, 1: Bottomwear, 2: Eyewear, 3: Fragrance, 4: Innerwear, 5: Jewellery, 6: Makeup, 7: Others, 8: Sandal, 9: Shoes, 10: Topwear, 11: Wallets, 12: watches}

train_df['label'].unique()
unique_labels = np.unique(labels)


Here we are doing image augmentation. Image augmentation is a technique commonly used in deep learning to artificially increase the amount of data available for training a model. By applying various transformations to existing images, such as flipping, rotating, scaling, or changing their color scheme, we can generate new images that are similar but not identical to the original ones. This can help the model to learn more robust features and generalize better to unseen data.

Two popular data augmentation tools for image classification tasks are horizontal flip and grayscale conversion. Horizontal flip flips the image along the vertical axis, creating a mirror image that still contains the same object but from a different perspective. This can help the model to recognize objects regardless of their orientation in the input image. Grayscale conversion, on the other hand, converts the original RGB image into a single-channel grayscale image, removing color information while preserving the overall shape and texture of the object. This can help the model to focus on the most relevant features of the image, such as edges and contours, and ignore less important details such as color variations.

So, here we are using horizontal flip and grayscale conversion to increase the data-points via image augmentation.

In [None]:
from PIL import Image
# selecting random images from the train.csv

# taking 30 percent of random images from the train.csv to horizontally flip

random_1 = train_df.sample(frac = 0.3)
counter = 60001
for ind in random_1.index:
  img = Image.open('Fashion-Product-Images/images/' + random_1['image-path'][ind])
  horizontal_flip = transforms.RandomHorizontalFlip(p = 1)
  hflipped = horizontal_flip(img)
  path = 'Fashion-Product-Images/images/' + str(counter) + '.jpg'
  hflipped.save(path)
  image_path = str(counter) + '.jpg'
  train_df = train_df.append({'imageid' : counter, 'label' : random_1['label'][ind], 'productname' : random_1['productname'][ind], 'image-path' : image_path }, ignore_index=True )
  counter+=1

# taking 30 percent of random images from the train.csv to do grayscale

random_2 = train_df.sample(frac = 0.3)

for ind in random_2.index:
  img = Image.open('Fashion-Product-Images/images/' + random_2['image-path'][ind])
  grayscale = transforms.Grayscale()
  gscale = grayscale(img)
  path = 'Fashion-Product-Images/images/' + str(counter) + '.jpg'
  gscale.save(path)
  image_path = str(counter) + '.jpg'
  train_df = train_df.append({'imageid' : counter, 'label' : random_2['label'][ind], 'productname' : random_2['productname'][ind], 'image-path' : image_path }, ignore_index=True )
  counter+=1


In [None]:
train_df, val_df = train_test_split(train_df, shuffle = True, test_size = 0.3)

# basic setup for PyTorch
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(myseed)
torch.manual_seed(myseed)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(myseed)

  
train_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor
])


val_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 128)
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor
])

test_tfm = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.Lambda(lambda x: x.repeat(3, 1, 1) if x.size(0)==1 else x),
    transforms.ToPILImage(),
    transforms.ToTensor(), # changing the datatype to tensor because pytorch takes input as tensor

])

image_path = 'Fashion-Product-Images/images/'

class CustomImageDataset_from_csv(Dataset):
    def __init__(self, dataframe , img_dir ,  transform = None , label_transform = None):
        self.img_labels = dataframe #pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.label_transform = label_transform
        
    def __len__(self):
        return len(self.img_labels)
    
    def __getitem__(self , idx):
        img_path = os.path.join(self.img_dir , self.img_labels.iloc[idx, 3])
        print(img_path)
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform is not None:
            image = self.transform(image)
        if self.label_transform:
            label = self.target_transform(label)

        return(image, label)

class CustomImageDataset_from_csv_test(Dataset):
    def __init__(self, dataframe , img_dir ,  transform = None , label_transform = None):
        self.img_labels = dataframe #pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.label_transform = label_transform
        
    def __len__(self):
        return len(self.img_labels)
    
    def __getitem__(self , idx):
        img_path = os.path.join(self.img_dir , self.img_labels.iloc[idx, 3])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform is not None:
            image = self.transform(image)
        if self.label_transform:
            label = self.target_transform(label)

        return(image, label, img_path)

class FirstCNN(nn.Module):
    def __init__(self):
        super(FirstCNN, self).__init__()
       
        # input size [3, 128, 128]

        self.cnn = nn.Sequential(
            nn.Conv2d(3, 64, 3, 1, 1),  # [64, 128, 128]
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [64, 64, 64]

            nn.Conv2d(64, 128, 3, 1, 1), # [128, 64, 64]
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [128, 32, 32]

            nn.Conv2d(128, 256, 3, 1, 1), # [256, 32, 32]
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),      # [256, 16, 16]

            nn.Conv2d(256, 512, 3, 1, 1), # [512, 16, 16]
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),       # [512, 8, 8]
            
            nn.Conv2d(512, 512, 3, 1, 1), # [512, 8, 8]
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.MaxPool2d(2, 2, 0),       # [512, 4, 4]
        )
        self.fc = nn.Sequential(
            nn.Linear(512*4*4, 1024),
            nn.ReLU(),
            nn.Linear(1024, 512),
            nn.ReLU(),
            nn.Linear(512, 13)
        )

        
    def forward(self, x):
        out = self.cnn(x)
        out = out.view(out.size()[0], -1)
        return self.fc(out)

In [None]:
_exp_name = "sample_4"
batch_size = 256

train_data = CustomImageDataset_from_csv(train_df , image_path , transform = train_tfm)
train_dataloader = DataLoader(train_data, batch_size = batch_size , shuffle = True, num_workers=0, pin_memory=True)

val_data = CustomImageDataset_from_csv(val_df , image_path , transform = val_tfm)
val_dataloader = DataLoader(val_data, batch_size = batch_size , shuffle = True, num_workers=0, pin_memory=True)

test_data = CustomImageDataset_from_csv_test(test_df , image_path , transform = test_tfm)
test_dataloader = DataLoader(test_data, batch_size = batch_size , shuffle = False, num_workers=0, pin_memory=True)

In [None]:
# Selecting which device to use for training. It would be better to use cuda if GPU is available.
device = "cuda" if torch.cuda.is_available() else "cpu"

# Initializing the model, and put it on the device specified.
model = FirstCNN().to(device)

# Initialize optimizer!
optimizer = torch.optim.Adam(model.parameters(), lr=0.0003, weight_decay=1e-5) 

# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()

# The number of training epochs (hyperparameters) and patience.
n_epochs = 4
patience = 300 # If no improvement in 'patience' epochs, early stop


# Initializing trackers
stale = 0
best_acc = 0

for epoch in range(n_epochs):

    # ---------- Training ----------
    ## Evaluation and training on training dataset.

    model.train()        # model is in train mode before training.

    # These are used to record information in training.
    train_loss = []
    train_accs = []

    for batch in tqdm(train_dataloader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch
        imgs = imgs.to(device)  # move images to device
        labels = torch.tensor(labels).to(device)  # move labels to device

        # Forward the data
        logits = model(imgs)

        # Calculate the cross-entropy loss.
        loss = criterion(logits, labels)

        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(model.parameters(), max_norm=10)

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()
        acc = (logits.argmax(dim=-1) == labels).float().mean()

        # Record the loss and accuracy.
        train_loss.append(loss.item())
        train_accs.append(acc)
        
    train_loss = sum(train_loss) / len(train_loss)
    train_acc = sum(train_accs) / len(train_accs)

    # Print the information.
    print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")

    # ---------- Validation ----------
    ## Evaluation and testing on validation dataset

    model.eval()       # model is in eval mode

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []

    # Iterate the validation set by batches.
    for batch in tqdm(val_dataloader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        with torch.no_grad():
            logits = model(imgs.to(device))

        loss = criterion(logits, labels.to(device))

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        valid_loss.append(loss.item())
        valid_accs.append(acc)
        #break

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)

    # Print the information.
    print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")

    # update logs
    if valid_acc > best_acc:
        with open(f"./{_exp_name}_log.txt","a"):
            print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f} -> best")
    else:
        with open(f"./{_exp_name}_log.txt","a"):
            print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")


    # save models
    if valid_acc > best_acc:
        print(f"Best model found at epoch {epoch}, saving model")
        torch.save(model.state_dict(), f"{_exp_name}_best.ckpt") # only save best to prevent output memory exceed error
        best_acc = valid_acc
        stale = 0
    else:
        stale += 1
        if stale > patience:
            print(f"No improvment {patience} consecutive epochs, early stopping")
            break

In [None]:
model_best = FirstCNN().to(device)
model_best.load_state_dict(torch.load(f"{_exp_name}_best.ckpt"))
model_best.eval()

prediction = []
image_paths = []

## Encodings
# {0: Bags, 1: Bottomwear, 2: Eyewear, 3: Fragrance, 4: Innerwear, 5: Jewellery, 6: Makeup, 7: Others, 8: Sandal, 9: Shoes, 10: Topwear, 11: Wallets, 12: watches}


with torch.no_grad():
    for data,_, img in test_dataloader:
        test_pred = model_best(data.to(device))
        test_label = np.argmax(test_pred.cpu().data.numpy(), axis=1)
        prediction += test_label.squeeze().tolist()
        image_paths += list(img)


print(prediction)
print(image_paths)
mapping = {0: "Bags", 1: "Bottomwear", 2: "Eyewear", 3: "Fragrance", 4: "Innerwear", 5: "Jewellery", 6: "Makeup", 7: "Others", 8: "Sandal", 9: "Shoes", 10: "Topwear", 11: "Wallets", 12: "watches" }

# now creating a prediction csv
df = pd.DataFrame()

for i in range(len(prediction)):
  # append rows to an empty DataFrame
  df = df.append({'image-path' : image_paths[i], 'encoded-label' : prediction[i], 'label-decoded' : mapping[prediction[i]]},
        ignore_index = True)

df.to_csv("prediction_4.csv",index = False)

## Conclusion:

In conclusion, image augmentation using horizontal flip and grayscale conversion has been shown to be an effective technique for increasing the amount of data available for our training of deep learning model. By generating new images with similar but not identical features to the original ones, we improved the generalization ability of the model and prevented overfitting to the training dataset. 

Moreover, by artificially increasing the amount of data using various image transformations, we can provide the model with a diverse set of examples to learn from, making it more robust to variations in the input images.

The use of horizontal flip and grayscale conversion in particular has allowed us to increase the number of data-points in our dataset, leading to better performance and accuracy of the model. Horizontal flip creates a mirror image of the input image by flipping it along the vertical axis. This helps the model to learn object features that are invariant to horizontal orientation, improving its ability to recognize objects regardless of their orientation in the input image. Grayscale conversion, on the other hand, removes color information from the input image, making the model more focused on the most relevant features of the image, such as edges and contours.

Lastly, our model performed better than the previous models where no image augmentation was done. This is because the additional training data provides the model with more examples to learn from, leading to improved generalization ability and reduced overfitting.

## Thank You! Keep Smiling and Stay Happy