In this notebook we will create an image classifier to detect playing cards.
We will tackle this problem in 3 parts:

1. Pytorch Dataset
2. Pytorch Model
3. Pytorch Training Loop

Almost every pytorch model training pipeline meets this paradigm.

In [None]:
! pip install pandas

In [None]:
##Importing essential libraries
import torch
import torch.nn as nn
import torch.optim as optim #for optimizer
from torch.utils.data import Dataset, DataLoader #for data preprocessing and loading
import torchvision
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder #for loading data
import timm

import matplotlib.pyplot as plt # For data viz
import pandas as pd
import numpy as np
import sys
from tqdm.notebook import tqdm

Step 1. Pytorch Dataset (and Dataloader)

For training any model in pytorch we need to set up our dataset first. 
Datasets are important: 
    .It's an organized way to structure how the data and labels are loaded into the model.
    .We can then wrap the dataset in a dataloader and pytorch will handle batching the shuffling the data for us when training the model!

In [None]:
#we will define a class for our cards dataset in the following fashion in pytorch
class PlayingCardDataset(Dataset): ##our class inherits methods from Dataset class of pytorch
    def __init__(self, data_dir, transform=None): ##we will intialize our object with the data_dir path of our dataset
        self.data = ImageFolder(data_dir, transform=transform) ##data is loaded using ImageFolder library we imported above
    
    def __len__(self):   ##default method in pytorch to define the total length of data samples in a dataset
        return len(self.data)
    
    def __getitem__(self, idx):  ##returns a sample from our dataset using idx
        return self.data[idx]
    
    @property
    def classes(self):  ##additional method to help us to know the data classes in our dataset
        return self.data.classes

Create Test Dataset

In [None]:
dataset=PlayingCardDataset(data_dir='D:/education/pyhton/projects/datasets/Cards_Image_Dataset_Classification/train')

In [None]:
#check if dataset is properly loaded 
len(dataset)

In [None]:
#checking by loading a sample from dataset
image, label = dataset[5000]
print(label)
image

In [None]:
# Get a dictionary associating target values with folder names
data_dir = 'D:/education/pyhton/projects/datasets/Cards_Image_Dataset_Classification/train'
target_to_class = {v: k for k, v in ImageFolder(data_dir).class_to_idx.items()}
print(target_to_class)

In [None]:
##Applying the data preprocessing steps on the loaded dataset like basic transformations
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
])

data_dir = data_dir
dataset = PlayingCardDataset(data_dir, transform)

In [None]:
#checking the applied transformation
image, label = dataset[100]
image.shape

In [None]:
# iterate over dataset
for image, label in dataset:
    break

DataLoader

    1. used to batch our dataset automatically using Dataloader in pytorch
    2. Faster parallel processing of data in batches and therefore faster to train our model.

In [None]:
#laoding dataset into a dataloader
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

In [None]:
for images, labels in dataloader:
    break

In [None]:
images.shape, labels.shape

In [None]:
##as we used shuffle above while loading data, data is organized randomly like below.
labels

Step 2. Pytorch Model¶

Pytorch datasets have a structured way of organizing your data, pytorch models follow a similar paradigm.

We could create the model from scratch defining each layer.
However for tasks like image classification, many of the state of the art architectures are readily available and we can import them from packages like timm.
Understanding the pytorch model is all about understanding the shape the data is at each layer, and the main one we need to modify for a task is the final layer. Here we have 53 targets, so we will modify the last layer for this.

In [None]:
class CardClassifier(nn.Module):
    def __init__(self,num_classes=53):
        super(CardClassifier,self).__init__() ##initializing object of parent class nn.Module
        ##we define here all the parts of the model like layers, base model, any alterations to the model layers and so on
        self.base_model= timm.create_model('efficientnet_b0',pretrained=True) ##we take here the pretrained model of efficientnet that is pretrained on Imagenet
        self.features=nn.Sequential(*list(self.base_model.children())[:-1]) ###altering the last layer of base model as our task has only 53 classes for classification where as base model is huge with different number of classes for classification.

        #output_size of enet
        enet_out_size = 1280
        ##Make a classifier for our task
        self.classifier = nn.Sequential(   
            nn.Flatten(),
            nn.Linear(enet_out_size, num_classes)
        ) ###last layer is altered with these lines

    def forward(self,x):
        ##connect the above defined parts of model to the input and perform a forward pass
        x=self.features(x)
        ouput=self.classifier(x)
        return ouput    

In [None]:
##defining model for our task with our input
model=CardClassifier(num_classes=53) ##model object is instantiated with the above defined class
print(str(model)[:500])

In [None]:
##check for correctness of model created
example_out = model(images)
example_out.shape # [batch_size, num_classes]

Step 3. The training loop

Now that we understand the general paradigm of pytorch datasets and models, we need to create the process of training this model.
Some things to consider:
    
    We want to validate our model on data it has not been trained on, so usually we split our data into a train and validate datasets. This is easy because we can just create two datasets using our existing class.

Terms:
   
    Epoch: One run through the entire training dataset.
    Step: One batch of data as defined in our dataloader

This loop is one you will become familiar with when training models, you load in data to the model in batches - then calculate the loss and perform backpropagation. There are packages that package this for you, but it's good to have at least written it once to understand how it works.

Two things to select:

    optimizer, adam is the best place to start for most tasks.
    loss function: What the model will optimize for.


In [None]:
# Loss function
criterion = nn.CrossEntropyLoss()
# Optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [None]:
##check for loss_fn working
criterion(example_out, labels) ##loss between predictions and actual GT
print(example_out.shape, labels.shape)

Setup Datasets

In [None]:
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor(),
])

train_folder = 'D:/education/pyhton/projects/datasets/Cards_Image_Dataset_Classification/train/'
valid_folder = 'D:/education/pyhton/projects/datasets/Cards_Image_Dataset_Classification/valid/'
test_folder = 'D:/education/pyhton/projects/datasets/Cards_Image_Dataset_Classification/test/'

train_dataset = PlayingCardDataset(train_folder, transform=transform)
val_dataset = PlayingCardDataset(valid_folder, transform=transform)
test_dataset = PlayingCardDataset(test_folder, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)

Simple Training Loop

In [None]:
torch.cuda.is_available()

In [None]:
num_epochs = 5 ##defining the number of epochs
train_losses, val_losses = [], [] ##empty lists for storing the losses

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") ##move the training onto GPU for fastness.
#model definition
model = CardClassifier(num_classes=53)
model.to(device) #send the training to GPU
#loss and optimizer definition
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
#training loop
for epoch in range(num_epochs):
    # Training phase
    model.train()
    running_loss = 0.0
    #for images, labels in tqdm(train_loader, desc='Training loop'): ##added a progress bar using tqdm module
    for images, labels in train_loader:    
        # Move inputs and labels to the device
        images, labels = images.to(device), labels.to(device) ##input and output are loaded onto GPU
        
        optimizer.zero_grad() 
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * labels.size(0)
    train_loss = running_loss / len(train_loader.dataset)
    train_losses.append(train_loss)
    
    # Validation phase
    model.eval()
    running_loss = 0.0
    with torch.no_grad():
        #for images, labels in tqdm(val_loader, desc='Validation loop'):
        for images, labels in val_loader:    
            # Move inputs and labels to the device
            images, labels = images.to(device), labels.to(device)
         
            outputs = model(images)
            loss = criterion(outputs, labels)
            running_loss += loss.item() * labels.size(0)
    val_loss = running_loss / len(val_loader.dataset)
    val_losses.append(val_loss)
    print(f"Epoch {epoch+1}/{num_epochs} - Train loss: {train_loss}, Validation loss: {val_loss}")

Visualize Losses

We can plot our training and validation loss through this training, usually we do this at the end of each epoch. We see that our accuracy on the validation dataset is x

In [None]:
plt.plot(train_losses, label='Training loss')
plt.plot(val_losses, label='Validation loss')
plt.legend()
plt.title("Loss over epochs")
plt.show()

In [None]:
import torch
import torchvision.transforms as transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np

# Load and preprocess the image
def preprocess_image(image_path, transform):
    image = Image.open(image_path).convert("RGB")
    return image, transform(image).unsqueeze(0)

# Predict using the model
def predict(model, image_tensor, device):
    model.eval()
    with torch.no_grad():
        image_tensor = image_tensor.to(device)
        outputs = model(image_tensor)
        probabilities = torch.nn.functional.softmax(outputs, dim=1)
    return probabilities.cpu().numpy().flatten()

# Visualization
def visualize_predictions(original_image, probabilities, class_names):
    fig, axarr = plt.subplots(1, 2, figsize=(14, 7))
    
    # Display image
    axarr[0].imshow(original_image)
    axarr[0].axis("off")
    
    # Display predictions
    axarr[1].barh(class_names, probabilities)
    axarr[1].set_xlabel("Probability")
    axarr[1].set_title("Class Predictions")
    axarr[1].set_xlim(0, 1)

    plt.tight_layout()
    plt.show()

In [None]:
# Example usage
test_image = "D:/education/pyhton/projects/datasets/Cards_Image_Dataset_Classification/test/five of diamonds/2.jpg"
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor()
])

original_image, image_tensor = preprocess_image(test_image, transform)
probabilities = predict(model, image_tensor, device)

# Assuming dataset.classes gives the class names
class_names = dataset.classes 
visualize_predictions(original_image, probabilities, class_names)

In [None]:
##predicting on 10 sample test images
from glob import glob
test_images = glob('D:/education/pyhton/projects/datasets/Cards_Image_Dataset_Classification/test/*/*')
test_examples = np.random.choice(test_images, 10)

for example in test_examples:
    original_image, image_tensor = preprocess_image(example, transform)
    probabilities = predict(model, image_tensor, device)

    # Assuming dataset.classes gives the class names
    class_names = dataset.classes 
    visualize_predictions(original_image, probabilities, class_names)
