<a href="https://colab.research.google.com/github/ThiruvarankanM/Self-Supervised-Card-Classification/blob/main/Card_Classification_SelfSupervised.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Setup and Imports
This section installs necessary libraries and imports the modules needed for the project.

In [3]:
!pip install Pillow torch torchvision



In [4]:
# Install gdown if not already installed
!pip install -U gdown



### Mount Google Drive

This code mounts your Google Drive to access files stored there.

In [5]:
# Mount Google Drive and download required folders using gdown
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Import Libraries

Importing the necessary libraries for working with PyTorch, images, and file operations.

In [7]:
import torch  # PyTorch core library
import torch.nn as nn  # Neural network layers
import torch.optim as optim  # Optimization algorithms
from torch.utils.data import Dataset, DataLoader  # Dataset and data loading utilities
from torchvision import transforms  # Image transformations
from PIL import Image  # Image processing
import os  # File and directory operations
import random  # Random number generation

### Card Rotation Dataset Class

This class defines a custom dataset for the self-supervised learning phase. It loads images from a directory, applies random rotations (0, 90, 180, 270 degrees), and assigns a corresponding label.

In [8]:
class CardRotationDataset(Dataset):
    def __init__(self, root_dir, image_size=(128, 128)):
        if not os.path.isdir(root_dir):
            raise FileNotFoundError(f"The folder '{root_dir}' does not exist. Please check the path.")
        self.root_dir = root_dir
        self.image_paths = [os.path.join(root_dir, f) for f in os.listdir(root_dir)
                            if f.endswith(('.png', '.jpg', '.jpeg'))]
        if len(self.image_paths) == 0:
            raise RuntimeError(f"No image files found in '{root_dir}'. Please ensure the folder contains images.")
        # Transform: resize and convert images to tensor
        self.transform = transforms.Compose([
            transforms.Resize(image_size),
            transforms.ToTensor()
        ])
        self.image_size = image_size
        self.num_channels = 3
    def __len__(self):
        return len(self.image_paths)
    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path).convert('RGB')
        # Randomly rotate image and assign label based on rotation
        rotation_angle = random.choice([0, 90, 180, 270])
        if rotation_angle == 0:
            label = 0
        elif rotation_angle == 90:
            image = image.rotate(90)
            label = 1
        elif rotation_angle == 180:
            image = image.rotate(180)
            label = 2
        else:
            image = image.rotate(270)
            label = 3
        image = self.transform(image)
        return image, label

### Simple CNN Model

This defines a simple Convolutional Neural Network (CNN) model. It consists of an encoder part to extract features and a classifier head.

In [9]:
class SimpleCNN(nn.Module):
    def __init__(self, image_size=(128, 128)):
        super(SimpleCNN, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(16, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )

        # Calculate the size of the flattened output to correctly define the Linear layer
        # This calculation assumes the input image is 128x128
        # (128 / 2 / 2) * (128 / 2 / 2) * 32_channels = 32 * 32 * 32
        flattened_size = (image_size[0] // 4) * (image_size[1] // 4) * 32

        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(flattened_size, 128),
            nn.ReLU(),
            nn.Linear(128, 4) # 4 classes: 0, 90, 180, 270 degrees
        )

    def forward(self, x):
        features = self.encoder(x)
        output = self.classifier(features)
        return output

### Train Self-Supervised Model Function

This function trains the CNN model using the self-supervised rotation task. It loads the dataset, defines the model, loss function, and optimizer, and then trains the model for a specified number of epochs. The encoder part of the model is saved after training.

In [10]:
def train_self_supervised_model():
    data_folder = '/content/drive/MyDrive/my_card_images'  # Use Google Drive path
    target_image_size = (128, 128)
    try:
        dataset = CardRotationDataset(root_dir=data_folder, image_size=target_image_size)
        dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
    except (FileNotFoundError, RuntimeError) as e:
        print(f"Error: {e}")
        return
    model = SimpleCNN(image_size=target_image_size)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    print(f"\n--- Starting Self-Supervised Training on images in '{data_folder}' ---")
    for epoch in range(10):
        running_loss = 0.0
        for i, (inputs, labels) in enumerate(dataloader):
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f"Epoch {epoch+1}, Loss: {running_loss/len(dataloader):.4f}")
    torch.save(model.encoder.state_dict(), 'self_supervised_encoder.pth')
    print("\nTraining finished! Saved the encoder to 'self_supervised_encoder.pth'.")

### Run Self-Supervised Training

This block calls the `train_self_supervised_model` function to start the self-supervised training process when the script is executed.

### Import glob

Importing the `glob` library for finding files matching a pattern.

In [11]:
# Run the self-supervised training if this script is executed directly
if __name__ == "__main__":
    train_self_supervised_model()


--- Starting Self-Supervised Training on images in '/content/drive/MyDrive/my_card_images' ---
Epoch 1, Loss: 0.6587
Epoch 2, Loss: 0.0846
Epoch 3, Loss: 0.0060
Epoch 4, Loss: 0.0006
Epoch 5, Loss: 0.0001
Epoch 6, Loss: 0.0000
Epoch 7, Loss: 0.0000
Epoch 8, Loss: 0.0000
Epoch 9, Loss: 0.0000
Epoch 10, Loss: 0.0000

Training finished! Saved the encoder to 'self_supervised_encoder.pth'.


### Labeled Card Dataset Class

This class defines a dataset for the supervised fine-tuning phase. It loads images from a directory organized into subfolders (each subfolder representing a class) and assigns class labels.

In [12]:
import glob  # For file path pattern matching

In [13]:
class LabeledCardDataset(Dataset):
    def __init__(self, root_dir, image_size=(128, 128)):
        self.root_dir = root_dir
        # Get class names from subfolder names
        self.classes = sorted([d for d in os.listdir(root_dir) if os.path.isdir(os.path.join(root_dir, d))])
        self.class_to_idx = {cls: i for i, cls in enumerate(self.classes)}
        # Collect image paths and their class labels
        self.image_paths = []
        for cls_name in self.classes:
            cls_path = os.path.join(root_dir, cls_name)
            for img_name in glob.glob(os.path.join(cls_path, '*')):
                self.image_paths.append((img_name, self.class_to_idx[cls_name]))
        if len(self.image_paths) == 0:
            raise RuntimeError(f"No labeled images found in '{root_dir}'.")
        # Transform: resize and convert images to tensor
        self.transform = transforms.Compose([
            transforms.Resize(image_size),
            transforms.ToTensor()
        ])
    def __len__(self):
        return len(self.image_paths)
    def __getitem__(self, idx):
        img_path, label = self.image_paths[idx]
        image = Image.open(img_path).convert('RGB')
        image = self.transform(image)
        return image, label

### Full Classifier Model

This class defines the full classifier model used for fine-tuning and prediction. It loads the pre-trained encoder from the self-supervised phase and adds a new classifier head for the card classification task. The encoder weights are frozen during fine-tuning.

In [14]:
class FullClassifier(nn.Module):
    def __init__(self, encoder_path, num_classes):
        super(FullClassifier, self).__init__()
        # Define encoder structure (must match pre-trained encoder)
        class SimpleCNN_Encoder(nn.Module):
            def __init__(self):
                super(SimpleCNN_Encoder, self).__init__()
                self.encoder = nn.Sequential(
                    nn.Conv2d(3, 16, kernel_size=3, padding=1),
                    nn.ReLU(),
                    nn.MaxPool2d(2, 2),
                    nn.Conv2d(16, 32, kernel_size=3, padding=1),
                    nn.ReLU(),
                    nn.MaxPool2d(2, 2)
                )
            def forward(self, x):
                return self.encoder(x)
        # Load pre-trained encoder weights
        self.encoder = SimpleCNN_Encoder().encoder
        self.encoder.load_state_dict(torch.load(encoder_path))
        for param in self.encoder.parameters():
            param.requires_grad = False  # Freeze encoder weights
        # Add classifier head for fine-tuning
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(32 * 32 * 32, 128),
            nn.ReLU(),
            nn.Linear(128, num_classes)
        )
    def forward(self, x):
        features = self.encoder(x)
        output = self.classifier(features)
        return output

### Finetune Supervised Model Function

This function performs supervised fine-tuning on the labeled card dataset. It loads the labeled dataset, the pre-trained encoder, adds a classifier head, and trains only the classifier head for a few epochs.

In [17]:
def finetune_supervised_model():
    data_folder = '/content/drive/MyDrive/cards_labeled_small'  # Path for Colab uploads
    try:
        dataset = LabeledCardDataset(root_dir=data_folder)
        dataloader = DataLoader(dataset, batch_size=5, shuffle=True)
    except RuntimeError as e:
        print(f"Error: {e}")
        return
    num_classes = len(dataset.classes)
    model = FullClassifier(encoder_path='self_supervised_encoder.pth', num_classes=num_classes)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)  # Only update classifier
    print(f"\n--- Starting Supervised Finetuning on {len(dataset)} labeled images ---")
    for epoch in range(20):
        running_loss = 0.0
        for i, (inputs, labels) in enumerate(dataloader):
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f"Epoch {epoch+1}, Loss: {running_loss/len(dataloader):.4f}")
    print("\nFinetuning finished! The model is now ready for card classification.")
    print(f"Final classes: {dataset.classes}")
    torch.save(model.state_dict(), 'card_classifier.pth')

### Run Supervised Fine-tuning

This block calls the `finetune_supervised_model` function to start the supervised fine-tuning process.

In [18]:
# Run the supervised fine-tuning if this script is executed directly
if __name__ == "__main__":
    finetune_supervised_model()


--- Starting Supervised Finetuning on 20 labeled images ---
Epoch 1, Loss: 2.9179
Epoch 2, Loss: 0.8795
Epoch 3, Loss: 0.6428
Epoch 4, Loss: 0.4396
Epoch 5, Loss: 0.7948
Epoch 6, Loss: 0.6388
Epoch 7, Loss: 0.4400
Epoch 8, Loss: 0.3779
Epoch 9, Loss: 0.4198
Epoch 10, Loss: 0.4384
Epoch 11, Loss: 0.4952
Epoch 12, Loss: 0.4096
Epoch 13, Loss: 0.4351
Epoch 14, Loss: 0.4381
Epoch 15, Loss: 0.4322
Epoch 16, Loss: 0.6738
Epoch 17, Loss: 0.4227
Epoch 18, Loss: 0.5362
Epoch 19, Loss: 0.3780
Epoch 20, Loss: 0.3867

Finetuning finished! The model is now ready for card classification.
Final classes: ['bank_card', 'id_card', 'visiting_card', 'voter_id']


### Full Classifier Model (for Prediction)

This is the same `FullClassifier` model definition used for fine-tuning, included again for clarity in the prediction section. It loads the pre-trained encoder and the fine-tuned classifier head.

In [19]:
class FullClassifier(nn.Module):
    def __init__(self, encoder_path, num_classes):
        super(FullClassifier, self).__init__()
        # Define encoder structure (must match pre-trained encoder)
        class SimpleCNN_Encoder(nn.Module):
            def __init__(self):
                super(SimpleCNN_Encoder, self).__init__()
                self.encoder = nn.Sequential(
                    nn.Conv2d(3, 16, kernel_size=3, padding=1),
                    nn.ReLU(),
                    nn.MaxPool2d(2, 2),
                    nn.Conv2d(16, 32, kernel_size=3, padding=1),
                    nn.ReLU(),
                    nn.MaxPool2d(2, 2)
                )
            def forward(self, x):
                return self.encoder(x)
        self.encoder = SimpleCNN_Encoder().encoder
        # Note: state_dict for encoder is loaded with the full model's state_dict
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(32 * 32 * 32, 128),
            nn.ReLU(),
            nn.Linear(128, num_classes)
        )
    def forward(self, x):
        features = self.encoder(x)
        output = self.classifier(features)
        return output

### Predict Image Function

This function takes a trained model, an image path, and a list of class names, and predicts the class of the given image. It applies the necessary transformations and uses the model to get the predicted class label.

In [20]:
def predict_image(model, image_path, classes):
    if not os.path.exists(image_path):
        print(f"Error: Image file not found at '{image_path}'")
        return None
    # Apply same transforms as during training
    transform = transforms.Compose([
        transforms.Resize((128, 128)),
        transforms.ToTensor()
    ])
    image = Image.open(image_path).convert('RGB')
    image_tensor = transform(image).unsqueeze(0)  # Add batch dimension
    model.eval()
    with torch.no_grad():
        output = model(image_tensor)
        predicted_idx = torch.argmax(output).item()
    predicted_class = classes[predicted_idx]
    return predicted_class

### Make a Prediction

This block loads the trained model and uses the `predict_image` function to make a prediction on a new image.

In [21]:
if __name__ == "__main__":
    # Set model and image paths
    model_path = 'card_classifier.pth'
    new_image_path = '/content/drive/MyDrive/cards_labeled_small/b_0118.png'  # Path for Colab uploads
    # List of class names in training order
    final_classes = ['bank_card', 'id_card', 'visiting_card', 'voter_id']
    num_classes = len(final_classes)
    # Load trained model
    model = FullClassifier(encoder_path='self_supervised_encoder.pth', num_classes=num_classes)
    try:
        model.load_state_dict(torch.load(model_path))
        print("Final model loaded successfully!")
    except FileNotFoundError:
        print(f"Error: Model file '{model_path}' not found. Please check the path.")
    except Exception as e:
        print(f"An error occurred while loading the model: {e}")
    print(f"\nMaking a prediction for the image at: {new_image_path}")
    predicted_label = predict_image(model, new_image_path, final_classes)
    if predicted_label:
        print(f"\nThe model predicts this card is a: {predicted_label}")

Final model loaded successfully!

Making a prediction for the image at: /content/drive/MyDrive/cards_labeled_small/b_0118.png

The model predicts this card is a: bank_card
