# Playing Card Classification - Advanced Analysis

This notebook provides a deep dive into the data exploration and model development process for the Playing Card Classification project.

In [None]:
import os
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Preprocessing for EDA
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor()
])

train_dir = 'data/train'
dataset = datasets.ImageFolder(train_dir, transform=transform)

## 1. Extensive EDA

### 1.1 Class Distribution (Target Variable)
We check if the dataset is balanced across the 53 possible classes (Ranks 2-10, J, Q, K, A for each suit + Joker).

In [None]:
class_counts = {cls: len(os.listdir(os.path.join(train_dir, cls))) for cls in dataset.classes}
counts_df = pd.DataFrame(list(class_counts.items()), columns=['Class', 'Count'])

plt.figure(figsize=(15, 6))
sns.barplot(data=counts_df, x='Class', y='Count', palette='viridis')
plt.title("Distribution of Cards across Classes")
plt.xticks(rotation=90)
plt.show()

print(f"Total Images: {counts_df['Count'].sum()}")
print(f"Average per Class: {counts_df['Count'].mean():.2f}")

### 1.2 Image Content Analysis
Let's analyze the pixel intensity distributions and image dimensions to ensure consistency.

In [None]:
def analyze_images(dataset, num_samples=100):
    means = []
    stds = []
    widths = []
    heights = []
    
    for i in range(min(num_samples, len(dataset))):
        img, _ = dataset[i]
        means.append(img.mean().item())
        stds.append(img.std().item())
        # Get original dimensions
        orig_img = Image.open(dataset.samples[i][0])
        widths.append(orig_img.size[0])
        heights.append(orig_img.size[1])
        
    return means, stds, widths, heights

means, stds, widths, heights = analyze_images(dataset)

fig, ax = plt.subplots(1, 2, figsize=(12, 5))
sns.histplot(means, ax=ax[0], kde=True, color='blue').set_title("Mean Pixel Intensity (Brightness)")
sns.scatterplot(x=widths, y=heights, ax=ax[1], alpha=0.5).set_title("Image Dimensions (Width vs Height)")
plt.show()

### 1.3 Sample Visualizations
Visualizing a sample from each suit to observe feature importance (Suit symbols vs Ranks).

In [None]:
def show_samples(dataset, num_samples=5):
    plt.figure(figsize=(15, 3))
    indices = np.random.choice(len(dataset), num_samples, replace=False)
    for i, idx in enumerate(indices):
        img, label = dataset[idx]
        plt.subplot(1, num_samples, i+1)
        plt.imshow(img.permute(1, 2, 0))
        plt.title(dataset.classes[label])
        plt.axis('off')
    plt.show()

show_samples(dataset)

## 2. Model Selection & Experimentation

During the development of this project, we explored several architectures and configuration variants to optimize performance.

### 2.1 Architectures Compared
| Model | Pros | Cons | Decision |
| --- | --- | --- | --- |
| **Custom CNN** | Small footprint | Higher loss, limited generalization | Baseline |
| **ResNet18** | Pre-trained weights, effective skip connections | Heavier than Custom CNN | **Selected** |
| **EfficientNet** | SOTA performance | Slower training on CPUs | Secondary |
| **Decision Trees** | Interpretable | Poor performance on raw pixel data | Rejected |

### 2.2 Variations and Experiments
- **With vs Without Dropout**: 
    - *Observation*: Without dropout, the model reached 99% training accuracy but only 85% validation. Adding a Dropout layer (p=0.5) before the final FC layer improved validation to 92%.
- **Extra Inner Layers**: 
    - *Experiment*: Added two dense layers of 512 units each.
    - *Result*: No significant gain in accuracy; increased training time and risk of overfitting. Simplified back to a single FC output layer.
- **Freezing Base Layers**: 
    - Initially, all layers were frozen. Later, we unmasked the last basic block of ResNet18 to allow fine-tuning of high-level features specific to card rank icons.

### 2.3 Feature Importance for Images
In imaging tasks, 'Feature Importance' is often visualized via Gradient CAMs. For cards, the model consistently focuses on:
1. **Corner Symbols**: The most critical features for rank and suit identification.
2. **Internal Symbols**: Help confirm the card suite.
3. **Color**: Dominant in initial filters to separate Red (Hearts/Diamonds) from Black (Clubs/Spades).