# 📦 Kaggle Submission Evaluation

This notebook loads a trained CIFAR-10 classification model and generates predictions on the official Kaggle test set.
The output is a `submission.csv` file compatible with the competition format.

## Workflow:
1. Select a trained model by name
2. Load model configuration and weights
3. Prepare the Kaggle test dataset
4. Run inference and collect predictions
5. Export submission file for Kaggle upload


### 🔧 Select Model to Evaluate
Set the name of the trained model you want to evaluate on the Kaggle test set.

In [1]:
# Choose the trained model you want to use for generating Kaggle submission
model_name = "cnn_mixup_cutout_SGD"

#### 📁 Set up paths and project structure, import libraries

Add the root directory to `sys.path` and import key folder constants.

In [2]:
import sys
import os
# Add project root to sys.path
project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
if project_root not in sys.path:
    sys.path.append(project_root)
from utils.paths import MODELS_DIR, DATA_DIR

# CIFAR10-Torch-Classifier and config.py
from core.cifar10_classifier import CIFAR10Classifier
import torch
import pandas as pd
import json

from torchvision import transforms
from torch.utils.data import Dataset
from PIL import Image

#### 📦 Load the trained model

Load the model and its configuration from the specified `model_name`.  
Also extract the dataset mean and standard deviation used for normalization.


In [7]:
config_path = os.path.join(MODELS_DIR, model_name,  f"{model_name}_config.json")
model_path = os.path.join(MODELS_DIR, model_name,  f"{model_name}_best_model.pth")

assert os.path.exists(config_path), f"Config not found at {config_path}"
assert os.path.exists(model_path), f"Model not found at {model_path}"

model = CIFAR10Classifier.load_model(
    model_name=model_name,
    config_path=config_path,
    model_path=model_path
)

display(model.summary())
mean, std = torch.tensor(model.mean), torch.tensor(model.std)

Layer (type:depth-idx)                   Output Shape              Param #
CIFAR10_CNN                              [1, 10]                   --
├─Sequential: 1-1                        [1, 256, 2, 2]            --
│    └─Conv2d: 2-1                       [1, 32, 32, 32]           896
│    └─BatchNorm2d: 2-2                  [1, 32, 32, 32]           64
│    └─ReLU: 2-3                         [1, 32, 32, 32]           --
│    └─MaxPool2d: 2-4                    [1, 32, 16, 16]           --
│    └─Conv2d: 2-5                       [1, 64, 16, 16]           18,496
│    └─BatchNorm2d: 2-6                  [1, 64, 16, 16]           128
│    └─ReLU: 2-7                         [1, 64, 16, 16]           --
│    └─MaxPool2d: 2-8                    [1, 64, 8, 8]             --
│    └─Conv2d: 2-9                       [1, 128, 8, 8]            73,856
│    └─BatchNorm2d: 2-10                 [1, 128, 8, 8]            256
│    └─ReLU: 2-11                        [1, 128, 8, 8]            --
│   

#### 🧾 Define Kaggle test dataset

This custom dataset class is used to load and preprocess test images  
from the Kaggle competition directory. Images are sorted by filename ID.

In [5]:
class KaggleCIFAR10Dataset(Dataset):
    def __init__(self, image_dir, transform=None):
        self.image_dir = image_dir
        self.image_paths = sorted([
            os.path.join(image_dir, fname)
            for fname in os.listdir(image_dir)
            if fname.endswith(".png")
        ], key=lambda x: int(os.path.splitext(os.path.basename(x))[0]))
        self.transform = transform

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path).convert("RGB")
        if self.transform:
            image = self.transform(image)
        return image, os.path.basename(img_path)

#### 🌀 Define image transformations

Define the transformation pipeline used to preprocess the Kaggle test images  
to match the input format expected by the trained model.

In [6]:
kaggle_transform = transforms.Compose([
    transforms.Resize((32, 32)),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])

#### 📤 Generate Kaggle submission

This function runs the model on the Kaggle test dataset and saves the predictions  
as a `submission.csv` file in the correct format: `[Id, Label]`.

In [17]:
from tqdm import tqdm

def generate_kaggle_submission(model, dataloader, class_names, output_path="submission.csv", device="cuda"):
    model.model.eval()
    predictions = []

    with torch.no_grad():
        for images, image_ids in tqdm(dataloader, desc="📤 Generating predictions", unit="batch"):
            images = images.to(device)
            outputs = model.model(images)
            _, predicted = torch.max(outputs, 1)

            for img_path, label in zip(image_ids, predicted.cpu().numpy()):
                img_id = os.path.splitext(os.path.basename(img_path))[0]
                predictions.append((int(img_id), class_names[label]))

    df = pd.DataFrame(predictions, columns=["Id", "Label"])
    df.to_csv(output_path, index=False, sep=",")
    print(f"\n✅ Saved submission to {output_path}")


#### 🚀 Run Inference and Save Results

Load the Kaggle test dataset, run the model to generate predictions,  
and save the result as a CSV file compatible with Kaggle submission.

In [18]:
# prepare dataset and loader
image_path = os.path.join(DATA_DIR, "Kaggle", "test")
dataset = KaggleCIFAR10Dataset(
    image_dir=image_path,
    transform=kaggle_transform
)
loader = torch.utils.data.DataLoader(dataset, batch_size=128, shuffle=False)

# generate predictions
class_path = os.path.join(DATA_DIR, "class_names.json")
with open(class_path, "r") as f:
    class_names = json.load(f)
    
output_path = os.path.join(DATA_DIR, "Kaggle", "submission.csv")
generate_kaggle_submission(model, loader, class_names, output_path=output_path)


📤 Generating predictions: 100%|██████████| 2344/2344 [42:59<00:00,  1.10s/batch]


✅ Saved submission to submission.csv



