# Dataset

In [None]:
! kaggle competitions download -c drawn-apart-aicc-round-3
! 7z x drawn-apart-aicc-round-3.zip
! mv task_data/task_data/* task_data/

drawn-apart-aicc-round-3.zip: Skipping, found more recently modified local copy (use --force to force download)

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Xeon(R) CPU @ 2.20GHz (406F0),ASM,AES-NI)

Scanning the drive for archives:
  0M Scan         1 file, 1628571999 bytes (1554 MiB)

Extracting archive: drawn-apart-aicc-round-3.zip
  8% 4096 Open              --
Path = drawn-apart-aicc-round-3.zip
Type = zip
Physical Size = 1628571999

  0%      0% 226 - task_data/cartoon/alarm_clock/clipart_003_000040.jpg                                                                 0% 507 - task_data/cartoon/apple/clipart_009_000067.jpg                   

# Task Description

### Storyline

Alexander and Stevie are explorers. Together, they've been on numerous adventures - sandy Bulgarian beaches, scaling the Great Wall of China...

Now, they're boarded the AI hype train, hoping to unlock unparalleled efficiency by leveraging AGI for automated, agentic scouting. No more leg-breaking work, when they can just sit back, relax, and watch the bot go.  

They've just sent their first prototype out into the wild for data collection, watching beautiful, noiseless data stream in when...

*"Hey Alex - what does that weird nondescript button do?"*  
*"Nothing much, just toggles some super secret settings. Just don't press it now since we're doing data col-*NO PLEASE DON*-"*  
*"AHH!"*

The monitor freezes.

Then - *ding!* - a flash of color. The display melts into hyper-saturated cartoons, like a Gen Alpha YouTube video on steroids.

*"..."*  
*"That wasn't too bad, right...?"*  
*"Shut up, Stevie. Let's just cycle through the other settings to get back to normal."*

Another click, and this time, the world shifts into elegant shades of penciled  grey.

*"Stevie... the console hanged."*  
*"...WHAT? The prototype's in the middle of nowhere, and we can't call it back?"*  
*"Yep. I think we're cooked."*  
*"..."*

Just as despair settles in, Stevie's eyes light up.

*"Hold on... I think I can do something with the data we have so far."*

---

### Problem Statement

You are to train a model that can classify *sketches* into their respective categories.

However, **none of the sketches are labelled**. Instead, you are given labelled *photographs* and *cartoons* corresponding to the categories.

---

### Input Format

You have been provided five folders and one file:
- `cartoon`, `photograph`: these contain subfolders titled with the name of each class. Within the subfolders are photos/cartoons of that specific class. For example, `cartoon/ant/xxxxxx.jpg`.
- `sketch`: this contains a single subfolder, `unlabeled`, which contains **9582** randomized, unlabeled sketches (e.g. `sketch/unlabeled/xxxxxx.jpg`).
- `sketch_val`: this contains **200** labeled sketches of the same classes as `cartoon`/`photograph`. This is purely for validation purposes, and labels are provided in the file `val.csv`. The folder contains the files directly (e.g. `sketch_val/xxxxxx.jpg`).
- `sketch_test`: this contains **4624** sketches. You are to classify them into the classes from `cartoon`/`photograph`, without any labels. The folder contains the files directly (e.g. `sketch_val/xxxxxx.jpg`).

### Output Format

You must output a CSV file with two columns:
- `filename`: the exact file name of the prediction, e.g. `xxxxxx.jpg`
- `class_name`: the exact name of the class as per the `cartoon`/`photograph` directories, e.g. `alarm_clock`

---

### Scoring Method

The performance of your model will be determined by **F1 Score**.

The baseline solution in this notebook scores **0.4093**, rounded to 4 decimal places.

---

### Architecture Restrictions

1. You **may not** use any pre-trained models, apart from those provided in [torchvision](https://docs.pytorch.org/vision/main/models.html).
2. You **may not** train your model directly on the validation or test samples in any way. This includes any unsupervised learning technique, calculating statistics over them, running GPT-5 agents etc.
3. You **may not** use any external dataset.
4. You **may not** label the unlabeled sketches or test set with any pre-trained model, e.g. CLIP.


# Baseline

### Loading the dataset

In [None]:
import torchvision
from torchvision.datasets import ImageFolder

In [None]:
cartoon_ds = ImageFolder(root="task_data/cartoon")
photo_ds = ImageFolder(root="task_data/photograph")
sketch_ds = ImageFolder(root="task_data/sketch")

### Preparing the data
For this baseline, we will take a very simple approach - combine the all labelled data and use it to predict on the test data.

In [None]:
import torch
import torch.nn as nn
from torch.utils.data import ConcatDataset, DataLoader, random_split
from torchvision.transforms import v2

from torchvision import transforms

# Define the transformations for the images
transform = v2.Compose([
    v2.ToImage(),
    v2.Resize(256),
    v2.CenterCrop(224)
])

gpu_transform = v2.Compose([
    v2.RandomHorizontalFlip(p=0.5),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

cartoon_ds.transform = transform
photo_ds.transform = transform

ds = ConcatDataset([cartoon_ds, photo_ds])
train_ds, val_ds = random_split(ds, [0.7, 0.3], generator=torch.Generator().manual_seed(42))
train_loader = DataLoader(train_ds, batch_size=128, shuffle=True, num_workers=2, prefetch_factor=1, persistent_workers=True)
val_loader = DataLoader(val_ds, batch_size=128, shuffle=False, num_workers=2, persistent_workers=True)

### Training the model
We will finetune ResNet34 for this baseline.

In [None]:
import torchvision.models as models

model = models.resnet34(pretrained=True)
model.requires_grad_(False)

num_classes = len(cartoon_ds.classes)
model.fc = nn.Linear(model.fc.in_features, num_classes)

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)



Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth


100%|██████████| 83.3M/83.3M [00:00<00:00, 203MB/s]


In [None]:
from tqdm.auto import tqdm

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In [None]:
num_epochs = 10

best_val_loss = float('inf')
best_model_weights = None

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(tqdm(train_loader, desc=f"Epoch {epoch+1} Training")):
        inputs = gpu_transform(inputs.to(device))
        labels = labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)

    epoch_loss = running_loss / len(train_loader.dataset)

    # Validation phase
    model.eval()
    val_running_loss = 0.0
    with torch.no_grad():
        for inputs, labels in tqdm(val_loader, desc=f"Epoch {epoch+1} Validation"):
            inputs = gpu_transform(inputs.to(device))
            labels = labels.to(device)

            outputs = model(inputs)
            loss = criterion(outputs, labels)

            val_running_loss += loss.item() * inputs.size(0)

    val_epoch_loss = val_running_loss / len(val_loader.dataset)

    print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {epoch_loss:.4f}, Val Loss: {val_epoch_loss:.4f}")

    # Save best model weights
    if val_epoch_loss < best_val_loss:
        best_val_loss = val_epoch_loss
        best_model_weights = model.state_dict()
        print("Saved best model weights!")

print("Model training finished.")

Epoch 1 Training:   0%|          | 0/196 [00:00<?, ?it/s]

Epoch 1 Validation:   0%|          | 0/84 [00:00<?, ?it/s]

Epoch 1/10, Train Loss: 2.1458, Val Loss: 1.2119
Saved best model weights!


Epoch 2 Training:   0%|          | 0/196 [00:00<?, ?it/s]

Epoch 2 Validation:   0%|          | 0/84 [00:00<?, ?it/s]

Epoch 2/10, Train Loss: 1.0328, Val Loss: 0.9539
Saved best model weights!


Epoch 3 Training:   0%|          | 0/196 [00:00<?, ?it/s]

Epoch 3 Validation:   0%|          | 0/84 [00:00<?, ?it/s]

Epoch 3/10, Train Loss: 0.8522, Val Loss: 0.8676
Saved best model weights!


Epoch 4 Training:   0%|          | 0/196 [00:00<?, ?it/s]

Epoch 4 Validation:   0%|          | 0/84 [00:00<?, ?it/s]

Epoch 4/10, Train Loss: 0.7521, Val Loss: 0.8275
Saved best model weights!


Epoch 5 Training:   0%|          | 0/196 [00:00<?, ?it/s]

Epoch 5 Validation:   0%|          | 0/84 [00:00<?, ?it/s]

Epoch 5/10, Train Loss: 0.6906, Val Loss: 0.8014
Saved best model weights!


Epoch 6 Training:   0%|          | 0/196 [00:00<?, ?it/s]

Epoch 6 Validation:   0%|          | 0/84 [00:00<?, ?it/s]

Epoch 6/10, Train Loss: 0.6382, Val Loss: 0.7810
Saved best model weights!


Epoch 7 Training:   0%|          | 0/196 [00:00<?, ?it/s]

Epoch 7 Validation:   0%|          | 0/84 [00:00<?, ?it/s]

Epoch 7/10, Train Loss: 0.6012, Val Loss: 0.7842


Epoch 8 Training:   0%|          | 0/196 [00:00<?, ?it/s]

Epoch 8 Validation:   0%|          | 0/84 [00:00<?, ?it/s]

Epoch 8/10, Train Loss: 0.5659, Val Loss: 0.7745
Saved best model weights!


Epoch 9 Training:   0%|          | 0/196 [00:00<?, ?it/s]

Epoch 9 Validation:   0%|          | 0/84 [00:00<?, ?it/s]

Epoch 9/10, Train Loss: 0.5421, Val Loss: 0.7718
Saved best model weights!


Epoch 10 Training:   0%|          | 0/196 [00:00<?, ?it/s]

Epoch 10 Validation:   0%|          | 0/84 [00:00<?, ?it/s]

Epoch 10/10, Train Loss: 0.5174, Val Loss: 0.7757
Model training finished.


In [None]:
# Load the best model weights after training
if best_model_weights is not None:
    model.load_state_dict(best_model_weights)
    print(f"Model restored to best validation loss ( {best_val_loss:.4f} ) state.")
else:
    print("No best model weights saved.")

Model restored to best validation loss ( 0.7718 ) state.


# Evaluation

### Loading test dataset

In [None]:
import os
from PIL import Image
from torch.utils.data import Dataset

class SketchTestDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.image_files = [f for f in os.listdir(root_dir) if f.endswith(('.jpg', '.jpeg', '.png'))]

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, idx):
        filename = self.image_files[idx]
        img_path = os.path.join(self.root_dir, filename)
        image = Image.open(img_path).convert('RGB')

        if self.transform:
            image = self.transform(image)

        return image, filename

In [None]:
val_dataset = SketchTestDataset(root_dir='task_data/sketch_val', transform=transform)
val_loader = DataLoader(val_dataset, batch_size=128, shuffle=False, num_workers=1, persistent_workers=True)

test_dataset = SketchTestDataset(root_dir='task_data/sketch_test', transform=transform)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False, num_workers=2, persistent_workers=True)

print(f"Number of images in val dataset: {len(val_dataset)}")
print(f"Number of images in test dataset: {len(test_dataset)}")

Number of images in val dataset: 200
Number of images in test dataset: 4624


### Validation

In [None]:
model.eval()
all_predictions = []
all_filenames = []

with torch.no_grad():
    for inputs, filenames in tqdm(val_loader, desc="Predicting on Test Data"):
        inputs = gpu_transform(inputs.to(device))
        outputs = model(inputs)
        _, predicted_indices = torch.max(outputs, 1)
        all_predictions.extend(predicted_indices.cpu().numpy())
        all_filenames.extend(filenames)

print("Predictions completed.")

Predicting on Test Data:   0%|          | 0/2 [00:00<?, ?it/s]

Predictions completed.


In [None]:
import pandas as pd
from sklearn.metrics import f1_score

class_names = cartoon_ds.classes
predicted_class_names = [class_names[idx] for idx in all_predictions]
assert len(all_filenames) == len(predicted_class_names), "Mismatch in lengths of filenames and predicted class names!"

val_submission_df = pd.DataFrame({
    'filename': all_filenames,
    'class_name': predicted_class_names
})
val_solution_df = pd.read_csv('task_data/val.csv')

merged_df = pd.merge(val_submission_df, val_solution_df, on='filename', suffixes=('_predicted', '_true'))

y_true = merged_df['class_name_true']
y_pred = merged_df['class_name_predicted']

print(f"F1 Score: {f1_score(y_true, y_pred, average='weighted')}")

F1 Score: 0.4022744495647721


### Predict on test

In [None]:
model.eval()
all_predictions = []
all_filenames = []

with torch.no_grad():
    for inputs, filenames in tqdm(test_loader, desc="Predicting on Test Data"):
        inputs = gpu_transform(inputs.to(device))
        outputs = model(inputs)
        _, predicted_indices = torch.max(outputs, 1)
        all_predictions.extend(predicted_indices.cpu().numpy())
        all_filenames.extend(filenames)

print("Predictions completed.")

Predicting on Test Data:   0%|          | 0/37 [00:00<?, ?it/s]

Predictions completed.


### Save to CSV

In [None]:
class_names = cartoon_ds.classes
predicted_class_names = [class_names[idx] for idx in all_predictions]

assert len(all_filenames) == len(predicted_class_names), "Mismatch in lengths of filenames and predicted class names!"

submission_df = pd.DataFrame({
    'filename': all_filenames,
    'class_name': predicted_class_names
})

submission_df.to_csv('submission.csv', index=False)

print("submission.csv created successfully.")
print(submission_df.head())

submission.csv created successfully.
                                   filename  class_name
0  38203cd6-8c44-4996-a32c-5187f3c9155f.jpg        swan
1  694d5198-6846-42c8-9592-f8b2efa8cb4e.jpg       shark
2  93e797b3-7127-4f72-8897-a961727334fb.jpg  headphones
3  82d12a60-7236-4292-a918-eed01a8e16ed.jpg   butterfly
4  fbac0949-cc63-4722-8271-0303622ac2f4.jpg       couch
