## Challenge 1

This colab notebook is used to allow creating an evaluation of your algorithm developed for Challenge 1 and a file to be submitted on "Codalab" the leaderboard

https://codalab.lisn.upsaclay.fr/competitions/11115?secret_key=e47c9134-ebbc-4132-8552-06b8fa3e3df3

Use your algorithm to modify the code below, it will train on 25 pre-selected seeds and produce test output. You can then upload this to see your score in the leaderboard. 

Note the ouput file "submission.zip" is created inside the colab, click on the folder icon on the left to see it and download it. This is the file you submit on the leaderboard.

In order for a top score to count you must also add to the zip file your version of this notebook (so that it can be inspected for violations of the contest rules, entries violating the rules will be removed from the leaderboard).

Violation of contest rules and spirit include for example:
- trying to label the test data in other ways
- violation challenge 1 rules (e.g. using pre-trained models)

### Downloads

Dataset Files:

- Training splits (contain targets): https://drive.google.com/file/d/1-Olha5krpVYKOy7AIPsn20lkQftSVrDy/view?usp=share_link
- Testing splits (no labels): https://drive.google.com/file/d/1-GLipKH63MaTXj0zLqtNtMDKpFGEasB_/view?usp=share_link


Run the following pair of cells to download the files to your session.

In [36]:
!pip install gdown

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [37]:
import gdown

file_ids = {'train_data.zip': '1-Olha5krpVYKOy7AIPsn20lkQftSVrDy',
            'test_data.zip': '1-GLipKH63MaTXj0zLqtNtMDKpFGEasB_'}
url_template = 'https://drive.google.com/uc?id={file_id}'

for filename, file_id in file_ids.items():
    url = url_template.format(file_id=file_id)
    gdown.download(url, filename, quiet=False)

Downloading...
From: https://drive.google.com/uc?id=1-Olha5krpVYKOy7AIPsn20lkQftSVrDy
To: /content/train_data.zip
100%|██████████| 15.4M/15.4M [00:00<00:00, 99.9MB/s]
Downloading...
From: https://drive.google.com/uc?id=1-GLipKH63MaTXj0zLqtNtMDKpFGEasB_
To: /content/test_data.zip
100%|██████████| 307M/307M [00:01<00:00, 214MB/s]


## Imports

In [38]:
import os
import re
import zipfile
import torch
import torch.nn as nn 
import torch.nn.functional as F
import numpy as np
import torch.optim as optim
import matplotlib.pyplot as plt

from tqdm.notebook import tqdm
from torch.utils.data import TensorDataset
from numpy.random import RandomState
from torch.utils.data import Subset, DataLoader
from torchvision import datasets, transforms

- Import dataset and define preprocessing functions

In [39]:
normalize = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))

transform_val = transforms.Compose([transforms.ToTensor(), normalize]) #careful to keep this one same
transform_train = transforms.Compose([transforms.ToTensor(), normalize]) 

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")


## Training Template

- Here we provide some code for you to get started with a baseline

In [40]:
def train(model, device, train_loader, optimizer, epoch, display=True):
    model.train()
    correct = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        target = target.float()
        data, target = data.to(device), target.to(device)
        
        optimizer.zero_grad()
        output = model(data).squeeze()

        loss = F.binary_cross_entropy_with_logits(output, target)
        loss.backward()
        optimizer.step()

        pred = output > 0.5 
        correct += pred.eq(target.view_as(pred)).sum().item()
    if display:
        print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
              epoch, batch_idx * len(data), len(train_loader.dataset),
              100. * batch_idx / len(train_loader), loss.item()))
    return correct / len(train_loader.dataset)

def predict(model,
            test_data,
            device=None,
            data_transform=transform_val,
            data_desc='prediction set'):
    device = torch.device(device or "cuda" if torch.cuda.is_available() else "cpu")
    model.eval()
    to_image = transforms.ToPILImage()
    predictions = []
    with torch.no_grad():
        for data in tqdm(test_data, desc=data_desc):
            data = torch.tensor(data).to(device)
            data = to_image(data)
            data = data_transform(data)
            data = data.to(device)
            output = model(data.unsqueeze(0)).squeeze(0)
            pred = 1 if output.item() > 0.5 else 0
            #pred = torch.argmax(output_sm1)
            predictions.append(pred)
    return predictions

In [41]:
class SmallNet(nn.Module):
    def __init__(self):
        super(SmallNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(16)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.conv3 = nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(32)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc1 = nn.Linear(32 * 8 * 8, 1024)
        self.drop = nn.Dropout(p=0.5)
        self.fc2 = nn.Linear(1024, 1)

    def forward(self, x):
        x = self.pool1(nn.functional.relu(self.bn1(self.conv1(x))))
        x = nn.functional.relu(self.conv2(x))
        x = self.pool2(nn.functional.relu(self.bn2(self.conv3(x))))
        x = x.view(-1, 32 * 8 * 8)
        x = nn.functional.relu(self.fc1(x))
        x = self.drop(x)
        x = self.fc2(x)
        return torch.nn.functional.sigmoid(x)

# Define the augmentation network architecture
class AugNet(nn.Module):
    def __init__(self):
        super(AugNet, self).__init__()
        self.conv1 = nn.Conv2d(6, 16, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1)
        self.conv3 = nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1)
        self.conv4 = nn.Conv2d(16, 16, kernel_size=3, stride=1, padding=1)
        self.conv5 = nn.Conv2d(16, 3, kernel_size=3, stride=1, padding=1)

    def forward(self, x):
        x = nn.functional.relu(self.conv1(x))
        x = nn.functional.relu(self.conv2(x))
        x = nn.functional.relu(self.conv3(x))
        x = nn.functional.relu(self.conv4(x))
        x = self.conv5(x)
        return x

- The cells below assumes you have uploaded the provided `train_data.zip` file which contains the given training splits. It will validate on random samples.

    Keep in mind for small sample problems the variance is high so continously evaluating on several subsets will be important.

In [42]:
# Load datasets
train_splits = []
train_targets = []
with zipfile.ZipFile('train_data.zip', mode='r') as archive:
    for a in archive.namelist():
        with archive.open(a) as f:
            d = np.load(f, allow_pickle=True)
            train_splits.append(d[0])
            train_targets.append(d[1])

test_splits = []
with zipfile.ZipFile('test_data.zip', mode='r') as archive:
    for a in archive.namelist():
        with archive.open(a) as f:
            d = np.load(f, allow_pickle=True)
            test_splits.append(d)

- You may use the following script to run your training experiments.

In [43]:
models = []
train_accuracies = []
eval_predictions = []
for i, (train_X, train_Y, test_X) in enumerate(zip(tqdm(train_splits,
                                                        desc='seeds loop'),
                                                   train_targets,
                                                   test_splits)):

    config = dict(batch_size=64,
                  epochs=100,
                  learning_rate=0.0001,
                  momentum=0.9,
                  weight_decay=5e-4)
    small_net = SmallNet()
    small_net.to(device)
    net_aug = AugNet()
    net_aug.to(device)

    small_net_loss = nn.CrossEntropyLoss()
    to_image = transforms.ToPILImage()

    optimizer = optim.Adam(list(small_net.parameters()) + list(net_aug.parameters()), lr=config["learning_rate"], weight_decay=config["weight_decay"])

 
    small_net.train()
    net_aug.train()

    train_loader = DataLoader(TensorDataset(train_X, train_Y),
                              batch_size=config['batch_size'],
                              shuffle=True)
    
    correct = 0
    total = 0
    for epoch in tqdm(range(config['epochs']), desc=f'training model {i+1}'):
      
      for j, data in enumerate(train_loader):

          inputs, labels = data
          inputs = inputs.to(device)
          labels = labels.to(device)
          iter = 0
          for input, label in zip(inputs, labels):
            

            #Original Image
            input = to_image(input)
            input = transform_train(input)
            input = input.to(device)
            output_sm1 = small_net(input.unsqueeze(0)).squeeze(0)

            #print(output_sm1)

            pred = 1 if output_sm1.item() > 0.5 else 0
            #pred = torch.argmax(output_sm1)
            #print(pred)

            if pred == label:
              correct += 1
            total += 1   


            #loss_sm1 = small_net_loss(output_sm1, label)
            loss_sm1 = F.binary_cross_entropy_with_logits(output_sm1[0], label.float())
            #loss_sm1.backward()


            #Generate Aug Image

            class_indices = [idx for idx, _ in enumerate(inputs) if labels[idx] == label]

            sampled_indices = torch.randperm(len(class_indices))[:3]


            aug_image = torch.cat([inputs[sampled_indices[0]], inputs[sampled_indices[1]]], dim=0).to(device)
            aug_image_compare = inputs[sampled_indices[2]]

           
            output_aug = net_aug(aug_image)
            

            C, W, H = output_aug.shape

            content_loss = torch.sum((output_aug - aug_image_compare) ** 2) / (C * W * H)

            output_aug = to_image(output_aug)
            output_aug = transform_train(output_aug)
            output_aug = output_aug.to(device)

            output_sm2 = small_net(output_aug.unsqueeze(0)).squeeze(0)

            pred = 1 if output_sm2.item() > 0.5 else 0

            #loss_sm2 = small_net_loss(output_sm2, label)
            loss_sm2 = F.binary_cross_entropy_with_logits(output_sm2[0], label.float())


            

            total_loss = 0.75 * (loss_sm1 + loss_sm2) + 0.25 * content_loss

            
            optimizer.zero_grad()
            total_loss.backward()
            optimizer.step()
 

      acc = (correct) / total      
 

    print('Seed {}/{} Loss: {} Accuracy: {}'.format(i + 1, 25, total_loss, acc))

    # evaluate model

    model = small_net

    train_accuracies.append(acc)
    models.append(model)

    pred_Y = predict(model, test_X, 
                     data_desc=f'CODALab eval instance {i+1}')
    eval_predictions.append(pred_Y)

print(f'Mean Train Acc over {len(train_splits)} models: '\
      f'{np.mean(train_accuracies):.2%} '\
      f'+- {np.std(train_accuracies):.2}')

seeds loop:   0%|          | 0/25 [00:00<?, ?it/s]

training model 1:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 1/25 Loss: 0.9691410064697266 Accuracy: 0.9354


CODALab eval instance 1:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 2:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 2/25 Loss: 1.3290446996688843 Accuracy: 0.9506


CODALab eval instance 2:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 3:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 3/25 Loss: 1.1279195547103882 Accuracy: 0.9488


CODALab eval instance 3:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 4:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 4/25 Loss: 1.3183926343917847 Accuracy: 0.7812


CODALab eval instance 4:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 5:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 5/25 Loss: 1.247666835784912 Accuracy: 0.9714


CODALab eval instance 5:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 6:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 6/25 Loss: 1.2359743118286133 Accuracy: 0.9038


CODALab eval instance 6:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 7:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 7/25 Loss: 1.3093739748001099 Accuracy: 0.8324


CODALab eval instance 7:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 8:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 8/25 Loss: 1.2494412660598755 Accuracy: 0.6726


CODALab eval instance 8:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 9:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 9/25 Loss: 1.0283678770065308 Accuracy: 0.9434


CODALab eval instance 9:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 10:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 10/25 Loss: 1.2638731002807617 Accuracy: 0.9356


CODALab eval instance 10:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 11:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 11/25 Loss: 1.0265114307403564 Accuracy: 0.9588


CODALab eval instance 11:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 12:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 12/25 Loss: 1.2600263357162476 Accuracy: 0.9358


CODALab eval instance 12:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 13:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 13/25 Loss: 1.1215975284576416 Accuracy: 0.9542


CODALab eval instance 13:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 14:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 14/25 Loss: 1.3086133003234863 Accuracy: 0.9454


CODALab eval instance 14:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 15:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 15/25 Loss: 1.068359375 Accuracy: 0.957


CODALab eval instance 15:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 16:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 16/25 Loss: 0.8545812368392944 Accuracy: 0.969


CODALab eval instance 16:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 17:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 17/25 Loss: 1.3347742557525635 Accuracy: 0.9556


CODALab eval instance 17:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 18:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 18/25 Loss: 0.9147590398788452 Accuracy: 0.9666


CODALab eval instance 18:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 19:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 19/25 Loss: 0.8458008766174316 Accuracy: 0.9318


CODALab eval instance 19:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 20:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 20/25 Loss: 0.8869202136993408 Accuracy: 0.9588


CODALab eval instance 20:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 21:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 21/25 Loss: 1.2399486303329468 Accuracy: 0.94


CODALab eval instance 21:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 22:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 22/25 Loss: 1.3944847583770752 Accuracy: 0.9064


CODALab eval instance 22:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 23:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 23/25 Loss: 1.2514731884002686 Accuracy: 0.9278


CODALab eval instance 23:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 24:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 24/25 Loss: 1.1340711116790771 Accuracy: 0.964


CODALab eval instance 24:   0%|          | 0/1000 [00:00<?, ?it/s]

training model 25:   0%|          | 0/100 [00:00<?, ?it/s]

Seed 25/25 Loss: 0.9614386558532715 Accuracy: 0.9538


CODALab eval instance 25:   0%|          | 0/1000 [00:00<?, ?it/s]

Mean Train Acc over 25 models: 92.40% +- 0.066


## Test and Submit to CODALab

**RUN THE FOLLOWING CELL** to run the testing loop. 

Results are saved using the CODALab path convention, and zipped at the end
    
Download the generated `submission.zip` file and upload it to CODALab.

You may _only_ change the `model` argument of `run_testing` to test your own neural network. (i.e., if you change anything else your submission might crash in CODALab!)

In [44]:
def zip_evaluation(models,
                   test_data_location = 'test_data',
                   test_data_file='test_data.zip',
                   submission_file='submission.zip',
                   predpath='./submission/input/res',
                   verbose=False):
    verb = lambda *s: print('[INFO] ', *s) if verbose else None
    
    verb(f'Files will be inside {predpath}')
    if not os.path.exists(predpath):
        os.makedirs(predpath)

    verb(f'Compressing in {submission_file}')
    with zipfile.ZipFile(submission_file, mode='a') as archive:
        for i, pred_Y in tqdm(enumerate(eval_predictions),
                              total=len(eval_predictions),
                              desc='seed loop'):
            np.savetxt(f'{predpath}/predictions_{i}.txt', pred_Y)
            archive.write(f'{predpath}/predictions_{i}.txt')
    verb('All Done Successfully :)')

zip_evaluation(model, verbose=True)

[INFO]  Files will be inside ./submission/input/res
[INFO]  Compressing in submission.zip


seed loop:   0%|          | 0/25 [00:00<?, ?it/s]

[INFO]  All Done Successfully :)
