# Train a model with Classical Training

Although episodic training has attracted a lot of interest in the early years of Few-Shot Learning research, more recent works suggest that competitive results can be achieved with a simple cross entropy loss across all training classes. Therefore, it is becoming more and more common to use this classical process to train the backbone, that will be common to all methods compared at test time.

This is in fact more representative of real use cases: episodic training assumes that, at training time, you have access to the shape of the few-shot tasks that will be encountered at test time (indeed you choose a specific number of ways for episodic training). You also "force" your inference method into the training of the network. Switching the few-shot learning logic to inference (i.e. no episodic training) allows methods to be agnostic of the backbone.

Nonetheless, if you need to perform episodic training, we also provide [an example notebook](episodic_training.ipynb) for that.

## Getting started
First we're going to do some imports (this is not the interesting part).

In [1]:
%cd ..

D:\iib\project\github\fsl_gesture


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [2]:
import copy
from pathlib import Path
import random
from statistics import mean

import numpy as np
import torch
from torch import nn
from tqdm import tqdm
import pandas as pd
from data_utils import *
from get_dataset import *

Then we're gonna do the most important thing in Machine Learning research: ensuring reproducibility by setting the random seed. We're going to set the seed for all random packages that we could possibly use, plus some other stuff to make CUDA deterministic (see [here](https://pytorch.org/docs/stable/notes/randomness.html)).

I strongly encourage that you do this in **all your scripts**.

In [3]:
random_seed = 0
np.random.seed(random_seed)
torch.manual_seed(random_seed)
random.seed(random_seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

Then we're gonna create our data loader for the training set. You can see that I chose tu use CUB in this notebook, because it’s a small dataset, so we can have good results quite quickly. I set a batch size of 128 but feel free to adapt it to your constraints.

Note that we're not using the `TaskSampler` for the train data loader, because we won't be sampling training data in the shape of tasks as we would have in episodic training. We do it **normally**.

In [4]:
def train_val_split(train_data, train_label, n_class, train_size, val_size, length=300):
    train = np.zeros((train_size*n_class, 3, length, 50))
    t_label = np.zeros(train_size*n_class)
    val = np.zeros((val_size*n_class,3,length,50))
    v_label = np.zeros(val_size*n_class)
    total = train_size + val_size
    for i in range(n_class):
        j = i*total
        k = i*train_size
        w = i*val_size
        train[k:k+train_size] = train_data[j:j+train_size]
        t_label[k:k+train_size] = train_label[j:j+train_size]
        val[w:w+val_size] = train_data[j+train_size:j+total]
        v_label[w:w+val_size] = train_label[j+train_size:j+total]
    return train, t_label,val,v_label

In [5]:
train_path = 'D:/iib_project/data/Gesture_Dataset/gestures/data/train/'
annotation_path = 'D:/iib_project/data/Gesture_Dataset/gestures/annotations/'
trainpart1_path = 'train part1-'
trainpart2_path = 'train part2-'
trainpart3_path = 'train part3-'

p_ids = [2,5,6,7,8,9,10,11,23,24]
single_frame_gestures = {'Thumb Up': trainpart1_path, 'Stop':trainpart1_path, 
                         'paper':trainpart2_path, 'Rock':trainpart3_path,
                         'Heart':trainpart2_path,'Circle':trainpart2_path}

multiframe_gestures = {'Scissor':trainpart2_path,'Gun':trainpart1_path, 
                                   'Grab things':trainpart2_path, 'Nozzle rotation':trainpart1_path,
                                   'Swipe':trainpart1_path, "Drive car":trainpart3_path, 
                                   "Teleport":trainpart3_path, "Two hands scale":trainpart3_path, 
                                   "Two hands delete":trainpart3_path, "Two hands flick":trainpart3_path}

In [6]:
d_set, d_label = get_data_singlestate(train_path, single_frame_gestures, p_ids,50)
#train, t_label,val,v_label = train_val_split(d_set,d_label, 16,30,20)

In [7]:
d_set.shape

(300, 3, 1, 50)

In [9]:
testpart1_path = 'test 1-'
testpart2_path = 'test 2-'
testpart3_path = 'test 3-'
test_path = 'D:/iib_project/data/Gesture_Dataset/gestures/data/test/'
single_frame_gestures_test = {'Thumb Up': testpart1_path, 'Stop':testpart1_path, 
                         'paper':testpart2_path, 'Rock':testpart3_path,
                         'Heart':testpart2_path,'Circle':testpart2_path}

multiframe_gestures_test = {'Scissor':testpart2_path,'Gun':testpart1_path, 
                                   'Grab things':testpart2_path, 'Nozzle rotation':testpart1_path,
                                   'Swipe':testpart1_path, "Drive car":testpart3_path, 
                                   "Teleport":testpart3_path, "Two hands scale":testpart3_path, 
                                   "Two hands delete":testpart3_path, "Two hands flick":testpart3_path,
                           'Null': testpart1_path}

In [60]:
test, test_label = get_dataset(test_path, single_frame_gestures_test,multiframe_gestures_test, p_ids,10)

In [62]:
test.shape

(170, 3, 300, 50)

In [38]:
import torch
from torch.utils.data import Dataset

class GestureDataset(Dataset):
    def __init__(self, gesture_data, labels):
        self.data = gesture_data
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        gesture_sample = torch.tensor(self.data[idx],dtype=torch.float)
        label = torch.tensor(self.labels[idx].astype(np.int64))
        
        return (gesture_sample, label)
    def get_labels(self):
        return self.labels

In [66]:
train_dataset = GestureDataset(train, t_label)
val_dataset = GestureDataset(val, v_label)
test_dataset = GestureDataset(test, test_label)

In [44]:
len(train_dataset)

480

In [67]:
test_label.shape

(170,)

Now, we are going to create the model that we want to train. Here we choose the ResNet12 that is very often used in Few-Shot Learning research. Note that the default setting of these networks in EasyFSL is to not have a last fully connected layer (as it is usual for most Few-Shot Learning methods), but for classical training we need this layer! We also force it to output a vector which size is the number of different classes in the training set.

In [47]:
from easyfsl.modules import resnet12

DEVICE = "cpu"

model = resnet12(
    use_fc=True,
    num_classes=16,
).to(DEVICE)

Now, we still need validation ! Since we're training a model to perform few-shot classification, we will validate on few-shot tasks, so now we'll use the `TaskSampler`. We arbitrarily set the shape of the validation tasks. Ideally, you'd like to perform validation on various shapes of tasks, but we didn't implement this yet (feel free to contribute!).

We also need to define the few-shot classification method that we will use during validation of the neural network we're training.
Here we choose Prototypical Networks, because it's simple and efficient, but this is still an arbitrary choice.

In [58]:
from easyfsl.methods import PrototypicalNetworks
from easyfsl.samplers import TaskSampler
from torch.utils.data import DataLoader

batch_size = 32
n_workers = 0

train_loader = DataLoader(
    train_dataset,
    batch_size=batch_size,
    num_workers=n_workers,
    pin_memory=True,
    shuffle=True,
)
few_shot_classifier = PrototypicalNetworks(model).to(DEVICE)

In [57]:
n_way = 5
n_shot = 5
n_query = 10
n_validation_tasks = 5

val_sampler = TaskSampler(
    val_dataset, n_way=n_way, n_shot=n_shot, n_query=n_query, n_tasks=n_validation_tasks
)
val_loader = DataLoader(
    val_dataset,
    batch_sampler=val_sampler,
    num_workers=n_workers,
    pin_memory=True,
    collate_fn=val_sampler.episodic_collate_fn,
)


## Training

Now let's define our training helpers ! I chose to use Stochastic Gradient Descent on 200 epochs with a scheduler that divides the learning rate by 10 after 120 and 160 epochs. The strategy is derived from [this repo](https://github.com/fiveai/on-episodes-fsl).

We're also gonna use a TensorBoard because it's always good to see what your training curves look like.

An other thing: we're doing 200 epochs like in [the episodic training notebook](notebooks/episodic_training.ipynb), but keep in mind that an epoch in classical training means one pass through the 6000 images of the dataset, while in episodic training it's an arbitrary number of episodes. In the episodic training notebook an epoch is 500 episodes of 5-way, 5-shot, 10-query tasks, so 37500 images. TL;DR you may want to monitor your training and increase the number of epochs if necessary.

In [53]:
from torch.optim import SGD, Optimizer
from torch.optim.lr_scheduler import MultiStepLR
from torch.utils.tensorboard import SummaryWriter


LOSS_FUNCTION = nn.CrossEntropyLoss()

n_epochs = 2
scheduler_milestones = [150, 180]
scheduler_gamma = 0.1
learning_rate = 1e-01
tb_logs_dir = Path(".")

train_optimizer = SGD(
    model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=5e-4
)
train_scheduler = MultiStepLR(
    train_optimizer,
    milestones=scheduler_milestones,
    gamma=scheduler_gamma,
)

tb_writer = SummaryWriter(log_dir=str(tb_logs_dir))

And now let's get to it! Here we define the function that performs a training epoch.

We use tqdm to monitor the training in real time in our logs.

In [54]:
def training_epoch(model_: nn.Module, data_loader: DataLoader, optimizer: Optimizer):
    all_loss = []
    model_.train()
    with tqdm(data_loader, total=len(data_loader), desc="Training") as tqdm_train:
        for images, labels in tqdm_train:
            optimizer.zero_grad()

            loss = LOSS_FUNCTION(model_(images.to(DEVICE)), labels.to(DEVICE))
            loss.backward()
            optimizer.step()

            all_loss.append(loss.item())

            tqdm_train.set_postfix(loss=mean(all_loss))

    return mean(all_loss)

And we have everything we need! This is now the time to **start training**.

A few notes:

- We only validate every 10 epochs (you may set an even less frequent validation) because a training epoch is much faster than 500 few-shot tasks, and we don't want validation to be the bottleneck of our training process.

- I also added something to log the state of the model that gave the best performance on the validation set.

In [59]:
best_state = model.state_dict()
best_validation_accuracy = 0.0
validation_frequency = 1
for epoch in range(n_epochs):
    print(f"Epoch {epoch}")
    average_loss = training_epoch(model, train_loader, train_optimizer)

    if epoch % validation_frequency == validation_frequency - 1:

        # We use this very convenient method from EasyFSL's ResNet to specify
        # that the model shouldn't use its last fully connected layer during validation.
        model.set_use_fc(False)
        validation_accuracy = evaluate(
            few_shot_classifier, val_loader, device=DEVICE, tqdm_prefix="Validation"
        )
        model.set_use_fc(True)

        if validation_accuracy > best_validation_accuracy:
            best_validation_accuracy = validation_accuracy
            best_state = copy.deepcopy(few_shot_classifier.state_dict())
            # state_dict() returns a reference to the still evolving model's state so we deepcopy
            # https://pytorch.org/tutorials/beginner/saving_loading_models
            print("Ding ding ding! We found a new best model!")

        tb_writer.add_scalar("Val/acc", validation_accuracy, epoch)

    tb_writer.add_scalar("Train/loss", average_loss, epoch)

    # Warn the scheduler that we did an epoch
    # so it knows when to decrease the learning rate
    train_scheduler.step()

Epoch 0


Training: 100%|█████████████████████████████████████████████████████| 15/15 [04:00<00:00, 16.03s/it, loss=2.82]
Validation: 100%|████████████████████████████████████████████████| 5/5 [00:45<00:00,  9.17s/it, accuracy=0.652]


Ding ding ding! We found a new best model!
Epoch 1


Training: 100%|█████████████████████████████████████████████████████| 15/15 [03:56<00:00, 15.79s/it, loss=2.06]
Validation: 100%|████████████████████████████████████████████████| 5/5 [00:51<00:00, 10.28s/it, accuracy=0.816]

Ding ding ding! We found a new best model!





Yay we successfully performed Classical Training! Now if you want to you can retrieve the best model's state.

In [63]:
best_state = model.state_dict()
model.load_state_dict(best_state)

<All keys matched successfully>

## Evaluation

Now that our model is trained, we want to test it.

First step: we fetch the test data. Note that we'll evaluate on the same shape of tasks as in validation. This is malicious practice, because it means that we used *a priori* information about the evaluation tasks during training. This is still less malicious than episodic training, though.

In [69]:
n_test_tasks = 100

test_sampler = TaskSampler(
    test_dataset, n_way=5, n_shot=5, n_query=5, n_tasks=n_test_tasks
)
test_loader = DataLoader(
    test_dataset,
    batch_sampler=test_sampler,
    num_workers=n_workers,
    pin_memory=True,
    collate_fn=test_sampler.episodic_collate_fn,
)

In [40]:
d = torch.tensor(val,dtype=torch.float)
l = v_label.astype(np.int64)
few_shot_classifier.process_support_set(d, torch.tensor(l))

In [68]:
test_loader.dataset

<__main__.GestureDataset at 0x24d91e0b700>

Second step: we instantiate a few-shot classifier using our trained ResNet as backbone, and run it on the test data. We keep using Prototypical Networks for consistence, but at this point you could basically use any few-shot classifier that takes no additional trainable parameters.

Like we did during validation, we need to tell our ResNet to not use its last fully connected layer.

In [36]:
def evaluate_val(test_loader):
    model.set_use_fc(True)
    #classifier.process_support_set(val_dataset, v_label)
    few_shot_classifier.process_support_set()
    total_predictions = 0
    correct_predictions = 0
    
    # eval mode affects the behaviour of some layers (such as batch normalization or dropout)
    # no_grad() tells torch not to keep in memory the whole computational graph
    model.eval()
    with torch.no_grad():
        # We use a tqdm context to show a progress bar in the logs
        with tqdm(test_loader,total=len(test_loader)) as tqdm_eval:
            for data in tqdm_eval:
                gestures = data['gesture_data']
                labels = data['label']
                #predictions = model(gestures).detach().data
                outputs = model(gestures)
                _, predictions = torch.max(outputs, 1)
                number_of_correct_predictions = (predictions == labels).sum().item()
                correct_predictions += number_of_correct_predictions
                total_predictions += len(labels)
    
    # accuracy = evaluate(few_shot_classifier, test_loader, device=DEVICE)
    accuracy = correct_predictions/total_predictions
    print(f"Average accuracy : {(100 * accuracy):.2f} %")
    return accuracy

In [63]:
evaluate_val(test_loader)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:10<00:00,  1.18s/it]

Average accuracy : 16.91 %





0.16911764705882354

In [56]:
def evaluate_on_one_task(model, support_images,support_labels, query_images, query_labels):
    """
    Returns the number of correct predictions of query labels, and the total number of
    predictions.
    """
    model.process_support_set(support_images, support_labels)
    predictions = model(query_images).detach().data
    number_of_correct_predictions = int(
        (torch.max(predictions, 1)[1] == query_labels).sum().item()
    )
    return number_of_correct_predictions, len(query_labels)

In [69]:
from easyfsl.methods import FewShotClassifier
def evaluate(model: FewShotClassifier, support_dataloader: DataLoader, query_dataloader:DataLoader):
    total_predictions = 0
    correct_predictions = 0
    model.eval()
    with torch.no_grad():
        support = support_dataloader.dataset[:]['gesture_data']
        s_labels = support_dataloader.dataset[:]['label']
        query = query_dataloader.dataset[:]['gesture_data']
        q_labels = query_dataloader.dataset[:]['label']
        support_data = torch.tensor(support,dtype=torch.float)
        support_labels = torch.tensor(s_labels.astype(np.int64))
        query_data = torch.tensor(query,dtype=torch.float)
        query_labels = torch.tensor(q_labels.astype(np.int64))
        correct, total = evaluate_on_one_task(
            model,
            support_data,
            support_labels,
            query_data,
            query_labels,
        )
        total_predictions += total
        correct_predictions += correct

    return correct_predictions / total_predictions

In [70]:
evaluate(few_shot_classifier,val_loader,test_loader)

  support_data = torch.tensor(support,dtype=torch.float)
  query_data = torch.tensor(query,dtype=torch.float)


KeyboardInterrupt: 

In [70]:
model.set_use_fc(False)

accuracy = evaluate(few_shot_classifier, test_loader, device=DEVICE)
print(f"Average accuracy : {(100 * accuracy):.2f} %")

  3%|█▋                                                        | 3/100 [00:24<13:18,  8.23s/it, accuracy=0.693]


KeyboardInterrupt: 