# CE-40959: Advanced Machine Learning
## HW2 - Optimization-based Meta Learning (100 points)

#### Name: 
#### Student No.: 

In this notebook, you are going to implement a optimization-based meta learner using the `Omniglot` dataset.

Please write your code in specified sections and do not change anything else. If you have a question regarding this homework, please ask it on the Quera.

Also, it is recommended to use Google Colab to do this homework. You can connect to your drive using the code below:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Import Required libraries

In [None]:
import numpy as np
import os
import matplotlib.pyplot as plt

import torch
import torchvision
import random
import torch.nn as nn
import math

import torch.nn.functional as F
import torchvision.transforms as transforms
import torch.optim as optim
import torch.utils.data as data

## Introduction

In Meta-Learning literature and in the meta-training phase, you are given some batches which consist of `support` and `query` sets. you train your model in a way that by using a support set you could predict query set labels correctly.

The pioneer of this branch is Model-Agnostic Meta-Learning(MAML). 

First, we should build the dataset in this way that each batch returns N*(k+k') images. `k` is the number of support images per class and `k'` is the number of query images per class in a batch.

The Omniglot data set is designed for developing more human-like learning algorithms. It contains 1623 different handwritten characters from 50 different alphabets. Each of the 1623 characters was drawn online via Amazon's Mechanical Turk by 20 different people.

Train and test dataset contains 964 and 659 classes, respectively. Torchvision-based Omniglot dataset is ordered and every 20 images in a row belong to one class.

In [None]:
# Meta learning parameters.

N = 5
support_size = 1
query_size = 15
meta_inner_lr = 0.4
meta_outer_lr = 0.001

## Prepare dataset (5 points)

In [None]:
transform = transforms.Compose([
    transforms.Resize(28),
    transforms.ToTensor(),
    transforms.Normalize(0.5, 0.5)
])

train_dataset = torchvision.datasets.Omniglot('./data/omniglot/', download = True, background = True, transform = transform)
test_dataset = torchvision.datasets.Omniglot('./data/omniglot/', download = True, background = False, transform = transform)

train_labels = np.repeat(np.arange(964), 20)
test_labels = np.repeat(np.arange(659), 20)

To build a dataloader, we should have a class that yields indexes of selected data in the dataset for every iteration and pass it to the `batch_sampler` attribute of dataloader.

Complete below code based on this pseudocode:


1.   select `N` classes randomly from all classes
2.   select `support_size + query_size` images from each classes independently and randomly
3.   shuffle dataset indexes, but don't forget to put query indexes at the last of the list

In [None]:
class BatchSampler(object):
    '''
    BatchSampler: yield a batch of indexes at each iteration.
    __len__ returns the number of episodes per epoch (same as 'self.iterations').
    '''

    def __init__(self, labels, classes_per_it, num_samples, iterations, batch_size):
        '''
        Initialize the BatchSampler object
        Args:
        - labels: array of labels of dataset.
        - classes_per_it: number of random classes for each iteration
        - num_samples: number of samples for each iteration for each class
        - iterations: number of iterations (episodes) per epoch
        - batch_size: number of batches per iteration
        '''
        super(BatchSampler, self).__init__()
        self.labels = labels
        self.classes_per_it = classes_per_it
        self.sample_per_class = num_samples
        self.iterations = iterations
        self.batch_size = batch_size

    def __iter__(self):
        '''
        yield a batch of indexes
        '''

        for it in range(self.iterations):
            total_batch_indexes = np.array([])

            #################################################################################
            #                  COMPLETE THE FOLLOWING SECTION (5 points)                    #
            #################################################################################
            # feel free to add/edit initialization part of sampler.
            #################################################################################




            #################################################################################
            #                                   THE END                                     #
            #################################################################################

            yield total_batch_indexes.astype(int)

    def __len__(self):
        return self.iterations

In [None]:
iterations = 5000
batch_size = 32

train_sampler = BatchSampler(labels=train_labels, classes_per_it=N,
                              num_samples=K, iterations=iterations,
                              batch_size=batch_size)

test_sampler = BatchSampler(labels=test_labels, classes_per_it=N,
                              num_samples=K, iterations=iterations,
                              batch_size=batch_size)

train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_sampler=train_sampler)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_sampler=test_sampler)

## Model (45 points)

Let's Build our model. the whole model is `ProtoNet` feature extractor which is used in [Prototypical Network paper](https://arxiv.org/abs/1703.05175) but due to the lack of enough computational resources for first part of question, we give you some part of the network as pretraining and only you will do meta-training on the last layer of the network.

In [None]:
def conv_block(in_channels, out_channels):
    return nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 3, padding=1),
            nn.BatchNorm2d(out_channels, momentum=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

class Feature_extractor(nn.Module):
    '''
    source: https://github.com/jakesnell/prototypical-networks/blob/f0c48808e496989d01db59f86d4449d7aee9ab0c/protonets/models/few_shot.py#L62-L84
    '''
    def __init__(self, x_dim=1, hid_dim=64):
        super(Feature_extractor, self).__init__()
        self.encoder = nn.Sequential(
            conv_block(x_dim, hid_dim),
            conv_block(hid_dim, hid_dim),
            conv_block(hid_dim, hid_dim)
        )

    def forward(self, x):
        return self.encoder(x)

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
feature_extractor = Feature_extractor()
feature_extractor = feature_extractor.to(device)
model.load_state_dict(torch.load('./pretrained_model.pt', map_location=device))

To be specific, you are going to get the features of each image via the feature extraction network and give the output of that as input to your meta-learner. at the end of initialization, you should have initialized your network parameters and have saved them on the given ParameterList for future forward passes.

The `Learner` class is a module that initializes your meta-parameters based on your given config as input. the format of config is arbitrary and you should prepare required parameters for initializing your submodules. do a quick look at the modules of meta-network to implement your Learner class.

At forwarding pass, you give your input and two optional attributes.

1.   **vars**: the default value of this attribute is None and it means that meta-learner will use its own parameters for forwarding pass, but you can give your desired parameters for computing output
2.   **bn_training**: if True, batch normalization layers show the same behavior as training time.


In the `zero_grad` method, you are going to set the gradient of given parameters as attribute or class parameters (self.vars) to zero.





In [None]:
class Learner(nn.Module):

    def __init__(self, *args, **kwargs):
        super(Learner, self).__init__()

        # this dict contains all tensors needed to be optimized
        self.vars = nn.ParameterList()
        # running_mean and running_var
        self.vars_bn = nn.ParameterList()

        #################################################################################
        #                  COMPLETE THE FOLLOWING SECTION (10 points)                   #
        #################################################################################
        # initialize your meta-network parameters based on given config.
        #################################################################################




        #################################################################################
        #                                   THE END                                     #
        #################################################################################


    def forward(self, x, vars=None, bn_training=True):

        #################################################################################
        #                  COMPLETE THE FOLLOWING SECTION (10 points)                   #
        #################################################################################
        # compute output of input with given parameters or class parameters
        #################################################################################




        #################################################################################
        #                                   THE END                                     #
        #################################################################################


    def zero_grad(self, vars=None):

        #################################################################################
        #                  COMPLETE THE FOLLOWING SECTION (5 points)                    #
        #################################################################################
        # set gradient of given parameters as attribute or class parameters to zero
        #################################################################################




        #################################################################################
        #                                   THE END                                     #
        #################################################################################

    def parameters(self):
        return self.vars

Now at the `Meta` module, you implement your meta-learner module. you give your all support and query data to your module and the model will update your `Learner` parameters based on MAML-loss.
to clarify, you pass your support data to `Learner` and then calculate the loss on them and update your parameters and then continue to update your parameters based on the given number of inner-loop updates and finally calculate the loss on query data and update `Learner` parameters

In [None]:
class Meta(nn.Module):
    def __init__(self, *args, ***kwargs):

        super(Meta, self).__init__()

        #################################################################################
        #                  COMPLETE THE FOLLOWING SECTION (5 points)                   #
        #################################################################################
        # initialize your meta-learner
        #################################################################################




        #################################################################################
        #                                   THE END                                     #
        #################################################################################


    def forward(self):

        #################################################################################
        #                  COMPLETE THE FOLLOWING SECTION (15 points)                   #
        #################################################################################
        # meta-train your parameters.
        #################################################################################




        #################################################################################
        #                                   THE END                                     #
        #################################################################################






## With Feature Extractor

Your Meta-network which you are going to initialize your Learner based on it for first part of question is as follows:


1.   **Conv2d layer**: in_channels=64, out_channels:64, kernel_size=3, stride=1, padding=1
2.   **BatchNorm2D layer**: out_channels=64
3.   **ReLU activation**
4.   **Max Pooling layer**: kernel_size = 2, stride = 2
5.   **Flatten layer**
6.   **Linear layer**: in_features=64, out_features=N (number of classes in meta-learning)




Meta-train **three** different networks with three different inner loop updates=[1, 2, 3]. after some reasonable epochs, plot accuracy of meta-test phase based on inner loop update parameter on each network.

### Train (25 points)

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

#################################################################################
#                  COMPLETE THE FOLLOWING SECTION (25 points)                   #
#################################################################################
# Define your config and initialize model and parameters
# prepare your data as input to your model.
# train meta-network
# get acurracy of model in meta-test phase
#################################################################################
lr = None
model = None
epochs = None
criterion = None













#################################################################################
#                                   THE END                                     #
#################################################################################


### Plot (2.5 points)

Plot accuracy of meta-test phase based on inner loop update parameter.

## Without Feature Extractor

### Train (10 points)

Now also add feature extractor network to your meta-network and repeat the same procedure like above cells just for inner loop update = 1.


In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

#################################################################################
#                  COMPLETE THE FOLLOWING SECTION (10 points)                   #
#################################################################################
# Define your config and initialize model and parameters
# prepare your data as input to your model.
# train meta-network
# get acurracy of model in meta-test phase
#################################################################################
lr = None
model = None
epochs = None
criterion = None
inner_loop_update = 1













#################################################################################
#                                   THE END                                     #
#################################################################################


### Report (2.5 points)

Report accuracy of meta-test phase.

## Compare and explain Results

Answer:

<br>

