# Assignment 2
## Question 1: Siamese networks & one-shot learning (7pt)
The Cifar-100 dataset is similar to the Cifar-10 dataset. It also consists of 60,000 32x32 RGB images, but they are distributed over 100 classes instead of 10. Thus, each class has much fewer examples, only 500 training images and 100 testing images per class. For more info about the dataset, see https://www.cs.toronto.edu/~kriz/cifar.html.

*HINT: Import the Cifar-100 dataset directly from Keras, no need to download it from the website. Use* `label_mode="fine"`

### Task 1.1: Siamese network
**a)**
* Train a Siamese Network on the first 80 classes of (the training set of) Cifar-100, i.e. let the network predict the probability that two input images are from the same class. Use 1 as a target for pairs of images from the same class (positive pairs), and 0 for pairs of images from different classes (negative pairs). Randomly select image pairs from Cifar-100, but make sure you train on as many positive pairs as negative pairs.

* Evaluate the performance of the network on 20-way one-shot learning tasks. Do this by generating 250 random tasks and obtain the average accuracy for each evaluation round. Use the remaining 20 classes that were not used for training. The model should perform better than random guessing.

For this question you may ignore the test set of Cifar-100; it suffices to use only the training set and split this, using the first 80 classes for training and the remaining 20 classes for one-shot testing.

*HINT: First sort the data by their labels (see e.g.* `numpy.argsort()`*), then reshape the data to a shape of* `(n_classes, n_examples, width, height, depth)`*, similar to the Omniglot data in Practical 4. It is then easier to split the data by class, and to sample positive and negative images pairs for training the Siamese network.*

*NOTE: do not expect the one-shot accuracy for Cifar-100 to be similar to that accuracy for Omniglot; a lower accuracy can be expected. However, accuracy higher than random guess is certainly achievable.*

In [1]:
import numpy as np
from keras.models import Model, Sequential
from keras.layers import Input, Conv2D, Lambda, Dense, Flatten, MaxPooling2D, Dropout, BatchNormalization
from keras.regularizers import l2
from keras.utils import to_categorical
from keras import backend as K
from keras.losses import binary_crossentropy
import os
import matplotlib.pyplot as plt
import pickle
from sklearn.utils import shuffle
%matplotlib inline

Using TensorFlow backend.


### Load the CIFAR-100 data and create training and test set


In [2]:
# === add code here ===
from keras.datasets import cifar100
(x_cifar_train, y_cifar_train),(x_cifar_test,y_cifar_test) = cifar100.load_data(label_mode='fine')

print('x_cifar_train shape: ', x_cifar_train.shape)
print('y_cifar_train shape: ', y_cifar_train.shape)

# We retrieve the ordered indexes to create our train set with the first 80 classes and test set with the last 20 classes
ordered_indexes = np.argsort(y_cifar_train.flatten())
train_indexes = ordered_indexes[:int(0.8*len(ordered_indexes))]
test_indexes = ordered_indexes[int(0.8*len(ordered_indexes)):]

# x_train, x_test, y_train, y_test
x_train = x_cifar_train[train_indexes]
x_test = x_cifar_train[test_indexes]
y_train = y_cifar_train[train_indexes]
y_test = y_cifar_train[test_indexes]
print('Class ranges of training: %d to %d' % (int(y_train[0]), int(y_train[-1])))
print('Class ranges of testing: %d to %d' % (int(y_test[0]), int(y_test[-1])))

# Reshape as (n_class, n_example, width, height, depth)
x_train = np.reshape(x_train, (80,-1,32,32,3))
y_train = np.reshape(y_train, (80,-1,1))
x_test = np.reshape(x_test, (20,-1,32,32,3))
y_test = np.reshape(y_test, (20,-1,1))

print('\nx_train shape: ', x_train.shape)
print('x_test shape: ', x_test.shape)
print('y_train shape: ', y_train.shape)
print('y_test shape: ', y_test.shape)

x_cifar_train shape:  (50000, 32, 32, 3)
y_cifar_train shape:  (50000, 1)
Class ranges of training: 0 to 79
Class ranges of testing: 80 to 99

x_train shape:  (80, 500, 32, 32, 3)
x_test shape:  (20, 500, 32, 32, 3)
y_train shape:  (80, 500, 1)
y_test shape:  (20, 500, 1)


### Create batch of pair images and target (balanced dataset)

In [0]:
def get_batch(batch_size, X):
    """Create batch of n pairs, half same class, half different class"""
    n_classes, n_examples, w, h, d = X.shape
    # randomly sample several classes to use in the batch
    categories = np.random.choice(n_classes, size=(batch_size,), replace=False)
    # initialize 2 empty arrays for the input image batch
    pairs = [np.zeros((batch_size, h, w, d)) for i in range(2)]
    # initialize vector for the targets, and make one half of it '1's, so 2nd half of batch has same class
    targets = np.zeros((batch_size,))
    targets[batch_size//2:] = 1
    for i in range(batch_size):
        category = categories[i]
        idx_1 = np.random.randint(0, n_examples)
        pairs[0][i, :, :, :] = X[category, idx_1].reshape(w, h, d)
        idx_2 = np.random.randint(0, n_examples)
        # Last part of Batch array: pick images of same class for 1st half, different for 2nd
        if i >= batch_size // 2:
            category_2 = category
        # First part of Batch array
        else:
            #add a random number to the category modulo n_classes to ensure 2nd image has different category
            category_2 = (category + np.random.randint(1,n_classes)) % n_classes
        pairs[1][i, :, :, :] = X[category_2,idx_2].reshape(w, h, d)
    return pairs, targets

### Siamese network

In [26]:
input_shape = (32, 32, 3)
left_input = Input(input_shape)
right_input = Input(input_shape)

# build convnet to use in each siamese 'leg'
convnet = Sequential()
convnet.add(Conv2D(64, (5,5), activation='relu', input_shape=input_shape, kernel_regularizer=l2(2e-4)))
convnet.add(MaxPooling2D())
convnet.add(BatchNormalization())
convnet.add(Dropout(0.25))

convnet.add(Conv2D(128, (4,4), activation='relu', kernel_regularizer=l2(2e-4)))
convnet.add(MaxPooling2D())
convnet.add(BatchNormalization())
convnet.add(Dropout(0.25))

convnet.add(Conv2D(128, (3,3), activation='relu', kernel_regularizer=l2(2e-4)))
#convnet.add(MaxPooling2D())
#convnet.add(BatchNormalization())
#convnet.add(Dropout(0.25))

#convnet.add(Conv2D(256, (3,3), activation='relu', kernel_regularizer=l2(2e-4)))
convnet.add(Flatten())
#convnet.add(BatchNormalization())
#convnet.add(Dropout(0.25))
convnet.add(Dense(1152, activation="sigmoid", kernel_regularizer=l2(1e-3)))
convnet.summary()

# encode each of the two inputs into a vector with the convnet
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)

# merge two encoded inputs with the L1 distance between them, and connect to prediction output layer
L1_distance = lambda x: K.abs(x[0]-x[1])
both = Lambda(L1_distance)([encoded_l, encoded_r])
prediction = Dense(1, activation='sigmoid')(both)
siamese_net = Model(inputs=[left_input,right_input], outputs=prediction)


siamese_net.compile(loss="binary_crossentropy", optimizer="adam")

siamese_net.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_34 (Conv2D)           (None, 28, 28, 64)        4864      
_________________________________________________________________
max_pooling2d_20 (MaxPooling (None, 14, 14, 64)        0         
_________________________________________________________________
batch_normalization_23 (Batc (None, 14, 14, 64)        256       
_________________________________________________________________
dropout_23 (Dropout)         (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_35 (Conv2D)           (None, 11, 11, 128)       131200    
_________________________________________________________________
max_pooling2d_21 (MaxPooling (None, 5, 5, 128)         0         
_________________________________________________________________
batch_normalization_24 (Batc (None, 5, 5, 128)         512       
__________

### Batch generator and Train function


In [0]:
def batch_generator(batch_size, X):
    """a generator for batches, so model.fit_generator can be used. """
    while True:
        pairs, targets = get_batch(batch_size, X)
        yield (pairs, targets)

def train(model, X_train, batch_size=32, steps_per_epoch=100, epochs=1, verbose=1):
    model.fit_generator(batch_generator(batch_size, x_train), steps_per_epoch=steps_per_epoch, epochs=epochs, verbose=verbose)

### One-shot functions


In [0]:
# Here X and Y is for test!!!
# N number of 20-oneshot tasks, k number of random tasks 250
# ignoring the language part
def make_oneshot_task(N, X, Y):
    """Create pairs of (test image, support set image) with ground truth, for testing N-way one-shot learning."""
    """ Just the first index 0 will have the true (both images coming from the same class)!! """
    n_classes, n_examples, w, h, d = X.shape
    # choose 20-random indexes from the 500 examples
    indices = np.random.randint(0, n_examples, size=(N,))
    # choose 20-random - since replace is False the index 0 will contain the True Unique category
    categories = np.random.choice(range(n_classes), size=(N,), replace=False)
    # we choose the category of the first index as the true one
    true_category = categories[0]
    # choose 2 random indexes for 2 examples comparison?
    ex1, ex2 = np.random.choice(n_examples, replace=False, size=(2,))
    # test_image.shape = (20, 32, 32, 3) This is a set of 20 images of the true category???? Is repeated 20 times the same image??
    test_image = np.asarray([X[true_category, ex1, :, :]]*N).reshape(N, w, h, d)
    # support set.shape = (20,32,32,3) This is a set of 20 images of random categories.
    support_set = X[categories, indices, :, :]
    # modify the first element of support set with ex2
    support_set[0, :, :] = X[true_category, ex2]
    support_set = support_set.reshape(N, w, h, d)
    # all targets are fitted with 0 (since is randomized), just the first example is true 
    targets = np.zeros((N,))
    targets[0] = 1
    targets, test_image, support_set = shuffle(targets, test_image, support_set)
    # shuffled pair where just one pair of images have target 1 (belonging to same class)
    pairs = [test_image, support_set]
    return pairs, targets

def test_oneshot(model, X, Y, N=20, k=250, verbose=True):
    """Test average N-way oneshot learning accuracy of a siamese neural net over k one-shot tasks."""
    # number of correct tasks
    n_correct = 0
    if verbose:
        print("Evaluating model on {} random {}-way one-shot learning tasks ...".format(k, N))
    for i in range(k):
        # inputs is the pair of images
        inputs, targets = make_oneshot_task(N, X, Y)
        # in probs we store the predictions - probs has float from 0 to 1
        probs = model.predict(inputs)
        # comparison of the task true target and prediction
        # np.argmax(targets) = 1
        #print('Np.argmax probs Index:', np.argmax(probs))
        #print('Np.argmax Targets Index:', np.argmax(targets))
        if np.argmax(probs) == np.argmax(targets):
            n_correct += 1
            #print('CORRECT!!!!')
    percent_correct = (100.0*n_correct / k)
    if verbose:
        print("Got an average of {}% accuracy for {}-way one-shot learning during this loop".format(percent_correct, N))
    return percent_correct

In [35]:
# Train and Test the architecture!
loops = 10
best_acc = 0
epochs = 1
for i in range(loops):
    print("=== Training loop {} ===".format(i+1))
    train(siamese_net, x_train, epochs=epochs)
    test_acc = test_oneshot(siamese_net, x_test, y_test)
    if test_acc >= best_acc:
        print("New best one-shot accuracy of {}%, saving model ...".format(test_acc))
        siamese_net.save("siamese_cifar100.h5")
        best_acc = test_acc
        
print('\nBest accuracy obtained during whole training {}%.'.format(best_acc))

=== Training loop 1 ===
Epoch 1/1
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 12.4% accuracy for 20-way one-shot learning during this loop
New best one-shot accuracy of 12.4%, saving model ...
=== Training loop 2 ===
Epoch 1/1
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 11.2% accuracy for 20-way one-shot learning during this loop
=== Training loop 3 ===
Epoch 1/1
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 14.8% accuracy for 20-way one-shot learning during this loop
New best one-shot accuracy of 14.8%, saving model ...
=== Training loop 4 ===
Epoch 1/1
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 12.0% accuracy for 20-way one-shot learning during this loop
=== Training loop 5 ===
Epoch 1/1
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 12.8% accuracy for 20-way one-shot learning during this lo

***

**b)** Compare the performance of your Siamese network for Cifar-100 to the Siamese network from Practical 4 for Omniglot. Name three fundamental differences between the Cifar-100 and Omniglot datasets. How do these differences influence the difference in one-shot accuracy?

**Answer:**



```
This is formatted as code
```

*=== write your answer here ===*

***

### Task 1.2: One-shot learning with neural codes
**a)**
* Train a CNN classifier on the first 80 classes of Cifar-100. Make sure it achieves at least 40% classification accuracy on those 80 classes (use the test set to validate this accuracy).
* Then use neural codes from one of the later hidden layers of the CNN with L2-distance to evaluate one-shot learning accuracy for the remaining 20 classes of Cifar-100 with 250 random tasks. I.e. for a given one-shot task, obtain neural codes for the test image as well as the support set. Then pick the image from the support set that is closest (in L2-distance) to the test image as your one-shot prediction.

In [0]:
# === add code here ===

***

**b)** Briefly motivate your CNN architecture, and discuss the difference in one-shot accuracy between the Siamese network approach and the CNN neural codes approach.

**Answer:**

*=== write your answer here ===*

***
## Question 2: Triplet networks & one-shot learning (10pt)

### Task 2.1: Train a triplet network
**a)**
* Train a triplet network on the first 80 classes of (the training set of) Cifar-100.
 
* Make sure the network achieves a smaller loss than the margin and the network does not collapse all representations to zero vectors. *HINT: If you experience problems to achieve this goal, it might be helpful to tinker the learning rate.*

* You are provided with a working example of triplet loss implementation for Keras below. You may directly use it.

You may ignore the test set of Cifar-100 for this question as well. It suffices to use only the training set and split this, using the first 80 classes for training and the remaining 20 classes for one-shot testing.

```python
# Notice that ground truth variable is not used for loss calculation. It is used as a function argument to by-pass some Keras functionality. This is because the network structure already implies the ground truth for the anchor image with the "positive" image.
import tensorflow as tf
def triplet_loss(ground_truth, network_output):

    anchor, positive, negative = tf.split(network_output, num_or_size_splits=3, axis=1)        
    
    for embedding in [anchor, positive, negative]:
        embedding = tf.math.l2_normalize(embedding)

    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), axis=1)
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), axis=1)
    
    margin = # define your margin
    basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), margin)
    loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), axis=0)

    return loss
```


In [0]:
# === add code here ===

***

### Task 2.2: One-shot learning with triplet neural codes
**a)**
* Use neural codes from the triplet network with L2-distance to evaluate one-shot learning accuracy for the remaining 20 classes of Cifar-100 with 250 random tasks. I.e. for a given one-shot task, obtain neural codes for the test image as well as the support set. Then pick the image from the support set that is closest (in L2-distance) to the test image as your one-shot prediction.
* Explicitly state the accuracy.

In [0]:
# === add code here ===

***
## Question 3: Performance comparison (3pt)


**a)** What accuracy would random guessing achieve (on average) on this dataset? Motivate your answer briefly.

*=== write your answer here ===*



**b)** Discuss and compare the performances of networks in tasks 1.1, 1.2 and 2.2. Briefly motivate and explain which task would be expected the highest accuracy. Explain the reasons of the accuracy difference if there are any. If there is almost no difference accuracy, explain the reason for that.

*=== write your answer here ===*

***
## Question 4: Peer review (0pt)

Finally, each group member must write a single paragraph outlining their opinion on the work distribution within the group. Did every group member contribute equally? Did you split up tasks in a fair manner, or jointly worked through the exercises. Do you think that some members of your group deserve a different grade from others?

*=== write your answer here ===*