<a href="https://colab.research.google.com/github/Joriswillems/deeplearning/blob/master/assignment2/2IMM10_Assignment_2_2_82.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

***
## Question 2: Triplet networks & one-shot learning (10pt)

In practice 4b.4, we train a Siamese network for one-shot learning task on the Omniglot dataset.  In this assignment, we will work on the same data set with the same task but extend it to triplet networks, we will also compare our model performance under different triplet selection method. The assignment contains the following 4 tasks

### Import packages and mount data
Before everything, we need to import packages and mount data,
*HINT: you could use the dataset in practice 4b.4 directly*

In [1]:
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input, Conv2D, Lambda, Dense, Flatten, MaxPooling2D, Dropout,Concatenate, BatchNormalization, Reshape
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import binary_crossentropy
import numpy as np
import os
import pickle
import matplotlib.pyplot as plt
from sklearn.utils import shuffle

In [2]:
#PATH = os.path.join("drive","My Drive","Colab Notebooks", "2IMM10 - Deep Learning" ,"omniglot")
PATH = "../new_data/"
with open(os.path.join(PATH, "omniglot_train.p"), "rb") as f:
    (X_train, c_train) = pickle.load(f)

with open(os.path.join(PATH, "omniglot_test.p"), "rb") as f:
    (X_test, c_test) = pickle.load(f)

print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("")
print("training alphabets")
print([key for key in c_train.keys()])
print("test alphabets:")
print([key for key in c_test.keys()])

X_train shape: (964, 20, 105, 105)
X_test shape: (659, 20, 105, 105)

training alphabets
['Braille', 'Anglo-Saxon_Futhorc', 'Tifinagh', 'Grantha', 'Burmese_(Myanmar)', 'Mkhedruli_(Georgian)', 'Latin', 'Ojibwe_(Canadian_Aboriginal_Syllabics)', 'Balinese', 'Malay_(Jawi_-_Arabic)', 'Early_Aramaic', 'Korean', 'Japanese_(hiragana)', 'Armenian', 'Cyrillic', 'Hebrew', 'Syriac_(Estrangelo)', 'Japanese_(katakana)', 'Blackfoot_(Canadian_Aboriginal_Syllabics)', 'N_Ko', 'Alphabet_of_the_Magi', 'Inuktitut_(Canadian_Aboriginal_Syllabics)', 'Greek', 'Bengali', 'Tagalog', 'Futurama', 'Arcadian', 'Gujarati', 'Asomtavruli_(Georgian)', 'Sanskrit']
test alphabets:
['ULOG', 'Atemayar_Qelisayer', 'Ge_ez', 'Gurmukhi', 'Tengwar', 'Keble', 'Malayalam', 'Oriya', 'Kannada', 'Mongolian', 'Angelic', 'Atlantean', 'Syriac_(Serto)', 'Aurek-Besh', 'Avesta', 'Glagolitic', 'Sylheti', 'Tibetan', 'Manipuri', 'Old_Church_Slavonic_(Cyrillic)']


### Task 2.1: Build  the triplet network (3pt)

We will define a triplet Network for use with the Omniglot dataset. Each branch of the triplet  is a "convnet" model that transforms data to an embeddings space. 

*HINT: you may need "Concatenate" from keras.layer to merge the output layer*

In [3]:
# define a convnet model to transforms data to an embeddings space. 
# === COMPLETE CODE BELOW ===
def build_convnet():
    input_shape = (105, 105, 1)

    convnet = Sequential(name='conv_base')
    convnet.add(Conv2D(64, (10,10), activation='relu', input_shape=input_shape, kernel_regularizer=l2(2e-4)))
    convnet.add(MaxPooling2D())
    convnet.add(BatchNormalization())
    convnet.add(Dropout(0.25))
    convnet.add(Conv2D(128, (7,7), activation='relu', kernel_regularizer=l2(2e-4)))
    convnet.add(MaxPooling2D())
    convnet.add(BatchNormalization())
    convnet.add(Dropout(0.25))
    convnet.add(Conv2D(128, (4,4), activation='relu', kernel_regularizer=l2(2e-4)))
    convnet.add(MaxPooling2D())
    convnet.add(BatchNormalization())
    convnet.add(Dropout(0.25))
    convnet.add(Conv2D(256, (4,4), activation='relu', kernel_regularizer=l2(2e-4)))
    convnet.add(Flatten())
    convnet.add(BatchNormalization())
    convnet.add(Dropout(0.25))
    convnet.add(Dense(4096, activation="sigmoid", kernel_regularizer=l2(1e-3)))
    convnet.add(Reshape((1,4096)))
    #convnet.summary()
    return convnet


In [4]:
# define a Triplet network
def build_triplet_net():
    # The anchor, positive, negative image are merged together, as the input of the triplet network, then got split to get each one's neural codes.
    generated = Input(shape=(3,105, 105, 1), name='input')

    anchor  = Lambda(lambda x: x[:,0])(generated)
    pos     = Lambda(lambda x: x[:,1])(generated)
    neg     = Lambda(lambda x: x[:,2])(generated)

    convnet = build_convnet()
    
    anchor_embedding    = convnet(anchor)
    pos_embedding       = convnet(pos)
    neg_embedding       = convnet(neg)  

    # merge the anchor, positive, negative embedding together, 
    # let the merged layer be the output of triplet network

    # === COMPLETE CODE BELOW ===
    merged_output = Concatenate(axis=1)([anchor_embedding, pos_embedding, neg_embedding])

    triplet_net = Model(inputs=generated, outputs=merged_output)
    #triplet_net.summary()
    
    triplet_net.compile(loss=triplet_loss, optimizer='adam')
    
    return triplet_net

### Task 2.2: Define triplet loss (2pt)

You can find the formula of the triplet loss function in our lecture note. When training our model, make sure the network achieves a smaller loss than the margin and the network does not collapse all representations to zero vectors. 

*HINT: If you experience problems to achieve this goal, it might be helpful to tinker the learning rate, you can also play with the margin value to get better performance*

In [5]:
# Notice that the ground truth variable is not used for loss calculation. 
# It is used as a function argument to by-pass some Keras functionality.
# This is because the network structure already implies the ground truth for the anchor image with the "positive" image.
import tensorflow as tf
def triplet_loss(ground_truth, network_output):
    
    B = 0.2
    
    network_output = K.l2_normalize(network_output, axis=1)

    #print(network_output.shape)
    
    
    anchor, positive, negative = tf.split(network_output, num_or_size_splits=3, axis=1)     
    
    #print(anchor.shape)
    
    # === COMPLETE CODE BELOW ===
    loss = K.maximum( K.sqrt(K.sum(K.square(positive - anchor))) - K.sqrt(K.sum(K.square(negative - anchor))) + B, 0)
 

    #print(loss.shape)
    return loss

### Task 2.3: Select triplets for training (3pt)

#### Different  selection method

We have two different options for the triplet selection method, and we will compare the model performance under these two methods after building our model.

(1) Random  triplets selection, including the following steps:
* Pick one random class for anchor
* Pick two different random picture for this class, as the anchor and positive images
* Pick another class for Negative, different from anchor_class
* Pick one random picture from the negative class.

(2) Hard triplets selection. For easy implement, for a picked anchor, positive pair, we will choose the hardest negative to form a hard triplet, that means, after picking an anchor, positive image, we will choose the negative image which is nearest from anchor image from a negative class, ie: "- d(a,n)"  can get the maximum value. The whole process including the following steps:
* Pick one random class for anchor
* Pick two different random picture for this class, as an anchor and positive images
* Pick another class for negative, different from anchor_class
* Pick one hardest picture from the negative class.

*HINT: when picking the hardest negative, you may need the model.predict to get the embedding of images, the calculate the distances*

In [13]:
from sklearn.metrics.pairwise import euclidean_distances
# Notice that the returned  1 * np.zeros(batch_size) is to by-pass some Keras functionality, corresponding to ground_truth in tripletloss
# We use a variable hard_selection to control which method we are going to use. If we set hard_selection == False, we will select triplets random,If we set the variable hard_selection == True, we will select hard triplets.

# === COMPLETE CODE BELOW === 
def get_batch(triplet_net, batch_size, X, hard_selection=False):
    
    #extract convolutional base from triplet net
    conv_base = triplet_net.get_layer("conv_base")

    while True:
        
        n_classes, n_examples, w, h = X.shape
        # initialize result
        triplets=[]

        for i in range(batch_size):
            triplet = [[],[],[]]
            #Pick one random class for anchor
            anchor_class = np.random.randint(0, n_classes)

            #Pick two different random pics for this class => idx_A and idx_P
            [idx_A,idx_P] = np.random.choice(n_examples,size=2,replace=False)

            #Pick another class for negative, different from anchor_class
            # === COMPLETE CODE BELOW === 
            negative_class = (anchor_class + np.random.randint(1, n_classes)) % n_classes

            if not hard_selection:
                #Pick a random pic from this negative class => N

                # === COMPLETE CODE BELOW ===   
                idx_N = np.random.choice(n_examples)

            else:
                #Pick a hardest pic from this negative class => N
                # === COMPLETE CODE BELOW === 
                
                print(X[anchor_class, idx_A].shape)
                
                anchor = conv_base.predict(X[anchor_class, idx_A].reshape((1, w,h,1)))[:,0]
                negative = conv_base.predict(X[negative_class].reshape((n_examples, w,h,1)))[:,0]
                
                print(anchor.shape)
                
                distances = euclidean_distances(anchor, negative)[0]
                
               # print(distances.shape)

                idx_N = np.argmin(distances) 
               # print(idx_N, np.random.choice(n_examples))
                idx_N = np.random.choice(n_examples)
                
            triplet[0] = X[anchor_class][idx_A].reshape(w, h, 1)
            triplet[1] = X[anchor_class][idx_P].reshape(w, h, 1)
            triplet[2]=  X[negative_class][idx_N].reshape(w, h, 1)
            triplets.append(triplet)

        yield np.array(triplets), 1 * np.zeros(batch_size)
        
        
def train(triplet_net, X_train, hard_selection, batch_size=64, steps_per_epoch=100, epochs=1):
    
    triplet_net.fit(get_batch(triplet_net, batch_size, X_train, hard_selection),
                    steps_per_epoch=steps_per_epoch, 
                    epochs=epochs)

### Task 2.4: One-shot learning with different selection method (2pt)

Function "make_oneshot_task" that can randomly setup such a one-shot task from a given test set (if a language is specified, using only classes/characters from that language), i.e. it will generate N pairs of images, where the first image is always the test image, and the second image is one of the N reference images. The pair of images from the same class will have target 1, all other targets are 0.

The function "test_oneshot" will generate a number (k) of such one-shot tasks and evaluate the performance of a given model on these tasks; it reports the percentage of correctly classified test images

In "test_oneshot", you can use embeddings extracted from the triplet network with L2-distance to evaluate one-shot learning. i.e. for a given one-shot task, obtain embeddings for the test image as well as the support set. Then pick the image from the support set that is closest (in L2-distance) to the test image as your one-shot prediction.

*HINT you can re-use some code from practice 4b.4*

In [7]:
def make_oneshot_task(N, X, c, language=None):
    """Create pairs of (test image, support set image) with ground truth, for testing N-way one-shot learning."""
    n_classes, n_examples, w, h = X.shape
    indices = np.random.randint(0, n_examples, size=(N,))
    if language is not None:
        low, high = c[language]
        if N > high - low:
            raise ValueError("This language ({}) has less than {} letters".format(language, N))
        categories = np.random.choice(range(low,high), size=(N,), replace=False)
    else:  # if no language specified just pick a bunch of random letters
        categories = np.random.choice(range(n_classes), size=(N,), replace=False)            
    true_category = categories[0]
    ex1, ex2 = np.random.choice(n_examples, replace=False, size=(2,))
    test_image = np.asarray([X[true_category, ex1, :, :]]*N).reshape(N, w, h, 1)
    support_set = X[categories, indices, :, :]
    support_set[0, :, :] = X[true_category, ex2]
    support_set = support_set.reshape(N, w, h, 1)
    targets = np.zeros((N,))
    targets[0] = 1
    targets, test_image, support_set = shuffle(targets, test_image, support_set)
    pairs = [test_image, support_set]
    return np.array(pairs), np.array(targets)


In [8]:
triplet_net = build_triplet_net()


triplet_net.get_layer("conv_base").predict(x_sample)

NameError: name 'x_sample' is not defined

In [77]:
triplet_net.get_layer("conv_base").predict(x_sample).shape

(1, 1, 4096)

In [9]:
def test_oneshot(triplet_net, X, k, c):
    # === COMPLETE CODE BELOW ===       
    n_correct = 0
    
    #extract convolutional base from triplet net
    conv_base = triplet_net.get_layer("conv_base")
    
    for i in range(k):
        imagePairs, targets = make_oneshot_task(20, X, c)
        
        test_embed = conv_base.predict(imagePairs[np.newaxis, 0, 0])[:,0]
        
        predicted_embed = conv_base.predict(imagePairs[1])[:,0]
        
        distances = euclidean_distances(test_embed, predicted_embed)[0]
        
        if np.argmin(distances)  == np.argmax(targets):
            
            n_correct += 1
    
    percent_correct = 100.0 * n_correct / k

    return percent_correct


In [13]:
triplet_net = build_triplet_net()
test_oneshot(triplet_net, X_test, 250, c_test)

24.8

With different triplets selecting method (random and hard), we will train our model and evaluate the model by one-shot learning accuracy.

* You need to explicitly state the accuracy under different  triplets selecting method
* When evaluating model with test_oneshot function, you should evaluate on 20 way one-shot task, and set the number (k) of evaluation one-shot tasks to be 250, then calculate the average accuracy

*HINT: After training our model with random selection method, before train model under hard triplets selection, we should re-build our model (re-run the cell in Task 2.1) to initialize our model and prevent re-use the trained model of random selection*

#### Evaluate one-shot learning with  random triplets selection

In [10]:
triplet_net = build_triplet_net()


# hard_selection == False, selcet triplets randomly
# Train our model and evaluate the model by one-shot learning accuracy.
loops = 10
best_acc = 0
k=250

for i in range(loops):
    print("=== Training loop {} ===".format(i+1))
    # === ADD CODE HERE ===
    train(triplet_net, X_train, hard_selection = False)
    test_acc = test_oneshot(triplet_net, X_test, k, c_test)
    if test_acc >= best_acc:
        print("New best one-shot accuracy ({}), saving model ...".format(test_acc))
        triplet_net.save(os.path.join(PATH, "triplet_omniglot_random.h5"))
        best_acc = test_acc
    else:
        print("Accuracy ({}) not improved.".format(test_acc))

print("Best accuracy for random triplet selection: {}".format(best_acc))

=== Training loop 1 ===
Epoch 1/1

KeyboardInterrupt: 

In [17]:
batchgen = get_batch(triplet_net, 5, X_test, hard_selection=True)

In [22]:
a, b = next(batchgen)

a.shape

(105, 105)
(1, 4096)
(105, 105)
(1, 4096)
(105, 105)
(1, 4096)
(105, 105)
(1, 4096)
(105, 105)
(1, 4096)


(5, 3, 105, 105, 1)

#### Evaluate one-shot learning with  hard triplets selection

In [14]:
triplet_net = build_triplet_net()
# hard_selection == True, selcet hard triplets
# Train our model and evaluate the model by one-shot learning accuracy.
loops = 10
best_acc = 0
k=250

for i in range(loops):
    print("=== Training loop {} ===".format(i+1))
    # === ADD CODE HERE ===
    train(triplet_net, X_train, hard_selection = True)
    test_acc = test_oneshot(triplet_net, X_test, k, c_test)
    if test_acc >= best_acc:
        print("New best one-shot accuracy ({}), saving model ...".format(test_acc))
        triplet_net.save(os.path.join(PATH, "triplet_omniglot_hard.h5"))
        best_acc = test_acc
    
    else:
        print("Accuracy ({}) not improved.".format(test_acc))


print("Best accuracy for hard triplet selection: {}".format(best_acc))

=== Training loop 1 ===
(105, 105)
Epoch 1/1


ValueError: Tensor Tensor("reshape_3/Reshape:0", shape=(?, 1, 4096), dtype=float32) is not an element of this graph.

In [0]:
 from tensorflow.keras.models import load_model
 convnet = build_convnet()
 tripNN = load_model(os.path.join("drive","My Drive","models", "triplet_omniglot_hard.h5"), custom_objects={'triplet_loss': triplet_loss})
 

In [29]:
Acc = test_oneshot(convnet, X_test, k, c_test)
print(Acc)

HBox(children=(FloatProgress(value=0.0, max=250.0), HTML(value='')))


28.0
