# One-shot Learnig for Omniglot Dataset

I reviewed literature to find references for best practices for one-shot learning and a verified architecture for Omniglot database. I found papers such as [FaceNet](https://arxiv.org/pdf/1503.03832.pdf) and [Siamese Neural Networks for One-Shot Image Recognition](http://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf).  
As shown in the references, architecture based on Convolutional Neural Network and similarity function in the final layers work the best for training. Once the model is trained, a forward calculation is used to encode images and find the distance between two images. The probability found through sigmoid function can be used as a metric to compare two images. 

The model described in the second reference is implemented in below and the results look promising:

https://sorenbouma.github.io/blog/oneshot/

https://github.com/Goldesel23/Siamese-Networks-for-One-Shot-Learning

Since I do not have a cloud account, I was not able to run the models and verify the results though.

Below is the model architecture with two inputs.

![alt text](Siamese_diagram_2.png "Title")



## Defining the Model
The model in defined in below using the architecture and initial parameters used in the [paper](http://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf)


In [7]:
from keras.layers import Input, Conv2D, Lambda, merge, Dense, Flatten,MaxPooling2D
from keras.models import Model, Sequential
from keras.regularizers import l2
from keras import backend as K
from keras.optimizers import SGD,Adam
from keras.losses import binary_crossentropy
import numpy.random as rng
import numpy as np
import os
import pickle
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.utils import shuffle
%matplotlib inline
def W_init(shape,name=None):
    """Initialize weights as in paper"""
    values = rng.normal(loc=0,scale=1e-2,size=shape)
    return K.variable(values,name=name)
#//TODO: figure out how to initialize layer biases in keras.
def b_init(shape,name=None):
    """Initialize bias as in paper"""
    values=rng.normal(loc=0.5,scale=1e-2,size=shape)
    return K.variable(values,name=name)

input_shape = (105, 105, 1)
left_input = Input(input_shape)
right_input = Input(input_shape)
#build convnet to use in each siamese 'leg'
convnet = Sequential()
convnet.add(Conv2D(64,(10,10),activation='relu',input_shape=input_shape,
                   kernel_initializer=W_init,kernel_regularizer=l2(2e-4)))
convnet.add(MaxPooling2D())
convnet.add(Conv2D(128,(7,7),activation='relu',
                   kernel_regularizer=l2(2e-4),kernel_initializer=W_init,bias_initializer=b_init))
convnet.add(MaxPooling2D())
convnet.add(Conv2D(128,(4,4),activation='relu',kernel_initializer=W_init,kernel_regularizer=l2(2e-4),bias_initializer=b_init))
convnet.add(MaxPooling2D())
convnet.add(Conv2D(256,(4,4),activation='relu',kernel_initializer=W_init,kernel_regularizer=l2(2e-4),bias_initializer=b_init))
convnet.add(Flatten())
convnet.add(Dense(4096,activation="sigmoid",kernel_regularizer=l2(1e-3),kernel_initializer=W_init,bias_initializer=b_init))

#call the convnet Sequential model on each of the input tensors so params will be shared
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)
#layer to merge two encoded inputs with the l1 distance between them
L1_layer = Lambda(lambda tensors:K.abs(tensors[0] - tensors[1]))
#call this layer on list of two input tensors.
L1_distance = L1_layer([encoded_l, encoded_r])
prediction = Dense(1,activation='sigmoid',bias_initializer=b_init)(L1_distance)
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)

optimizer = Adam(0.00006)
#//TODO: get layerwise learning rates and momentum annealing scheme described in paperworking
siamese_net.compile(loss="binary_crossentropy",optimizer=optimizer)

siamese_net.count_params()

38951745

### Set the Path 
The path will be used to upload images for training the model and also for applyting the model for one-shot learning

In [11]:
PATH = "C:/Users/mvariani/Desktop/Projects/Courses/Fellowship/Omniglot" #CHANGE THIS - path where the piPATH = "C:/Users/mvariani/Desktop/Projects/Courses/Fellowship/Omniglot" #CHANGE THIS - path where the pickled data is storedckled data is stored

## Data 
The handwritten data is pickled as an N_classes x n_examples x width x height array (using load_data.py uploaded seperately), and there is an accompanyng dictionary to specify which indexes belong to which languages. The pickle for "Train" data is genereated from "images_background" and "val" from "images_evaluation".

In [14]:

with open(PATH + "/train.pickle", "rb") as f:
    (X,c) = pickle.load(f)

with open(PATH +  "/val.pickle", "rb") as f:
    (Xval,cval) = pickle.load(f)
    
print("training alphabets")
print(c.keys())
print("validation alphabets:")
print(cval.keys())

training alphabets
dict_keys(['Alphabet_of_the_Magi', 'Anglo-Saxon_Futhorc', 'Arcadian', 'Armenian', 'Asomtavruli_(Georgian)', 'Balinese', 'Bengali', 'Blackfoot_(Canadian_Aboriginal_Syllabics)', 'Braille', 'Burmese_(Myanmar)', 'Cyrillic', 'Early_Aramaic', 'Futurama', 'Grantha', 'Greek', 'Gujarati', 'Hebrew', 'Inuktitut_(Canadian_Aboriginal_Syllabics)', 'Japanese_(hiragana)', 'Japanese_(katakana)', 'Korean', 'Latin', 'Malay_(Jawi_-_Arabic)', 'Mkhedruli_(Georgian)', 'N_Ko', 'Ojibwe_(Canadian_Aboriginal_Syllabics)', 'Sanskrit', 'Syriac_(Estrangelo)', 'Tagalog', 'Tifinagh'])
validation alphabets:
dict_keys(['Angelic', 'Atemayar_Qelisayer', 'Atlantean', 'Aurek-Besh', 'Avesta', 'Ge_ez', 'Glagolitic', 'Gurmukhi', 'Kannada', 'Keble', 'Malayalam', 'Manipuri', 'Mongolian', 'Old_Church_Slavonic_(Cyrillic)', 'Oriya', 'Sylheti', 'Syriac_(Serto)', 'Tengwar', 'Tibetan', 'ULOG'])


## Batch Data and Test Files
Below codes are used to generate batch files from "train" data sets for optimization. In order to evaluate the training, random set of data is generated from "val" data sets 

In [15]:
class Siamese_Loader:
    """For loading batches and testing tasks to a siamese net"""
    def __init__(self, path, data_subsets = ["train", "val"]):
        self.data = {}
        self.categories = {}
        self.info = {}
        
        for name in data_subsets:
            file_path = os.path.join(path, name + ".pickle")
            print("loading data from {}".format(file_path))
            with open(file_path,"rb") as f:
                (X,c) = pickle.load(f)
                self.data[name] = X
                self.categories[name] = c

    def get_batch(self,batch_size,s="train"):
        """Create batch of n pairs, half same class, half different class"""
        X=self.data[s]
        n_classes, n_examples, w, h = X.shape

        #randomly sample several classes to use in the batch
        categories = rng.choice(n_classes,size=(batch_size,),replace=False)
        #initialize 2 empty arrays for the input image batch
        pairs=[np.zeros((batch_size, h, w,1)) for i in range(2)]
        #initialize vector for the targets, and make one half of it '1's, so 2nd half of batch has same class
        targets=np.zeros((batch_size,))
        targets[batch_size//2:] = 1
        for i in range(batch_size):
            category = categories[i]
            idx_1 = rng.randint(0, n_examples)
            pairs[0][i,:,:,:] = X[category, idx_1].reshape(w, h, 1)
            idx_2 = rng.randint(0, n_examples)
            #pick images of same class for 1st half, different for 2nd
            if i >= batch_size // 2:
                category_2 = category  
            else: 
                #add a random number to the category modulo n classes to ensure 2nd image has
                # ..different category
                category_2 = (category + rng.randint(1,n_classes)) % n_classes
            pairs[1][i,:,:,:] = X[category_2,idx_2].reshape(w, h,1)
        return pairs, targets
    
    def generate(self, batch_size, s="train"):
        """a generator for batches, so model.fit_generator can be used. """
        while True:
            pairs, targets = self.get_batch(batch_size,s)
            yield (pairs, targets)    

    def make_oneshot_task(self,N,s="val",language=None):
        """Create pairs of test image, support set for testing N way one-shot learning. """
        X=self.data[s]
        n_classes, n_examples, w, h = X.shape
        indices = rng.randint(0,n_examples,size=(N,))
        if language is not None:
            low, high = self.categories[s][language]
            if N > high - low:
                raise ValueError("This language ({}) has less than {} letters".format(language, N))
            categories = rng.choice(range(low,high),size=(N,),replace=False)
            
        else:#if no language specified just pick a bunch of random letters
            categories = rng.choice(range(n_classes),size=(N,),replace=False)            
        true_category = categories[0]
        ex1, ex2 = rng.choice(n_examples,replace=False,size=(2,))
        test_image = np.asarray([X[true_category,ex1,:,:]]*N).reshape(N, w, h,1)
        support_set = X[categories,indices,:,:]
        support_set[0,:,:] = X[true_category,ex2]
        support_set = support_set.reshape(N, w, h,1)
        targets = np.zeros((N,))
        targets[0] = 1
        targets, test_image, support_set = shuffle(targets, test_image, support_set)
        pairs = [test_image,support_set]

        return pairs, targets
    
    def test_oneshot(self,model,N,k,s="val",verbose=0):
        """Test average N way oneshot learning accuracy of a siamese neural net over k one-shot tasks"""
        n_correct = 0
        if verbose:
            print("Evaluating model on {} random {} way one-shot learning tasks ...".format(k,N))
        for i in range(k):
            inputs, targets = self.make_oneshot_task(N,s)
            probs = model.predict(inputs)
            if np.argmax(probs) == np.argmax(targets):
                n_correct+=1
        percent_correct = (100.0*n_correct / k)
        if verbose:
            print("Got an average of {}% {} way one-shot learning accuracy".format(percent_correct,N))
        return percent_correct
    
    def train(self, model, epochs, verbosity):
        model.fit_generator(self.generate(batch_size),
                            
                             )
    
    
#Instantiate the class
loader = Siamese_Loader(PATH)

loading data from C:/Users/mvariani/Desktop/Projects/Courses/Fellowship/Omniglot\train.pickle
loading data from C:/Users/mvariani/Desktop/Projects/Courses/Fellowship/Omniglot\val.pickle


## Training
Below loop is used for training and finding the best weights. As I mentioned, since I do not have a cloud account, I was not able to train the model for large number of iterations. Pelase note that the number of iterations is set to 2 to make sure code is running without errors. 

In [16]:
#Training loop
print("!")
evaluate_every = 1 # interval for evaluating on one-shot tasks
loss_every=50 # interval for printing loss (iterations)
batch_size = 32
# n_iter = 90000 #orig
n_iter = 2
N_way = 20 # how many classes for testing one-shot tasks>
# n_val = 250 #how mahy one-shot tasks to validate on?
n_val = 250 #how mahy one-shot tasks to validate on? orig
best = -1
weights_path = os.path.join(PATH, "weights")
print("training")
for i in range(1, n_iter):
    (inputs,targets)=loader.get_batch(batch_size)
    loss=siamese_net.train_on_batch(inputs,targets)
    print(loss)
    if i % evaluate_every == 0:
        print("evaluating")
        val_acc = loader.test_oneshot(siamese_net,N_way,n_val,verbose=True)
        if val_acc >= best:
            print("saving")
            siamese_net.save(weights_path)
            best=val_acc

    if i % loss_every == 0:
        print("iteration {}, training loss: {:.2f},".format(i,loss))


!
training
4.523834
evaluating
Evaluating model on 25 random 20 way one-shot learning tasks ...
Got an average of 28.0% 20 way one-shot learning accuracy
saving


## Applying the model
Once the model is trained, it can be used for one-shot learning purposes. Here "allrun" files are used to find the accuracy of the model. In each run iteration, the model.predit is used to find the probabilty based on distance of encoded images. Each image in the "test" folder is compared with the ones in "train" folder. The highest probabilty is used to label the image. The errors could be very large since the model is not trained with adequate number of iterations.

In [17]:
import numpy as np
import copy
from scipy.ndimage import imread
from scipy.spatial.distance import cdist

#Loading the model
#siamese_net = load_model(PATH + '/siamese_net.h5')

# Parameters
nrun = 20 # number of classification runs
fname_label = 'class_labels.txt' # where class labels are stored for each run

def classification_run(folder,f_load,siamese_prob,fpath):

#     assert ((ftype=='cost') | (ftype=='score'))

    # get file names
    
    with open(folder+'/'+fname_label) as f:
        content = f.read().splitlines()
    pairs = [line.split() for line in content]
    test_files  = [pair[0] for pair in pairs]
    train_files = [pair[1] for pair in pairs]
    answers_files = copy.copy(train_files)
    test_files.sort()
    train_files.sort()	
    ntrain = len(train_files)
    ntest = len(test_files)

    # load the images (and, if needed, extract features)
    train_items = [f_load(os.path.join(fpath,f)) for f in train_files]
    test_items  = [f_load(os.path.join(fpath,f)) for f in test_files ]

    # compute Probability matrix
    ProbM = np.zeros((ntest,ntrain),float)
    for i in range(ntest):
        for c in range(ntrain):
            ProbM[i,c] = siamese_prob(test_items[i],train_items[c])    
   
    YHAT = np.argmax(ProbM,axis=1)
    

    # compute the error rate
    correct = 0
    for i in range(ntest):
        if train_files[YHAT[i]] == answers_files[i]:
            correct += 1.0
    pcorrect = 100 * correct / ntest
    perror = 100 - pcorrect
    return perror

def Prob_Calc(itemA,itemB):
    b1 = itemA.reshape(1,105,105,1)
    b2 = itemB.reshape(1,105,105,1)
    b = np.stack((b1, b2))
    b = list(b)
#     print(np.array(b).shape)
    p=siamese_net.predict(b)
    return p

def LoadImgAsPoints(fn):
    I = imread(fn,flatten=True)
    return I

f_path = PATH + '/all_runs/'
print('One-shot classification demo with Siamese')
perror = np.zeros(nrun)
for r in range(1,nrun+1):
    rs = str(r)
    if len(rs)==1:
        rs = '0' + rs
    perror[r-1] = classification_run(f_path+'run'+rs, LoadImgAsPoints, Prob_Calc,f_path)
    print(" run " + str(r) + " (error " + str(perror[r-1] ) + "%)")
total = np.mean(perror)
print(" average error " + str(total) + "%")


One-shot classification demo with Siamese


`imread` is deprecated in SciPy 1.0.0.
Use ``matplotlib.pyplot.imread`` instead.


 run 1 (error 90.0%)
 run 2 (error 85.0%)
 run 3 (error 90.0%)
 run 4 (error 65.0%)
 run 5 (error 60.0%)
 run 6 (error 90.0%)
 run 7 (error 85.0%)
 run 8 (error 85.0%)
 run 9 (error 80.0%)
 run 10 (error 85.0%)
 run 11 (error 80.0%)
 run 12 (error 90.0%)
 run 13 (error 85.0%)
 run 14 (error 90.0%)
 run 15 (error 70.0%)
 run 16 (error 80.0%)
 run 17 (error 85.0%)
 run 18 (error 65.0%)
 run 19 (error 95.0%)
 run 20 (error 90.0%)
 average error 82.25%
