# Getting Known to Siamese Networks

Siamese network is getting extremely popular in the day to day usage. The siamese network has wide application. Siamese network is a kind of architecture can be used to rain a model to compare two things. These networks are presently used in the following application. 

    Signature Verification
    Apple Photo ID
    Comparing text with paraphrasing
    Comparing two texts to detect, neutral, contradicting and enlightening sentences in SNLI dataset
    DeepFace: facial recognition system created by research uses siamese network
    
    
    
Siamese network architecture has two sister network connected by the common stem as shown below in the diagram.

![](figures/siamese_network-1.png)

Figure. Showing a schematic structure of the siamese network. It always has two sister network connected to a common stem. Some of the custom loss function is used to train this kind of networks are described in details in the below-given description. Change this 
It is important to note that the two arms of the network must have similar architecture and they must share the weights. The siamese network can have various type of layers in the two arms. for example. 

1. Dense layers to process numerical data
2. Convolution layer to compare two images
3. Recurrent layers to compare two sentences
4. A combination of Convolution and recurrent layer to compare two signals. these signal can be anything like video or audio streams. 

Usually, the Siamese network is used to calculate the binary classification and hence it can be trained using binary cross entropy loss function. 


# Importing requirement 

In [None]:
import os

import numpy as np
import torch
from tensorboardX import SummaryWriter
from torch import nn
from torch.autograd import Variable
from torchtext import data
from torchtext import vocab
from tqdm import tqdm

SEED = 1234

torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

**Setting up configuration**

In [None]:
class Config:
    embed_dim = 100
    batch_size = 32
    hidden_size = 100
    input_size = 10
    bidirectional = False 
    n_layers = 1
    piller_out_class = 100
    num_class = 2
    max_tokens = 10

config  = Config

**Dataset Description:** To demonstrate the effectiveness of Siamese network in comparing two texts we will be using a small dataset present at `data/text_simillarity`. This dataset was acquired from google dataset search. This dataset is text similarity dataset available under Database content license V1.0.  Few of the rows from the dataset are given below. The dataset is about comparing similar ticker description **description_x**, **description_y**. A stock ticker is a report of the price for certain securities, updated continuously throughout the trading session by the various stock exchanges. The same_security  column is used as the label. The goal of our Siamese network to take text x and text y and to predict whether they are similar or not.

| description_x | description_y                                         | ticker_x                                                     | ticker_y | same_security |       | 
|---------------|-------------------------------------------------------|--------------------------------------------------------------|----------|---------------|-------| 
| 0             | first trust dow jones internet                        | first trust dj internet idx                                  | FDN      | FDN           | TRUE  | 
| 1             | schwab intl large company index etf                   | schwab strategic tr fundamental intl large co index etf      | FNDF     | FNDF          | TRUE  | 
| 2             | vanguard small cap index adm                          | vanguard small-cap index fund inst                           | VSMAX    | VSCIX         | FALSE | 
| 3             | duke energy corp new com new isin #us4 sedol #b7jzsk0 | duke energy corp new com new isin #us26441c2044 sedol #b7jzs | DUK      | DUK           | TRUE  | 
| 4             | visa inc class a                                      | visa inc.                                                    | V        | V             | TRUE  | 
| 5             | ford motor co new div: 0.600                          | ford motor co                                                | F        | F             | TRUE  | 


**Loading and Pre-processing data:** Torchtext subclass data.Iterator.splits is used for loading the data and glove 300-dimensional glove embedding is used as pre-trained embedding. A very similar code snippet of torchtext will be used to get train and test iterators



In [None]:
def tokenize(sentiments): 
    tokens  = [i.lower() for i in sentiments]
    if len(tokens) >= config.max_tokens:
        return tokens[:config.max_tokens]
    else:
        pad = ['0' for i in range(0,(config.max_tokens - len(tokens)))]
        temp = list(tokens) + (list(pad))
        return temp
def to_categorical(x):
    if x == "TRUE":
        return [1,0]
    if x == "FALSE":
        return [0,1]


In [None]:
# defining data fields
TEXT1 = data.Field(sequential=True , preprocessing=tokenize, use_vocab = True,batch_first=True)
LABEL = data.Field(is_target=True,use_vocab = False, sequential=False, preprocessing = to_categorical)

fields = [(None, None), ('description_x', TEXT1),('description_y', TEXT1), (None, None),(None, None), ('same_security', LABEL)]

# constructing tabular dataset
train_data , test_data = data.TabularDataset.splits(
                            path = 'data/text_simillarity',
                            train = 'train.csv',
                            test = 'test.csv',
                            format = 'csv',
                            skip_header=True,
                            fields = fields)

In [None]:
# Printing sample data
print ([vars(train_data[i]) for i in range (0,3)])

**Downloading embedding**
The pre-trained embeddings are available and can be easily used in our model.  we will be using the GloVe embedding with 100 dimentions.

In [None]:
embed_exists = os.path.isfile('../embeddings/glove.6B.zip')
if not embed_exists:
    print("Downloading Glove embeddings, if not downloaded properly, then delete the `../embeddings/glove.6B.zip")
    chakin.search(lang='English')
    chakin.download(number=12, save_dir='../embeddings')
    zip_ref = zipfile.ZipFile("../embeddings/glove.6B.zip", 'r')
    zip_ref.extractall("../embeddings/")
    zip_ref.close()

**Constructing iterator**

In [None]:
vec = vocab.Vectors(name = 'glove.6B.100d.txt',cache = "../embeddings/glove.6B/")
TEXT1.build_vocab(train_data, test_data, max_size=400000, vectors=vec)

# making iterator
train_iter, test_iter = data.Iterator.splits(
        (train_data, test_data), sort_key=lambda x: len(x.description_x),
        batch_sizes=(config.batch_size,config.batch_size), device=device)

**Vector size and Embedding vector placeholder**

In [None]:
vocab_size = len(TEXT1.vocab)
vocab_vectors = TEXT1.vocab.vectors

# Model

**Constructing sister network:** Here for the purpose of text processing, I have taken LSTM units in the sister network. Each sister network is taken as an input shape of [batch_size, input_length]. After application of embeddings this shape changes to [batch_size, input_length, embeddings_size]. The output of the embeddings is given to the LSTM unit. The hidden shape of the LSTM is taken and passed to the dense layer to generate any arbitrary output size. In our case the sister network outputs [batch_size, 100] as the output. Such output is generated by both the sister network. Here sister network is constructed as `Piller`.



In [None]:
class Piller(nn.Module):
    def __init__(self, config : Config, vocab_size):
        super(Piller, self).__init__()
        self.config = config
        self.embed = nn.Embedding(vocab_size, embedding_dim=config.embed_dim)
        self.lstm1 = nn.LSTM(config.embed_dim, config.hidden_size, batch_first=True)
        self.dense = nn.Linear(self.config.input_size * self.config.embed_dim,self.config.piller_out_class)
        self.init_hidden()
        
    def forward(self,input):
        embed_out = self.embed(input)
        lstm_out, (self.h0, self.c0) = self.lstm1(embed_out, (self.h0, self.c0))
        dense_out =  self.dense(lstm_out.contiguous().view(self.config.batch_size, -1))
        return torch.softmax(dense_out, 1)
    
    def init_hidden(self):
        bidiractional_state = (1 if self.config.bidirectional==False else 2)
        h0 = Variable(torch.Tensor(np.random.rand(self.config.n_layers * bidiractional_state, self.config.batch_size, self.config.hidden_size)))
        c0 = Variable(torch.Tensor(np.random.rand(self.config.n_layers * bidiractional_state, self.config.batch_size, self.config.hidden_size)))
        
        self.h0 = h0.to(device)
        self.c0 = c0.to(device)
        



**The Stem**: The stem is the network where both the sister network converges and eventually fully connected layer are applied and comparison is done by classification.


In [None]:
class Stem(nn.Module):
    def __init__(self, config):
        super(Stem, self).__init__()
        self.config = config
        self.dense1 = nn.Linear(config.piller_out_class*2, config.piller_out_class)
        self.dense2 = nn.Linear(config.piller_out_class, config.num_class)
        
    def forward(self, input):
        stem_dense1 = self.dense1(input)
        stem_dense2 = self.dense2(stem_dense1)
        return stem_dense2    

`SiameseNetwork` is constructed by  conecteing **Pillers** to the **Stem**.

In [None]:
class SiameseNetwork(nn.Module):
    def __init__(self,left_arm, right_arm, stem):
        super(SiameseNetwork,self).__init__()
        self.left_arm = left_arm
        self.right_arm = right_arm
        self.stem = stem
    def forward(self, left_input, right_input):
        left_output = self.left_arm(left_input)
        right_output = self.right_arm(right_input)
        stem_input  = torch.cat((left_output,right_output), dim = 1)
        stem_output = self.stem(stem_input)
        return stem_output

Initilaizing and passing network to the device 

In [None]:
left = Piller(config, vocab_size = vocab_size)
right = Piller(config,vocab_size = vocab_size)
stem  = Stem(config)
left= left.to(device)
right= right.to(device)
stem = stem.to(device)

model = SiameseNetwork(left, right, stem)
model= model.to(device)

**Supporting Function**

In [None]:
def binary_accuracy(preds, y):
    """
    Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
    """
    rounded_preds = torch.argmax(preds, dim=1)
    correct = (rounded_preds == torch.argmax(y, dim=1)).float() #convert into float for division 
    acc = correct.sum()/len(correct)
    return acc

In [None]:
def train(model, iterator, optimizer, criterion):
    epoch_loss = 0
    epoch_acc = 0    
    for batch in iterator:
        if batch.description_x.shape[0] == config.batch_size:
            x1 =  batch.description_x.long().to(device)
            x2 = batch.description_y.long().to(device)
            target = batch.same_security.type(torch.FloatTensor).to(device)
            optimizer.zero_grad()
            predictions = model(x1,x2)
            loss = criterion(predictions.type(torch.FloatTensor).to(device), target)
            loss.backward(retain_graph=True)
            optimizer.step()
            acc = binary_accuracy(predictions.type(torch.FloatTensor), target.type(torch.FloatTensor))
            epoch_loss += loss.item()
            epoch_acc += acc.item()
    return model, epoch_loss / len(iterator), epoch_acc / len(iterator)

In [None]:
def test_accuracy_calculator(model, test_iterator):
    epoch_acc = 0
    for batch in test_iterator:
        if batch.description_x.shape[0] == config.batch_size:
            x1 =  batch.description_x.long().to(device)
            x2 = batch.description_y.long().to(device)
            target = batch.same_security.type(torch.FloatTensor).to(device)
            predictions = model(x1,x2)          
            acc = binary_accuracy(predictions.type(torch.FloatTensor), target.type(torch.FloatTensor))
            epoch_acc += acc.item()
    return  epoch_acc / len(test_iterator)

**Defining optimizer and loss**

This network was trained using Mean Squared Error as loss function and SGD as the optimizer. The decrease in the Training loss and increase in the training accuracy as observed with Tensorboard is given below:

In [None]:
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
criterion = nn.MSELoss()
criterion = criterion.to(device)

In [None]:
epochs  = 10
writer = SummaryWriter()

for i in tqdm(range(epochs)):
    if (i != 0 and i%10 == 0 ):
        # chnaging learning rate for rnn_model
        for param_group in optimizer.param_groups:
            param_group['lr'] = param_group['lr']/2
            
    model, epoch_loss, epoch_acc = train(model, train_iter, optimizer, criterion)
#     test_acc = test_accuracy_calculator(model, test_iter)
    writer.add_scalar('Train/Loss', epoch_loss, i)
    writer.add_scalar('Train/Accuracy', epoch_acc, i)
#     writer.add_scalar('Test', test_acc, i)

# Performance 

![](figures/siamese.png)

Figure: Showing convergence of the Siamese architecture on the text comparison related task. the shown  result i on the train data but the code is having commented block to test accuracy of the test data as well, check it yourself.