## Assignment 4 - Question Duplicates

<a name='0'></a>
## Overview
In this assignment, concretely you will: 

- Learn about Siamese networks
- Understand how the triplet loss works
- Understand how to evaluate accuracy
- Use cosine similarity between the model's outputted vectors
- Use the data generator to get batches of questions
- Predict using your own model

By now, you are familiar with trax and know how to make use of classes to define your model. We will start this homework by asking you to preprocess the data the same way you did in the previous assignments. After processing the data you will build a classifier that will allow you to identify whether two questions are the same or not. 
<img src = "images/meme.png" style="width:550px;height:300px;"/>


You will process the data first and then pad in a similar way you have done in the previous assignment. Your model will take in the two question embeddings, run them through an LSTM, and then compare the outputs of the two sub networks using cosine similarity. Before taking a deep dive into the model, start by importing the data set.

### 1. Importing the data

In [131]:
import os
import nltk
import trax
from trax import layers as tl
from trax.supervised import training
from trax.fastmath import numpy as fastnp
import numpy as np
import pandas as pd
import random as rnd
from trax import shapes

import w4_unittest

nltk.data.path.append('nltk_data')
rnd.seed(4)

In [132]:
data = pd.read_csv('data/questions.csv')
print(f"number of questions pairs: {len(data)}")
data.head()

number of questions pairs: 404351


Unnamed: 0,id,qid1,qid2,question1,question2,is_duplicate
0,0,1,2,What is the step by step guide to invest in sh...,What is the step by step guide to invest in sh...,0
1,1,3,4,What is the story of Kohinoor (Koh-i-Noor) Dia...,What would happen if the Indian government sto...,0
2,2,5,6,How can I increase the speed of my internet co...,How can Internet speed be increased by hacking...,0
3,3,7,8,Why am I mentally very lonely? How can I solve...,Find the remainder when [math]23^{24}[/math] i...,0
4,4,9,10,"Which one dissolve in water quikly sugar, salt...",Which fish would survive in salt water?,0


In [133]:
N_TRAIN = 300000
N_TEST=10*1024
data_train = data[:N_TRAIN]
data_test = data[N_TRAIN:N_TRAIN + N_TEST]

print(f"length of training set: {len(data_train)}, length of testing set: {len(data_test)}")

length of training set: 300000, length of testing set: 10240


In [134]:
is_duplicate_index = data_train[data_train['is_duplicate'] == True].index.to_list()
is_duplicate_index

[5,
 7,
 11,
 12,
 13,
 15,
 16,
 18,
 20,
 29,
 31,
 32,
 38,
 48,
 49,
 50,
 51,
 53,
 58,
 62,
 65,
 66,
 67,
 71,
 72,
 73,
 74,
 79,
 84,
 85,
 86,
 88,
 92,
 93,
 95,
 100,
 104,
 107,
 113,
 120,
 122,
 125,
 127,
 135,
 136,
 143,
 144,
 152,
 156,
 158,
 159,
 160,
 163,
 165,
 168,
 173,
 175,
 176,
 178,
 179,
 180,
 182,
 185,
 188,
 189,
 190,
 191,
 193,
 194,
 197,
 198,
 199,
 200,
 203,
 209,
 210,
 215,
 216,
 219,
 220,
 221,
 224,
 226,
 229,
 235,
 236,
 238,
 242,
 243,
 244,
 246,
 249,
 250,
 251,
 253,
 255,
 260,
 261,
 262,
 267,
 269,
 270,
 273,
 274,
 275,
 281,
 284,
 285,
 286,
 287,
 288,
 291,
 293,
 295,
 296,
 299,
 304,
 307,
 308,
 309,
 312,
 317,
 318,
 321,
 322,
 323,
 326,
 329,
 331,
 339,
 341,
 346,
 347,
 348,
 349,
 350,
 353,
 364,
 365,
 368,
 373,
 377,
 380,
 383,
 390,
 393,
 394,
 395,
 397,
 399,
 400,
 402,
 403,
 404,
 405,
 409,
 410,
 412,
 415,
 421,
 422,
 428,
 430,
 431,
 432,
 439,
 442,
 443,
 445,
 446,
 450,
 451,
 457,

In [135]:
print(f"number of duplicate questions: {len(is_duplicate_index)}, number of non duplicate questions {len(data) - len(is_duplicate_index)}")

number of duplicate questions: 111486, number of non duplicate questions 292865


In [136]:
data.loc[is_duplicate_index[:5]]

Unnamed: 0,id,qid1,qid2,question1,question2,is_duplicate
5,5,11,12,Astrology: I am a Capricorn Sun Cap moon and c...,"I'm a triple Capricorn (Sun, Moon and ascendan...",1
7,7,15,16,How can I be a good geologist?,What should I do to be a great geologist?,1
11,11,23,24,How do I read and find my YouTube comments?,How can I see all my Youtube comments?,1
12,12,25,26,What can make Physics easy to learn?,How can you make physics easy to learn?,1
13,13,27,28,What was your first sexual experience like?,What was your first sexual experience?,1


Splitting out test and train q1 and q2 words

In [137]:
q1_train_words = data_train.loc[is_duplicate_index,'question1']
q2_train_words = data_train.loc[is_duplicate_index,'question2']

q1_test_words = data_test['question1']
q2_test_words = data_test['question2']
y_test = data_test['is_duplicate']


Q1 and Q2 training breakdown

In [138]:
q1_train_words[:10], q2_train_words[:10]

(5     Astrology: I am a Capricorn Sun Cap moon and c...
 7                        How can I be a good geologist?
 11          How do I read and find my YouTube comments?
 12                 What can make Physics easy to learn?
 13          What was your first sexual experience like?
 15    What would a Trump presidency mean for current...
 16                         What does manipulation mean?
 18    Why are so many Quora users posting questions ...
 20                           Why do rockets look white?
 29               How should I prepare for CA final law?
 Name: question1, dtype: object,
 5     I'm a triple Capricorn (Sun, Moon and ascendan...
 7             What should I do to be a great geologist?
 11               How can I see all my Youtube comments?
 12              How can you make physics easy to learn?
 13               What was your first sexual experience?
 15    How will a Trump presidency affect the student...
 16                        What does manipulation means

In [139]:
print(f"number of q1 train words: {len(q1_train_words)}, and number of q2 train words {len(q2_train_words)}")

number of q1 train words: 111486, and number of q2 train words 111486


Q1 and Q2 testing breakdown.

In [140]:
print(f"number of q1 test words: {len(q1_test_words)}, number of q2 test words: {len(q2_test_words)}, number of y test labels: {len(y_test)}")

number of q1 test words: 10240, number of q2 test words: 10240, number of y test labels: 10240


In [141]:
test_sentence = "How old are you"
test_words = nltk.word_tokenize(test_sentence)
test_words

['How', 'old', 'are', 'you']

In [142]:
q1_train = q1_train_words.apply(lambda x : nltk.word_tokenize(x))
q2_train = q2_train_words.apply(lambda x : nltk.word_tokenize(x))
q1_train.head(), q2_train.head()

(5     [Astrology, :, I, am, a, Capricorn, Sun, Cap, ...
 7              [How, can, I, be, a, good, geologist, ?]
 11    [How, do, I, read, and, find, my, YouTube, com...
 12       [What, can, make, Physics, easy, to, learn, ?]
 13    [What, was, your, first, sexual, experience, l...
 Name: question1, dtype: object,
 5     [I, 'm, a, triple, Capricorn, (, Sun, ,, Moon,...
 7     [What, should, I, do, to, be, a, great, geolog...
 11    [How, can, I, see, all, my, Youtube, comments, ?]
 12    [How, can, you, make, physics, easy, to, learn...
 13      [What, was, your, first, sexual, experience, ?]
 Name: question2, dtype: object)

In [143]:
q1_train = q1_train.to_numpy()
q2_train = q2_train.to_numpy()
q1_train, q2_train

(array([list(['Astrology', ':', 'I', 'am', 'a', 'Capricorn', 'Sun', 'Cap', 'moon', 'and', 'cap', 'rising', '...', 'what', 'does', 'that', 'say', 'about', 'me', '?']),
        list(['How', 'can', 'I', 'be', 'a', 'good', 'geologist', '?']),
        list(['How', 'do', 'I', 'read', 'and', 'find', 'my', 'YouTube', 'comments', '?']),
        ...,
        list(['What', 'are', 'the', 'top', '10', 'TV', 'series', 'one', 'should', 'genuinely', 'watch', '?']),
        list(['Is', 'there', 'no', 'life', 'on', 'other', 'planets', '?']),
        list(['How', 'do', 'I', 'tell', 'the', 'difference', 'between', 'infatuation', 'and', 'love', '?'])],
       dtype=object),
 array([list(['I', "'m", 'a', 'triple', 'Capricorn', '(', 'Sun', ',', 'Moon', 'and', 'ascendant', 'in', 'Capricorn', ')', 'What', 'does', 'this', 'say', 'about', 'me', '?']),
        list(['What', 'should', 'I', 'do', 'to', 'be', 'a', 'great', 'geologist', '?']),
        list(['How', 'can', 'I', 'see', 'all', 'my', 'Youtube', 'comments'

In [144]:
q1_test = q1_test_words.apply(lambda x : nltk.word_tokenize(x))
q2_test = q2_test_words.apply(lambda x : nltk.word_tokenize(x))
q1_test.head(), q2_test.head()

(300000    [How, do, I, prepare, for, interviews, for, cs...
 300001    [What, is, the, best, bicycle, to, buy, under,...
 300002    [How, do, I, become, Mutual, funds, distribute...
 300003                  [Will, this, relationship, work, ?]
 300004                [How, does, Brexit, affect, India, ?]
 Name: question1, dtype: object,
 300000    [What, is, the, best, way, to, prepare, for, c...
 300001    [Which, is, the, best, bike, in, in, dia, to, ...
 300002    [How, do, I, become, mutual, funds, distributo...
 300003    [Relationship, :, Will, this, relationship, wo...
 300004    [Will, the, GBP/AUD, be, affected, by, Brexit, ?]
 Name: question2, dtype: object)

In [145]:
q1_test = q1_test.to_numpy()
q2_test = q2_test.to_numpy()
q1_test, q2_test

(array([list(['How', 'do', 'I', 'prepare', 'for', 'interviews', 'for', 'cse', '?']),
        list(['What', 'is', 'the', 'best', 'bicycle', 'to', 'buy', 'under', '10k', '?']),
        list(['How', 'do', 'I', 'become', 'Mutual', 'funds', 'distributer', 'for', 'all', 'company', 'mutual', 'funds', '?']),
        ...,
        list(['What', 'are', 'some', 'biblical', 'examples', 'of', 'God', 'giving', 'people', 'more', 'than', 'they', 'can', 'handle', '?']),
        list(['What', 'is', 'the', 'main', 'cause', 'of', 'typhoons', '?']),
        list(['How', 'does', 'one', 'become', 'a', 'man', 'of', 'action', '?'])],
       dtype=object),
 array([list(['What', 'is', 'the', 'best', 'way', 'to', 'prepare', 'for', 'cse', '?']),
        list(['Which', 'is', 'the', 'best', 'bike', 'in', 'in', 'dia', 'to', 'buy', 'in', 'INR', '10k', '?']),
        list(['How', 'do', 'I', 'become', 'mutual', 'funds', 'distributor', 'for', 'all', 'company', 'mutual', 'funds', '?']),
        ...,
        list(['If', 'Go

In [151]:
from collections import defaultdict

vocab = defaultdict(lambda : 0)
vocab['<PAD>']=1

for idx in range(len(q1_train)):
    q = q1_train[idx] + q2_train[idx]
    for word in q:
        if word not in vocab:
            vocab[word] = len(vocab) + 1
print('The length of the vocabulary is: ', len(vocab))

KeyError: 0

In [150]:
print(f"'<PAD>' index {vocab['<PAD>']}")
print(f"'Astrology' index {vocab['Astrology']}")
print(f"'What' index {vocab['What']}")
print(f"'Astronomy' index {vocab['Astronomy']}")

'<PAD>' index 1
'Astrology' index 2
'What' index 33
'Astronomy' index 0


<a name='1-2'></a>
### 1.2 - Converting a Question to a Tensor

You will now convert every question to a tensor, or an array of numbers, using your vocabulary built above.

In [149]:
q1_train_tensor = [[vocab[w] for w in words] for words in q1_train]
q2_train_tensor = [[vocab[w] for w in words] for words in q2_train]

q1_train_tensor, q2_train_tensor

([[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
  [22, 23, 4, 24, 6, 25, 26, 21],
  [22, 27, 4, 28, 11, 29, 30, 31, 32, 21],
  [33, 23, 34, 35, 36, 37, 38, 21],
  [33, 39, 40, 41, 42, 43, 44, 21],
  [33, 45, 6, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 21],
  [33, 16, 60, 48, 21],
  [61, 62, 63, 64, 65, 66, 67, 68, 17, 62, 69, 70, 56, 71, 21],
  [61, 27, 72, 73, 74, 21],
  [22, 75, 4, 76, 49, 77, 78, 79, 21],
  [33, 62, 80, 81, 82, 49, 83, 84, 6, 85, 17, 86, 87, 88, 89, 90, 21],
  [33, 91, 92, 93, 94, 45, 24, 89, 95, 96, 37, 97, 98, 99, 21],
  [22, 27, 100, 76, 49, 101, 21],
  [33, 62, 80, 102, 92, 103, 17, 23, 24, 34, 104, 105, 106, 21],
  [22, 27, 4, 34, 107, 108],
  [109, 110, 111, 25, 49, 112, 113, 114, 115, 21],
  [116, 6, 117, 118, 119, 56, 6, 120, 121, 122, 21, 123, 63, 124, 125, 21],
  [33, 126, 89, 127, 128, 129, 98, 130, 131, 132, 11, 133, 21],
  [4,
   39,
   134,
   135,
   136,
   137,
   108,
   4,
   138,
   139,
   140,
   30,
 

In [122]:
q1_test_tensor = [[vocab[w] for w in words] for words in q1_test]
q2_test_tensor = [[vocab[w] for w in words] for words in q2_test]

q1_test_tensor, q2_test_tensor

([[22, 27, 4, 76, 49, 779, 49, 9242, 21],
  [33, 126, 89, 163, 7100, 37, 557, 221, 6633, 21],
  [22, 27, 4, 408, 1031, 574, 0, 49, 190, 1791, 1584, 574, 21],
  [116, 372, 1219, 230, 21],
  [22, 16, 1442, 726, 236, 21],
  [109,
   11749,
   3264,
   7475,
   1825,
   22698,
   2293,
   385,
   0,
   0,
   12113,
   26005,
   7475,
   1825,
   21],
  [33, 62, 80, 27115, 1624, 21],
  [218, 126, 89, 163, 1110, 940, 3236, 21],
  [33, 126, 89, 163, 11, 1653, 177, 3369, 169, 1311, 804, 1317, 55, 21],
  [61, 27, 80, 2577, 3501, 49, 806, 21],
  [33, 62, 89, 163, 766, 37, 819, 169, 856, 21],
  [33, 4655, 15623, 37, 89, 2565, 92, 1505, 23328, 21],
  [33, 62, 80, 1170, 25, 3546, 3607, 1439, 37, 1090, 56, 2162, 21],
  [22, 23, 0, 798, 20, 169, 2335, 3767, 21],
  [933, 89, 2287, 14636, 0, 21],
  [22, 27, 4, 944, 1687, 11, 1677, 169, 13152, 21],
  [33, 126, 89, 178, 16093, 2639, 2640, 21],
  [123,
   12172,
   11326,
   92,
   14467,
   299,
   298,
   28331,
   0,
   10569,
   298,
   0,
   0,
   30

In [123]:
train_cutoff = int(len(q1_train) * 0.8)
train_q1_final, train_q2_final = q1_train_tensor[:train_cutoff], q2_train_tensor[:train_cutoff]
val_q1, val_q2 = q1_train_tensor[train_cutoff:], q2_train_tensor[train_cutoff:]

<a name='1-3'></a>
### 1.3 - Understanding the Iterator 

Most of the time in Natural Language Processing, and AI in general we use batches when training our data sets. If you were to use stochastic gradient descent with one example at a time, it will take you forever to build a model. In this example, we show you how you can build a data generator that takes in $Q1$ and $Q2$ and returns a batch of size `batch_size`  in the following format $([q1_1, q1_2, q1_3, ...]$, $[q2_1, q2_2,q2_3, ...])$. The tuple consists of two arrays and each array has `batch_size` questions. Again, $q1_i$ and $q2_i$ are duplicates, but they are not duplicates with any other elements in the batch. 

<br>

The command ```next(data_generator)```returns the next batch. This iterator returns the data in a format that you could directly use in your model when computing the feed-forward of your algorithm. This iterator returns a pair of arrays of questions. 

<a name='ex-1'></a>
### Exercise 1 - data_generator

**Instructions:**  
Implement the data generator below. Here are some things you will need. 

- While true loop.
- if `index >= len_Q1`, set the `idx` to $0$.
- The generator should return shuffled batches of data. To achieve this without modifying the actual question lists, a list containing the indexes of the questions is created. This list can be shuffled and used to get random batches everytime the index is reset.
- Append elements of $Q1$ and $Q2$ to `input1` and `input2` respectively.
- if `len(input1) == batch_size`, determine `max_len` as the longest question in `input1` and `input2`. Ceil `max_len` to a power of $2$ (for computation purposes) using the following command:  `max_len = 2**int(np.ceil(np.log2(max_len)))`.
- Pad every question by `vocab['<PAD>']` until you get the length `max_len`.
- Use yield to return `input1, input2`. 
- Don't forget to reset `input1, input2`  to empty arrays at the end (data generator resumes from where it last left).

In [124]:
train_q1[1]

[22, 23, 4, 24, 6, 25, 26, 21]

In [125]:
def data_generator(Q1, Q2, batch_size : int, pad=1, shuffle=True):
    """Generator function that yields batches of data

    Args:
        Q1 (list): List of transformed (to tensor) questions.
        Q2 (list): List of transformed (to tensor) questions.
        batch_size (int): Number of elements per batch.
        pad (int, optional): Pad character from the vocab. Defaults to 1.
        shuffle (bool, optional): If the batches should be randomnized or not. Defaults to True.
    Yields:
        tuple: Of the form (input1, input2) with types (numpy.ndarray, numpy.ndarray)
        NOTE: input1: inputs to your model [q1a, q2a, q3a, ...] i.e. (q1a,q1b) are duplicates
              input2: targets to your model [q1b, q2b,q3b, ...] i.e. (q1a,q2i) i!=a are not duplicates
    """
    input1 = []
    input2 = []
    
    idx = 0
    len_q = len(Q1)
    assert len_q == len(Q2)
    question_indexes = [*range(len_q)]
    
    if shuffle:
        rnd.shuffle(question_indexes)
    
    
    while True:
        if idx >= len_q:
            idx = 0
            
            if shuffle:
                rnd.shuffle(question_indexes)
        
        cur_q1 = Q1[question_indexes[idx]]
        cur_q2 = Q2[question_indexes[idx]]
        
        idx += 1
        
        input1.append(cur_q1)
        input2.append(cur_q2)

        if len(input1) == batch_size:
            max_q1 = max(len(q) for q in input1)
            max_q2 = max(len(q) for q in input2)
            max_len = max(max_q1, max_q2)
            max_len = 2**int(np.ceil(np.log2(max_len)))
            b1 = []
            b2 = []
            
            for batch_q1, batch_q2 in zip(input1, input2):
                diff = max_len - len(batch_q1)
                batch_q1.extend([pad for _ in range(diff)])
                batch_q2.extend([pad for _ in range(diff)])
                b1.append(batch_q1)
                b2.append(batch_q1)
            
            yield np.array(b1), np.array(b2)
            input1, input2 = [], []  # reset the batches
        


In [126]:
batch_size = 2
res1, res2 = next(data_generator(train_q1_final, train_q2_final, batch_size))
print("First questions  : ",'\n', res1, '\n')
print("Second questions : ",'\n', res2)

First questions  :  
 [[   33    75     4    27   243     4   541   893   126   461   385    20
     21     1     1     1]
 [  116  5120  6079  1595 18470   169   761   598    21     1     1     1
      1     1     1     1]] 

Second questions :  
 [[   33    75     4    27   243     4   541   893   126   461   385    20
     21     1     1     1]
 [  116  5120  6079  1595 18470   169   761   598    21     1     1     1
      1     1     1     1]]


**Note**: The following expected output is valid only if you run the above test cell **_once_** (first time). The output will change on each execution.

If you think your implementation is correct and it is not matching the output, make sure to restart the kernel and run all the cells from the top again. 

**Expected Output:**
```CPP
First questions  :  
 [[  30   87   78  134 2132 1981   28   78  594   21    1    1    1    1
     1    1]
 [  30   55   78 3541 1460   28   56  253   21    1    1    1    1    1
     1    1]] 

Second questions :  
 [[  30  156   78  134 2132 9508   21    1    1    1    1    1    1    1
     1    1]
 [  30  156   78 3541 1460  131   56  253   21    1    1    1    1    1
     1    1]]
```
Now that you have your generator, you can just call it and it will return tensors which correspond to your questions in the Quora data set.<br>Now you can go ahead and start building your neural network. 



In [102]:
# Test your function
w4_unittest.test_data_generator(data_generator)

Wrong output for questions in batch 2.
	Expected [[ 4 22  6 23  7 24  8 25 26 11 27 28  7 29 30 16 31 18 19 20 21  1  1  1
   1  1  1  1  1  1  1  1]
 [30 37  4 38 39 34  6 40 36 21  1  1  1  1  1  1  1  1  1  1  1  1  1  1
   1  1  1  1  1  1  1  1]
 [32 33  4 46 47 43 48 45 21  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
   1  1  1  1  1  1  1  1]].
	Got [[ 2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21  1  1  1  1
   1  1  1  1  1  1  1  1]
 [32 33  4 34  6 35 36 21  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
   1  1  1  1  1  1  1  1]
 [32 38  4 41 11 42 43 44 45 21  1  1  1  1  1  1  1  1  1  1  1  1  1  1
   1  1  1  1  1  1  1  1]].
Output for questions in batch 1 has the wrong size.
	Expected (5, 32).
	Got (5, 64).
Output for questions in batch 2 has the wrong size.
	Expected (5, 32).
	Got (5, 64).
Wrong output for questions in batch 1.
	Expected [[ 2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 -1 -1 -1 -1
  -1 -1 -1 -1 -1 -1 -1 -1]
 [30 55 56 57 58

<a name='2'></a>
## 2 - Defining the Siamese Model

<a name='2-1'></a>
### 2.1 - Understanding Siamese Network 
A Siamese network is a neural network which uses the same weights while working in tandem on two different input vectors to compute comparable output vectors.The Siamese network you are about to implement looks like this:

<img src = "images/siamese.png" style="width:600px;height:300px;"/>

You get the question embedding, run it through an LSTM layer, normalize $v_1$ and $v_2$, and finally use a triplet loss (explained below) to get the corresponding cosine similarity for each pair of questions. As usual, you will start by importing the data set. The triplet loss makes use of a baseline (anchor) input that is compared to a positive (truthy) input and a negative (falsy) input. The distance from the baseline (anchor) input to the positive (truthy) input is minimized, and the distance from the baseline (anchor) input to the negative (falsy) input is maximized. In math equations, you are trying to maximize the following.

$$\mathcal{L}(A, P, N)=\max \left(\|\mathrm{f}(A)-\mathrm{f}(P)\|^{2}-\|\mathrm{f}(A)-\mathrm{f}(N)\|^{2}+\alpha, 0\right)$$

$A$ is the anchor input, for example $q1_1$, $P$ the duplicate input, for example, $q2_1$, and $N$ the negative input (the non duplicate question), for example $q2_2$.<br>
$\alpha$ is a margin; you can think about it as a safety net, or by how much you want to push the duplicates from the non duplicates. 
<br>

<a name='ex-2'></a>
### Exercise 2 - Siamese

**Instructions:** Implement the `Siamese` function below. You should be using all the objects explained below. 

To implement this model, you will be using `trax`. Concretely, you will be using the following functions.


- `tl.Serial`: Combinator that applies layers serially (by function composition) allows you set up the overall structure of the feedforward. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.combinators.Serial) / [source code](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/combinators.py#L26)
    - You can pass in the layers as arguments to `Serial`, separated by commas. 
    - For example: `tl.Serial(tl.Embeddings(...), tl.Mean(...), tl.Dense(...), tl.LogSoftmax(...))` 


-  `tl.Embedding`: Maps discrete tokens to vectors. It will have shape (vocabulary length X dimension of output vectors). The dimension of output vectors (also called d_feature) is the number of elements in the word embedding. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.Embedding) / [source code](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/core.py#L113)
    - `tl.Embedding(vocab_size, d_feature)`.
    - `vocab_size` is the number of unique words in the given vocabulary.
    - `d_feature` is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).


-  `tl.LSTM` The LSTM layer. It leverages another Trax layer called [`LSTMCell`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.rnn.LSTMCell). The number of units should be specified and should match the number of elements in the word embedding. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.rnn.LSTM) / [source code](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/rnn.py#L87)
    - `tl.LSTM(n_units)` Builds an LSTM layer of n_units.
    
    
- `tl.Mean`: Computes the mean across a desired axis. Mean uses one tensor axis to form groups of values and replaces each group with the mean value of that group. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.Mean) / [source code](https://github.com/google/trax/blob/1372b903bb66b0daccee19fd0b1fdf44f659330b/trax/layers/core.py#L276)
    - `tl.Mean(axis=1)` mean over columns.


- `tl.Fn` Layer with no weights that applies the function f, which should be specified using a lambda syntax. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.base.Fn) / [source doce](https://github.com/google/trax/blob/70f5364dcaf6ec11aabbd918e5f5e4b0f5bfb995/trax/layers/base.py#L576)
    - $x$ -> This is used for cosine similarity.
    - `tl.Fn('Normalize', lambda x: normalize(x))` Returns a layer with no weights that applies the function `f`
    
    
- `tl.parallel`: It is a combinator layer (like `Serial`) that applies a list of layers in parallel to its inputs. [docs](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.combinators.Parallel) / [source code](https://github.com/google/trax/blob/37aba571a89a8ad86be76a569d0ec4a46bdd8642/trax/layers/combinators.py#L152)


In [None]:
# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: Siamese
def Siamese(vocab_size=41699, d_model=128, mode='train'):
    """Returns a Siamese model.

    Args:
        vocab_size (int, optional): Length of the vocabulary. Defaults to len(vocab).
        d_model (int, optional): Depth of the model. Defaults to 128.
        mode (str, optional): 'train', 'eval' or 'predict', predict mode is for fast inference. Defaults to 'train'.

    Returns:
        trax.layers.combinators.Parallel: A Siamese model. 
    """

    def normalize(x):  # normalizes the vectors to have L2 norm 1
        return x / fastnp.sqrt(fastnp.sum(x * x, axis=-1, keepdims=True))
    
    ### START CODE HERE (Replace instances of 'None' with your code) ###
    q_processor = tl.Serial( # Processor will run on Q1 and Q2. 
        tl.Embedding(vocab_size=vocab_size,d_feature=d_model), # Embedding layer
        tl.LSTM(d_model), # LSTM layer
        tl.Mean(axis=1), # Mean over columns
        tl.Fn('Normalize', lambda x : normalize(x)), # Apply normalize function
    )  # Returns one vector of shape [batch_size, d_model]. 
    
    ### END CODE HERE ###
    
    # Run on Q1 and Q2 in parallel.
    model = tl.Parallel(q_processor, q_processor)
    return model

In [None]:
# check your model
model = Siamese()
print(model)

**Expected output:**  

```CPP
Parallel_in2_out2[
  Serial[
    Embedding_41699_128
    LSTM_128
    Mean
    Normalize
  ]
  Serial[
    Embedding_41699_128
    LSTM_128
    Mean
    Normalize
  ]
]
```

In [None]:
# Test your function
w4_unittest.test_Siamese(Siamese)