### **INITIALIZATION:**
- I use these three lines of code on top of my each notebooks because it will help to prevent any problems while reloading the same project. And the third line of code helps to make visualization within the notebook.

In [1]:
#@ INITIALIZATION:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

**DOWNLOADING LIBRARIES AND DEPENDENCIES:**
- I have downloaded all the libraries and dependencies required for the project in one particular cell.

In [3]:
#@ DOWNLOADING THE LIBRARIES AND DEPENDENCIES:
# !pip install -U d2l
from d2l import torch as d2l

import os
import torch     
from torch import nn                                
from torch.nn import functional as F

**GETTING THE DATASET:**
- I have used google colab for this notebook so the process of downloading and reading the data might be different in other platforms. I will use Stanford Natural Language Inference Corpus for this notebook. The SNLI Corpus is a collection of over 500000 labeled english pairs.

In [5]:
#@ GETTING THE DATASET: 
batch_size, num_steps = 256, 50                                          # Initializing Parameters. 
train_iter, test_iter, vocab = d2l.load_data_snli(batch_size, num_steps) # Initializing Data Iterations. 

read 549367 examples
read 9824 examples


  cpuset_checked))


### **DECOMPOSABLE ATTENTION MODEL:**

**ATTENDING CLASS:**
- I will align words in one text sequnce to each word in other sequence. I will implement soft alignment using attention mechanism. 

In [6]:
#@ IMPLEMENTING MULTILAYER PERCEPTRON: 
def mlp(num_inputs, num_hiddens, flatten):                 # Function for MLP.
  net = []                                                 # Initializing Lists.
  net.append(nn.Dropout(0.2))                              # Initializing Dropout Layer. 
  net.append(nn.Linear(num_inputs, num_hiddens))           # Initializing Linear Layer. 
  net.append(nn.ReLU())                                    # Initializing RELU Activation. 
  if flatten:
    net.append(nn.Flatten(start_dim=1))                    # Initializing Flatten Layer. 
  net.append(nn.Dropout(0.2))                              # Initializing Dropout Layer. 
  net.append(nn.Linear(num_hiddens, num_hiddens))          # Initializing Linear Layer. 
  net.append(nn.ReLU())                                    # Initializing RELU Activation. 
  if flatten:
    net.append(nn.Flatten(start_dim=1))                    # Initializing Flatten Layer. 
  return nn.Sequential(*net)                               # Initializing Sequential API. 

- I will define the `Attend Class` to compute the soft alignment of the hypotheses `beta` with input premises and soft alignment of premises `alpha` with input hypotheses. 

In [7]:
#@ IMPLEMENTATION OF ATTEND CLASS: 
class Attend(nn.Module):                                            # Initializing Attend Class. 
  def __init__(self, num_inputs, num_hiddens, **kwargs):            # Initializing Constructor Function. 
    super(Attend, self).__init__(**kwargs)
    self.f = mlp(num_inputs, num_hiddens, flatten=False)            # Initialization of MLP. 
  
  def forward(self, A, B):                                          # Forward Propagation Function. 
    f_A = self.f(A)                                                 # Implementation of MLP. 
    f_B = self.f(B)                                                 # Implementation of MLP. 
    e = torch.bmm(f_A, f_B.permute(0, 2, 1))                        # Implementation of Matrix Multiplication. 
    beta = torch.bmm(F.softmax(e, dim=-1), B)                       # Implementation of Softmax. 
    alpha = torch.bmm(F.softmax(e.permute(0, 2, 1), dim=-1), A)     # Implementation of Softmax. 
    return beta, alpha

**COMPARING CLASS:**
- I will compare a word in one sequence with the other sequence that is softly aligned with the word. In soft alignment all the words from one sequence with different attention weights will be compared with a word in other sequence. 

In [8]:
#@ INITIALIZING COMPARING CLASS: 
class Compare(nn.Module):                                       # Initializing Compare. 
  def __init__(self, num_inputs, num_hiddens, **kwargs):        # Initializing Constructor Function. 
    super(Compare, self).__init__(**kwargs)
    self.g = mlp(num_inputs, num_hiddens, flatten=False)        # Implementation of MLP Function. 
  
  def forward(self, A, B, beta, alpha):                         # Forward Propagation Function. 
    V_A = self.g(torch.cat([A, beta], dim=2))                   # Initializing Concatenation. 
    V_B = self.g(torch.cat([B, alpha], dim=2))                  # Initializing Concatenation. 
    return V_A, V_B

**AGGREGATING CLASS:**
- I will aggregate the two sets of comparison vectors to infer the logical relationship. I will feed the concatenation of both summarization results into MLP function to obtain the classification result of the logical relationship. 

In [9]:
#@ INITIALIZING AGGREGATING CLASS: 
class Aggregate(nn.Module):                                               # Initializing Aggregating. 
  def __init__(self, num_inputs, num_hiddens, num_outputs, **kwargs):     # Initializing Constructor Function. 
    super(Aggregate, self).__init__(**kwargs)
    self.h = mlp(num_inputs, num_hiddens, flatten=True)                   # Initializing MLP Classifier. 
    self.linear = nn.Linear(num_hiddens, num_outputs)                     # Initializing Output Layer. 
  
  def forward(self, V_A, V_B):                                            # Forward Propagation Function. 
    V_A = V_A.sum(dim=1)                                                  # Getting sum of Comparison Vectors. 
    V_B = V_B.sum(dim=1)                                                  # Getting sum of Comparison Vectors. 
    Y_hat = self.Linear(self.h(torch.cat([V_A, V_B], dim=1)))             # Implementation of MLP and Linear Layer. 
    return Y_hat

- By using the attending, comparing and aggregating steps together, I will define the decomposable attention model to jointly train these three steps. 

In [10]:
#@ IMPLEMENTATION OF DECOMPOSABLE ATTENTION MODEL: 
class DecomposableAttention(nn.Module):                                         # Initializing Decomposable Attention Model. 
  def __init__(self, vocab, embed_size, num_hiddens, num_inputs_attend=100, 
               num_inputs_compare=200, num_inputs_agg=400, **kwargs):           # Initializing Constructor Function. 
    super(DecomposableAttention, self).__init__(**kwargs)         
    self.embedding = nn.Embedding(len(vocab), embed_size)                       # Initializing Embedding Layer. 
    self.attend = Attend(num_inputs_attend, num_hiddens)                        # Initializing Attending Class. 
    self.compare = Compare(num_inputs_compare, num_hiddens)                     # Initializing Comparing Class. 
    self.aggregate = Aggregate(num_inputs_agg, num_hiddens, num_outputs=3)      # Initializing Aggregating Class. 
  
  def forward(self, X):                                                         # Forward Propagation Function. 
    premises, hypotheses = X 
    A = self.embedding(premises)                                                # Implementation of Embedding. 
    B = self.embedding(hypotheses)                                              # Implementation of Embedding. 
    beta, alpha = self.attend(A, B)                                             # Implementation of Attending. 
    V_A, V_B = self.compare(A, B, beta, alpha)                                  # Implementation of Comparing. 
    Y_hat = self.aggregate(V_A, V_B)                                            # Implementation of Aggregating. 
    return Y_hat