# **Equations 7,8,9 in Fusion Sub-network**

**Useful implementation code:**

https://sigmoidal.io/implementing-additive-attention-in-pytorch/

The paper we are using, uses Additive attention, also called "Bahdanau Attention" as implemented in a paper by Bahdanau

The following implementation is based on RNN for seq2seq modeling

Please go the the last code
It should be the one that we need

**Other good examples:**

Other good examples:

https://bastings.github.io/annotated_encoder_decoder/

https://tomekkorbak.com/2020/06/26/implementing-attention-in-pytorch/

In [2]:
import torch

**Encoder and Decoder**

In [3]:
class AdditiveAttention_Encoder_Decoder(torch.nn.Module):	 	 
    def __init__(self, encoder_dim=100, decoder_dim=50):	 	 
        super().__init__()	 	 

        self.encoder_dim = encoder_dim	 	 
        self.decoder_dim = decoder_dim	 	 
        self.v = torch.nn.Parameter(torch.rand(self.decoder_dim))	 	 
        self.W_1 = torch.nn.Linear(self.decoder_dim, self.decoder_dim)	 	 
        self.W_2 = torch.nn.Linear(self.encoder_dim, self.decoder_dim)	 	 

    def forward(self, 	 	 
      query, # [decoder_dim]	 	 
      values # [seq_length, encoder_dim]	 	 
    ):	 	 
        weights = self._get_weights(query, values) # [seq_length]	 	 
        weights = torch.nn.functional.softmax(weights, dim=0)	 	 
        return weights @ values # [encoder_dim]	 	 

    def _get_weights(self, 	 	 
      query, # [decoder_dim]	 	 
      values # [seq_length, encoder_dim]	 	 
    ):	 	 
        query = query.repeat(values.size(0), 1) # [seq_length, decoder_dim]	 	 
        weights = self.W_1(query) + self.W_2(values) # [seq_length, decoder_dim]	 	 
        return torch.tanh(weights) @ self.v # [seq_length]

**Encoder only**

In [4]:
class AdditiveAttention_Encoder(torch.nn.Module):	 	 # Please adjust the encoder dim
    def __init__(self, encoder_dim=100):#, decoder_dim=50):	 	 
        super().__init__()	 	 

        self.encoder_dim = encoder_dim	 	 
        #self.decoder_dim = decoder_dim	 	 
        self.v = torch.nn.Parameter(torch.rand(self.decoder_dim))	 	 
        #self.W_1 = torch.nn.Linear(self.decoder_dim, self.decoder_dim) # Linear transform, y = x*Trans(A) + b	 	 
        # My hypothesis is that we might not need another bias term, since linear transform introduce that
        # https://pytorch.org/docs/stable/generated/torch.nn.Linear.html says that linear introduces bias by default
        self.W_2 = torch.nn.Linear(self.encoder_dim) #, self.decoder_dim) Linear transform, y = x*Trans(A) + b	 	 

    def forward(self, 	 	 
      #query, # [decoder_dim]	 	 
      values # [seq_length, encoder_dim]	 	 
    ):	 	 
        weights = self._get_weights(values) # [seq_length]	 	 
        weights = torch.nn.functional.softmax(weights, dim=0)	 	 
        return weights @ values # [encoder_dim]	 	 

    def _get_weights(self, 	 	 
      #query, # [decoder_dim]	 	 
      values # [seq_length, encoder_dim]	 	 
    ):	 	 
        # query = query.repeat(values.size(0), 1) # [seq_length, decoder_dim]	 	 
        weights = self.W_2(values) # [seq_length, decoder_dim]	 	 
        return torch.tanh(weights) @ self.v # [seq_length]

**Execute:**

In [None]:
u = AdditiveAttention_Encoder().forward(values) # values is encoder, supplying lis in some way [l0,l1,l2,l3,l4]

# **Equation 10**

In [None]:
weights = torch.nn.Linear(u_dim)
p = torch.nn.functional.softmax(weights, dim=0)

# **Combine model?**

# **Loss function**

In [None]:
# In pytorch, loss = nn.CrossEntropyLoss()

In [None]:
model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])
                   
model.fit(X_train, y_train,epochs=4, batch_size=1, verbose=1)

# **Example of full image classifier, different from ours**

**https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py**