## 1 Issues using positional encoding when we design a deep architecture to represent a sequence by stacking self-attention layers

Although positional encoding is a commonly employed technique to incorporate sequence information in self-attention layers, it has several drawbacks when utilized in deep architectures.

1) Fixed Positional Embedding: When deep architectures consist of multiple self-attention layers, they may struggle to effectively model long-range dependencies. This is due to the fixed nature of positional encoding, which fails to dynamically adapt to the context and thus struggles to capture complex dependencies and patterns.

2) Increased Computational Complexity: Each position in the sequence requires unique embeddings, resulting in additional computational complexity. In deep architectures with multiple self-attention layers, this added complexity can significantly increase the computational cost of both training and inference processes.

3) Lack of Context Information: Positional encoding treats each position in the sequence independently, failing to explicitly capture the context or relationships between positions. Consequently, the model's ability to learn long-range dependencies is limited.

4) Reliance on prior knowledge of sequence length: Positional encoding assumes that the model possesses prior knowledge of the sequence length during training. This assumption leads to the usage of padding or truncating sentences during inference, which may not generalize well to real-world problems.

5) Redundant Information: In deep architectures employing multiple self-attention layers with fixed positional encoding, the same positional encoding is redundantly used across all layers. This redundancy limits the learning capacity of the model, whereas learnable positional encodings can enable the network to capture more intricate information in deeper layers.

## 2 Design a learnable positional encoding method

In [1]:
import torch
import pandas as pd
import math
import torch.nn as nn

In [2]:
amazon_df = pd.read_csv('C:/Users/chock/Desktop/ScriptML/amazon_cells_labelled.txt',names=['sentence', 'label'], sep='\t')

In [3]:
amazon_df.head()

Unnamed: 0,sentence,label
0,So there is no way for me to plug it in here i...,0
1,"Good case, Excellent value.",1
2,Great for the jawbone.,1
3,Tied to charger for conversations lasting more...,0
4,The mic is great.,1


In [4]:
amazon_df.tail()

Unnamed: 0,sentence,label
995,The screen does get smudged easily because it ...,0
996,What a piece of junk.. I lose more calls on th...,0
997,Item Does Not Match Picture.,0
998,The only thing that disappoint me is the infra...,0
999,"You can not answer calls with the unit, never ...",0


In [5]:
sentences = amazon_df['sentence'].values

In [6]:
class PositionalEncoding(nn.Module):
    def __init__(self, embedding_dim, max_seq_length):
        super(PositionalEncoding, self).__init__()
        self.embedding_dim = embedding_dim
        
        pe = torch.zeros(max_seq_length, embedding_dim)
        position = torch.arange(0, max_seq_length, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, embedding_dim, 2).float() * (-math.log(10000.0) / embedding_dim))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0).transpose(0, 1)
        self.register_buffer('pe', pe)

    def forward(self, x):
        x = x * math.sqrt(self.embedding_dim)
        x = x + self.pe[:x.size(0), :].squeeze()
        return x

In [7]:
tokenized_sentences = [sentence.split() for sentence in sentences]
vocab = set([word for sentence in tokenized_sentences for word in sentence])
vocab_size = len(vocab)

word_to_idx = {word: i for i, word in enumerate(vocab)}
idx_to_word = {i: word for word, i in word_to_idx.items()}

# Define the maximum sentence length
max_seq_length = max([len(sentence) for sentence in tokenized_sentences])

indexed_sentences = []
for sentence in tokenized_sentences:
    indexed_sentence = [word_to_idx[word] for word in sentence]
    indexed_sentence += [0] * (max_seq_length - len(sentence))  # Padding
    indexed_sentences.append(indexed_sentence)

# Convert indexed sentences to PyTorch tensors
indexed_sentences = torch.tensor(indexed_sentences)

# Define word embedding dimension
embedding_dim = 100


embedding = nn.Embedding(vocab_size, embedding_dim)

# Create the positional encoding layer
pos_encoding = PositionalEncoding(embedding_dim, max_seq_length)

# Apply word embeddings and positional encoding
embedded_sentences = embedding(indexed_sentences)
embedded_sentences = pos_encoding(embedded_sentences)

In [10]:
def print_encoding(index_no):
    print(sentences[index_no])
    print("\n")
    print(embedded_sentences[index_no].shape)
    print(embedded_sentences[index_no])

In [11]:
print_encoding(0)

So there is no way for me to plug it in here in the US unless I go by a converter.


torch.Size([30, 100])
tensor([[ -9.2940, -15.5578,   2.2448,  ...,   1.2913,  -2.7797,   5.7666],
        [  7.4501,  24.4456, -10.6322,  ...,   9.5298,   2.2946,   5.5580],
        [-14.5885,   6.7318,  -4.3790,  ...,   6.5755,  15.6758,  14.7288],
        ...,
        [ -3.2380,   8.4697,  -4.1641,  ...,   8.0339,  -3.3613,  -1.4214],
        [ -3.9234,   7.7992,  -4.6775,  ...,   8.0339,  -3.3612,  -1.4214],
        [ -4.8580,   8.0138,  -4.5621,  ...,   8.0339,  -3.3611,  -1.4214]],
       grad_fn=<SelectBackward0>)
