#### **Transformers for Text Processing**

Why use Transformers for text processing?
- Transformers (like those from huggingface) are foundation of many pre-trained models and are known for speed.
- It can understand the relationship between words, regardless of distances of words within the text.
- It can generate highly authentic human-like text.

#### **Components of a Transformer**

- `Encoder`: processes input data
- `Decoder`: reconstructs the output
- `Feed-forward Neural Networks`: refines understanding (identifies nuances like sarcasm)
- `Positional Encoding`: ensures order matters
- `Multi-Head Attention`: captures multiple inputs or sentiments

In [71]:
# dataset
sentences = ['I love this produce',
             'This is terrible',
             'Could be better',
             'This is the best']
labels = [1, 0, 0, 1]

# splitting training & testing data
# training data
train_sentences = sentences[:3]
train_labels = labels[:3]

# testing data
test_sentences = sentences[3:]
test_labels = labels[3:]

#### **Building Transformer Model**

In [65]:
import torch
from torch import nn, optim

class TransformerEncoder(nn.Module):
    def __init__(self, embed_size, heads, num_layers, dropout):
        super(TransformerEncoder, self).__init__()
        self.encoder = nn.TransformerEncoder(nn.TransformerEncoderLayer(d_model=embed_size, nhead=heads), num_layers=num_layers)
        self.fc = nn.Linear(embed_size, 2)
    
    def forward(self, x):
        x = self.encoder(x)
        x = x.mean(dim=1)
        return self.fc(x)

model = TransformerEncoder(embed_size=512, heads=8, num_layers=3, dropout=0.5)  # embed_size=512 for balanced power & efficiency, heads=8 for focusing on 8 word segments at once, num_layers=3 as well as dropout=0.5 for combating overfitting (both factors affects overfitting)
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()



In [None]:
# Training the transformer

epochs = 5

for epoch in range(epochs):
    for sentence, label in zip(train_sentences, train_labels):
        tokens = sentence.split()
        data = torch.stack([token_embeddings[token] for token in tokens], dim=1)    # token_embeddings is a pre-made embeddings dictionary
        output = model(data)
        loss = criterion(output, torch.tensor([label]))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(f'Epoch: {epoch+1}, Loss: {loss.item()}')

In [75]:
# predicting sentiments

def predict(sentence):
    model.eval()
    with torch.no_grad():
        tokens = sentence.split()
        data = torch.stack([token_embeddings.get(token, torch.rand((1, 512))) for token in tokens], dim=1)
        output = model(data)
        predicted = torch.argmax(output, dim=1)
        return 'Positive' if predicted.item() == 1 else 'Negative'

In [None]:
# predicting sentiment of a new text

sample_text = 'This product can be better'
print(f"'{sample_text}' is {predict(sample_text)}")