# **Week 3 Hands-on Lab: Building a Simplified Transformer Encoder**

This hands-on lab allows you to understand the Transformer architecture by implementing a basic Transformer encoder. You will learn how input embeddings, positional encodings, and feedforward layers work together in an encoder block. We will be using the Torch framework to build a simple transformer encoder.

# **Part 1: Input Embedding and Positional Encoding**

**1.	Generate Input Data**
Define a sample sentence and tokenize it into a numerical format.


In [None]:
import torch
import torch.nn as nn
import numpy as np

# Example sentence and token IDs (simplified for illustration)
token_ids = torch.tensor([[1, 2, 3, 4, 5]])  # Tokenized sentence
vocab_size = 10  # Vocabulary size
embedding_dim = 8  # Embedding size

**2. Create an Embedding Layer**
Implement the embedding layer to convert token IDs into dense vectors.

In [None]:
embedding_layer = nn.Embedding(vocab_size, embedding_dim)
embedded_tokens = embedding_layer(token_ids)
print("Embedded Tokens:\n", embedded_tokens)

**3.	Add Positional Encoding**
Incorporate positional encoding to provide positional information to the model.


In [None]:
def positional_encoding(seq_len, embedding_dim):
    position = np.arange(seq_len)[:, np.newaxis]
    div_term = np.exp(np.arange(0, embedding_dim, 2) * -(np.log(10000.0) / embedding_dim))
    pe = np.zeros((seq_len, embedding_dim))
    pe[:, 0::2] = np.sin(position * div_term)
    pe[:, 1::2] = np.cos(position * div_term)
    return torch.tensor(pe, dtype=torch.float)

seq_len = token_ids.size(1)
pos_encoding = positional_encoding(seq_len, embedding_dim)
print("Positional Encoding:\n", pos_encoding)


Add the positional encoding to the embedded tokens:

In [None]:
embedded_with_pos = embedded_tokens + pos_encoding.unsqueeze(0)
print("Embedded Tokens with Positional Encoding:\n", embedded_with_pos)


# **Part 2: Add a Feedforward Layer**

1.	**Define a Feedforward Neural Network**
Implement a simple feedforward layer as part of the encoder.


In [None]:
feedforward = nn.Sequential(
    nn.Linear(embedding_dim, 16),
    nn.ReLU(),
    nn.Linear(16, embedding_dim)
)
ff_output = feedforward(embedded_with_pos)
print("Feedforward Output:\n", ff_output)


# **Part 3: Combine the Components into an Encoder Block**

1.	**Define the Encoder Block**
Combine the embedding, positional encoding, and feedforward components into an encoder block.


In [None]:
class TransformerEncoderBlock(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(TransformerEncoderBlock, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.feedforward = nn.Sequential(
            nn.Linear(embedding_dim, 16),
            nn.ReLU(),
            nn.Linear(16, embedding_dim)
        )
        self.layer_norm = nn.LayerNorm(embedding_dim)

    def forward(self, x):
        embed = self.embedding(x)
        pos_enc = positional_encoding(x.size(1), embed.size(2))
        embed_with_pos = embed + pos_enc.unsqueeze(0)
        ff_output = self.feedforward(embed_with_pos)
        return self.layer_norm(embed_with_pos + ff_output)

encoder = TransformerEncoderBlock(vocab_size, embedding_dim)
output = encoder(token_ids)
print("Encoder Output:\n", output)


**Part 4: Experiment with Different Inputs**

* Test with Different Sentences
Replace token_ids with new examples to observe how the encoder processes different inputs.
* Modify Hyperparameters
Experiment with different embedding sizes, feedforward dimensions, or positional encoding scales to see their effect on the output.


# **Summary**

By completing this lab, you have:

* Understood the role of embedding, positional encoding, and feedforward layers in the Transformer encoder.
* Gained hands-on experience implementing a core component of the Transformer architecture.
* Developed a deeper appreciation for the architecture’s design and functionality.

This lab builds foundational knowledge of the Transformer, preparing you for more advanced concepts like self-attention.
