<a href="https://colab.research.google.com/github/RCortez25/PhD/blob/main/LLM/5.%20LLM%20architecture/LLM_architecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LLM architecture

We're going to replicate GPT-2, with 124 million parameteres, whose weights are open source.

In [None]:
# Configuration for our GPT model
GPT_CONFIG_124M = {
    "vocab_size": 50257,        # Number of words/sub-words
    "context_length": 1024,     # How many words used to predict the next word
    "embedding_dimension": 768, # Tokens are projected into a 768-dimensional space
    "number_of_heads": 12,      # This creates 12 query, key, and value matrices
    "number_of_layers": 12,     # Number of transformer blocks
    "dropout_rate": 0.1,
    "qkv_bias": False}

# GPT placeholder architecture

We'll build a GPT placeholder architecture to gain intuition on how everything fits together. It will take the configuration we just outlined above.

In [None]:
import torch
import torch.nn as nn

class GPTModelPlaceholder(nn.Module):
    def __init__(self, config):
        super().__init__()
        # Initialize variables using the configuration dictionary
        # Look up table from ids to embeddings
        self.token_embedding_table = nn.Embedding(config["vocab_size"], config["embedding_dimension"])
        # Look up table from position to position embedding
        self.position_embedding_table = nn.Embedding(config["context_length"], config["embedding_dimension"])
        self.dropout_embedding = nn.Dropout(config["dropout_rate"])

    # Method for accepting the inputs and make the transformations
    # The inputs are fed into the model as tokens, that is, as IDs
    def forward(self, inputs_ids):
        # Obtain the size of the batch and the sequence length
        batch_size, context_length = inputs_ids.shape

        # Use the lookp table to obtain embeddings given the IDs
        token_embeddings = self.token_embedding_table(inputs_ids)

        # Obtain positional embeddings
        # Create a range object whose length will be equal to the length of the
        # inputs
        range_object = torch.arange(context_length, device=inputs_ids.device)
        # Use the object to use the lookup table for obtaining the positional
        # embeddings corresponding to each position of each token
        position_embeddings = self.position_embedding_table(range_object)

        # Add the vector embeddings
        x = token_embeddings + position_embeddings

        # Apply dropout
        x = self.dropout_embedding(x)

        # Now, the data is passed to the transformer block

# Example

First, let's create the batch of text to be used and tokenize it.

In [None]:
import tiktoken

# Use GPT-2 encoded
tokenizer = tiktoken.get_encoding("gpt2")
batch = []

# Text to be used in the example
text1 = "Every effort moves you"
text2 = "Every day holds a"

# Obtain the IDs of each text
text_1_tokenized = tokenizer.encode(text1)
text_2_tokenized = tokenizer.encode(text2)

batch.append(text_1_tokenized)
batch.append(text_2_tokenized)

batch = torch.tensor(batch)
print(batch)

tensor([[6109, 3626, 6100,  345],
        [6109, 1110, 6622,  257]])


In [None]:
batch.shape

torch.Size([2, 4])