# RFT-Lab — Understanding Encoder

Before an AI system can reason or generate answers,
it must first **understand** the input.

This notebook implements a **Transformer Encoder**
responsible only for understanding:

• Context building  
• Token relationships  
• Semantic representation  

❌ No reasoning  
❌ No decoding  
❌ No text generation  

This encoder closely follows the
**standard Transformer encoder architecture**,
implemented in a simple and transparent way.


## Step 0: Imports

We use only core PyTorch modules.
This keeps the code readable and interview-friendly.


In [21]:
import torch
import torch.nn as nn
import math

# RFT-Lab — Understanding Encoder (Production Version)

This notebook implements the **final Understanding Encoder**
used in the real-time RFT system.

STRICT CONSTRAINTS:
- No assumptions
- No placeholder inputs
- No future replacement logic

This encoder will receive **raw cleaned text**
from the input-handling layer and convert it into
stable, contextual representations.

This module is **locked after implementation**.


## Step 0: Imports

We use stable, widely-adopted libraries only.


In [22]:
import torch
import torch.nn as nn
import math

from transformers import GPT2TokenizerFast

## Step 1: Tokenization (Production-Fixed)

Tokenization is NOT a toy decision.

In real-time systems:
- Token IDs must be consistent
- Vocabulary must be fixed
- Encoder must see the same distribution always

We use a **pretrained GPT-2 tokenizer**
as a stable, industry-proven choice.

This tokenizer is now a **locked dependency**.


In [23]:
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

## Tokenization Test (Real Input)

This is exactly how user text will enter the encoder.

In [31]:
# Sample input text from user
user_text = "Analyze this resume and highlight weaknesses."

# Tokenize text into model-ready tensors
tokens = tokenizer(
    user_text,
    return_tensors="pt",
    padding=True,
    truncation=True
)

# Extract token IDs
input_ids = tokens["input_ids"]

# Extract attention mask (1 = real token, 0 = padding)
attention_mask = tokens["attention_mask"]

# Return inputs for encoder/model
input_ids, attention_mask


(tensor([[37702,  2736,   428, 15294,   290,  7238, 20256,    13]]),
 tensor([[1, 1, 1, 1, 1, 1, 1, 1]]))

## Step 2: Token Embedding

Embeddings map real token IDs into vector space.

This layer will NEVER change once deployed.


In [32]:
class TokenEmbedding(nn.Module):
    def __init__(self, vocab_size, d_model):
        super().__init__()

        # Lookup table mapping token IDs to dense vectors
        self.embedding = nn.Embedding(vocab_size, d_model)

    def forward(self, x):
        # Convert token IDs into embedding vectors
        return self.embedding(x)


## Step 3: Positional Encoding

* Order information is injected deterministically.
* This ensures repeatable behavior across runs.


In [33]:
class PositionalEncoding(nn.Module):
    def __init__(self, d_model, max_len=1024):
        super().__init__()

        # Create positional encoding matrix
        pe = torch.zeros(max_len, d_model)

        # Position indices (0, 1, 2, ...)
        position = torch.arange(0, max_len).unsqueeze(1)

        # Frequency scaling for sine and cosine
        div_term = torch.exp(
            torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model)
        )

        # Apply sine to even indices
        pe[:, 0::2] = torch.sin(position * div_term)

        # Apply cosine to odd indices
        pe[:, 1::2] = torch.cos(position * div_term)

        # Store as non-trainable buffer
        self.register_buffer("pe", pe)

    def forward(self, x):
        # Add positional encoding to token embeddings
        return x + self.pe[:x.size(1)]


## Step 4: Self-Attention

Self-attention builds contextual understanding.
No generation. No reasoning.


In [34]:
class SelfAttention(nn.Module):
    def __init__(self, d_model):
        super().__init__()

        # Linear projections for queries, keys, and values
        self.W_q = nn.Linear(d_model, d_model)
        self.W_k = nn.Linear(d_model, d_model)
        self.W_v = nn.Linear(d_model, d_model)

    def forward(self, x, mask=None):
        # Project input embeddings into Q, K, V
        Q = self.W_q(x)
        K = self.W_k(x)
        V = self.W_v(x)

        # Scaled dot-product attention scores
        scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(x.size(-1))

        # Apply attention mask to ignore padding tokens
        if mask is not None:
            scores = scores.masked_fill(mask == 0, -1e9)

        # Convert scores to attention weights
        weights = torch.softmax(scores, dim=-1)

        # Weighted sum of values
        return torch.matmul(weights, V)


## Step 5: Encoder Block

This block is frozen after validation.


In [35]:
class EncoderBlock(nn.Module):
    def __init__(self, d_model):
        super().__init__()

        # Self-attention layer for contextual interaction
        self.attn = SelfAttention(d_model)

        # Layer normalization after attention
        self.norm1 = nn.LayerNorm(d_model)

        # Feed-forward network for non-linear transformation
        self.ffn = nn.Sequential(
            nn.Linear(d_model, d_model * 4),
            nn.ReLU(),
            nn.Linear(d_model * 4, d_model)
        )

        # Layer normalization after feed-forward
        self.norm2 = nn.LayerNorm(d_model)

    def forward(self, x, mask):
        # Residual connection + normalization after attention
        x = self.norm1(x + self.attn(x, mask))

        # Residual connection + normalization after FFN
        x = self.norm2(x + self.ffn(x))

        # Output of one encoder block
        return x


## Step 6: Understanding Encoder/Contexual Representation

This is the final understanding module
used by the real-time RFT system.


In [29]:
class UnderstandingEncoder(nn.Module):
    def __init__(self, vocab_size, d_model, num_layers):
        super().__init__()

        # Token IDs → dense embeddings
        self.embedding = TokenEmbedding(vocab_size, d_model)

        # Adds sequence position information
        self.position = PositionalEncoding(d_model)

        # Stack of encoder blocks
        self.layers = nn.ModuleList(
            [EncoderBlock(d_model) for _ in range(num_layers)]
        )

    def forward(self, input_ids, attention_mask):
        # Embed tokens
        x = self.embedding(input_ids)

        # Add positional encoding
        x = self.position(x)

        # Apply encoder layers with attention mask
        for layer in self.layers:
            x = layer(x, attention_mask.unsqueeze(1))

        # Output contextual representations
        return x


## Step 7: End-to-End Understanding Test (Production)

This is the exact path used in deployment.


In [30]:
# Initialize encoder with vocabulary size, embedding dimension, and number of layers
encoder = UnderstandingEncoder(
    vocab_size=tokenizer.vocab_size,
    d_model=128,
    num_layers=2
)

# Pass token IDs and attention mask through the encoder
encoded_output = encoder(input_ids, attention_mask)

# Shape: (batch_size, sequence_length, d_model)
encoded_output.shape


torch.Size([1, 8, 128])

# **This notebook shows:**
- No assumptions
- Fixed tokenizer contract
- Real-time safe design
- Production discipline
- No future rewrites