## Token Embeddings

Convert words into dense vectors. 

In [3]:
# Import the PyTorch library
import torch
import torch.nn as nn

# -------------------------------------------------------------
# 1️. Define a small vocabulary
# -------------------------------------------------------------
# Each word is mapped to a unique integer ID.
# <pad> is used for padding shorter sequences in a batch.
vocab = {
    "<pad>": 0,
    "i": 1,
    "love": 2,
    "ai": 3,
    "bugs": 4
}

# -------------------------------------------------------------
# 2️. Create example sentences
# -------------------------------------------------------------
# We'll encode two short sentences using the token IDs.
# Each row in the tensor is one sentence.
# Shape = (batch_size = 2, sequence_length = 3)
x = torch.tensor([
    [1, 2, 3],   # "i love ai"
    [1, 4, 0]    # "i bugs <pad>"
])

print("Input token IDs:")
print(x)
# Output:
# tensor([[1, 2, 3],
#         [1, 4, 0]])

# -------------------------------------------------------------
# 3️. Create an Embedding layer
# -------------------------------------------------------------
# The embedding layer stores a trainable vector for each word ID.
# Here, we choose an embedding dimension of 8 → each word will
# be represented as an 8-dimensional vector.
emb = nn.Embedding(num_embeddings=len(vocab), embedding_dim=8)

# -------------------------------------------------------------
# 4️. Pass our token IDs through the embedding layer
# -------------------------------------------------------------
# This replaces each token ID with its corresponding vector.
out = emb(x)  # Shape = (batch_size, seq_length, embedding_dim)

print("\n Embedding output shape:", out.shape)


# -------------------------------------------------------------
# 5️. Inspect the embeddings
# -------------------------------------------------------------
# For clarity, print embeddings of the first sentence
print("\n Embeddings for first sentence ('i love ai'):\n", out[0])


Input token IDs:
tensor([[1, 2, 3],
        [1, 4, 0]])

 Embedding output shape: torch.Size([2, 3, 8])

 Embeddings for first sentence ('i love ai'):
 tensor([[ 0.4743,  2.3712,  0.1601, -0.7399,  0.3953, -2.1967, -0.2220, -0.7519],
        [ 0.1516, -0.8838, -0.7086, -0.2069,  0.0796, -0.3800,  0.7540,  1.1430],
        [ 0.3600, -0.4320, -0.3064, -0.6688, -0.5681, -0.1112, -0.4809, -0.9226]],
       grad_fn=<SelectBackward0>)


### Discussion Questions 

1. Why does the embedding layer output continuous values instead of integers?
2. How would increasing the embedding dimension (e.g., from 8 → 128) affect model capacity?
