### Create Word Embeddings

First of all we need to convert each word in the input sequence to an embedding vector. Embedding vectors will create a more semantic representation of each word.

Suppoese each embedding vector is of **`512`** dimension and suppose our vocab size is **`100`**, then our embedding matrix will be of size **`100x512`**. These marix will be learned on training and during inference each word will be mapped to corresponding **`512 d`** vector. Suppose we have batch size of **`32`** and sequence length of **`10`**(10 words). The the output will be **`32x10x512`**.

In [2]:
import torch
import torch.nn as nn

In [3]:
class Embedding(nn.Module):
    def __init__(self, vocab_size, embed_dim):
        """
        Args:
            vocab_size: size of vocabulary
            embed_dim: dimension of embeddings
        """
        super().__init__()
        self.embed = nn.Embedding(num_embeddings=vocab_size, embedding_dim=embed_dim) # Embedding layer

    def forward(self, x):
        """
        Args:
            x: input
        Returns:
            out: embedding vector
        """
        out = self.embed(x) # Convert token indices to dense vectors
        return out

In [4]:
vocab_size = 10  # Assume we have 10 words in our vocabulary
embed_dim = 5    # Each word is embedded into a 5-dimensional vector
embeddings = Embedding(vocab_size, embed_dim)

In [9]:
# Create a sample input (batch of token indices)
sample_input = torch.tensor([1, 3, 7])  # Example token indices

# Get embeddings
output = embeddings(sample_input)
output

tensor([[-0.1667,  1.3110,  0.5141,  0.3984,  0.9671],
        [-0.5384,  1.1861, -0.2256,  0.5444, -1.4291],
        [ 0.9302,  0.7904, -1.4920,  1.0415, -1.1608]],
       grad_fn=<EmbeddingBackward0>)

In [10]:
output.shape

torch.Size([3, 5])