## Step 1: Create Dummy Embeddings
First, we need to set up some dummy embeddings. Let's assume each word is represented as a 5-dimensional vector.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# Assume we have a vocabulary of 10 words, each represented by a 5-dimensional vector
vocab_size = 10
embedding_dim = 5

# Create dummy embeddings for these words
embeddings = nn.Embedding(num_embeddings=vocab_size, embedding_dim=embedding_dim)

Exploring `nn.Embedding` in PyTorch.

In [2]:
indices = torch.tensor([2, 4])
embeddings(indices)

tensor([[-0.0403,  0.0077,  1.7520, -0.3653,  1.2519],
        [-0.5223,  0.2101, -1.4149,  1.2403, -0.8029]],
       grad_fn=<EmbeddingBackward0>)

: 

In [3]:
class SimpleLM(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, vocab_size):
        super(SimpleLM, self).__init__()
        self.fc = nn.Linear(embedding_dim, hidden_dim)
        self.out = nn.Linear(hidden_dim, vocab_size)

    def forward(self, x):
        x = F.relu(self.fc(x))
        x = self.out(x)
        return x

hidden_dim = 10  # Size of hidden layer
model = SimpleLM(embedding_dim, hidden_dim, vocab_size)


## Step 3: Generate a Context Vector
Let's simulate a context of 3 words. We'll randomly pick these words for our demonstration.

In [4]:
context_size = 3  # Number of words in context
dummy_context = torch.randint(0, vocab_size, (context_size,))

# Get embeddings for the context words
context_embeddings = embeddings(dummy_context)


## Step 4: Predict the Next Token
Finally, we'll process these embeddings through the network and predict the next token.

In [5]:
# Average the embeddings to simulate a simple way of combining them
avg_embedding = context_embeddings.mean(dim=0)

# Feed the averaged embedding into the network
network_output = model(avg_embedding.unsqueeze(0))  # Unsqueeze to add batch dimension

# Apply softmax to get probabilities
probabilities = F.softmax(network_output, dim=1)

# Just to see the output
print("Predicted probabilities for the next word:", probabilities)


Predicted probabilities for the next word: tensor([[0.0979, 0.1299, 0.0722, 0.0899, 0.1158, 0.1111, 0.1016, 0.1065, 0.0753,
         0.0998]], grad_fn=<SoftmaxBackward0>)


In [None]:
2