# Transformer

In [2]:
import torch
import torch.nn as nn 
import math

In [3]:
class InputEmbeddings(nn.Module):
    
    def __init__(self, d_model:int, vocab_size:int ) -> None:
        super().__init__()
        self.d_model = d_model   # d_model which represents the dimension of the model (i.e., the size of the word embeddings).
        self.vocab_size = vocab_size  # vocab_size which represents the size of the vocabulary. For instance, the GPT-3 model by OpenAI has a vocabulary size of approximately 14,735,746 words
        self.embedding = nn.Embedding(vocab_size, d_model)  # size of the word and vocab size, do the embedding 
        
        
    def forward(self, x):
        return self.embedding(x) * math.sqrt(self.d_model) # The embeddings are scaled by multiplying with the square root of d_model as recommended in the "Attention is All You Need" paper.
        

This piece of code defines a class named InputEmbeddings that inherits from nn.Module. nn.Module is the base class for all neural network modules in PyTorch, a popular deep learning library. This class, InputEmbeddings, represents an embedding layer in a neural network model.

Explanation of the Code

Let's break down the methods present in the class:

__init__(self, d_model:int, vocab_size:int ) -> None:

This is the constructor method for the class which initializes the instance. It takes three arguments:

self which represents the instance of the class.
d_model which represents the dimension of the model (i.e., the size of the word embeddings).
vocab_size which represents the size of the vocabulary.

In the body of the constructor, it calls the constructor of the parent class (nn.Module) with super().__init__(), stores the provided d_model and vocab_size into instance variables, and then initializes an embedding layer using PyTorch's nn.Embedding. nn.Embedding is a simple lookup table that stores embeddings of a fixed dictionary and size. The input to the module is a list of indices, and the output is the corresponding word embeddings.

forward(self, x):

This method defines the forward pass of the embedding layer. In other words, it describes how the module processes input data (x). It returns the input data passed through the embedding layer and scales the embedding according to the paper's recommendation by multiplying it by the square root of d_model. The scaling is a trick the authors of the "Attention is All You Need" paper use to get the model to learn better.

Key Points

InputEmbeddings is a class that defines an embedding layer in a neural network.
It inherits from the nn.Module class, the base class for all neural network modules in PyTorch.
The nn.Embedding layer is a simple lookup table that stores embeddings of a fixed dictionary and size.
The forward method defines how the module processes input data.
The embeddings are scaled by multiplying with the square root of d_model as recommended in the "Attention is All You Need" paper.