# WTF Does nn.Embedding Do?

### Concept:
nn.Embedding is a PyTorch layer that transforms categorical data (like words or characters) in a torch.long() format into dense vector representations of type torch.float(). It's essentially a lookup table mapping integer indices (representing discrete categories) to continuous vectors of fixed size.

### How It Works:
Lookup Table: The embedding layer maintains a weight matrix of size (num_embeddings, embedding_dim).
Indexing: When you pass an index through this layer, it retrieves the corresponding row from the matrix, which is the embedding vector for that index.
Training: These vectors are initialized randomly and are updated during training via backpropagation, just like other parameters in the model.

If you don't specify any initialization methods, the embeddings are normally distributed with mean 0 and standard deviation 1.

### Differentiability:
Although it seems like a simple lookup table, nn.Embedding is differentiable and trainable.
The embedding layer can be thought of as multiplying a one-hot encoded vector by the weight matrix, effectively selecting a specific row.
During training, gradients are computed for the embeddings, allowing them to be refined based on the model’s loss function.

### How are these embeddings different from Sentence-BERT embeddings?:
While nn.Embedding offers static embeddings, meaning each word is assigned a fixed vector, models like Sentence-BERT introduce dynamic, context-dependent embeddings. In these models, the representation of a token can vary based on the tokens surrounding it, enabling a more nuanced understanding of language.

### Contextual Understanding:
Dynamic embeddings consider the context in which a token appears, allowing its representation to adapt and capture variations in meaning. This contextual understanding enables the model to handle polysemy—words with multiple meanings—more effectively, as the embedding of a token can change depending on its neighboring tokens.

# References:
- https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html

In [1]:
import string
import torch
import torch.nn as nn

In [2]:
# There are 100 possible characters
chars = list(string.printable)
n_chars = len(chars)
tokenizer = dict(zip(chars, range(len(chars))))
detokenizer = dict(zip(range(len(chars)), chars))

In [3]:
input_string = "WTF does nn.Embedding do?"
# Shape: (batch_size, seq_len)
input_tensor = torch.tensor([[tokenizer[c] for c in input_string]], dtype=torch.long)
print(f"Input Tensor Shape: {input_tensor.shape}")

# Dimension of the embedding space
seq_len = input_tensor.shape[1]
embed_dim = 10

Input Tensor Shape: torch.Size([1, 25])


In [4]:
embedding = nn.Embedding(n_chars, embed_dim)
output_embedding = embedding(input_tensor)
# Shape: (batch_size, seq_len, embed_dim)
print(f"Output Tensor Shape: {output_embedding.shape}")

Output Tensor Shape: torch.Size([1, 25, 10])


In [11]:
a_tokenized = torch.tensor([[tokenizer['a']]], dtype=torch.long)
a_embedded = embedding(a_tokenized)
print(f"Embedding for 'a': {a_embedded}")

Embedding for 'a': tensor([[[-0.1633,  0.2602, -1.3810, -0.9100, -1.3273,  0.0258,  1.2236,
           0.0988,  1.5362, -0.0035]]], grad_fn=<EmbeddingBackward0>)


# nn.Embedding vs nn.Linear

Title: Exploring the Differences Between nn.Linear and nn.Embedding in Neural Networks

Hey, Reddit fam!

I've been diving deep into neural network architectures lately, and I stumbled upon an interesting question: What sets nn.Linear and nn.Embedding apart, and can one replace the other? Let's unpack this together.

Understanding the Basics
nn.Linear: This powerhouse layer is all about linear transformations, focusing on individual features. It takes continuous data and reshapes it through matrix multiplication and addition, allowing it to capture intricate patterns within the data.

nn.Embedding: On the other hand, nn.Embedding is the go-to for handling categorical data, such as words or characters in natural language processing tasks. It transforms discrete categories into dense vectors, known as embeddings, by mapping integer indices to continuous vectors of fixed size.

The Divergence Point: Features vs. Sequences
Here's where it gets interesting: Although both methods can create continuous vectors, their focus differs. While nn.Linear zeroes in on individual features, nn.Embedding hones in on the sequence as a whole. In essence, nn.Linear pays attention to the nitty-gritty details of each feature, whereas nn.Embedding considers the context and order of the sequence.

Can nn.Linear Replace nn.Embedding?
Now, for the million-dollar question: Can nn.Linear replace nn.Embedding? The short answer: not quite. While nn.Linear excels at capturing feature-level intricacies, it lacks the contextual understanding that nn.Embedding brings to the table. In tasks where sequence order and relationships matter—think natural language processing—nn.Embedding reigns supreme.

Conclusion: Embracing Diversity in Neural Networks
In the dynamic world of neural networks, diversity is key. Both nn.Linear and nn.Embedding play vital roles in shaping the architecture of our models, each bringing its unique strengths to the table. So, instead of pitting them against each other, let's celebrate their diversity and leverage them wisely to unlock the full potential of our neural networks.

What are your thoughts on nn.Linear vs. nn.Embedding? Have you encountered any interesting use cases for either? Let's keep the conversation going in the comments below!

In [50]:
# Get one row: One sample in the training set
x = torch.tensor([
    [1, 0, 1, 0],
    [0, 0, 1, 1],
    [1, 1, 1, 0],
    ], dtype=torch.float32)

rows = torch.tensor([[0, 1, 2]], dtype=torch.long)

In [31]:
w_linear = nn.Linear(4, 3, bias = False)
w_linear.weight

Parameter containing:
tensor([[-0.0976, -0.2872, -0.1632,  0.0354],
        [ 0.2979,  0.3021,  0.4652,  0.1204],
        [-0.3786,  0.2526, -0.3794, -0.0699]], requires_grad=True)

In [52]:
print(w_linear(x[0,:]))

tensor([-0.2608,  0.7631, -0.7580], grad_fn=<SqueezeBackward4>)


In [53]:
w_embedding = nn.Embedding(4, 3).from_pretrained(w_linear.weight.t())
w_embedding.weight

Parameter containing:
tensor([[-0.0976,  0.2979, -0.3786],
        [-0.2872,  0.3021,  0.2526],
        [-0.1632,  0.4652, -0.3794],
        [ 0.0354,  0.1204, -0.0699]])

In [54]:
# Indices from the first row (non-zero entries)
row_indices = torch.tensor([0, 2], dtype=torch.long)

# Pass indices through embedding layer and sum the embeddings
embedding_output = w_embedding(row_indices).sum(dim=0)
print(embedding_output)

tensor([-0.2608,  0.7631, -0.7580])
