# Embedding Layers
---

*COSCI 223 - Machine Learning 3*

*Prepared by Sebastian C. Ibañez*

<a href="https://colab.research.google.com/github/aim-msds/msds2023-ml3/blob/main/notebooks/rnn/01-simple-rnn.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" style="float: left;"></a><br>

In [1]:
import torch

## What does an embedding layer do?

---

An **embedding layer** is a type of hidden layer in a neural network that transforms high-dimensional categorical input data (such as words) into low-dimensional vectors that capture some of the meaning or structure of the input. 

For example, if we have a vocabulary of 2000 words, each word can be represented by a one-hot vector of length 2000, where only one element is 1 and the rest are 0. This is very inefficient and sparse. 

An embedding layer can map each word to a vector of length 50 (or any other number) that contains some numerical values. 

These values are <u>learned</u> during the training process and can reflect how similar or different words are in terms of semantics or syntax. 

In [2]:
vocab = ['a', 'b', 'c', 'd', 'e', 'f']
vocab

['a', 'b', 'c', 'd', 'e', 'f']

In [3]:
# Tokens must be converted into integers
vocab.index('c')

2

In [4]:
# We place it in a torch tensor
torch.tensor([vocab.index('c')])

tensor([2])

In [5]:
import torch.nn as nn

torch.manual_seed(0)

# Create an embedding layer
embedding_layer = nn.Embedding(num_embeddings=len(vocab), embedding_dim=2)

for c in vocab:
    token = torch.tensor([vocab.index(c)])
    print(embedding_layer(token).detach())

tensor([[ 1.5410, -0.2934]])
tensor([[-2.1788,  0.5684]])
tensor([[-1.0845, -1.3986]])
tensor([[0.4033, 0.8380]])
tensor([[-0.7193, -0.4033]])
tensor([[-0.5966,  0.1820]])


In [6]:
# Embedding layers are implemented as learnable lookup tables
for p in embedding_layer.parameters():
    print(p)
    print(p.shape)

Parameter containing:
tensor([[ 1.5410, -0.2934],
        [-2.1788,  0.5684],
        [-1.0845, -1.3986],
        [ 0.4033,  0.8380],
        [-0.7193, -0.4033],
        [-0.5966,  0.1820]], requires_grad=True)
torch.Size([6, 2])


In [7]:
# 2x6 weight matrix times 6x1 OHE -> transpose
(embedding_layer.weight.T @ torch.tensor([[1., 0., 0., 0., 0., 0.]]).T).T

tensor([[ 1.5410, -0.2934]], grad_fn=<PermuteBackward0>)

## References

---

1. https://discuss.pytorch.org/t/what-is-nn-embedding-exactly-doing/12521

2. https://discuss.pytorch.org/t/how-does-nn-embedding-work/88518