## Embedding Layer Understanding

### Embedding Layer Setup

- **Define an Embedding Layer**:
  - `num_embeddings = 6`: Vocabulary size (number of unique tokens)
  - `embedding_dim = 4`: Dimension of each embedding vector
  - The embedding matrix (denoted as `E`) will have a shape of `[6, 4]`.

- **Embedding Matrix `E`**:
```
E = | 0.1 0.2 0.3 0.4 | # Embedding for token 0
| 0.5 0.6 0.7 0.8 | # Embedding for token 1
| 0.9 1.0 1.1 1.2 | # Embedding for token 2
| 1.3 1.4 1.5 1.6 | # Embedding for token 3
| 1.7 1.8 1.9 2.0 | # Embedding for token 4
| 2.1 2.2 2.3 2.4 | # Embedding for token 5
```


*In pytorch it we can create learnable Embedding matrix as:*
```python
from torch.nn import Embedding

emb_layers=Embedding(num_embeddings=6,embedding_dim=4) ##num_embedding is equal to to vocab_size
```

- Shape of `E`: `[6, 4]`

### Sample Input and Embedding Lookup

- **Sample Input**:
- `input_indices = [[1, 4, 3], [2, 0, 5]]`
- Shape of `input_indices`: `[2, 3]` (2 sequences, each with 3 tokens)

- **Performing the Lookup**:
- Each index in `input_indices` is mapped to its corresponding embedding vector in `E`.

- **Output**:
- Each index is replaced by its corresponding 4-dimensional embedding vector.
- Output for `input_indices[0] = [1, 4, 3]`:
  ```
  | 0.5  0.6  0.7  0.8 |   # Embedding for token 1
  | 1.7  1.8  1.9  2.0 |   # Embedding for token 4
  | 1.3  1.4  1.5  1.6 |   # Embedding for token 3
  ```
- Output for `input_indices[1] = [2, 0, 5]`:
  ```
  | 0.9  1.0  1.1  1.2 |   # Embedding for token 2
  | 0.1  0.2  0.3  0.4 |   # Embedding for token 0
  | 2.1  2.2  2.3  2.4 |   # Embedding for token 5
  ```
- Shape of Output: `[2, 3, 4]` (2 sequences, each with 3 tokens, each represented by a 4-dimensional vector)

### Summary

- The embedding matrix `E` is shaped `[num_embeddings, embedding_dim]`, `[6, 4]` in this example.
- The input indices are shaped `[batch_size, sequence_length]`, `[2, 3]` here.
- The output of the embedding layer is shaped `[batch_size, sequence_length, embedding_dim]`, `[2, 3, 4]` in this case, with each token index replaced by its corresponding embedding vector.


In [7]:
import torch

In [8]:
##creating a simple tensor; 2 sentence with 3 tokesn length

input_example=torch.LongTensor([[1,3,4],[2,6,8]])
print('type: ',input_example.dtype)
print('shape:  ',input_example.shape)

type:  torch.int64
shape:   torch.Size([2, 3])


In [6]:
##
vocab_size=100
emd_dime=5

emd_layer=torch.nn.Embedding(num_embeddings=vocab_size,embedding_dim=emd_dime)

embedded_inp_shape=emd_layer(input_example)

print('emd_layer:  ',embedded_inp_shape)
print("\nemb_layer.shape: ",embedded_inp_shape.shape)

emd_layer:   tensor([[[ 0.3479,  1.6453, -0.5427, -1.1865, -0.6854],
         [ 0.3126, -0.1501, -0.3753,  1.3835,  0.8567],
         [ 1.0775,  0.9205, -0.1192,  0.0075, -0.4301]],

        [[ 0.3406,  0.0350,  0.2793, -0.4562,  0.6643],
         [-0.3303,  2.1596,  1.2740,  0.6125, -0.8145],
         [ 1.3573, -0.7365,  0.5432,  1.0916, -0.1357]]],
       grad_fn=<EmbeddingBackward0>)

emb_layer.shape:  torch.Size([2, 3, 5])


In [9]:
embedded_inp_shape.mean(dim=1).shape

torch.Size([2, 5])