![embeddings](bertembeddings.png)

In [4]:
from transformers import BertTokenizer, BertModel
import torch
import torch.nn as nn

model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertModel.from_pretrained(model_name)
model.embeddings

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


BertEmbeddings(
  (word_embeddings): Embedding(30522, 768, padding_idx=0)
  (position_embeddings): Embedding(512, 768)
  (token_type_embeddings): Embedding(2, 768)
  (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  (dropout): Dropout(p=0.1, inplace=False)
)

**Summary**

- Bert input embedding: a **lookup table/vocab** operation
    - lookup tabel
        - token embeddings: 30522*768
        - segment embeddings (token type embedings): 0/1*8=768
        - position embeddings: 512 (max-length)*768
    - after operation
        - Layer norm
        - Drop out

In [5]:
test_sent = 'this is a test sentence'
data_input = tokenizer(test_sent, return_tensors='pt')
input_ids = data_input['input_ids']
token_type_ids = data_input['token_type_ids']

input_ids, token_type_ids

(tensor([[ 101, 2023, 2003, 1037, 3231, 6251,  102]]),
 tensor([[0, 0, 0, 0, 0, 0, 0]]))

In [8]:
pos_ids = torch.arange(input_ids.shape[1])
pos_ids

tensor([0, 1, 2, 3, 4, 5, 6])

In [11]:
model.embeddings(input_ids, token_type_ids).shape

torch.Size([1, 7, 768])

### Token embedding

In [12]:
token_embed = model.embeddings.word_embeddings(input_ids)
token_embed.shape

torch.Size([1, 7, 768])

### Token_type embedding

In [13]:
token_type_embed = model.embeddings.token_type_embeddings(token_type_ids)
token_type_embed.shape

torch.Size([1, 7, 768])

#### Position embedding

In [14]:
pos_embed = model.embeddings.position_embeddings(pos_ids)
pos_embed.shape

torch.Size([7, 768])

### Input embedding

In [17]:
input_embed = token_embed + token_type_embed + pos_embed.unsqueeze(0)
input_embed.shape

torch.Size([1, 7, 768])

### After operation

In [18]:
embed = model.embeddings.LayerNorm(input_embed)
embed = model.embeddings.dropout(embed)
embed.shape

torch.Size([1, 7, 768])