<a href="https://colab.research.google.com/github/avdlaan/Angular-10-Templates/blob/master/4_03_Extracting_embeddings_with_ALBERT_checkpoint.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Extracting embeddings with ALBERT
With Hugging Face transformers, we can use the ALBERT model just like how we used BERT. Let's explore this with a small example. Suppose, we need to get the contextual word embedding of every word in the sentence Paris is a beautiful city. Let's see how to that with ALBERT. 

Import the necessary modules: 

In [1]:
!pip install transformers==3.5.1
!pip install -U torchtext==0.8.0
!pip install -q torch==1.4.0 -f https://download.pytorch.org/whl/cu101/torch_stable.html
!pip install datasets

Collecting torchtext==0.8.0
  Downloading torchtext-0.8.0-cp37-cp37m-manylinux1_x86_64.whl (6.9 MB)
[K     |████████████████████████████████| 6.9 MB 4.4 MB/s 
Installing collected packages: torchtext
  Attempting uninstall: torchtext
    Found existing installation: torchtext 0.10.0
    Uninstalling torchtext-0.10.0:
      Successfully uninstalled torchtext-0.10.0
Successfully installed torchtext-0.8.0
[K     |████████████████████████████████| 753.4 MB 4.9 kB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.10.0+cu102 requires torch==1.9.0, but you have torch 1.4.0 which is incompatible.[0m
[?25hCollecting datasets
  Downloading datasets-1.11.0-py3-none-any.whl (264 kB)
[K     |████████████████████████████████| 264 kB 5.1 MB/s 
Collecting huggingface-hub<0.1.0
  Downloading huggingface_hub-0.0.16-py3-none-any.whl (50 kB)
[K     |█████

In [2]:
from transformers import AlbertTokenizer, AlbertModel


Download and load the pre-trained Albert model and tokenizer. In this tutorial, we use the ALBERT-base model: 


In [3]:
model = AlbertModel.from_pretrained('albert-base-v2')
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')

Downloading:   0%|          | 0.00/684 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/47.4M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/760k [00:00<?, ?B/s]


Now, feed the sentence to the tokenizer and get the preprocessed input: 

In [4]:
sentence = "Paris is a beautiful city" 
inputs = tokenizer(sentence, return_tensors="pt")


Let's print the inputs:

In [5]:
print(inputs)

{'input_ids': tensor([[   2, 1162,   25,   21, 1632,  136,    3]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}



Now we just feed the inputs to the model and get the result. The model returns the hidden_rep which contains the hidden state representation of all the tokens from the final encoder layer and cls_head which contains the hidden state representation of the [CLS] token from the final encoder layer:


In [6]:
hidden_rep, cls_head = model(**inputs)



We can obtain the contextual word embedding of each word in the sentence just like BERT as:

- hidden_rep[0][0] contains the contextual embedding of the token [CLS]
- hidden_rep[0][1] contains the contextual embedding of the token 'Paris' 
- hidden_rep[0][2] contains the contextual embedding of the token 'is' 

Similarly in this manner, hidden_rep[0][7] contains the contextual embedding of the token 'city'. 

In this way, we can use the ALBERT model just like how we used the BERT model. We can also fine-tune the ALBERT model similar to how we fine-tuned the BERT model on any downstream task. Now that we learned how ALBERT works, in the next section, let us explore RoBERTa, another interesting variant of BERT.