# Working with Language Models and Tokenizers

### 1.  In order to use any specified language model, we first need to import it. We will start with the BERT model provided by Google and use its pretrained version, as follows:

In [1]:
from transformers import BertTokenizer 
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') 

In [2]:
text = "Using transformers is easy!" 
tokenizer(text) 

{'input_ids': [101, 2478, 19081, 2003, 3733, 999, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}

In [3]:
encoded_input = tokenizer(text, return_tensors="pt")
encoded_input

{'input_ids': tensor([[  101,  2478, 19081,  2003,  3733,   999,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}

### 2.  In order to run the model—for example, the BERT base model—the following code can be used to download the model from the huggingface model repository:

In [4]:
from transformers import BertModel 
model = BertModel.from_pretrained("bert-base-uncased") 

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [5]:
model

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-11): 12 x BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
  

In [6]:
output = model(**encoded_input) 

In [8]:
output.keys()

odict_keys(['last_hidden_state', 'pooler_output'])

In [11]:
vars(output).keys()

dict_keys(['last_hidden_state', 'pooler_output', 'hidden_states', 'past_key_values', 'attentions', 'cross_attentions'])

### 3. For specific tasks such as filling masks using language models, there are pipelines designed by huggingface that are ready to use. For example, a task of filling a mask can be seen in the following code snippet:

In [13]:
from transformers import pipeline 
unmasker = pipeline('fill-mask', model='bert-base-uncased') 
o = unmasker("The man worked as a [MASK].") 
print(type(o))
o

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


<class 'list'>


[{'score': 0.09747521579265594,
  'token': 10533,
  'token_str': 'carpenter',
  'sequence': 'the man worked as a carpenter.'},
 {'score': 0.05238352343440056,
  'token': 15610,
  'token_str': 'waiter',
  'sequence': 'the man worked as a waiter.'},
 {'score': 0.049627047032117844,
  'token': 13362,
  'token_str': 'barber',
  'sequence': 'the man worked as a barber.'},
 {'score': 0.037886057049036026,
  'token': 15893,
  'token_str': 'mechanic',
  'sequence': 'the man worked as a mechanic.'},
 {'score': 0.037680741399526596,
  'token': 18968,
  'token_str': 'salesman',
  'sequence': 'the man worked as a salesman.'}]

We get the list of dicts. SO we can make DataFrame. To get a neat view with pandas, run the following code:

In [14]:
import pandas as pd
pd.DataFrame(o)

Unnamed: 0,score,token,token_str,sequence
0,0.097475,10533,carpenter,the man worked as a carpenter.
1,0.052384,15610,waiter,the man worked as a waiter.
2,0.049627,13362,barber,the man worked as a barber.
3,0.037886,15893,mechanic,the man worked as a mechanic.
4,0.037681,18968,salesman,the man worked as a salesman.
