# Working with Language Models and Tokenizers

In [None]:
!pip install transformers 

In [15]:
from transformers import BertTokenizer 
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') 

In [16]:
text = "Using transformers is easy!" 
tokenizer(text) 

{'input_ids': [101, 2478, 19081, 2003, 3733, 999, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}

In [17]:
encoded_input = tokenizer(text, return_tensors="pt")

In [18]:
from transformers import BertModel 
model = BertModel.from_pretrained("bert-base-uncased") 
output = model(**encoded_input) 

In [19]:
from transformers import BertTokenizer, TFBertModel 
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') 
model = TFBertModel.from_pretrained("bert-base-uncased") 
text = " Using transformers is easy!" 
encoded_input = tokenizer(text, return_tensors='tf') 
output = model(**encoded_input) 

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


In [20]:
from transformers import pipeline 
unmasker = pipeline('fill-mask', model='bert-base-uncased') 
unmasker("The man worked as a [MASK].") 

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.0974755585193634,
  'sequence': 'the man worked as a carpenter.',
  'token': 10533,
  'token_str': 'carpenter'},
 {'score': 0.05238321051001549,
  'sequence': 'the man worked as a waiter.',
  'token': 15610,
  'token_str': 'waiter'},
 {'score': 0.04962703585624695,
  'sequence': 'the man worked as a barber.',
  'token': 13362,
  'token_str': 'barber'},
 {'score': 0.03788604959845543,
  'sequence': 'the man worked as a mechanic.',
  'token': 15893,
  'token_str': 'mechanic'},
 {'score': 0.037680838257074356,
  'sequence': 'the man worked as a salesman.',
  'token': 18968,
  'token_str': 'salesman'}]

In [21]:
from transformers import pipeline 
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli") 
sequence_to_classify = "I am going to france." 
candidate_labels = ['travel', 'cooking', 'dancing'] 
classifier(sequence_to_classify, candidate_labels) 

{'labels': ['travel', 'dancing', 'cooking'],
 'scores': [0.9866883754730225, 0.007197572849690914, 0.006114061456173658],
 'sequence': 'I am going to france.'}