# Derivatives of BERT

In [1]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
nlp = pipeline("fill-mask", model="bert-base-uncased")

print(type(nlp.model))

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


<class 'transformers.models.bert.modeling_bert.BertForMaskedLM'>


In [5]:
preds = nlp(f"If you don't [MASK] at the sign, you will get a ticket.")

print("If you don't *** at the sign, you will get a ticket.")

for p in preds:
    print(f"Token: {p['token_str']}.  |  Score: {100*p['score']:,.2f}%")

If you don't *** at the sign, you will get a ticket.
Token: look.  |  Score: 84.78%
Token: stop.  |  Score: 10.96%
Token: stare.  |  Score: 0.64%
Token: glance.  |  Score: 0.29%
Token: appear.  |  Score: 0.21%


## Doing same task with RoBERTa base

In [8]:
roberta_nlp = pipeline("fill-mask", model="roberta-base")

print(type(roberta_nlp.model))

preds = roberta_nlp("If you don't <mask> at the sign, you will get a ticket.")

print("If you don't *** at the sign, you will get a ticket.")

for p in preds:
    print(f"Token: {p['token_str']}.  |  Score: {100*p['score']:,.2f}%")


Device set to use mps:0


<class 'transformers.models.roberta.modeling_roberta.RobertaForMaskedLM'>
If you don't *** at the sign, you will get a ticket.
Token:  stop.  |  Score: 46.29%
Token:  look.  |  Score: 39.91%
Token:  stay.  |  Score: 2.72%
Token:  stand.  |  Score: 1.51%
Token:  pay.  |  Score: 0.86%


## Doing same task with Distil-RoBERTa

In [12]:
distilroberta_nlp = pipeline("fill-mask", model="distilroberta-base")

print(type(distilroberta_nlp.model))

preds = distilroberta_nlp("If you don't <mask> at the sign, you will get a ticket.")

print("If you don't *** at the sign, you will get a ticket.")

for p in preds:
    print(f"Token: {p['token_str']}.  |  Score: {100*p['score']:,.2f}%")

Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


<class 'transformers.models.roberta.modeling_roberta.RobertaForMaskedLM'>
If you don't *** at the sign, you will get a ticket.
Token:  stop.  |  Score: 43.30%
Token:  arrive.  |  Score: 5.89%
Token:  park.  |  Score: 5.04%
Token:  look.  |  Score: 4.78%
Token:  stare.  |  Score: 3.85%


## Doing same task with DistilBERT

In [15]:
distil_nlp = pipeline("fill-mask", model="distilbert-base-cased")

print(type(distil_nlp.model))

preds = distil_nlp("If you don't [MASK] at the sign, you will get a ticket.")

print("If you don't *** at the sign, you will get a ticket.")

for p in preds:
    print(f"Token: {p['token_str']}.  |  Score: {100*p['score']:,.2f}%")

Device set to use mps:0


<class 'transformers.models.distilbert.modeling_distilbert.DistilBertForMaskedLM'>
If you don't *** at the sign, you will get a ticket.
Token: look.  |  Score: 57.85%
Token: stop.  |  Score: 8.69%
Token: glance.  |  Score: 4.74%
Token: arrive.  |  Score: 2.14%
Token: appear.  |  Score: 1.74%
