**Masked Language Modelling**

Masked Language Modeling (MLM) is a task used in natural language processing, particularly for training language models like BERT (Bidirectional Encoder Representations from Transformers). It's a type of self-supervised learning technique where the model learns to predict words that have been intentionally masked (hidden) in the input text. The key aspects of MLM include:

Masking Tokens: In an input sentence, a certain percentage of the tokens are randomly replaced with a special [MASK] token. For instance, in the sentence "The quick brown fox jumps over the lazy dog," you might mask the word 'brown' so it becomes "The quick [MASK] fox jumps over the lazy dog."

Model's Task: The model's objective is to predict the original token that was masked out, using the context provided by the other, unmasked words in the sentence. In the example above, the model would try to predict the word 'brown' based on the surrounding context.

Training: During training, the model is provided with many such examples where different words are masked in different contexts. The model learns to use the bidirectional context (words before and after the masked token) to predict the masked word. This is in contrast to traditional language models which typically predict the next word in a sequence based only on the previous words (unidirectional context).

In [1]:
import torch

# Check if a GPU is available and which one
if torch.cuda.is_available():
    print('Available GPUs: ', torch.cuda.device_count())
    print('Current GPU: ', torch.cuda.get_device_name(torch.cuda.current_device()))
else:
    print("Please install a GPU version of PyTorch")


Available GPUs:  1
Current GPU:  Tesla T4


In [2]:
import torch
# Check if GPU is available and set the device to GPU if it is, else CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

In [3]:
from transformers import BertForMaskedLM, pipeline ,BertTokenizer

In [4]:
bert_lm = BertForMaskedLM.from_pretrained("bert-base-cased")
bert_lm.to(device)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


BertForMaskedLM(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(28996, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_a

In [5]:
# Pipelines in transformers take in models/tokenizers and are easy way to perform several tasks
# We can perform an auto-encoder language model task
mlmp = pipeline("fill-mask", model='bert-base-cased')

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [6]:
type(mlmp)

transformers.pipelines.fill_mask.FillMaskPipeline

In [7]:
mlmp.tokenizer

BertTokenizerFast(name_or_path='bert-base-cased', vocab_size=28996, model_max_length=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	0: AddedToken("[PAD]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	100: AddedToken("[UNK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	101: AddedToken("[CLS]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	102: AddedToken("[SEP]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	103: AddedToken("[MASK]", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

In [8]:
print(type(mlmp.model))

<class 'transformers.models.bert.modeling_bert.BertForMaskedLM'>


In [9]:
output = mlmp(f"He had a famous maxim, “No ideas but in things,” which I take to mean that to speak about ideas, {mlmp.tokenizer.mask_token} , and abstractions, we must ground them firmly in the things of the world.")
print("He had a famous maxim, “No ideas but in things,” which I take to mean that to speak about ideas, **** , and abstractions, we must ground them firmly in the things of the world.")

He had a famous maxim, “No ideas but in things,” which I take to mean that to speak about ideas, **** , and abstractions, we must ground them firmly in the things of the world.


In [10]:
for p in output:
    print(f"Token:{p['token_str']}. Score: {100*p['score']:,.2f}%")

Token:ideas. Score: 24.96%
Token:concepts. Score: 11.45%
Token:things. Score: 8.87%
Token:thoughts. Score: 3.65%
Token:objects. Score: 2.13%


In [11]:
print(output)

[{'score': 0.24962514638900757, 'token': 4133, 'token_str': 'ideas', 'sequence': 'He had a famous maxim, “ No ideas but in things, ” which I take to mean that to speak about ideas, ideas, and abstractions, we must ground them firmly in the things of the world.'}, {'score': 0.11452872306108475, 'token': 8550, 'token_str': 'concepts', 'sequence': 'He had a famous maxim, “ No ideas but in things, ” which I take to mean that to speak about ideas, concepts, and abstractions, we must ground them firmly in the things of the world.'}, {'score': 0.08866363763809204, 'token': 1614, 'token_str': 'things', 'sequence': 'He had a famous maxim, “ No ideas but in things, ” which I take to mean that to speak about ideas, things, and abstractions, we must ground them firmly in the things of the world.'}, {'score': 0.036541398614645004, 'token': 3578, 'token_str': 'thoughts', 'sequence': 'He had a famous maxim, “ No ideas but in things, ” which I take to mean that to speak about ideas, thoughts, and abst

**The Next Sentence Prediction Task**

In [12]:
from transformers import BertForNextSentencePrediction, BertTokenizer
import torch

In [13]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model_nsp = BertForNextSentencePrediction.from_pretrained("bert-base-uncased")

In [14]:
model_nsp.to(device)

BertForNextSentencePrediction(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

In [15]:
text2 = "BERT (Bidirectional Encoder Representations from Transformers) includes a Next Sentence Prediction (NSP) task during its pre-training phase, which involves predicting whether a given sentence logically follows a preceding one."
text1 = "This task helps BERT understand the relationship between sentences, enhancing its ability to handle tasks like question answering and paragraph coherence, where understanding the sequence of sentences is crucial"

In [16]:
inputs = tokenizer(text1, text2, return_tensors='pt')
print(inputs) #attention mask 1 represents pay attention to every single token.

{'input_ids': tensor([[  101,  2023,  4708,  7126, 14324,  3305,  1996,  3276,  2090, 11746,
          1010, 20226,  2049,  3754,  2000,  5047,  8518,  2066,  3160, 10739,
          1998, 20423,  2522,  5886, 10127,  1010,  2073,  4824,  1996,  5537,
          1997, 11746,  2003, 10232,   102, 14324,  1006,  7226,  7442,  7542,
          2389,  4372, 16044,  2099, 15066,  2013, 19081,  1007,  2950,  1037,
          2279,  6251, 17547,  1006, 24978,  2361,  1007,  4708,  2076,  2049,
          3653,  1011,  2731,  4403,  1010,  2029,  7336, 29458,  3251,  1037,
          2445,  6251, 11177,  2135,  4076,  1037, 11003,  2028,  1012,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 

In [17]:
inputs.input_ids  # tokens for sentence A and B

tensor([[  101,  2023,  4708,  7126, 14324,  3305,  1996,  3276,  2090, 11746,
          1010, 20226,  2049,  3754,  2000,  5047,  8518,  2066,  3160, 10739,
          1998, 20423,  2522,  5886, 10127,  1010,  2073,  4824,  1996,  5537,
          1997, 11746,  2003, 10232,   102, 14324,  1006,  7226,  7442,  7542,
          2389,  4372, 16044,  2099, 15066,  2013, 19081,  1007,  2950,  1037,
          2279,  6251, 17547,  1006, 24978,  2361,  1007,  4708,  2076,  2049,
          3653,  1011,  2731,  4403,  1010,  2029,  7336, 29458,  3251,  1037,
          2445,  6251, 11177,  2135,  4076,  1037, 11003,  2028,  1012,   102]])

In [18]:
inputs.token_type_ids  # segment Ids (0 == A & 1 == B)

tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1]])

In [19]:
inputs.attention_mask  # pay attention to everything

tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1]])

In [20]:
# 0 == "isNextSentence" and 1 == "notNextSentence"
outputs = model_nsp(**inputs.to(device))

print(outputs)

NextSentencePredictorOutput(loss=None, logits=tensor([[ 6.4221, -6.3870]], device='cuda:0', grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)


In [25]:
# calculate loss by passing through a label
outputs = model_nsp(**inputs.to(device), labels=torch.LongTensor([0]).to(device))
outputs

NextSentencePredictorOutput(loss=tensor(2.7418e-06, device='cuda:0', grad_fn=<NllLossBackward0>), logits=tensor([[ 6.4221, -6.3870]], device='cuda:0', grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [26]:
# calculate loss by passing through a label
outputs = model_nsp(**inputs.to(device), labels=torch.LongTensor([1]).to(device))
outputs

NextSentencePredictorOutput(loss=tensor(12.8091, device='cuda:0', grad_fn=<NllLossBackward0>), logits=tensor([[ 6.4221, -6.3870]], device='cuda:0', grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

**Fine Tuning BERT to solve NLP tasks**

In [27]:
from transformers import pipeline, BertForQuestionAnswering, BertForTokenClassification, BertForSequenceClassification

In [28]:
model_sq = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)
model_sq.to(device)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

In [29]:
model_sq.classifier

Linear(in_features=768, out_features=3, bias=True)

In [30]:
# Finding a classifier on the Huggingface model repository
finbert = pipeline('text-classification', model='ProsusAI/finbert', tokenizer='ProsusAI/finbert')  # Finbert is trained on large financial data

config.json:   0%|          | 0.00/758 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [32]:
finbert("The order notified the formation of the 16th Finance Commission and listed Panagariya as its chief, but a notification for the appointment of other panel members would be issued separately.")

[{'label': 'neutral', 'score': 0.9421731233596802}]

In [34]:
finbert("One of the reasons for imposing the penalty is that the invoices basis which Input Tax Credit (ITC) has been claimed by MTWL are not reported by vendors in GST returns and thus are not appearing in auto populated GSTR-2A")

[{'label': 'neutral', 'score': 0.7651121020317078}]

In [36]:
finbert("Shares of Tata Motors Ltd hit a record high amid negative broader market sentiment on the last trading day of 2023 today. With today's rally, Tata Motors stock has recovered 110% from its 52-week low touched early this year. Tata Motors stock fell to a low of Rs 381 on January 6, 2023. It hit a record high of Rs 802.60 today, rising 6.41% intraday against the previous close of Rs 754.20 on BSE.")

[{'label': 'negative', 'score': 0.9598290920257568}]

In [38]:
finbert("The recent performance of small caps has brought them in the category of stocks where the desire to own them is “highest”. But the fact which gets ignored in such conditions is that “risks” are also at the highest. The reason, valuation across the board are expensive and corrections in small caps are probably the most brutal ones. Just jog your memory back to October 2021. For all those who still want to take exposure to them, in such times, it would be better to be cautious in selecting the stocks, better to go with buying in smaller quantities and buy stocks where the underlying business has good macro fundamentals. Refinitiv’s Stock Report Plus which lists stocks with high upside potential over the next 12 months, having an average recommendation rating of buy or strong buy")

[{'label': 'neutral', 'score': 0.8902133107185364}]

In [39]:
finbert.model

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

**Token Classification**

In [40]:
model_tc = BertForTokenClassification.from_pretrained('bert-base-uncased')
model_tc.to(device)

Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForTokenClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, el

In [41]:
model_tc.classifier

Linear(in_features=768, out_features=2, bias=True)

In [42]:
# https://huggingface.co/savasy/bert-base-turkish-ner-cased
custom_module = 'savasy/bert-base-turkish-ner-cased'
model_ner=pipeline('ner', model=custom_module, tokenizer=custom_module)

config.json:   0%|          | 0.00/734 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/443M [00:00<?, ?B/s]

Some weights of the model checkpoint at savasy/bert-base-turkish-ner-cased were not used when initializing BertForTokenClassification: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/40.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/251k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

'savasy/bert-base-turkish-ner-cased': This is the identifier of a pre-trained model hosted on Hugging Face's model hub. The model is developed by a user or organization named 'savasy'. This particular model is a BERT-based model trained for Turkish language and is specifically fine-tuned for the task of Named Entity Recognition with cased sensitivity (meaning it distinguishes between uppercase and lowercase letters).
The same identifier is used for both the model and the tokenizer to ensure compatibility. The tokenizer is responsible for converting raw text into a format (like token IDs) that the model can process. Using the tokenizer that is paired with the model ensures that the text is processed in the way the model was trained to understand.

In [43]:
sequence = "Merhaba! Benim adım Sinan. San Francisco'dan geliyorum" # Hi! I'm Sinan. I come from San Francisco"
model_ner(sequence) # PER - person, LOC - lcation, B-begining, I-continuation

[{'entity': 'B-PER',
  'score': 0.72424686,
  'index': 5,
  'word': 'Sinan',
  'start': 20,
  'end': 25},
 {'entity': 'B-LOC',
  'score': 0.99879956,
  'index': 7,
  'word': 'San',
  'start': 27,
  'end': 30},
 {'entity': 'I-LOC',
  'score': 0.9977099,
  'index': 8,
  'word': 'Francisco',
  'start': 31,
  'end': 40}]

**Question Answering**

In [44]:
model_qa = BertForQuestionAnswering.from_pretrained('bert-base-uncased')
model_qa.to(device)

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForQuestionAnswering(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elem

In [45]:
model_qa.qa_outputs

Linear(in_features=768, out_features=2, bias=True)

In [46]:
model_name = "deepset/roberta-base-squad2"
qa = pipeline(model=model_name, tokenizer=model_name, revision="v1.0", task="question-answering")

config.json:   0%|          | 0.00/559 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of the model checkpoint at deepset/roberta-base-squad2 were not used when initializing RobertaForQuestionAnswering: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

deepset/roberta-base-squad2 is a variant of the RoBERTa model that has been fine-tuned on the SQuAD 2.0 dataset. SQuAD 2.0 (Stanford Question Answering Dataset) is a well-known dataset used for training and evaluating question-answering systems. The model is provided by 'deepset', an organization known for its work in NLP.


In [47]:
context = "The Eiffel Tower is one of the most famous landmarks in the world. Located in Paris, France, it was constructed between 1887 and 1889. It was initially criticized by some of France's leading artists and intellectuals for its design, but has become a global cultural icon of France and one of the most recognizable structures in the world. The Eiffel Tower is the most-visited paid monument in the world; nearly 7 million people ascended it in 2015."
question = "When was the Eiffel Tower constructed?"

# Assuming 'qa' is your question-answering pipeline
answer = qa(question=question, context=context)
print(answer)


{'score': 0.8510423302650452, 'start': 112, 'end': 133, 'answer': 'between 1887 and 1889'}


In [49]:
context = "The theory of relativity, developed by Albert Einstein, is a fundamental theory in physics which describes the laws of gravitation and the forces of nature. It consists of two parts: the general theory of relativity, which provides a unified description of gravity as a geometric property of space and time, and the special theory of relativity, which describes the relationship between space and time. The theory has had a profound impact on the understanding of the universe and has been tested and confirmed in many scientific experiments."
question = "What are the two parts of the theory of relativity?"

# Assuming 'qa' is your question-answering pipeline
answer = qa(question=question, context=context)
print(answer)

{'score': 0.004713078029453754, 'start': 183, 'end': 215, 'answer': 'the general theory of relativity'}
