In [21]:
import torch
from transformers import pipeline

In [2]:
question_answerer = pipeline('question-answering')

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


In [4]:
question = "Which deep learning libraries back  Transformers?"

In [5]:
long_context = """
Transformers: State of the Art NLP

 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction,
question answering, summarization, translation, text generation and more in over 100 languages.
Its aim is to make cutting-edge NLP easier to use for everyone.

 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and
then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and
can be modified to enable quick research experiments.

Why should I use transformers?

1. Easy-to-use state-of-the-art models:
  - High performance on NLU and NLG tasks.
  - Low barrier to entry for educators and practitioners.
  - Few user-facing abstractions with just three classes to learn.
  - A unified API for using all our pretrained models.
  - Lower compute costs, smaller carbon footprint:

2. Researchers can share trained models instead of always retraining.
  - Practitioners can reduce compute time and production costs.
  - Dozens of architectures with over 10,000 pretrained models, some in more than 100 languages.

3. Choose the right framework for every part of a model's lifetime:
  - Train state-of-the-art models in 3 lines of code.
  - Move a single model between TF2.0/PyTorch frameworks at will.
  - Seamlessly pick the right framework for training, evaluation and production.

4. Easily customize a model or an example to your needs:
  - We provide examples for each architecture to reproduce the results published by its original authors.
  - Model internals are exposed as consistently as possible.
  - Model files can be used independently of the library for quick experiments.

 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration
between them. It's straightforward to train your models with one before loading them for inference with the other.
"""

In [6]:
question_answerer(question = question, context = long_context)

{'score': 0.9700287699743058,
 'start': 1887,
 'end': 1914,
 'answer': 'Jax, PyTorch and TensorFlow'}

In [7]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
checkpoint = 'distilbert-base-cased-distilled-squad'
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForQuestionAnswering.from_pretrained(checkpoint)

In [8]:
inputs = tokenizer(question, long_context)
print(len(inputs['input_ids'])) # The length is 456 which is longer than the context length of model (384)

456


In [9]:
inputs = tokenizer(question, long_context, max_length = 384, truncation = 'only_second')# Truncation is 'only_second' because the context is from second line
print(tokenizer.decode(inputs['input_ids']))

[CLS] Which deep learning libraries back Transformers? [SEP] Transformers : State of the Art NLP Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting - edge NLP easier to use for everyone. Transformers provides APIs to quickly download and use those pretrained models on a given text, fine - tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments. Why should I use transformers? 1. Easy - to - use state - of - the - art models : - High performance on NLU and NLG tasks. - Low barrier to entry for educators and practitioners. - Few user - facing abstractions with just three classes to learn. - A unified API for using all our pre

In [12]:
# To make sure that no important part of the context is left out we will make chunks of the long context and for that we will return overflowing tokens to True
sentence = 'We will be splitting this sentence as an example although it is not very long'
inputs = tokenizer(
    sentence, truncation=True, return_overflowing_tokens=True, max_length=5,stride = 2# stride defines the overlap between the splitted new sentence
)
for ids in inputs['input_ids']:
    print(tokenizer.decode(ids))

[CLS] We will be [SEP]
[CLS] will be splitting [SEP]
[CLS] be splitting this [SEP]
[CLS] splitting this sentence [SEP]
[CLS] this sentence as [SEP]
[CLS] sentence as an [SEP]
[CLS] as an example [SEP]
[CLS] an example although [SEP]
[CLS] example although it [SEP]
[CLS] although it is [SEP]
[CLS] it is not [SEP]
[CLS] is not very [SEP]
[CLS] not very long [SEP]


In [13]:
print(inputs.keys())

KeysView({'input_ids': [[101, 1284, 1209, 1129, 102], [101, 1209, 1129, 15601, 102], [101, 1129, 15601, 1142, 102], [101, 15601, 1142, 5650, 102], [101, 1142, 5650, 1112, 102], [101, 5650, 1112, 1126, 102], [101, 1112, 1126, 1859, 102], [101, 1126, 1859, 1780, 102], [101, 1859, 1780, 1122, 102], [101, 1780, 1122, 1110, 102], [101, 1122, 1110, 1136, 102], [101, 1110, 1136, 1304, 102], [101, 1136, 1304, 1263, 102]], 'attention_mask': [[1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1]], 'overflow_to_sample_mapping': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]})


In [15]:
print(inputs['overflow_to_sample_mapping']) # here we see 13 zeros which are due to 13 sentences after the split of original sentence

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [16]:
# another example where we split multiple sentences
sentences = ['Hey there! how are you?',
             'Large Language Models are everywhere',
             'Technology is growing lightning fast']

inputs = tokenizer(sentences, truncation=True, return_overflowing_tokens=True, max_length=6, stride = 2)

print(inputs['overflow_to_sample_mapping'])

[0, 0, 0, 1, 1, 2, 2]


# Back to Long Context

In [17]:
inputs = tokenizer(
    question,
    long_context,
    stride = 128,
    max_length = 384,
    padding= 'longest',
    truncation = 'only_second',
    return_overflowing_tokens=True,
    return_offsets_mapping=True,
)

In [18]:
_ = inputs.pop('overflow_to_sample_mapping')
offsets = inputs.pop('offset_mapping')

inputs = inputs.convert_to_tensors('pt')
print(inputs['input_ids'].shape)

torch.Size([2, 384])


In [19]:
# we will have 2 sets of start and end logits
outputs = model(**inputs)
start_logits = outputs.start_logits
end_logits = outputs.end_logits
print(start_logits.shape, end_logits.shape)

torch.Size([2, 384]) torch.Size([2, 384])


In [22]:
# We mask the tokens that are not part of the Context and we also mask the padding tokens
sequence_ids = inputs.sequence_ids()
# mask everything apart from the tokens of the context
mask = [i !=1 for i in sequence_ids]
# Unmask the CLS token
mask[0] = False
# mask all the [PAD] tokens
mask = torch.logical_or(torch.tensor(mask)[None], (inputs['attention_mask']== 0))
start_logits[mask] = -10000
end_logits[mask] = -10000

In [23]:
# Now we apply softmax to convert logits to probabilities
start_probabilities = torch.nn.functional.softmax(start_logits, dim=-1)
end_probabilities = torch.nn.functional.softmax(end_logits, dim=-1)

In [24]:
# We'll attribute a score to all possible spans of answer and then take the span with best score
candidate_answers = []
for start_probs, end_probs in zip(start_probabilities, end_probabilities):
    scores = start_probs[:,None]*end_probs[None,:]
    idx = torch.triu(scores).argmax().item()

    start_idx = idx//scores.shape[1]
    end_idx = idx % scores.shape[1]
    score = scores[start_idx,end_idx].item()

    candidate_answers.append((start_idx, end_idx, score))

In [25]:
print(candidate_answers)

[(0, 0, 0.6493729948997498), (167, 178, 0.9697459936141968)]


In [26]:
for candidate, offset in zip(candidate_answers, offsets):
    start_token, end_token, score = candidate
    start_char,_ = offset[start_token]
    _, end_char= offset[end_token]
    answer = long_context[start_char:end_char]
    result = {
        'answer': answer,
        'start': start_char,
        'end': end_char,
        'score': score,
    }
    print(result)

{'answer': '', 'start': 0, 'end': 0, 'score': 0.6493729948997498}
{'answer': 'Jax, PyTorch and TensorFlow', 'start': 1887, 'end': 1914, 'score': 0.9697459936141968}
