# Using the question-answering pipeline

In [143]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
context = """
🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch, and TensorFlow — with a seamless integration
between them. It's straightforward to train your models with one before loading them for inference with the other.
"""
question = "Which deep learning libraries back 🤗 Transformers?"
question_answerer(question=question, context=context, top_k=5)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.9802603125572205,
  'start': 78,
  'end': 106,
  'answer': 'Jax, PyTorch, and TensorFlow'},
 {'score': 0.008247792720794678,
  'start': 78,
  'end': 108,
  'answer': 'Jax, PyTorch, and TensorFlow —'},
 {'score': 0.0013677021488547325,
  'start': 78,
  'end': 90,
  'answer': 'Jax, PyTorch'},
 {'score': 0.00038108628359623253,
  'start': 83,
  'end': 106,
  'answer': 'PyTorch, and TensorFlow'},
 {'score': 0.000216845452087, 'start': 96, 'end': 106, 'answer': 'TensorFlow'}]

# Let’s see how it does all of this!
1. start by tokenizing our input and 
2. then send it through the model.

In [101]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

model_checkpoint = "distilbert-base-cased-distilled-squad"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint)

inputs = tokenizer(question, context, return_tensors="pt")
outputs = model(**inputs)

we tokenize the question and the context **as a pair** (https://huggingface.co/learn/nlp-course/chapter6/3b?fw=pt)

the model has been trained **to predict the index of the token starting the answer (here 21) and the index of the token where the answer ends (here 24)**. This is why those models don’t return one tensor of logits but **two**: one for the logits corresponding to the start token of the answer, and one for the logits corresponding to the end token of the answer.

In [102]:
outputs

QuestionAnsweringModelOutput(loss=None, start_logits=tensor([[-4.4952, -6.4454, -4.7115, -7.0968, -7.0726, -7.4981, -5.5397, -4.1368,
         -5.9199, -5.4193, -1.5920, -1.0857, -5.0981, -2.9331, -3.4070,  2.2467,
          5.1563, -1.3602, -2.2209, -0.9686, -4.8112, -2.2527,  1.4383, 10.1211,
         -1.5311,  2.2685, -1.8951, -2.2108, -4.2142, -2.5571, -2.3252, -2.6046,
          1.7047, -1.9867, -1.7211, -0.5415, -2.0239, -4.4246, -5.1012, -4.4966,
         -7.8940, -6.7200, -4.6759, -6.3278, -4.8339, -5.1839, -3.3724, -7.4120,
         -8.1542, -4.4871, -7.4659, -4.3293, -4.2293, -3.1903, -7.9467, -5.2665,
         -7.5902, -5.0570, -7.4476, -7.9083, -6.5951, -7.4061, -8.8821, -7.6749,
         -6.9879, -7.0466, -5.4193]], grad_fn=<CloneBackward0>), end_logits=tensor([[-2.3958e+00, -7.0978e+00, -7.0745e+00, -6.3676e+00, -5.9532e+00,
         -7.9585e+00, -7.1869e+00, -3.6494e+00, -6.9677e+00, -5.1421e+00,
         -3.1757e+00, -1.1649e+00, -7.0748e+00, -5.2875e+00, -6.8611e+00,
 

In [103]:
start_logits = outputs.start_logits
end_logits = outputs.end_logits
print(start_logits.shape, end_logits.shape)

torch.Size([1, 67]) torch.Size([1, 67])


HAYEJAN ANGIZ! :D

To convert those logits into probabilities, we will apply a softmax function — but before that, we need to make sure we mask the indices that are not part of the context:
1. Masking Strategy: Before applying the softmax function to get probabilities from the model’s output logits, you need to mask certain tokens to focus the model on relevant parts of the input:
Masking the Question and [SEP] Tokens: The idea is to ignore the parts of the input that don't directly contribute to answering the question, which includes the question itself and the separator tokens. This is because the answer is expected to be found within the context, not the question or the separators.
Keeping the [CLS] Token: Although the [CLS] token is generally used for classification tasks, in some implementations, it may be used to indicate scenarios where the answer might not be explicitly present in the provided context.

2. Implementing Masking by Using Large Negative Numbers: When computing the softmax of the logits (which are essentially the raw output scores from the model for each token), replacing certain logits with a large negative value (like -10000) effectively removes them from consideration. In the softmax function, such large negative values become zero in the final probability distribution, effectively excluding these tokens from influencing the answer.

3. Softmax Application: After masking, applying the softmax function across the logits converts them into a probability distribution. Only the logits corresponding to the context tokens have meaningful values, and the probabilities associated with them will indicate the likelihood of each token being part of the answer.

In [104]:
inputs.sequence_ids()  # it consider a number for each sentence query in order. for example question have id=0 and the context have id = 1

[None,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 None,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 None]

In [105]:
inputs

{'input_ids': tensor([[  101,  5979,  1996,  3776,  9818,  1171,   100, 25267,   136,   102,
           100, 25267,  1110,  5534,  1118,  1103,  1210,  1211,  1927,  1996,
          3776,  9818,   783, 13612,   117,   153,  1183,  1942,  1766,  1732,
           117,  1105,  5157, 21484,  2271,  6737,   783,  1114,   170,  2343,
          1306,  2008,  9111,  1206,  1172,   119,  1135,   112,   188, 21546,
          1106,  2669,  1240,  3584,  1114,  1141,  1196, 10745,  1172,  1111,
          1107, 16792,  1114,  1103,  1168,   119,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [148]:
print(inputs.tokens())
len(inputs.tokens())

['[CLS]', 'Which', 'deep', 'learning', 'libraries', 'back', '[UNK]', 'Transformers', '?', '[SEP]', '[UNK]', 'Transformers', 'is', 'backed', 'by', 'the', 'three', 'most', 'popular', 'deep', 'learning', 'libraries', '—', 'Jax', ',', 'P', '##y', '##T', '##or', '##ch', ',', 'and', 'Ten', '##sor', '##F', '##low', '—', 'with', 'a', 'sea', '##m', '##less', 'integration', 'between', 'them', '.', 'It', "'", 's', 'straightforward', 'to', 'train', 'your', 'models', 'with', 'one', 'before', 'loading', 'them', 'for', 'in', '##ference', 'with', 'the', 'other', '.', '[SEP]']


67

In [107]:
inputs.tokens()[9]

'[SEP]'

In [108]:
import torch

sequence_ids = inputs.sequence_ids()
# Mask everything apart from the tokens of the context
mask = [i != 1 for i in sequence_ids]
# Unmask the [CLS] token
mask[0] = False
mask = torch.tensor(mask)[None]
mask

tensor([[False,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         False, False, False, False, False, False, False, False, False, False,
         False, False, False, False, False, False, False, False, False, False,
         False, False, False, False, False, False, False, False, False, False,
         False, False, False, False, False, False, False, False, False, False,
         False, False, False, False, False, False, False, False, False, False,
         False, False, False, False, False, False,  True]])

Now that we have properly masked the logits corresponding to positions we don’t want to predict, we can apply the softmax:

In [109]:
start_logits[mask] = -10000
end_logits[mask] = -10000

In [110]:
start_logits

tensor([[-4.4952e+00, -1.0000e+04, -1.0000e+04, -1.0000e+04, -1.0000e+04,
         -1.0000e+04, -1.0000e+04, -1.0000e+04, -1.0000e+04, -1.0000e+04,
         -1.5920e+00, -1.0857e+00, -5.0981e+00, -2.9331e+00, -3.4070e+00,
          2.2467e+00,  5.1563e+00, -1.3602e+00, -2.2209e+00, -9.6861e-01,
         -4.8112e+00, -2.2527e+00,  1.4383e+00,  1.0121e+01, -1.5311e+00,
          2.2685e+00, -1.8951e+00, -2.2108e+00, -4.2142e+00, -2.5571e+00,
         -2.3252e+00, -2.6046e+00,  1.7047e+00, -1.9867e+00, -1.7211e+00,
         -5.4148e-01, -2.0239e+00, -4.4246e+00, -5.1012e+00, -4.4966e+00,
         -7.8940e+00, -6.7200e+00, -4.6759e+00, -6.3278e+00, -4.8339e+00,
         -5.1839e+00, -3.3724e+00, -7.4120e+00, -8.1542e+00, -4.4871e+00,
         -7.4659e+00, -4.3293e+00, -4.2293e+00, -3.1903e+00, -7.9467e+00,
         -5.2665e+00, -7.5902e+00, -5.0570e+00, -7.4476e+00, -7.9083e+00,
         -6.5951e+00, -7.4061e+00, -8.8821e+00, -7.6749e+00, -6.9879e+00,
         -7.0466e+00, -1.0000e+04]], g

In [111]:
end_logits

tensor([[-2.3958e+00, -1.0000e+04, -1.0000e+04, -1.0000e+04, -1.0000e+04,
         -1.0000e+04, -1.0000e+04, -1.0000e+04, -1.0000e+04, -1.0000e+04,
         -3.1757e+00, -1.1649e+00, -7.0748e+00, -5.2875e+00, -6.8611e+00,
         -5.1769e+00,  3.7892e+00, -4.4408e+00, -7.6688e-01, -3.9180e+00,
         -2.1634e+00,  1.8116e+00, -1.4678e+00,  2.0508e+00,  1.5437e-03,
         -1.5531e+00, -6.9469e-01, -1.3466e+00, -1.6879e+00,  4.0826e+00,
          1.1467e+00, -3.7881e-01,  6.0774e-01,  1.2281e+00,  5.8202e-01,
          1.0657e+01,  5.8794e+00, -5.7342e+00, -7.0719e+00, -6.8077e+00,
         -7.1513e+00, -5.3228e+00, -3.4305e+00, -4.2575e+00,  2.2268e+00,
         -4.1297e-01, -6.8944e+00, -7.9381e+00, -8.3298e+00, -5.6078e+00,
         -8.9589e+00, -5.5772e+00, -5.7309e+00, -1.9592e+00, -7.8078e+00,
         -2.3823e+00, -7.2457e+00, -6.1642e+00, -4.2830e+00, -8.0948e+00,
         -8.0364e+00, -4.5566e+00, -7.6585e+00, -7.3241e+00, -2.2402e+00,
         -1.8462e+00, -1.0000e+04]], g

In [112]:
start_probabilities = torch.nn.functional.softmax(start_logits, dim=-1)[0]
end_probabilities = torch.nn.functional.softmax(end_logits, dim=-1)[0]
start_probabilities  #torch.Size([67])

tensor([4.4531e-07, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 8.1185e-06, 1.3470e-05,
        2.4368e-07, 2.1236e-06, 1.3220e-06, 3.7722e-04, 6.9219e-03, 1.0237e-05,
        4.3289e-06, 1.5143e-05, 3.2463e-07, 4.1933e-06, 1.6808e-04, 9.9179e-01,
        8.6288e-06, 3.8557e-04, 5.9956e-06, 4.3725e-06, 5.8977e-07, 3.0929e-06,
        3.8998e-06, 2.9493e-06, 2.1940e-04, 5.4713e-06, 7.1354e-06, 2.3212e-05,
        5.2711e-06, 4.7788e-07, 2.4291e-07, 4.4467e-07, 1.4879e-08, 4.8133e-08,
        3.7169e-07, 7.1242e-08, 3.1735e-07, 2.2365e-07, 1.3685e-06, 2.4093e-08,
        1.1470e-08, 4.4891e-07, 2.2828e-08, 5.2562e-07, 5.8092e-07, 1.6419e-06,
        1.4114e-08, 2.0591e-07, 2.0161e-08, 2.5390e-07, 2.3251e-08, 1.4667e-08,
        5.4533e-08, 2.4235e-08, 5.5390e-09, 1.8524e-08, 3.6818e-08, 3.4721e-08,
        0.0000e+00], grad_fn=<SelectBackward0>)

In [113]:
end_probabilities  #torch.Size([67])

tensor([2.1185e-06, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
        0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 9.7129e-07, 7.2546e-06,
        1.9679e-08, 1.1754e-07, 2.4366e-08, 1.3130e-07, 1.0284e-03, 2.7411e-07,
        1.0801e-05, 4.6232e-07, 2.6730e-06, 1.4233e-04, 5.3586e-06, 1.8078e-04,
        2.3292e-05, 4.9208e-06, 1.1610e-05, 6.0494e-06, 4.3002e-06, 1.3790e-03,
        7.3201e-05, 1.5923e-05, 4.2704e-05, 7.9413e-05, 4.1620e-05, 9.8838e-01,
        8.3161e-03, 7.5197e-08, 1.9735e-08, 2.5704e-08, 1.8229e-08, 1.1346e-07,
        7.5278e-07, 3.2926e-07, 2.1558e-04, 1.5388e-05, 2.3568e-08, 8.2993e-09,
        5.6096e-09, 8.5332e-08, 2.9904e-09, 8.7980e-08, 7.5446e-08, 3.2784e-06,
        9.4549e-09, 2.1473e-06, 1.6586e-08, 4.8915e-08, 3.2096e-07, 7.0959e-09,
        7.5224e-09, 2.4413e-07, 1.0978e-08, 1.5336e-08, 2.4753e-06, 3.6704e-06,
        0.0000e+00], grad_fn=<SelectBackward0>)

In [114]:
start_probabilities[:, None].shape

torch.Size([67, 1])

In [115]:
end_probabilities[None, :].shape

torch.Size([1, 67])

we could take the argmax of the start and end probabilities — but **we might end up with a start index that is greater than the end index**, so we need to take a few more precautions. We will compute the probabilities of each possible start_index and end_index where start_index <= end_index, then take the tuple (start_index, end_index) with the highest probability.

In [118]:
#test
test = torch.tensor([[1], [2], [3]]) * torch.tensor([10, 100,1000])
test

tensor([[  10,  100, 1000],
        [  20,  200, 2000],
        [  30,  300, 3000]])

JADID! :D (torch.triu())

In [121]:
score_test = torch.triu(test)
score_test

tensor([[  10,  100, 1000],
        [   0,  200, 2000],
        [   0,    0, 3000]])

In [122]:
max_index_test = score_test.argmax().item()
max_index_test

8

POINT: to find out the index of a value from a flatten matrix in its original dimensional matrix we should divide the index from flatten matrix to the number of columns of the original matrix. the answer is the row n umber and the module is the coumn number of the value in the original matrix

In [123]:
start_index_test = max_index_test // score_test.shape[1]
start_index_test

2

In [124]:
end_index_test = max_index_test % score_test.shape[1]
end_index_test

2

In [117]:
#KHAFAN!: learn how it add dimension to the tensors :D
scores = start_probabilities[:, None] * end_probabilities[None, :] #size(67, 67)

Then we’ll mask the values where start_index > end_index by setting them to 0 (the other probabilities are all positive numbers). The torch.triu() function returns the upper triangular part of the 2D tensor passed as an argument, so it will do that masking for us:

In [120]:
scores = torch.triu(scores)

In [146]:
max_index = scores.argmax().item()
start_index = max_index // scores.shape[1]
end_index = max_index % scores.shape[1]
print(scores[start_index, end_index])

tensor(0.9803, grad_fn=<SelectBackward0>)


In [147]:
start_index

23

#-------------------------EXAM------------------------

Try it out! Compute the start and end indices for the five most likely answers.

In [136]:
flattens = torch.flatten(scores)
sorted, indices = torch.sort(flattens, descending=True)
indices[:5]

tensor([1576, 1577, 1107, 1570, 1710])

In [139]:
for max_index_ex in indices[:5]:
    start_index_ex = max_index_ex // scores.shape[1]
    end_index_ex = max_index_ex % scores.shape[1]
    print(scores[start_index_ex, end_index_ex])

tensor(0.9803, grad_fn=<SelectBackward0>)
tensor(0.0082, grad_fn=<SelectBackward0>)
tensor(0.0068, grad_fn=<SelectBackward0>)
tensor(0.0014, grad_fn=<SelectBackward0>)
tensor(0.0004, grad_fn=<SelectBackward0>)


#---------------------END------------------------

In [149]:
inputs_with_offsets = tokenizer(question, context, return_offsets_mapping=True)
offsets = inputs_with_offsets["offset_mapping"]

start_char, _ = offsets[start_index]
_, end_char = offsets[end_index]
answer = context[start_char:end_char]

results = {"answer": answer,
           "start": start_char,
           "end": end_char,
           "score": scores[start_index, end_index].item()}

results

{'answer': 'Jax, PyTorch, and TensorFlow',
 'start': 78,
 'end': 106,
 'score': 0.9802601933479309}

#--------------------------EXAM----------------------------------

Try it out! Use the best scores you computed earlier to show the five most likely answers. 

In [157]:
results_ex = []
for max_index_ex in indices[:5]:
    start_index_ex = max_index_ex // scores.shape[1]
    end_index_ex = max_index_ex % scores.shape[1]

    start_char_ex, _ = offsets[start_index_ex]
    _, end_char_ex = offsets[end_index_ex]

    answer_ex = context[start_char_ex: end_char_ex]

    score_ex = scores[start_index_ex, end_index_ex]

    results_ex.append({'answer': answer_ex,
                       "start": start_char_ex,
                       "end": end_char_ex,
                       "score": score_ex})

results_ex

[{'answer': 'Jax, PyTorch, and TensorFlow',
  'start': 78,
  'end': 106,
  'score': tensor(0.9803, grad_fn=<SelectBackward0>)},
 {'answer': 'Jax, PyTorch, and TensorFlow —',
  'start': 78,
  'end': 108,
  'score': tensor(0.0082, grad_fn=<SelectBackward0>)},
 {'answer': 'three most popular deep learning libraries — Jax, PyTorch, and TensorFlow',
  'start': 33,
  'end': 106,
  'score': tensor(0.0068, grad_fn=<SelectBackward0>)},
 {'answer': 'Jax, PyTorch',
  'start': 78,
  'end': 90,
  'score': tensor(0.0014, grad_fn=<SelectBackward0>)},
 {'answer': 'PyTorch, and TensorFlow',
  'start': 83,
  'end': 106,
  'score': tensor(0.0004, grad_fn=<SelectBackward0>)}]

It is the same!!!!!!!!!!!!!!!!! =D

# Handling long contexts

In [158]:
long_context = """
🤗 Transformers: State of the Art NLP

🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction,
question answering, summarization, translation, text generation and more in over 100 languages.
Its aim is to make cutting-edge NLP easier to use for everyone.

🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and
then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and
can be modified to enable quick research experiments.

Why should I use transformers?

1. Easy-to-use state-of-the-art models:
  - High performance on NLU and NLG tasks.
  - Low barrier to entry for educators and practitioners.
  - Few user-facing abstractions with just three classes to learn.
  - A unified API for using all our pretrained models.
  - Lower compute costs, smaller carbon footprint:

2. Researchers can share trained models instead of always retraining.
  - Practitioners can reduce compute time and production costs.
  - Dozens of architectures with over 10,000 pretrained models, some in more than 100 languages.

3. Choose the right framework for every part of a model's lifetime:
  - Train state-of-the-art models in 3 lines of code.
  - Move a single model between TF2.0/PyTorch frameworks at will.
  - Seamlessly pick the right framework for training, evaluation and production.

4. Easily customize a model or an example to your needs:
  - We provide examples for each architecture to reproduce the results published by its original authors.
  - Model internals are exposed as consistently as possible.
  - Model files can be used independently of the library for quick experiments.

🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration
between them. It's straightforward to train your models with one before loading them for inference with the other.
"""
question_answerer(question=question, context=long_context)

{'score': 0.9714871048927307,
 'start': 1892,
 'end': 1919,
 'answer': 'Jax, PyTorch and TensorFlow'}

In [159]:
inputs = tokenizer(question, long_context)
print(len(inputs["input_ids"]))

461


In [160]:
inputs = tokenizer(
    question,
    long_context,
    stride=128,
    max_length=384,
    padding="longest",
    truncation="only_second",
    return_overflowing_tokens=True,
    return_offsets_mapping=True,
)

In [161]:
_ = inputs.pop("overflow_to_sample_mapping")
offsets = inputs.pop("offset_mapping")

inputs = inputs.convert_to_tensors("pt")
print(inputs["input_ids"].shape)

torch.Size([2, 384])


In [162]:
inputs

{'input_ids': tensor([[  101,  5979,  1996,  3776,  9818,  1171,   100, 25267,   136,   102,
           100, 25267,   131,  1426,  1104,  1103,  2051, 21239,  2101,   100,
         25267,  2790,  4674,  1104,  3073,  4487,  9044,  3584,  1106,  3870,
          8249,  1113,  6685,  1216,  1112,  5393,   117,  1869, 16026,   117,
          2304, 10937,   117,  7584,  7317,  2734,   117,  5179,   117,  3087,
          3964,  1105,  1167,  1107,  1166,  1620,  3483,   119,  2098,  6457,
          1110,  1106,  1294,  5910,   118,  2652, 21239,  2101,  5477,  1106,
          1329,  1111,  2490,   119,   100, 25267,  2790, 20480,  1116,  1106,
          1976,  9133,  1105,  1329,  1343,  3073,  4487,  9044,  3584,  1113,
           170,  1549,  3087,   117,  2503,   118,  9253,  1172,  1113,  1240,
          1319,  2233, 27948,  1105,  1173,  2934,  1172,  1114,  1103,  1661,
          1113,  1412,  2235, 10960,   119,  1335,  1103,  1269,  1159,   117,
          1296,   185, 25669,  8613, 1

In [163]:
outputs = model(**inputs)

start_logits = outputs.start_logits
end_logits = outputs.end_logits
print(start_logits.shape, end_logits.shape)

torch.Size([2, 384]) torch.Size([2, 384])


In [164]:
sequence_ids = inputs.sequence_ids()
# Mask everything apart from the tokens of the context
mask = [i != 1 for i in sequence_ids]
# Unmask the [CLS] token
mask[0] = False
# Mask all the [PAD] tokens
mask = torch.logical_or(torch.tensor(mask)[None], (inputs["attention_mask"] == 0))

start_logits[mask] = -10000
end_logits[mask] = -10000

In [165]:
start_probabilities = torch.nn.functional.softmax(start_logits, dim=-1)
end_probabilities = torch.nn.functional.softmax(end_logits, dim=-1)

In [166]:
candidates = []
for start_probs, end_probs in zip(start_probabilities, end_probabilities):
    scores = start_probs[:, None] * end_probs[None, :]
    idx = torch.triu(scores).argmax().item()

    start_idx = idx // scores.shape[1]
    end_idx = idx % scores.shape[1]
    score = scores[start_idx, end_idx].item()
    candidates.append((start_idx, end_idx, score))

print(candidates)

[(0, 18, 0.33867067098617554), (173, 184, 0.9714868664741516)]


In [167]:
for candidate, offset in zip(candidates, offsets):
    start_token, end_token, score = candidate
    start_char, _ = offset[start_token]
    _, end_char = offset[end_token]
    answer = long_context[start_char:end_char]
    result = {"answer": answer, "start": start_char, "end": end_char, "score": score}
    print(result)

{'answer': '\n🤗 Transformers: State of the Art NLP', 'start': 0, 'end': 37, 'score': 0.33867067098617554}
{'answer': 'Jax, PyTorch and TensorFlow', 'start': 1892, 'end': 1919, 'score': 0.9714868664741516}
