<a href="https://colab.research.google.com/github/AlinZohari/InformationExtraction/blob/starlink_gen2/003_SQuAD_TuneQAmodel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-Tuning QA model

This notebook are run in Google Colab to leverage its GPU capability

Reference:
1. Hugging Face -  [Question and Answering Task Guide](https://huggingface.co/docs/transformers/tasks/question_answering)
2. Hugging face [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb) on Question Answering on SQUAD
2. Creating Train and Validation Datasets - https://simpletransformers.ai/docs/qa-data-formats/

## Preparation GPU in Google Colab

In [1]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Wed Aug 30 13:24:47 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 54.8 gigabytes of available RAM

You are using a high-RAM runtime!


In [3]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

Setting CUDA_LAUNCH_BLOCKING=1 makes all CUDA operations synchronous, which means the CPU will wait for the GPU to finish before executing the next line of code. This makes it easier to identify and debug errors, because the stack trace will show exactly where the error occurred.However, this will make the code run slower

In [4]:
!pip install transformers[torch]



In [5]:
!pip install accelerate -U



In [6]:
!pip show accelerate

Name: accelerate
Version: 0.22.0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: sylvain@huggingface.co
License: Apache
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy, packaging, psutil, pyyaml, torch
Required-by: 


In [7]:
import torch
torch.cuda.is_available()

True

## Pretrained model capabilities

let us see first the capability of the pretrained deepset/roberta-base-squad2 model on our questions

In [8]:
import torch
from transformers import RobertaTokenizer, RobertaForQuestionAnswering


# Load the tokenizer and model
tokenizer = RobertaTokenizer.from_pretrained("deepset/roberta-base-squad2")
model = RobertaForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2")

# Read context from a .txt file
import requests

url = "https://raw.githubusercontent.com/AlinZohari/InformationExtraction/starlink_gen2/data/authorize_doc/StarlinkGen2_FCC-22-91A1.txt"
response = requests.get(url)
context = response.text

# Dictionary of questions
questions = {
    "const_name": "What's the name of the satellite constellation the company seeks to deploy or operate?",
    "date_release": "On which date was the document released?",
    "date_50": "By which date must the company launch and operate half of its satellites?",
    "date_100": "By which date is the company expected to have all its satellites operational?",
    "total_sat_const": "How many satellites is the company authorized to deploy and operate for this constellation?",
    "altitude": "At which authorized altitudes will the company deploy its satellites?",
    "inclination": "What are the authorized satellite inclinations within the corresponding altitudes?",
    "number_orb_plane": "How many orbital planes, corresponding to given altitudes and inclinations, has the company been authorized for?",
    "total_sat_per_orb_plane": "How many satellites are allocated to each orbital plane?",
    "total_sat_per_alt_incl": "How many satellites, for each altitude and inclination, are there across all matching orbital planes?",
    "operational_lifetime": "What is the satellite's expected operational lifetime in years?"
}

# Loop through each question
for key, question in questions.items():
    # Prepare the input
    inputs = tokenizer.encode_plus(question, context, return_tensors="pt", max_length=512, truncation=True)


    # Get the model's prediction
    input_ids = inputs["input_ids"]
    attention_mask = inputs["attention_mask"]
    output = model(input_ids, attention_mask=attention_mask)

    answer_start_scores = output.start_logits
    answer_end_scores = output.end_logits

    answer_start = torch.argmax(answer_start_scores)
    answer_end = torch.argmax(answer_end_scores)
    answer = tokenizer.decode(input_ids[0][answer_start:answer_end + 1])

    print(f"Question: {question}")
    print(f"Answer: {answer}")
    print()


Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: What's the name of the satellite constellation the company seeks to deploy or operate?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: On which date was the document released?
Answer:  December 1, 2022



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: By which date must the company launch and operate half of its satellites?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: By which date is the company expected to have all its satellites operational?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: How many satellites is the company authorized to deploy and operate for this constellation?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: At which authorized altitudes will the company deploy its satellites?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: What are the authorized satellite inclinations within the corresponding altitudes?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: How many orbital planes, corresponding to given altitudes and inclinations, has the company been authorized for?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: How many satellites are allocated to each orbital plane?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: How many satellites, for each altitude and inclination, are there across all matching orbital planes?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: What is the satellite's expected operational lifetime in years?
Answer: <s>



The warning message you are seeing is due to the truncation strategy used by the tokenizer. The 'longest_first' truncation strategy truncates tokens from the longest of the two sequences (question or context) until they fit within the specified max_length. The warning is informing you that the overflowing tokens, which are the tokens removed during truncation, are not being returned in the inputs. This is expected behavior, as we are not using the overflowing tokens in this case.

The answers that are just indicate that the model is not able to find a suitable answer in the context for the given question. This could be because the answer is not present in the context, or because the context is too large and the relevant portion was truncated.

Because of this let us fine tune this model to fit our purpose.

## Lets now Fine-Tuned the model

We are using deepset/roberta-base-squad2 model which is used for question answering taks. It is based oon RoBERTa model which ia a variant of BERT (Bidirectional Encoder Representations from Transformers) model. BERT and RoBERTa are models designed to understand the context and relationships among words.
- RoBERTa: RoBERTa stands for "A Robustly Optimized BERT Pretraining Approach". It is an optimized version of BERT, which means it is trained on more data and for more iterations than BERT. RoBERTa modifies key hyperparameters in BERT, including removing the next-sentence pretraining objective, and training with much larger mini-batches and learning rates.
- squad2: SQuAD stands for Stanford Question Answering Dataset version 2.0 an extension of SQuAD 1.1 which includes unanswerable questions. This means that the model trained on this dataset not only needs to answer questions but also has to determine if a question is answerable or not based on the provided context.

In [9]:
from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
from transformers import RobertaTokenizerFast

#Reference: https://huggingface.co/deepset/roberta-base-squad2

model_name = "deepset/roberta-base-squad2"

#Load model & tokenizer
#model = AutoModelForQuestionAnswering.from_pretrained(model_name)
#tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = RobertaTokenizerFast.from_pretrained(model_name)

#using AutoModelForQuestionAnswering automatically infer the correct model and tokenizer classes to use based on the model name. This makes the code more flexible as it can work with any model architecture
#using RobertaTokenizerFast which is a fast tokenizer for RoBERTa models. The "fast" tokenizers are implemented in Rust and are more performant compared to the standard Python tokenizers. They also provide additional functionalities like alignment between the original and tokenized text.

In [10]:
#looking at RoBerta Question Answering
model

RobertaForQuestionAnswering(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (Lay

How to fine-tune a QA model
- we need GPU
- building a training script


In [11]:
#getting our own build training datasets
import requests
import json

url = "https://raw.githubusercontent.com/AlinZohari/InformationExtraction/main/data/QA_model/train.json"
response = requests.get(url)
train = response.json()

In [12]:
#looking at the train dataset
train

[{'context': 'In this Order and Authorization, we grant, to the extent set forth below, the request of Kuiper Systems LLC (Kuiper or Amazon) to provide satellite services.\n            Operating 3,372 satellites in 102 orbital planes at altitudes of 590 km, 610 km, and 630 km in a circular orbit.\n            At 590 km, 30 orbital planes with 28 satellites per plane for a total of 840 satellites at inclination of 33 degree.\n            At 610 km, 42 orbital planes with 36 satellites per plane for a total of 1512 satellites at inclination of 42 degree.\n            At 630 km, 30 orbital planes with 34 satellites per plane for a total of 1020 satellite at inclination of 51.9 degree.\n            The constellation are require to launch and operate 50 percent of its satellites no later than July 30, 2026, and must launch the remaining space stations necessary to complete its authorized service constellation, place them in their assigned orbits, and operate each of them in accordance with 

In [13]:
##etting our own build validation datasets
import requests
import json

url = "https://raw.githubusercontent.com/AlinZohari/InformationExtraction/main/data/QA_model/validation.json"
response = requests.get(url)
validation = response.json()

In [14]:
#looking atthe validation dataset
validation

[{'context': 'Release date: October 29, 1995 In this Order and Authorization, we grant, to the extent set forth below, the request of Ligado Networks LLC to provide Fixed Satellite Services (FSS). Operating 2320 satellites in 58 orbital planes in total at altitudes of 500, 600, 700 and 800 kilometers. At an altitude of 500 km, there are 15 orbital planes, each hosting 36 satellites, resulting in a total of 540 satellites at an inclination of 36.5 degrees. For the 600 km altitude, 23 orbital planes are present, with each plane containing 50 satellites, summing up to 1150 satellites at an inclination of 49 degrees. At the 700 km mark, 15 orbital planes are equipped with 27 satellites each, leading to a total of 405 satellites at a 51.9-degree inclination. Lastly, at 800 km, there are 5 orbital planes, and each has 45 satellites, amounting to a total of 225 satellites at an inclination of 59.3 degrees. The constellation are require to launch and operate 50 percent of its satellites no lat

## Preprocess the data

Here we want to preprocess our data so that it will fit the BERT/RoBERTa model by tokenizing our train data. The tokenizer we use will be the same as above RobertaTokenizerFast

In [15]:
!pip install datasets

Collecting datasets
  Downloading datasets-2.14.4-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.3/519.3 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.8,>=0.3.0 (from datasets)
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m15.1 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m22.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.15-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: xxhash, dill, multiprocess, datasets
Successfully installed datasets-2.14.

In [16]:
#we need to defined the tokenizer
#from transformers import RobertaTokenizerFast
#tokenizer = RobertaTokenizerFast.from_pretrained(model_name)
# needed to use BertTokenizerFast/ RobertaTokenizerFast return_offset_mapping feature is not available when using Python tokenizers.

In [17]:
import pandas as pd
from datasets import Dataset

def preprocess_function(examples):
    questions = []
    contexts = []
    answers = []
    ids = []

    for i in range(len(examples['context'])):
        context = examples['context'][i]
        qas = examples['qas'][i]

        for qa in qas:
            questions.append(qa['question'].strip())
            contexts.append(context)
            ids.append(qa['id'])
            if not qa['is_impossible']:
                ans = qa['answers'][0]
                answer_text = ans['text'] if ans['text'] else ""
                answers.append({'answer_start': [ans['answer_start']], 'text': [answer_text]})
            else:
                answers.append({'answer_start': [None], 'text': [""]})

    inputs = tokenizer(
        questions,
        contexts,
        max_length=384,
        truncation="only_second",
        return_offsets_mapping=True,
        padding="max_length",
    )

    offset_mapping = inputs.pop("offset_mapping")

    start_positions = []
    end_positions = []

    for i, offset in enumerate(offset_mapping):
        answer = answers[i]
        start_char = answer['answer_start'][0]
        end_char = start_char + len(answer['text'][0]) if answer['text'][0] else None
        sequence_ids = inputs.sequence_ids(i)

        if start_char is None or end_char is None:
            start_positions.append(0)
            end_positions.append(0)
        else:
            idx = 0
            while sequence_ids[idx] != 1:
                idx += 1
            context_start = idx
            while sequence_ids[idx] == 1:
                idx += 1
            context_end = idx - 1

            if offset[context_start][0] > end_char or offset[context_end][1] < start_char:
                start_positions.append(0)
                end_positions.append(0)
            else:
                idx = context_start
                while idx <= context_end and offset[idx][0] <= start_char:
                    idx += 1
                start_positions.append(idx - 1)

                idx = context_end
                while idx >= context_start and offset[idx][1] >= end_char:
                    idx -= 1
                end_positions.append(idx + 1)

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions
    inputs["answers"] = answers
    inputs["id"] = ids

    return inputs

# Convert lists to Dataset objects
train_dataset = Dataset.from_pandas(pd.DataFrame(train))
validation_dataset = Dataset.from_pandas(pd.DataFrame(validation))

# Apply preprocess_function
tokenized_train = train_dataset.map(preprocess_function, batched=True, remove_columns=train_dataset.column_names)
tokenized_validation = validation_dataset.map(preprocess_function, batched=True, remove_columns=validation_dataset.column_names)




Map:   0%|          | 0/13 [00:00<?, ? examples/s]

Map:   0%|          | 0/1 [00:00<?, ? examples/s]

In [18]:
tokenized_train

Dataset({
    features: ['input_ids', 'attention_mask', 'start_positions', 'end_positions', 'answers', 'id'],
    num_rows: 42
})

In [19]:
train_dataset

Dataset({
    features: ['context', 'qas'],
    num_rows: 13
})

The DefaultDataCollator is a class from the transformers library that is used to collate samples into batches for training or evaluation. When you train a model, you usually don't pass the entire dataset at once, but rather use mini-batches of data. The data_collator is responsible for taking the individual samples and combining them into these mini-batches.

The DefaultDataCollator will:

Handle the padding of the input data (if necessary) to ensure that all samples in the batch have the same length.
Convert the batch into PyTorch tensors.

In [20]:
from transformers import DefaultDataCollator

data_collator = DefaultDataCollator()

## Fine-tune the model

In [21]:
from transformers import TrainingArguments, Trainer
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

metric = load_metric("squad") loads the SQuAD (Stanford Question Answering Dataset) evaluation metric. This metric computes the Exact Match (EM) and F1 score, which are commonly used for evaluating question answering models.

Exact Match (EM): This is the simplest metric. It measures the percentage of predictions that match any one of the ground truth answers exactly.
F1 Score: This is a more complex metric that considers the overlap between the prediction and ground truth answer. It is the harmonic mean of precision and recall.

In [22]:
#defining training argument
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=20,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
)


In [23]:
from datasets import load_metric #from datasets in huggging face
import numpy as np
from transformers import EvalPrediction
from typing import Dict

metric = load_metric("squad")

def compute_metrics(p):
    start_preds, end_preds = p.predictions
    ids = tokenized_validation['id']
    input_ids = tokenized_validation['input_ids']

    # convert scores to actual positions
    start_positions = np.argmax(start_preds, axis=1)
    end_positions = np.argmax(end_preds, axis=1)

    predictions = []
    for id, input_id, start, end in zip(ids, input_ids, start_positions, end_positions):
        prediction_text = tokenizer.decode(input_id[start:end+1], skip_special_tokens=True)
        if not prediction_text:
            prediction_text = ""
        predictions.append({'id': id, 'prediction_text': prediction_text})

    references = [{'id': id, 'answers': {'answer_start': [answer['answer_start'][0]], 'text': [answer['text'][0] if answer['text'][0] else ""]}} for id, answer in zip(ids, tokenized_validation['answers'])]
    result = metric.compute(predictions=predictions, references=references)
    return result



trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_validation,
    compute_metrics=compute_metrics,
)

trainer.train()


  metric = load_metric("squad")


Downloading builder script:   0%|          | 0.00/1.72k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

Step,Training Loss


TrainOutput(global_step=120, training_loss=0.7039108912150065, metrics={'train_runtime': 64.868, 'train_samples_per_second': 12.949, 'train_steps_per_second': 1.85, 'total_flos': 164616956743680.0, 'train_loss': 0.7039108912150065, 'epoch': 20.0})

The TrainOutput object contains some information about the training process:


### Evaluation of the tuned model

In [24]:
# Evaluate the model
results = trainer.evaluate()

print(results)

{'eval_loss': 3.194551944732666, 'eval_exact_match': 54.54545454545455, 'eval_f1': 53.29380764163374, 'eval_runtime': 0.2869, 'eval_samples_per_second': 38.339, 'eval_steps_per_second': 6.971, 'epoch': 20.0}


In [25]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/gdrive')


# Save model and tokenizer
model.save_pretrained("/content/gdrive/MyDrive/tuned_model")
tokenizer.save_pretrained("/content/gdrive/MyDrive/tuned_model")



Mounted at /content/gdrive


('/content/gdrive/MyDrive/tuned_model/tokenizer_config.json',
 '/content/gdrive/MyDrive/tuned_model/special_tokens_map.json',
 '/content/gdrive/MyDrive/tuned_model/vocab.json',
 '/content/gdrive/MyDrive/tuned_model/merges.txt',
 '/content/gdrive/MyDrive/tuned_model/added_tokens.json',
 '/content/gdrive/MyDrive/tuned_model/tokenizer.json')

## Using the tuned model

In [26]:
import os

# List the contents of the directory
os.listdir('/content/gdrive/MyDrive/tuned_model')


['training_args.bin',
 'config.json',
 'pytorch_model.bin',
 'special_tokens_map.json',
 'merges.txt',
 'vocab.json',
 'tokenizer_config.json',
 'tokenizer.json']

In [27]:
print(os.path.abspath('/content/gdrive/MyDrive/tuned_model'))


/content/gdrive/MyDrive/tuned_model


In [28]:
from transformers import RobertaTokenizer, RobertaTokenizerFast, RobertaForQuestionAnswering, AutoModelForQuestionAnswering
import torch
import requests
import torch.nn.functional as F #to find the score of the answer

# Load the saved model and tokenizer
model = AutoModelForQuestionAnswering.from_pretrained("/content/gdrive/MyDrive/tuned_model")
tokenizer = RobertaTokenizerFast.from_pretrained("/content/gdrive/MyDrive/tuned_model")

# Read context from a .txt file
url = "https://raw.githubusercontent.com/AlinZohari/InformationExtraction/starlink_gen2/data/authorize_doc/StarlinkGen2_FCC-22-91A1.txt"
response = requests.get(url)
context = response.text

#define the questions
questions = [
    "What's the name of the satellite constellation the company seeks to deploy or operate?",
    "On which date was the document released?",
    "By which date must the company launch and operate half of its satellites?",
    "By which date is the company expected to have all its satellites operational?",
    "How many satellites is the company authorized to deploy and operate for this constellation?",
    "At which authorized altitudes will the company deploy its satellites?",
    "What are the authorized satellite inclinations within the corresponding altitudes?",
    "How many orbital planes, corresponding to given altitudes and inclinations, has the company been authorized for?",
    "How many satellites are allocated to each orbital plane?",
    "How many satellites, for each altitude and inclination, are there across all matching orbital planes?",
    "What is the satellite's expected operational lifetime in years?"
]
# Function to ask a single question
def ask_question(question, context):
    # Split the context into chunks of 512 tokens with an overlap of 100 tokens
    chunk_size = 512 - tokenizer.num_special_tokens_to_add(pair=True)
    overlap = 100
    context_chunks = [context[i:i+chunk_size] for i in range(0, len(context), chunk_size - overlap)]

    answers = []

    for context_chunk in context_chunks:
        inputs = tokenizer(question, context_chunk, return_tensors='pt')
        outputs = model(**inputs)
        start_logits = F.softmax(outputs.start_logits, dim=-1)
        end_logits = F.softmax(outputs.end_logits, dim=-1)
        answer_start = torch.argmax(start_logits)
        answer_end = torch.argmax(end_logits)
        answer_score = start_logits[0, answer_start].item() * end_logits[0, answer_end].item()
        answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][answer_start:answer_end+1]))
        answers.append((answer, answer_score))

    # Combine the answers from each chunk
    full_answers = [answer for answer, _ in answers]
    full_answer = ' '.join(full_answers)
    # Combine the scores from each chunk
    full_scores = [score for _, score in answers]
    max_score = max(full_scores)
    return full_answer, max_score

# Ask each question
answers_and_scores = [ask_question(question, context) for question in questions]

# Print the answers and scores
for question, (answer, score) in zip(questions, answers_and_scores):
    print(f'Question: {question}')
    print(f'Answer: {answer}')
    print(f'Score: {score}\n')


Question: What's the name of the satellite constellation the company seeks to deploy or operate?
Answer:  NGSO <s> <s> <s>  Space Exploration Holdings  Starlink  SpaceX Gen2  SpaceX  Gen2 Starlink <s>  SpaceX   SpaceX  Starlink Gen2  SpaceX  Gen2 Starlink  Gen2 Starlink  SpaceX  SES Americom and O3b  Viasat T-AMD  SAT-AMD  DISH Network Corporation (DISH <s>  SpaceX  NGSO  SpaceX NGSO  SpaceX NGSO  SpaceX NGSO  Starlink  Starlink  Starlink  Starlink  Gen1 Starlink  Gen1 Starlink  Gen1 Starlink  SpaceX NGSO  Starlink  SpaceX tarlink  New Spectrum Satellite  NGSO  NGSO  Satellit  NGSO  New Spectrum Satellite   New Spectrum Satellite  Kepler ceX Gen2 Starlink  Gen2 Starlink  SpaceX  Gen2 Starlink  Kuiper Kuiper Kuiper  Viasat  Viasat  EchoStar  Kuiper  EchoStar Kuiper Eutelsat   SpaceX Gen2 Starlink  SpaceX  Gen2 Starlink  SpaceX  Gen1 Starlink  Gen2 Starlink SpaceX  Space Exploration Technologies Corp., to Marlene H. Dortch, Secretary, FCC, IBFS File Nos. SAT-LOA-20200526-00055 and SAT-AM