# Question answering (PyTorch)

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [1]:
!pip install datasets evaluate transformers[sentencepiece]
!pip install accelerate

!apt install git-lfs

Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m6

You will need to setup git, adapt your email and name in the following cell.

You will also need to be logged in to the Hugging Face Hub. Execute the following and enter your credentials.

In [2]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [3]:
!pip install datasets
from datasets import load_dataset
import datasets
from transformers import AutoTokenizer



In [4]:
from transformers import AutoTokenizer

#uncomment below when fine-tuning form the base model
model_checkpoint = "deepset/roberta-base-squad2"
#model_checkpoint = "dcrowleymunster/roberta-finetuned-sunderlandUni2-emergency-proj"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

In [5]:
tokenizer.is_fast

True

In [6]:
# A max legth of input sequence in characters is about 1544 to 1,930 characters including spaces.
# This corresponds to 384 inout tokens
max_length = 384
stride = 128


def preprocess_training_examples(examples):
    questions = [q.strip() for q in examples["question"]]
    inputs = tokenizer(
        questions,
        examples["context"],
        max_length=max_length,
        truncation="only_second",
        stride=stride,
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length",
    )

    offset_mapping = inputs.pop("offset_mapping")
    sample_map = inputs.pop("overflow_to_sample_mapping")
    answers = examples["answers"]
    start_positions = []
    end_positions = []

    for i, offset in enumerate(offset_mapping):
        sample_idx = sample_map[i]
        answer = answers[sample_idx]
        start_char = answer["answer_start"][0]
        end_char = answer["answer_start"][0] + len(answer["text"][0])
        sequence_ids = inputs.sequence_ids(i)

        # Find the start and end of the context
        idx = 0
        while sequence_ids[idx] != 1:
            idx += 1
        context_start = idx
        while sequence_ids[idx] == 1:
            idx += 1
        context_end = idx - 1

        # If the answer is not fully inside the context, label is (0, 0)
        if offset[context_start][0] > start_char or offset[context_end][1] < end_char:
            start_positions.append(0)
            end_positions.append(0)
        else:
            # Otherwise it's the start and end token positions
            idx = context_start
            while idx <= context_end and offset[idx][0] <= start_char:
                idx += 1
            start_positions.append(idx - 1)

            idx = context_end
            while idx >= context_start and offset[idx][1] >= end_char:
                idx -= 1
            end_positions.append(idx + 1)

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions
    return inputs

In [28]:
import pandas as pd
df_train=pd.read_csv('/content/trainMulti4.csv',skip_blank_lines=True)
df_test=pd.read_csv('/content/testMulti4.csv')

In [29]:
df_train.columns

Index(['item_id', 'domain', 'nn_mod', 'nn_asp', 'query_mod', 'query_asp',
       'q_review_id', 'q_reviews_id', 'question', 'question_subj_level',
       'ques_subj_score', 'is_ques_subjective', 'review_id', 'review',
       'human_ans_spans', 'human_ans_indices', 'answer_subj_level',
       'ans_subj_score', 'is_ans_subjective'],
      dtype='object')

In [None]:
len(df_train)

2532

In [11]:
df_train.iloc[2511].answers

AttributeError: 'Series' object has no attribute 'answers'

In [12]:
df_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2532 entries, 0 to 2531
Data columns (total 19 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   item_id              2501 non-null   object 
 1   domain               2501 non-null   object 
 2   nn_mod               2501 non-null   object 
 3   nn_asp               2501 non-null   object 
 4   query_mod            2501 non-null   object 
 5   query_asp            2501 non-null   object 
 6   q_review_id          2501 non-null   object 
 7   q_reviews_id         2501 non-null   object 
 8   question             2532 non-null   object 
 9   question_subj_level  2501 non-null   float64
 10  ques_subj_score      2501 non-null   float64
 11  is_ques_subjective   2501 non-null   object 
 12  review_id            2501 non-null   object 
 13  review               2532 non-null   object 
 14  human_ans_spans      2532 non-null   object 
 15  human_ans_indices    2532 non-null   o

In [None]:
df_train.describe()

Unnamed: 0,question_subj_level,ques_subj_score,answer_subj_level,ans_subj_score
count,2501.0,2501.0,2501.0,2501.0
mean,1.823671,0.166232,1.822071,0.200875
std,1.361652,0.294166,1.358797,0.333595
min,1.0,0.0,1.0,0.0
25%,1.0,0.0,1.0,0.0
50%,1.0,0.0,1.0,0.0
75%,2.0,0.233333,2.0,0.427083
max,5.0,1.0,5.0,1.0


In [15]:
df_train.iloc[2520].review

'Located on your App AI message text box window, located at the bottom of your cellphone screen  type in Bomb_Threat_in_Building ensuirng you type in the Building ID as well. ANSWERNOTFOUND'

In [9]:
df_train.iloc[0].review[251:265]

'ANSWERNOTFOUND'

In [30]:
df_train=df_train[['question','human_ans_indices','review','human_ans_spans']]
df_test=df_test[['question','human_ans_indices','review','human_ans_spans']]

In [11]:
df_train.shape[0]

2532

In [17]:
df_train['review']

Unnamed: 0,review
0,Whether it be in her portrayal of a nerdy lesb...
1,"An outstanding romantic comedy, 13 Going on 30..."
2,"To let the truth be known, I watched this movi..."
3,"At the time of my review, there had been 910 c..."
4,"""Fright Night"" is great! This is how the story..."
...,...
2527,If you find that smoke is overwhelming you try...
2528,If the office or building loses electrical po...
2529,In the event of an Earthquake move away from w...
2530,"During inclement weather conditions, move away..."


In [31]:
import numpy as np
df_train['id']=np.linspace(0,len(df_train)-1,len(df_train))
df_test['id']=np.linspace(0,len(df_test)-1,len(df_test))

df_train['id']=df_train['id'].astype(str)
df_test['id']=df_test['id'].astype(str)

In [32]:
int(df_train.iloc[2503].human_ans_indices.split('(')[1].split(',')[0])

75

In [33]:
float(df_train.iloc[0].human_ans_indices.split('(')[1].split(',')[1].split(' ')[1].split(')')[0])

265.0

In [34]:
df_train['answers']=df_train['human_ans_spans']
df_test['answers']=df_test['human_ans_spans']

In [35]:
for i in range(0,len(df_train)):
  answer1={}
  print(df_train.iloc[i])
  si=int(df_train.iloc[i].human_ans_indices.split('(')[1].split(',')[0])
  print(si)
  ei=int(df_train.iloc[i].human_ans_indices.split('(')[1].split(',')[1].split(' ')[1].split(')')[0])
  print(ei)
  answer1['text']=[df_train.iloc[i].review[si:ei]]
  answer1['answer_start']=[si]
  df_train.at[i, 'answers']=answer1
  print(df_train.iloc[i].answers,df_train.iloc[i].human_ans_spans)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
question                                What is the quality of themes?
human_ans_indices                                         (2940, 2954)
review               The most important moment for a hero, whether ...
human_ans_spans                                         ANSWERNOTFOUND
id                                                              2032.0
answers                                                 ANSWERNOTFOUND
Name: 2032, dtype: object
2940
2954
{'text': ['ANSWERNOTFOUND'], 'answer_start': [2940]} ANSWERNOTFOUND
question                    How is dialogue in the animated ant movie?
human_ans_indices                                         (1448, 1462)
review               Lucas plumbed the depths of bad filmmaking wit...
human_ans_spans                                         ANSWERNOTFOUND
id                                                              2033.0
answers                                           

In [36]:
print(df_train.iloc[2524].question)
print(df_train.iloc[2524].review)
print(df_train.iloc[2524].human_ans_indices)
print(df_train.iloc[2524].human_ans_spans)

Where are fire extinguishers for qtc12345?
for building qtc12345 on each floor there are numerous fire extinguishers located as follows, Trisection A: By the Exit to the Stairs, Trisection A By the Exit to the Stairs Trisection B By the Exit to the Stairs Trisection C In the kitchen. ANSWERNOTFOUND
(47, 96)
Trisection A By the Exit to the Stairs Trisection B By the Exit to the Stairs Trisection C In the kitchen


In [18]:
len(df_train)

2532

In [37]:
for i in range(0,len(df_test)):
  answer1={}
  si=int(df_test.iloc[i].human_ans_indices.split('(')[1].split(',')[0])
  ei=int(df_test.iloc[i].human_ans_indices.split('(')[1].split(',')[1].split(' ')[1].split(')')[0])
  answer1['text']=[df_test.iloc[i].review[si:ei]]
  answer1['answer_start']=[si]
  df_test.at[i, 'answers']=answer1
  print(df_train.iloc[i].answers,df_train.iloc[i].human_ans_spans)

{'text': ['ANSWERNOTFOUND'], 'answer_start': [251]} ANSWERNOTFOUND
{'text': ['ANSWERNOTFOUND'], 'answer_start': [1195]} ANSWERNOTFOUND
{'text': ['ANSWERNOTFOUND'], 'answer_start': [1476]} ANSWERNOTFOUND
{'text': ['this show is OUTSTANDING'], 'answer_start': [296]} this show is OUTSTANDING
{'text': ['The costume design by Susan Matheson is great'], 'answer_start': [1254]} The costume design by Susan Matheson is great
{'text': ['ANSWERNOTFOUND'], 'answer_start': [2897]} ANSWERNOTFOUND
{'text': ['ANSWERNOTFOUND'], 'answer_start': [2723]} ANSWERNOTFOUND
{'text': ['ANSWERNOTFOUND'], 'answer_start': [855]} ANSWERNOTFOUND
{'text': ['Oh my god is this a STUPID film'], 'answer_start': [1212]} Oh my god is this a STUPID film
{'text': ['ANSWERNOTFOUND'], 'answer_start': [232]} ANSWERNOTFOUND
{'text': ['ANSWERNOTFOUND'], 'answer_start': [8473]} ANSWERNOTFOUND
{'text': ['KingKong'], 'answer_start': [9]} KingKong
{'text': ['The acting is well done'], 'answer_start': [1243]} The acting is well done
{

In [20]:
df_train.columns

Index(['question', 'human_ans_indices', 'review', 'human_ans_spans', 'id',
       'answers'],
      dtype='object')

In [38]:
df_train.columns=['question', 'human_ans_indices', 'context', 'human_ans_spans', 'id',
       'answers']

df_test.columns=['question', 'human_ans_indices', 'context', 'human_ans_spans','id',
       'answers']

In [39]:
val_dataset2 = datasets.Dataset.from_pandas(df_test)
train_dataset2 = datasets.Dataset.from_pandas(df_train)


In [40]:
train_dataset = train_dataset2.map(
    preprocess_training_examples,
    batched=True,
    remove_columns=train_dataset2.column_names,
)
len(train_dataset2), len(train_dataset)

Map:   0%|          | 0/2532 [00:00<?, ? examples/s]

(2532, 4893)

In [41]:
train_dataset2.shape

(2532, 6)

In [42]:
def preprocess_validation_examples(examples):
    questions = [q.strip() for q in examples["question"]]
    inputs = tokenizer(
        questions,
        examples["context"],
        max_length=max_length,
        truncation="only_second",
        stride=stride,
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length",
    )

    sample_map = inputs.pop("overflow_to_sample_mapping")
    example_ids = []

    for i in range(len(inputs["input_ids"])):
        sample_idx = sample_map[i]
        example_ids.append(examples["id"][sample_idx])

        sequence_ids = inputs.sequence_ids(i)
        offset = inputs["offset_mapping"][i]
        inputs["offset_mapping"][i] = [
            o if sequence_ids[k] == 1 else None for k, o in enumerate(offset)
        ]

    inputs["example_id"] = example_ids
    return inputs

In [43]:
validation_dataset = val_dataset2.map(
    preprocess_validation_examples,
    batched=True,
    remove_columns=val_dataset2.column_names,
)
len(validation_dataset)

Map:   0%|          | 0/582 [00:00<?, ? examples/s]

1104

In [44]:
len(validation_dataset)

1104

In [45]:
validation_dataset

Dataset({
    features: ['input_ids', 'attention_mask', 'offset_mapping', 'example_id'],
    num_rows: 1104
})

In [None]:
len(val_dataset2)

582

In [46]:
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

In [47]:
import torch
from transformers import AutoModelForQuestionAnswering


In [48]:
import collections



In [49]:
import evaluate

metric = evaluate.load("squad")

Downloading builder script:   0%|          | 0.00/4.53k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.32k [00:00<?, ?B/s]

In [50]:
from tqdm.auto import tqdm


def compute_metrics(start_logits, end_logits, features, examples):
    example_to_features = collections.defaultdict(list)
    for idx, feature in enumerate(features):
        example_to_features[feature["example_id"]].append(idx)

    predicted_answers = []
    for example in tqdm(examples):
        example_id = example["id"]
        context = example["context"]
        answers = []

        # Loop through all features associated with that example
        for feature_index in example_to_features[example_id]:
            start_logit = start_logits[feature_index]
            end_logit = end_logits[feature_index]
            offsets = features[feature_index]["offset_mapping"]

            start_indexes = np.argsort(start_logit)[-1 : -n_best - 1 : -1].tolist()
            end_indexes = np.argsort(end_logit)[-1 : -n_best - 1 : -1].tolist()
            for start_index in start_indexes:
                for end_index in end_indexes:
                    # Skip answers that are not fully in the context
                    if offsets[start_index] is None or offsets[end_index] is None:
                        continue
                    # Skip answers with a length that is either < 0 or > max_answer_length
                    if (
                        end_index < start_index
                        or end_index - start_index + 1 > max_answer_length
                    ):
                        continue

                    answer = {
                        "text": context[offsets[start_index][0] : offsets[end_index][1]],
                        "logit_score": start_logit[start_index] + end_logit[end_index],
                    }
                    answers.append(answer)

        # Select the answer with the best score
        if len(answers) > 0:
            best_answer = max(answers, key=lambda x: x["logit_score"])
            predicted_answers.append(
                {"id": example_id, "prediction_text": best_answer["text"]}
            )
        else:
            predicted_answers.append({"id": example_id, "prediction_text": ""})

    theoretical_answers = [{"id": ex["id"], "answers": ex["answers"]} for ex in examples]
    return metric.compute(predictions=predicted_answers, references=theoretical_answers)

In [51]:
model = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint)

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

In [None]:
# from huggingface_hub import notebook_login

# notebook_login()

In [53]:
from transformers import TrainingArguments

args = TrainingArguments(
    "roberta-finetuned-sunderlandUni-terminate-token-emergency-multitest4-proj",
    evaluation_strategy="epoch",
    logging_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    num_train_epochs=50,
    weight_decay=0.01,
    fp16=True,
    push_to_hub=True,
    report_to="none"
)



In [None]:
!huggingface-cli login



    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: fineGrained).
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in yo

In [54]:
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=validation_dataset,
    tokenizer=tokenizer,
)


  trainer = Trainer(


In [34]:
import numpy as np
n_best=20
max_answer_length = 300

In [35]:
predictions, _, _ = trainer.predict(validation_dataset)
start_logits, end_logits = predictions
compute_metrics(start_logits, end_logits, validation_dataset, val_dataset2)

  0%|          | 0/582 [00:00<?, ?it/s]

{'exact_match': 2.577319587628866, 'f1': 9.799049517983459}

In [55]:
trainer.train()

Epoch,Training Loss,Validation Loss
1,1.3001,No log
2,1.0511,No log
3,0.8896,No log
4,0.7407,No log
5,0.6533,No log
6,0.5862,No log
7,0.5426,No log
8,0.5021,No log
9,0.4772,No log
10,0.4555,No log


TrainOutput(global_step=30600, training_loss=0.385799373551911, metrics={'train_runtime': 7233.9789, 'train_samples_per_second': 33.82, 'train_steps_per_second': 4.23, 'total_flos': 4.79446886515968e+16, 'train_loss': 0.385799373551911, 'epoch': 50.0})

In [56]:
trainer.push_to_hub(commit_message="Training complete")

CommitInfo(commit_url='https://huggingface.co/dcrowleymunster/roberta-finetuned-sunderlandUni-terminate-token-emergency-multitest4-proj/commit/cf6081f8563a47bceae15d5b1392ab82571959fb', commit_message='Training complete', commit_description='', oid='cf6081f8563a47bceae15d5b1392ab82571959fb', pr_url=None, repo_url=RepoUrl('https://huggingface.co/dcrowleymunster/roberta-finetuned-sunderlandUni-terminate-token-emergency-multitest4-proj', endpoint='https://huggingface.co', repo_type='model', repo_id='dcrowleymunster/roberta-finetuned-sunderlandUni-terminate-token-emergency-multitest4-proj'), pr_revision=None, pr_num=None)

In [38]:
predictions, _, _ = trainer.predict(validation_dataset)
start_logits, end_logits = predictions
compute_metrics(start_logits, end_logits, validation_dataset, val_dataset2)

  0%|          | 0/582 [00:00<?, ?it/s]

{'exact_match': 62.19931271477663, 'f1': 64.10528248751693}

In [None]:
#Inference!

In [None]:
#START INFERENCE From HERE - LOADING THE MODEL here
#Load the modified df_train dataframes that gave the review and question columns

In [1]:
from transformers import pipeline

In [2]:
# Replace this with your own checkpoint
model_checkpoint = "dcrowleymunster/roberta-finetuned-sunderlandUni-terminate-token-emergency-multitest4-proj"
question_answerer = pipeline("question-answering", model=model_checkpoint)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.27k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/3.56M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/957 [00:00<?, ?B/s]

In [3]:
import pandas as pd
df_train1=pd.read_csv('/content/trainMulti4.csv')
#df_test1=pd.read_csv('/content/testMulti4.csv')

In [48]:
df_train1.iloc[2501].question

'What is Emergency contact qtc12345?'

In [4]:
df_train1.iloc[2501].review

'For company employees and contractors in building qtc12345, The telephone contact numbers are 71723234567 and 9131234567. ANSWERNOTFOUND'

In [5]:
context = df_train1.iloc[2501].review
print(context)
question = df_train1.iloc[2501].question
print(question)
question_answerer(question=question, context=context)

For company employees and contractors in building qtc12345, The telephone contact numbers are 71723234567 and 9131234567. ANSWERNOTFOUND
What is Emergency contact qtc12345?


{'score': 0.9999581575393677,
 'start': 74,
 'end': 120,
 'answer': 'contact numbers are 71723234567 and 9131234567'}

In [6]:
context = df_train1.iloc[2502].review
print(context)
question = df_train1.iloc[2502].question
print(question)
question_answerer(question=question, context=context)

For company employees and contractors in building qtc676767, The telephone contact numbers are 3451114567 and 3451233434. ANSWERNOTFOUND
What is Emergency  contact qtc676767?


{'score': 0.9999556541442871,
 'start': 75,
 'end': 120,
 'answer': 'contact numbers are 3451114567 and 3451233434'}

In [7]:
context = df_train1.iloc[2503].review
question = df_train1.iloc[2503].question
question_answerer(question=question, context=context)

{'score': 0.9999555349349976,
 'start': 75,
 'end': 120,
 'answer': 'contact numbers are 7031234567 and 7031234567'}

In [8]:
context = df_train1.iloc[2504].review
print(df_train1.iloc[2504].review)
question = df_train1.iloc[2504].question
print(df_train1.iloc[2504].question)
question_answerer(question=question, context=context)

In buidlings  qtc12345,  qtc676767 and  qtc123423, on for floor 22 and suite 200, in order to exit building qtc676767, look to see which area in the office you are in: Section A follow yellow line Section B follow red line Section C follow blue line. ANSWERNOTFOUND
What is exit route from  qtc676767


{'score': 0.9999628067016602,
 'start': 168,
 'end': 249,
 'answer': 'Section A follow yellow line Section B follow red line Section C follow blue line'}

In [9]:
context = df_train1.iloc[2505].review
print(context)
question = df_train1.iloc[2505].question
print(question)
question_answerer(question=question, context=context)

In buidlings qtc12345, qtc676767 and qtc123423, on for floor 23 and suite 300, in order to exit, look to see which area in the office you are in Trisection A follow yellow line. ANSWERNOTFOUND
Wht is exit route from qtc12345 ?


{'score': 0.9999498128890991,
 'start': 145,
 'end': 176,
 'answer': 'Trisection A follow yellow line'}

In [10]:
context = df_train1.iloc[2506].review
print(context)
question = df_train1.iloc[2506].question
print(question)
question_answerer(question=question, context=context)

Each trisection line has it's own color to help employees orientate themselves for normal office nagivation but mosr importantly during an emergency. Start with orientating yourself and look for a sign that reads trisection A, B or C. Next locate and follow trisection color line to the Exit and the stairs and elevators. ANSWERNOTFOUND
Where is the closest Exit to the stairs?


{'score': 0.9996997714042664,
 'start': 251,
 'end': 306,
 'answer': 'follow trisection color line to the Exit and the stairs'}

In [11]:
context = df_train1.iloc[2507].review
print(context)
question = df_train1.iloc[2507].question
print(question)
question_answerer(question=question, context=context)

Each floor in buidlings qtc12345,  qtc676767 and  qtc123423, for floor 22/23, the general structure is that the stairs and elevators are in the middle of the office floor area, so if you keep turning left you will eventuallt reach the centre where the stairs and eleeevators are. However if you on all floors if you look for a bightly colored sign on any wall you will locate sign trisection A B or C follow arrow adjacent to it, follow the trisection sign arrrow and line from where you are, and you will reach the stairs and elevators. ANSWERNOTFOUND
How do I know what my exit route is?


{'score': 0.9999399185180664,
 'start': 369,
 'end': 413,
 'answer': 'locate sign trisection A B or C follow arrow'}

In [12]:
context = df_train1.iloc[2508].review
print(context)
question = df_train1.iloc[2508].question
print(question)
question_answerer(question=question, context=context)

Medical kits are stocked with the appropriate requirements for minor cuts/bruises. The medical kit is opposite the Restroom entrance. ANSWERNOTFOUND
Where is the medical kit for qtc12345 ?


{'score': 0.9999351501464844,
 'start': 83,
 'end': 132,
 'answer': 'The medical kit is opposite the Restroom entrance'}

In [13]:
context = df_train1.iloc[2509].review
print(context)
question = df_train1.iloc[2509].question
print(question)
question_answerer(question=question, context=context)

Medical kits are stocked with the appropriate requirements for minor cuts/bruises. In building qtc676767, on Floors 22 and 23. The medical kit is opposite main  entrance. ANSWERNOTFOUND
Where is the medical kit  for qtc676767?


{'score': 0.999934196472168,
 'start': 127,
 'end': 169,
 'answer': 'The medical kit is opposite main  entrance'}

In [14]:
context = df_train1.iloc[2510].review
print(context)
question = df_train1.iloc[2510].question
print(question)
question_answerer(question=question, context=context)

Medical kits are stocked with the appropriate requirements for minor cuts/bruises. In building qtc123423 on Floors 22 and 23.  The medical kit is opposite the kitchen entrance. ANSWERNOTFOUND
Where is the medical kit  for building qtc123423?


{'score': 0.9999241828918457,
 'start': 127,
 'end': 175,
 'answer': 'The medical kit is opposite the kitchen entrance'}

In [15]:
context = df_train1.iloc[2511].review
print(context)
question = df_train1.iloc[2511].question
print(question)
question_answerer(question=question, context=context)

for building qtc12345, On Floors 22 and 23, beside the Elevators. Behind a glass case on the wall. ANSWERNOTFOUND
Where is the defibrillator  for qtc12345?


{'score': 0.9999549388885498,
 'start': 66,
 'end': 97,
 'answer': 'Behind a glass case on the wall'}

In [16]:
context = df_train1.iloc[2512].review
print(context)
question = df_train1.iloc[2512].question
print(question)
question_answerer(question=question, context=context)

for building qtc676767 on floors 22 and 23 beside security desk close to fire extinguishers on  wall. ANSWERNOTFOUND
Where is defibrillator for qtc676767?


{'score': 0.9999605417251587,
 'start': 43,
 'end': 100,
 'answer': 'beside security desk close to fire extinguishers on  wall'}

In [17]:
context = df_train1.iloc[2513].review
print(context)
question = df_train1.iloc[2513].question
print(question)
question_answerer(question=question, context=context)

for building qtc123423 on floors 22 and 23 beside security desk close to fire extinguishers on  wall behind glass case. ANSWERNOTFOUND
Where is  defibrillator for qtc123423?


{'score': 0.9999316930770874,
 'start': 43,
 'end': 118,
 'answer': 'beside security desk close to fire extinguishers on  wall behind glass case'}

In [18]:
context = df_train1.iloc[2514].review
print(context)
question = df_train1.iloc[2514].question
print(question)
question_answerer(question=question, context=context)

for building qtc12345 there is an Evacuation chair located on Floors 22 and 23 in kitchen to right of restrooms. ANSWERNOTFOUND
Where is Evacuation chair  for qtc12345?


{'score': 0.9999574422836304,
 'start': 59,
 'end': 111,
 'answer': 'on Floors 22 and 23 in kitchen to right of restrooms'}

In [19]:
context = df_train1.iloc[2515].review
print(context)
question = df_train1.iloc[2515].question
print(question)
question_answerer(question=question, context=context)

for building qtc676767  there is an Evacuation chair located on Floors 22 and 23 in  kitchen. ANSWERNOTFOUND
Where is Evacuation chair  for qtc676767?


{'score': 0.9999653100967407,
 'start': 61,
 'end': 92,
 'answer': 'on Floors 22 and 23 in  kitchen'}

In [20]:
context = df_train1.iloc[2516].review
print(context)
question = df_train1.iloc[2516].question
print(question)
question_answerer(question=question, context=context)

for building qtc123423  there is an evacuation chair located on Floors 22 and 23 beside  front desk. ANSWERNOTFOUND
Where is the Evacuation chair  for qtc123423?


{'score': 0.9999653100967407,
 'start': 61,
 'end': 99,
 'answer': 'on Floors 22 and 23 beside  front desk'}

In [21]:
context = df_train1.iloc[2517].review
print(context)
question = df_train1.iloc[2517].question
print(question)
question_answerer(question=question, context=context)

For active shoorters scenario's the safest and quickest thing to do is to:  RUN, HIDE, FIGHT. ANSWERNOTFOUND
What should I do there is an active shooter


{'score': 0.9999390840530396,
 'start': 76,
 'end': 92,
 'answer': 'RUN, HIDE, FIGHT'}

In [22]:
context = df_train1.iloc[2518].review
print(context)
question = df_train1.iloc[2518].question
print(question)
question_answerer(question=question, context=context)

Communicating with authorities, in the text box at the bottm of this application type in: send location or send photo this will cause this app to send your location or a photo tp pre-defined numbers. ANSWERNOTFOUND
How do I message my location to authorities ?


{'score': 0.9999600648880005,
 'start': 90,
 'end': 117,
 'answer': 'send location or send photo'}

In [23]:
context = df_train1.iloc[2519].review
print(context)
question = df_train1.iloc[2519].question
print(question)
question_answerer(question=question, context=context)

For all emergencies you can call  8881234567 or 7771234567, however for immediate emergencies dial 911. ANSWERNOTFOUND
What are emergency security numbers?


{'score': 0.9999698400497437,
 'start': 28,
 'end': 58,
 'answer': 'call  8881234567 or 7771234567'}

In [24]:
context = df_train1.iloc[2520].review
print(context)
question = df_train1.iloc[2520].question
print(question)
question_answerer(question=question, context=context)

Located on your App AI message text box window, located at the bottom of your cellphone screen  message Bomb_Threat_in_Building to security  ensuirng you type in the Building ID as well. ANSWERNOTFOUND
What do I do for bomb threat ?


{'score': 0.9999363422393799,
 'start': 96,
 'end': 139,
 'answer': 'message Bomb_Threat_in_Building to security'}

In [25]:
context = df_train1.iloc[2521].review
print(context)
question = df_train1.iloc[2521].question
print(question)
question_answerer(question=question, context=context)

When you see any un-attended item, regardless, imediately Evacuate the area and contact security. ANSWERNOTFOUND
what should I do with I unattended item?


{'score': 0.9999384880065918,
 'start': 47,
 'end': 96,
 'answer': 'imediately Evacuate the area and contact security'}

In [26]:
context = df_train1.iloc[2522].review
print(context)
question = df_train1.iloc[2522].question
print(question)
question_answerer(question=question, context=context)

In the case where the elevator becomes stuck, Press the intercom button and wait to be connected to the operator. ANSWERNOTFOUND
What do i do stuck in the elevator?


{'score': 0.9999383687973022,
 'start': 46,
 'end': 112,
 'answer': 'Press the intercom button and wait to be connected to the operator'}

In [27]:
context = df_train1.iloc[2523].review
print(context)
question = df_train1.iloc[2523].question
print(question)
question_answerer(question=question, context=context)

In the event of hearing the fire alarm, Immediately call security on 6261234567 or 8581234567 and make your way to the stairs to exit the building. ANSWERNOTFOUND
What do I do fire alarm has sounded ?


{'score': 0.99991774559021,
 'start': 69,
 'end': 93,
 'answer': '6261234567 or 8581234567'}

In [28]:
context = df_train1.iloc[2524].review
print(context)
question = df_train1.iloc[2524].question
print(question)
question_answerer(question=question, context=context)

for building qtc12345 on each floor there are numerous fire extinguishers located as follows, Trisection A: By the Exit to the Stairs, Trisection A By the Exit to the Stairs Trisection B By the Exit to the Stairs Trisection C In the kitchen. ANSWERNOTFOUND
Where are fire extinguishers for qtc12345?


{'score': 0.9998230934143066,
 'start': 46,
 'end': 104,
 'answer': 'numerous fire extinguishers located as follows, Trisection'}

In [29]:
context = df_train1.iloc[2525].review
print(context)
question = df_train1.iloc[2525].question
print(question)
question_answerer(question=question, context=context)

for building qtc676767 on each floor there are 3 fire extinguishers located as follows. By the Exit to Stairs Trisection C in kitchen. ANSWERNOTFOUND
Where are fire extinguishers for qtc676767?


{'score': 0.9999499320983887,
 'start': 88,
 'end': 133,
 'answer': 'By the Exit to Stairs Trisection C in kitchen'}

In [30]:
context = df_train1.iloc[2526].review
print(context)
question = df_train1.iloc[2526].question
print(question)
question_answerer(question=question, context=context)

for building qtc123423 on each floor there 2 fire extinguishers located as follows. By the Exit to Stairs Trisection B in kitchen. ANSWERNOTFOUND
Where are fire extinguishers for qtc123423?


{'score': 0.9999505281448364,
 'start': 84,
 'end': 129,
 'answer': 'By the Exit to Stairs Trisection B in kitchen'}

In [31]:
context = df_train1.iloc[2527].review
print(context)
question = df_train1.iloc[2527].question
print(question)
question_answerer(question=question, context=context)

If you find that smoke is overwhelming you try to cover your mouth with something so you dont inhale the smoke and in the text box below send your location including floor number and trisection e.g. Help I cant breath, I am on floor 22 in trisection 3, by the window. ANSWERNOTFOUND
What to so there is so much smoke 


{'score': 0.9999192953109741,
 'start': 50,
 'end': 110,
 'answer': 'cover your mouth with something so you dont inhale the smoke'}

In [32]:
context = df_train1.iloc[2528].review
print(context)
question = df_train1.iloc[2528].question
print(question)
question_answerer(question=question, context=context)

 If the office or building loses electrical power and there is no lighting All the lights have gone and I cannot see anything  On your cellphone activate the flashlight and follow your subway exit route color to the stairs. Make sure you contact site management on 6261234567 or 8581234567 please notify the Office Manager or Security immediately using these numbers: QTC EHS (DB): 888 123 4567, Corp EHS (JK): 888 123 456. ANSWERNOTFOUND
What to do no electricity 


{'score': 0.9999194145202637,
 'start': 238,
 'end': 289,
 'answer': 'contact site management on 6261234567 or 8581234567'}

In [33]:
context = df_train1.iloc[2529].review
print(context)
question = df_train1.iloc[2529].question
print(question)
question_answerer(question=question, context=context)

In the event of an Earthquake move away from windows cover under  stable object. Once the earthquake has stopped move out and ahead with extreme caution. ANSWERNOTFOUND
What to do there is an earthquake  


{'score': 0.9999394416809082,
 'start': 30,
 'end': 79,
 'answer': 'move away from windows cover under  stable object'}

In [34]:
context = df_train1.iloc[2530].review
print(context)
question = df_train1.iloc[2530].question
print(question)
question_answerer(question=question, context=context)

During inclement weather conditions, move away from all windows and try to get to the lowest ground floor possible, however do not use the elevators. If there is no time to get to the ground floor then move to interior spaces like restrooms and corridors please notify the Office Manager or Security immediately of your location using these numbers: QTC EHS (DB): 888 123 4567, Corp EHS (JK): 888 123 456. ANSWERNOTFOUND
What to do there is a hurricane  


{'score': 0.9999029636383057,
 'start': 202,
 'end': 254,
 'answer': 'move to interior spaces like restrooms and corridors'}

In [36]:
context = df_train1.iloc[2531].review
print(context)
question = df_train1.iloc[2531].question
print(question)
question_answerer(question=question, context=context)

For chemical spill do not leave building contact 6261234599 or 8584444221. ANSWERNOTFOUND
What to do if there is a chemical spill 


{'score': 1.2215384231240023e-05, 'start': 19, 'end': 21, 'answer': 'do'}

In [17]:
def find_substring_positions(text, substring):
  start_pos = text.find(substring)
  if start_pos == -1:
    return (-1, -1)
  end_pos = start_pos + len(substring)
  return (start_pos, end_pos)

# Example usage:
text = "For active shoorters scenario's the safest and quickest thing to do is to:  RUN, HIDE, FIGHT. ANSWERNOTFOUND"
substring = "RUN, HIDE, FIGHT"

start_pos, end_pos = find_substring_positions(text, substring)

if start_pos != -1:
  print(f"Substring found at positions: ({start_pos}, {end_pos})")
else:
  print("Substring not found.")

Substring found at positions: (76, 92)
