## ***Step 1:*** Download Transformers Library

In [1]:
%%capture
!pip install transformers

In [2]:
import torch
from transformers import AutoTokenizer,BertTokenizerFast, BertForQuestionAnswering

## ***Step 2:*** Load the fine tuned model

In [3]:
# Define the bert tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

# Load the fine-tuned modeol
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
model.eval()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


BertForQuestionAnswering(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 1024, padding_idx=0)
      (position_embeddings): Embedding(512, 1024)
      (token_type_embeddings): Embedding(2, 1024)
      (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-23): 24 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=1024, out_features=1024, bias=True)
              (key): Linear(in_features=1024, out_features=1024, bias=True)
              (value): Linear(in_features=1024, out_features=1024, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=1024, out_features=1024, bias=True)
              (LayerNorm): LayerNorm((1024,)

## ***Step 3:*** Loading the output and evaluating it





In [4]:
def predict(context,query):

  inputs = tokenizer.encode_plus(query, context, return_tensors='pt')

  outputs = model(**inputs)
  answer_start = torch.argmax(outputs[0])  # get the most likely beginning of answer with the argmax of the score
  answer_end = torch.argmax(outputs[1]) + 1

  answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][answer_start:answer_end]))

  return answer

def normalize_text(s):
  """Removing articles and punctuation, and standardizing whitespace are all typical text processing steps."""
  import string, re

  def remove_articles(text):
    regex = re.compile(r"\b(a|an|the)\b", re.UNICODE)
    return re.sub(regex, " ", text)

  def white_space_fix(text):
    return " ".join(text.split())

  def remove_punc(text):
    exclude = set(string.punctuation)
    return "".join(ch for ch in text if ch not in exclude)

  def lower(text):
    return text.lower()

  return white_space_fix(remove_articles(remove_punc(lower(s))))

def compute_exact_match(prediction, truth):
    return int(normalize_text(prediction) == normalize_text(truth))

def compute_f1(prediction, truth):
  pred_tokens = normalize_text(prediction).split()
  truth_tokens = normalize_text(truth).split()

  # if either the prediction or the truth is no-answer then f1 = 1 if they agree, 0 otherwise
  if len(pred_tokens) == 0 or len(truth_tokens) == 0:
    return int(pred_tokens == truth_tokens)

  common_tokens = set(pred_tokens) & set(truth_tokens)

  # if there are no common tokens then f1 = 0
  if len(common_tokens) == 0:
    return 0

  prec = len(common_tokens) / len(pred_tokens)
  rec = len(common_tokens) / len(truth_tokens)

  return 2 * (prec * rec) / (prec + rec)

In [5]:
def give_an_answer(context,query):

  prediction = predict(context,query)
  # em_score = compute_exact_match(prediction, answer)
  # f1_score = compute_f1(prediction, answer)

  print(f"Insights : {query}")
  print(f"Extracted Keypoints : {prediction}")
  # print(f"True Answer: {answer}")
  # print(f"EM: {em_score}")
  # print(f"F1: {f1_score}")
  print("\n")

  return prediction

## ***Step 4:*** Test my model

For extractive textual QA tasks, we are adopting two evaluation metrics, which measure ***exact match*** and partially ***overlapped scores*** respectively.
- ***Exact Match***: measures whether the predicted answer exactly matches the ground-truth answers. If the exact matching occurs, then assigns 1.0, otherwise assigns 0.0.
- ***F1 Score***: computes the average word overlap between predicted and ground-truth answers, which can ensure both of precision and recall rate are optimized at the same time.

In [6]:
pip install pandas



In [9]:
import pandas as pd

In [10]:
# Assuming `give_an_answer` is a function that returns the extracted keypoints for a query

# List of 15 fixed queries
fixed_queries = [
    "Add-on Covers or Riders",
    "Automatic Restoration in policy",
    "Co-morbidities or Pre-existing Diseases",
    "Co-payment in insurance",
    "Critical Illness coverage",
    "Cashless Claims process",
    "Deductibles in health insurance",
    "Policy Exclusions",
    "Coverage Inclusions",
    "No-claim bonus impact",
    "Network Hospitals importance",
    "Premium factors",
    "Sum Insured",
    "Top-up plans in insurance",
    "Waiting Period in policy"
]

# Function to input context
def get_context():
    print("Enter your health policy context:")
    return input()

# Function to handle additional queries with the same context
def additional_queries(context):
    while True:
        new_query = input("\nEnter another query (or type 'exit' to stop and start over): ")
        if new_query.lower() == 'exit':
            return 'exit'
        extracted_keypoints = give_an_answer(context, new_query)
        print(f"Query: {new_query}")
        print(f"Extracted Keypoints: {extracted_keypoints}")

# Main function
def process_queries():
    while True:
        # Step 1: Get the context from the user
        context = get_context()

        # Step 2: Process the 15 fixed queries
        results = []
        for q in fixed_queries:
            extracted_keypoints = give_an_answer(context, q)
            results.append((q, extracted_keypoints))  # Append a tuple of (query, extracted_keypoints)

        # Step 3: Display the results in a DataFrame
        df = pd.DataFrame(results, columns=['Query', 'Extracted Keypoints'])
        pd.set_option('display.max_colwidth', None)  # Allow full width for extracted keypoints
        print("\nResults for the fixed queries:")
        print(df.to_string(index=False, justify='left'))

        # Step 4: Handle additional queries with the same context
        while True:
            user_choice = additional_queries(context)
            if user_choice == 'exit':
                break  # Exit to start the process over again

# Run the main function
process_queries()


Enter your health policy context:
Insights : Add-on Covers or Riders
Extracted Keypoints : an optional add - on cover for critical illnesses like cancer and liver disease can be added to the policy at an additional premium


Insights : Automatic Restoration in policy
Extracted Keypoints : there is no age limit for policy renewals, ensuring lifelong coverage


Insights : Co-morbidities or Pre-existing Diseases
Extracted Keypoints : pre - existing conditions


Insights : Co-payment in insurance
Extracted Keypoints : the policy allows cashless claims at more than 8, 000 hospitals across the country, with a co - payment clause of 5 % applicable only for claims made by senior citizens aged 65 and above


Insights : Critical Illness coverage
Extracted Keypoints : an optional add - on cover for critical illnesses like cancer and liver disease can be added to the policy at an additional premium


Insights : Cashless Claims process
Extracted Keypoints : the policy allows cashless claims at more

KeyboardInterrupt: Interrupted by user