# **BERT for Question Answering**

**Characteristics of BERT-based Question Answering System**
**bold text**

1.Pretrained Transformer Model

Uses BERT (Bidirectional Encoder Representations from Transformers) pretrained on large corpora (Wikipedia + BookCorpus).

Fine-tuned on SQuAD2.0 dataset for QA tasks.

2.Context-Aware Understanding

Processes input in a bidirectional manner (left + right context), allowing better comprehension than traditional NLP models.

3.Extractive QA

The system identifies the start and end tokens in the context to extract the most relevant span as the answer.

4.Confidence Scoring

Provides a confidence score for predicted answers using the softmax probabilities.

5.Handles Long Texts with Chunking

Can break long contexts into overlapping chunks and aggregate answers to avoid truncation issues.

6.Domain Adaptability

Pretrained BERT can be fine-tuned further for specialized domains like medicine, law, or finance.

**Applications of BERT-based Question Answering**

1.Search Engines & Information Retrieval

Improves search results by extracting direct answers instead of returning just documents.

2.Chatbots & Virtual Assistants

Powers intelligent assistants (like Alexa, Siri, or customer service bots) to answer questions naturally.

3.Education & E-Learning

Provides instant answers from textbooks, lecture notes, or online resources for students.

4.Healthcare

Helps doctors and patients by answering queries from medical research papers and clinical notes.

5.Legal & Finance Industry

Assists in extracting precise answers from contracts, policies, or regulations.

6.Business Intelligence

Enables employees to query large internal documents (reports, manuals, FAQs) in natural language.

7.Research & Academia

Facilitates literature review by quickly answering specific research questions from papers.

Install Required Libraries

In [1]:
!pip install transformers[torch]==4.38.2
!pip install datasets==2.13.1
!pip install plotly


Collecting transformers==4.38.2 (from transformers[torch]==4.38.2)
  Downloading transformers-4.38.2-py3-none-any.whl.metadata (130 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m130.7/130.7 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.19,>=0.14 (from transformers==4.38.2->transformers[torch]==4.38.2)
  Downloading tokenizers-0.15.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading transformers-4.38.2-py3-none-any.whl (8.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.5/8.5 MB[0m [31m57.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tokenizers-0.15.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m78.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.21.4

 Import Libraries

In [3]:
# Imports
import torch
import numpy as np
import pandas as pd
from scipy.special import softmax
import plotly.express as px
from transformers import BertForQuestionAnswering, BertTokenizerFast


Load Pretrained Model & Tokenizer

In [4]:
# Step 1: Load Model & Tokenizer
model_name = "deepset/bert-base-cased-squad2"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForQuestionAnswering.from_pretrained(model_name)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/152 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/508 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/433M [00:00<?, ?B/s]

Some weights of the model checkpoint at deepset/bert-base-cased-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


 Define Context and a Sample Question

In [5]:
context = """Artificial Intelligence (AI) is the field of computer science that
focuses on creating systems capable of performing tasks that normally require
human intelligence. These tasks include reasoning, learning, problem-solving,
perception, and natural language understanding.

AI can be classified into two categories: narrow AI, which is designed for
specific tasks like speech recognition or image classification, and general AI,
which aims to perform any intellectual task that a human can do. Machine
learning and deep learning are subfields of AI that have enabled major
breakthroughs in computer vision, natural language processing, and robotics.
"""

question = "What are the two categories of AI?"


Tokenize Input

In [10]:
#Step 3: Tokenize Inputs

inputs = tokenizer(question, context, return_tensors="pt")
tokenizer.tokenize(context)



['Art',
 '##ific',
 '##ial',
 'Intelligence',
 '(',
 'AI',
 ')',
 'is',
 'the',
 'field',
 'of',
 'computer',
 'science',
 'that',
 'focuses',
 'on',
 'creating',
 'systems',
 'capable',
 'of',
 'performing',
 'tasks',
 'that',
 'normally',
 'require',
 'human',
 'intelligence',
 '.',
 'These',
 'tasks',
 'include',
 'reasoning',
 ',',
 'learning',
 ',',
 'problem',
 '-',
 'solving',
 ',',
 'perception',
 ',',
 'and',
 'natural',
 'language',
 'understanding',
 '.',
 'AI',
 'can',
 'be',
 'classified',
 'into',
 'two',
 'categories',
 ':',
 'narrow',
 'AI',
 ',',
 'which',
 'is',
 'designed',
 'for',
 'specific',
 'tasks',
 'like',
 'speech',
 'recognition',
 'or',
 'image',
 'classification',
 ',',
 'and',
 'general',
 'AI',
 ',',
 'which',
 'aims',
 'to',
 'perform',
 'any',
 'intellectual',
 'task',
 'that',
 'a',
 'human',
 'can',
 'do',
 '.',
 'Machine',
 'learning',
 'and',
 'deep',
 'learning',
 'are',
 'sub',
 '##fields',
 'of',
 'AI',
 'that',
 'have',
 'enabled',
 'major',
 '

Run Model Inference

In [7]:
# Step 4: Run Model Inference
with torch.no_grad():
    outputs = model(**inputs)

start_scores = softmax(outputs.start_logits)[0]
end_scores = softmax(outputs.end_logits)[0]


Extract Answer

In [8]:
# Step 5: Extract Answer
start_idx = np.argmax(start_scores)
end_idx = np.argmax(end_scores)

answer_ids = inputs.input_ids[0][start_idx : end_idx + 1]
answer_tokens = tokenizer.convert_ids_to_tokens(answer_ids)
answer = tokenizer.convert_tokens_to_string(answer_tokens)

print(f"Q: {question}")
print(f"A: {answer}")

Q: What are the two categories of AI?
A: narrow AI


Visualize Token Scores

In [9]:
#Step 6: Visualize Token Scores
scores_df = pd.DataFrame({
    "Token Position": list(range(len(start_scores))) * 2,
    "Score": list(start_scores) + list(end_scores),
    "Score Type": ["Start"] * len(start_scores) + ["End"] * len(end_scores),
})

px.bar(scores_df, x="Token Position", y="Score", color="Score Type",
       barmode="group", title="Start and End Scores for Tokens").show()


In [11]:
# Part 2: Predict Answers with Function + Chunking

# Function to predict answer for a given context and question
def predict_answer(context, question):
    inputs = tokenizer(question, context, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    start_scores, end_scores = softmax(outputs.start_logits)[0], softmax(outputs.end_logits)[0]
    start_idx, end_idx = np.argmax(start_scores), np.argmax(end_scores)
    confidence_score = (start_scores[start_idx] + end_scores[end_idx]) / 2
    answer_ids = inputs.input_ids[0][start_idx: end_idx + 1]
    answer_tokens = tokenizer.convert_ids_to_tokens(answer_ids)
    answer = tokenizer.convert_tokens_to_string(answer_tokens)
    if answer != tokenizer.cls_token:
        return answer, confidence_score
    return None, confidence_score

In [12]:
# Function to split long contexts into overlapping chunks
def chunk_sentences(sentences, chunk_size, stride):
    chunks = []
    num_sentences = len(sentences)
    for i in range(0, num_sentences, chunk_size - stride):
        chunk = sentences[i: i + chunk_size]
        chunks.append(chunk)
    return chunks


In [13]:
# Define new context (AI paragraph)

context = """Artificial Intelligence (AI) is the field of computer science that
focuses on creating systems capable of performing tasks that normally require
human intelligence. These tasks include reasoning, learning, problem-solving,
perception, and natural language understanding.

AI can be classified into two categories: narrow AI, which is designed for
specific tasks like speech recognition or image classification, and general AI,
which aims to perform any intellectual task that a human can do. Machine
learning and deep learning are subfields of AI that have enabled major
breakthroughs in computer vision, natural language processing, and robotics.
"""

# Split into sentences
sentences = context.split("\n")

# Create chunks (size 3 sentences, stride 1)
chunked_sentences = chunk_sentences(sentences, chunk_size=3, stride=1)

# Define questions about AI
questions = [
    "What is Artificial Intelligence?",
    "What are the two categories of AI?",
    "What is narrow AI used for?",
    "What are subfields of AI?",
]

In [14]:
# Dictionary to store best answers
answers = {}

for chunk in chunked_sentences:
    sub_context = "\n".join(chunk)
    for question in questions:
        answer, score = predict_answer(sub_context, question)
        if answer:
            if question not in answers:
                answers[question] = (answer, score)
            else:
                if score > answers[question][1]:
                    answers[question] = (answer, score)

In [15]:
# Print answers
for q, (a, s) in answers.items():
    print(f"Q: {q}\nA: {a} (confidence: {s:.2f})\n")

Q: What is Artificial Intelligence?
A: computer science (confidence: 0.86)

Q: What are the two categories of AI?
A: narrow AI, which is designed for specific tasks like speech recognition or image classification, and general AI (confidence: 0.88)

Q: What is narrow AI used for?
A: specific tasks (confidence: 0.91)

Q: What are subfields of AI?
A: Machine learning and deep learning (confidence: 0.69)

