## Question & Answering using Transformers

### Installation

In [1]:
!pip install transformers



In [2]:
!pip install datasets



## Approach-1: Using custom context

### Import required modules

In [12]:
from transformers import pipeline
import warnings
warnings.filterwarnings("ignore")

pipeline() function:
Comes from transformers and creates a high-level abstraction over a model + tokenizer + processing logic for a specific NLP task.

"question-answering":
This tells Hugging Face to load a model fine-tuned for extractive question answering — where the answer is a span of text from a given context.

In [13]:
# Load the Q&A pipeline
qa_pipeline = pipeline("question-answering") #this model uses Distil-BERT Cased model

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


> #### In the Module - 5 we will learn about BERT in depth

In [11]:
# Example context
context = """
AI Planet is a global AI community with headquarters in Belgium and India.
It started with the vision to make AI education accessible to everyone and build AI for good to solve key challenges of humanity.
As part of our community initiatives, we provide free AI and data science courses by industry experts from large tech companies or
startups worldwide. Over 300K+ learners from 150+ countries have benefited since our inception in 2020.
"""

In [12]:
# Ask questions based out of the context
question1 = "How many learners have AI Planet impacted?"
question2 = "Where is AI Planet located?"
question3 = "What is the vision of AI Planet?"

context: A paragraph or document in which the answer is expected to be found.

question: The specific question you want to ask about the context.

qa_pipeline: A pretrained question-answering model loaded earlier using pipeline("question-answering").

In [13]:
#Use Question and Answering by default
def answer_question(context, question):

    answer = qa_pipeline(context=context, question=question)
    return answer["answer"]

In [14]:
# Get the answer
answer1 = answer_question(context, question1)
answer2 = answer_question(context,question2)
answer3 = answer_question(context,question3)

In [15]:
print(f"Question:{question1}")
print(f"Answer:{answer1}")

Question:How many learners have AI Planet impacted?
Answer:Over 300K+


In [16]:
print(f"Question:{question2}")
print(f"Answer:{answer2}")

Question:Where is AI Planet located?
Answer:Belgium and India


In [17]:
print(f"Question:{question3}")
print(f"Answer:{answer3}")


Question:What is the vision of AI Planet?
Answer:to make AI education accessible to everyone


## Approach-2: Using datasets from HuggingFace

In [21]:
pip install datasets



### Load the dataset using `load_dataset`

We will use the dataset names **squad**. This dataset contain, Context, Question and Answer features

In [26]:
!pip install --upgrade datasets


Collecting datasets
  Downloading datasets-4.0.0-py3-none-any.whl.metadata (19 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-4.0.0-py3-none-any.whl (494 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m494.8/494.8 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2025.3.0-py3-none-any.whl (193 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: fsspec, datasets
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2025.3.2
    Uninstalling fsspec-2025.3.2:
      Successfully uninstalled fsspec-2025.3.2
  Attempting uninstall: datasets
    Found existing installation: datasets 2.14.4
    Uninstalling datasets-2.14.4:
      Successfully uninstalled datasets-2.14.4
[31mERROR: pip's dependency re

In [1]:
from datasets import load_dataset

# Load SQuAD v2 with remote code trust
squad = load_dataset("squad_v2", trust_remote_code=True)



`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'squad_v2' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
ERROR:datasets.load:`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'squad_v2' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional t

README.md: 0.00B [00:00, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/16.4M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/130319 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/11873 [00:00<?, ? examples/s]

### Visualize the dataset

It is split into two part:
- Training dataset
- Validation dataset

In [3]:
squad

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 130319
    })
    validation: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 11873
    })
})

The dataset contains: 5 columns- id, title, context, question and answers. The training data include 87,599 rows whereas validation dataset includes 10,570 rows. By rows it means total number of context, questions and answers

In [5]:
squad['train']['title'][:10]

['Beyoncé',
 'Beyoncé',
 'Beyoncé',
 'Beyoncé',
 'Beyoncé',
 'Beyoncé',
 'Beyoncé',
 'Beyoncé',
 'Beyoncé',
 'Beyoncé']

In [6]:
squad['train']['question'][:10]

['When did Beyonce start becoming popular?',
 'What areas did Beyonce compete in when she was growing up?',
 "When did Beyonce leave Destiny's Child and become a solo singer?",
 'In what city and state did Beyonce  grow up? ',
 'In which decade did Beyonce become famous?',
 'In what R&B group was she the lead singer?',
 'What album made her a worldwide known artist?',
 "Who managed the Destiny's Child group?",
 'When did Beyoncé rise to fame?',
 "What role did Beyoncé have in Destiny's Child?"]

In [7]:
squad['train']['answers'][:10]

[{'text': ['in the late 1990s'], 'answer_start': [269]},
 {'text': ['singing and dancing'], 'answer_start': [207]},
 {'text': ['2003'], 'answer_start': [526]},
 {'text': ['Houston, Texas'], 'answer_start': [166]},
 {'text': ['late 1990s'], 'answer_start': [276]},
 {'text': ["Destiny's Child"], 'answer_start': [320]},
 {'text': ['Dangerously in Love'], 'answer_start': [505]},
 {'text': ['Mathew Knowles'], 'answer_start': [360]},
 {'text': ['late 1990s'], 'answer_start': [276]},
 {'text': ['lead singer'], 'answer_start': [290]}]

In [8]:
squad['train']['context'][:10]

['Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".',
 'Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead s

In [9]:
context = squad["train"][0]["context"]
question = "Where is the modern stone statue of Mary"

In [14]:
answer = qa_pipeline(question=question, context=context)

In [15]:
print(context)

Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".


In [16]:
print(f"Question:{question}")

Question:Where is the modern stone statue of Mary


In [17]:
print(f"Answer:{answer}")

Answer:{'score': 0.059267304837703705, 'start': 0, 'end': 30, 'answer': 'Beyoncé Giselle Knowles-Carter'}


score	The model’s confidence in the answer (softmax probability). In this case, ~5.9%. A low score typically means the model is not very confident in its prediction.
start	The start index (character position) of the answer span in the context text. Here, index 0.
end	The end index (character position) of the answer span in the context. Here, index 30.
answer	The actual answer text extracted from the context, which is: "Beyoncé Giselle Knowles-Carter

Feature	Details
📍 Created by	Stanford University (2016)
🧠 Task	Extractive Question Answering
📄 Input	A paragraph (context) + a question
🎯 Output	A span of text from the paragraph that answers the question
📚 Format	JSON with context, question, and answer start/end index