# Step 1: Introduction to Question Answering

Question Answering (QA) is an NLP task focused on building systems that can automatically answer questions posed by humans in a natural language. For this project, we'll focus on **Extractive Question Answering**.

In this type of QA, the model is given a **question** and a **context** (a piece of text). The model's job is not to generate a new answer from scratch, but to **find and extract** the span of text from the context that contains the correct answer.

Models based on the **BERT (Bidirectional Encoder Representations from Transformers)** architecture excel at this. Because BERT reads the entire sequence of text at once, it develops a deep, bidirectional understanding of the context, making it highly effective at determining how a question relates to a specific part of the text.

---

# ⚙️ Step 2: Setup and Library Installation

The setup is straightforward. We just need the **transformers** library and a deep learning framework like **PyTorch**, which transformers will use on the backend.


In [6]:
# Install the necessary libraries
!pip install -q transformers torch

print("✅ Libraries installed successfully.")

✅ Libraries installed successfully.


# 🚀 Step 3: Loading the Question Answering Pipeline

Hugging Face's `pipeline` function simplifies the process immensely. By specifying the task as **"question-answering"**, the library automatically downloads and caches a default pre-trained model fine-tuned for this purpose. The default is often a **DistilBERT** model fine-tuned on the **SQuAD (Stanford Question Answering Dataset)**, which is lightweight yet powerful.


In [7]:
from transformers import pipeline

# Load the question-answering pipeline
# This will download the default model and tokenizer for the task
Youtubeer = pipeline("question-answering")

print("Question-Answering pipeline is ready.")

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0


Question-Answering pipeline is ready.


# 📝 Step 4: Providing Context and Asking a Question

To get an answer, we need to provide two key pieces of information to our pipeline:

- **context**: The body of text where the answer can be found.  
- **question**: The specific question you want to ask about the context.

Let's start with a context about the planet **Mars**.


In [8]:
# Define the context text
context = """
Mars is the fourth planet from the Sun and the second-smallest planet in the Solar System, 
being larger than only Mercury. In English, Mars carries the name of the Roman god of war 
and is often referred to as the "Red Planet". The latter refers to the effect of the iron
oxide prevalent on Mars's surface, which gives it a reddish appearance distinctive among
the astronomical bodies visible to the naked eye.
"""

# Define the question
question = "What gives Mars its reddish appearance?"

# 💡 Step 5: Getting the Answer from the Model

Now, we simply pass the **question** and **context** to our `question-answering` pipeline.  
The model will process both and return a dictionary containing:

- **answer** — the exact text span found in the context.  
- **score** — the model's confidence in its answer.  
- **start** / **end** — the character positions of the answer within the context.


In [9]:
# Get the answer from the pipeline
result = Youtubeer(question=question, context=context)

# Print the result in a nice format
print(f"Question: {question}")
print(f"Answer: '{result['answer']}'")
print(f"Confidence Score: {result['score']:.4f}")

Question: What gives Mars its reddish appearance?
Answer: 'iron
oxide'
Confidence Score: 0.8649


# 🧠 Step 6: Trying a Different Question

The same `question-answering` pipeline object can handle **any** question-context pair.  
Let's ask a new question using the **same** context to see how it performs.  

This shows that the model is **dynamically** interpreting the text each time,  
rather than memorizing a single fixed answer.


In [10]:
# Define a second question
question_2 = "Which planet is the second smallest in our solar system?"

# Get the new answer
result_2 = Youtubeer(question=question_2, context=context)

# Print the new result
print(f"Question: {question_2}")
print(f"Answer: '{result_2['answer']}'")
print(f"Confidence Score: {result_2['score']:.4f}")

Question: Which planet is the second smallest in our solar system?
Answer: 'Mars'
Confidence Score: 0.8532
