### **Question Answering System (Using Transformers)**
 **Goal:** Build a closed-book QA system using a pretrained Transformer model (like
 distilbert-base-cased ) that answers questions based on a given paragraph or passage.

**Option 1: Using Hugging Face Transformers (No Search Backend)**

**Define a Context and a Question**

In [8]:
# Define the context and question
context = """
The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris.
It is named after the engineer Gustave Eiffel, whose company designed and built the tower.
Constructed from 1887 to 1889 as the entrance to the 1889 World's Fair, it was a remarkable engineering feat.
"""

question = "Who designed the Eiffel Tower?"

**Load Pretrained QA Pipeline**

In [9]:
from transformers import pipeline

# Load the Question Answering pipeline with a small BERT model
qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

# Predict the answer
result = qa_pipeline({'question': question, 'context': context})

# Print the answer
print("Answer:", result['answer'])

Device set to use cpu


Answer: Gustave Eiffel


### **Option 2: Open-Domain QA with Haystack**
If you want your QA system to search across many documents (like multiple paragraphs, PDFs, or articles), you use Haystack.

In [15]:
!pip install farm-haystack[colab]

Collecting farm-haystack[colab]
  Using cached farm_haystack-1.26.4.post0-py3-none-any.whl.metadata (28 kB)
Collecting boilerpy3 (from farm-haystack[colab])
  Using cached boilerpy3-1.0.7-py3-none-any.whl.metadata (5.8 kB)
Collecting events (from farm-haystack[colab])
  Using cached Events-0.5-py3-none-any.whl.metadata (3.9 kB)
Collecting lazy-imports==0.3.1 (from farm-haystack[colab])
  Using cached lazy_imports-0.3.1-py3-none-any.whl.metadata (10 kB)
Collecting posthog (from farm-haystack[colab])
  Using cached posthog-6.0.1-py3-none-any.whl.metadata (6.0 kB)
Collecting prompthub-py==4.0.0 (from farm-haystack[colab])
  Using cached prompthub_py-4.0.0-py3-none-any.whl.metadata (2.2 kB)
Collecting pydantic<2 (from farm-haystack[colab])
  Using cached pydantic-1.10.22-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (154 kB)
Collecting quantulum3 (from farm-haystack[colab])
  Using cached quantulum3-0.9.2-py3-none-any.whl.metadata (16 kB)
Collecting rank-bm25 (from fa

**Basic Haystack Setup**

In [22]:
from haystack.nodes import FARMReader, BM25Retriever
from haystack.document_stores import InMemoryDocumentStore
from haystack.pipelines import ExtractiveQAPipeline

# 1. Set up document store with BM25 enabled
document_store = InMemoryDocumentStore(use_bm25=True)

# 2. Add documents
context = """
The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris.
It is named after the engineer Gustave Eiffel, whose company designed and built the tower.
Constructed from 1887 to 1889 as the entrance to the 1889 World's Fair.
"""
docs = [{"content": context}]
document_store.write_documents(docs)

# 3. Initialize BM25 retriever (no embeddings needed)
retriever = BM25Retriever(document_store=document_store)

# 4. Load the QA reader model
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=False)

# 5. Create the QA pipeline
pipe = ExtractiveQAPipeline(reader, retriever)

# 6. Ask a question
question = "Who designed the Eiffel Tower?"

# 7. Run the pipeline
prediction = pipe.run(query=question, params={"Retriever": {"top_k": 2}, "Reader": {"top_k": 1}})

# 8. Print the answer
print("\nAnswer:", prediction['answers'][0].answer)


Updating BM25 representation...: 100%|██████████| 1/1 [00:00<00:00, 3379.78 docs/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.02 Batches/s]


Answer: Gustave Eiffel





### 📝 Summary

This project builds a closed-book Question Answering (QA) system using Transformer models.

🧠 **Approach 1: Hugging Face Transformers**
- Uses the `pipeline("question-answering")` API with a pretrained model (`distilbert-base-cased-distilled-squad`)
- Takes a **context paragraph** and a **question** as input, and returns the most likely answer span.

📦 **Model**: DistilBERT fine-tuned on SQuAD  
📤 **Input**: {question, context}  
📥 **Output**: Extracted answer from the context

---

🔎 **Approach 2: Open-Domain QA with Haystack**
- Stores multiple documents in a `DocumentStore`
- Uses `BM25Retriever` to search relevant text and `FARMReader` (e.g., RoBERTa) to extract the answer
- Suitable for searching across large sets of documents

📦 **Retriever**: BM25  
📦 **Reader**: deepset/roberta-base-squad2  
🔄 **Pipeline**: Question → Retrieve relevant docs → Extract answer

This project demonstrates both simple and advanced QA systems using Transformers.
