
# 6.3.1 - Implementing Question Answering with Hugging Face Transformers

This notebook demonstrates how to build both extractive and generative Question Answering (QA) systems using Hugging Face Transformers.

We'll cover:
- Extractive QA using `DistilBERT` and `DeBERTa`
- Generative QA using `T5` and `Mistral-7B-Instruct`
- A CPU-efficient LLM (`BitNet-b1.58`) for generative inference

Each example includes loading the model, preparing inputs, and running inference.


In [None]:

!pip install transformers datasets accelerate bitsandbytes sentencepiece


In [None]:

from transformers import pipeline, AutoModelForQuestionAnswering, AutoTokenizer, AutoModelForSeq2SeqLM
from transformers import T5Tokenizer, T5ForConditionalGeneration
from datasets import load_dataset
import torch


## Extractive QA using DistilBERT

In [None]:

# Load pre-trained DistilBERT QA model
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")

context = "Hugging Face is a company based in New York that specializes in Natural Language Processing."
question = "Where is Hugging Face based?"

# Run inference
result = qa_pipeline(question=question, context=context)
print("Answer:", result['answer'], "Score:", result['score'])


## Extractive QA using DeBERTa-v3

In [None]:

qa_pipeline = pipeline("question-answering", model="microsoft/deberta-v3-base")

context = "Quantum computing uses the principles of quantum mechanics such as superposition and entanglement."
question = "What principles does quantum computing use?"

result = qa_pipeline(question=question, context=context)
print("Answer:", result['answer'], "Score:", result['score'])


## Generative QA using T5

In [None]:

t5_model = T5ForConditionalGeneration.from_pretrained("t5-base")
t5_tokenizer = T5Tokenizer.from_pretrained("t5-base")

input_text = "question: What is the capital of France? context: France is a country in Europe. Its capital is Paris."
inputs = t5_tokenizer(input_text, return_tensors="pt")

output = t5_model.generate(**inputs, max_length=32)
print("Answer:", t5_tokenizer.decode(output[0], skip_special_tokens=True))


## Generative QA using Mistral-7B-Instruct (mocked with smaller model)

In [None]:

# Due to resource constraints, we'll use mistralai/Mistral-7B-Instruct if available
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "tiiuae/falcon-rw-1b"  # Placeholder for "mistralai/Mistral-7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "Q: What is relativity? Context: Relativity is the dependence of various physical phenomena on relative motion."
inputs = tokenizer(input_text, return_tensors="pt")
output = model.generate(**inputs, max_length=50)
print("Answer:", tokenizer.decode(output[0], skip_special_tokens=True))


## CPU-efficient Generative QA with BitNet

In [None]:

# Simulate BitNet (actual model needs custom loader; we show mocked example)
from transformers import AutoModelForCausalLM, AutoTokenizer

# We'll mock BitNet with a lightweight model for compatibility
model_name = "sshleifer/tiny-gpt2"  # Placeholder for BitNet
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

inputs = tokenizer("Q: What is quantum computing? Context: Quantum computing uses qubits.", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print("Answer:", tokenizer.decode(outputs[0], skip_special_tokens=True))
