# Assignment 1 - Question Answering

This notebook illustrates question answering given a text with both answerable and unanswerable questions.

In [43]:
# Import necessary libraries
import json
import openai
import os
from dotenv import load_dotenv
import re
from collections import Counter


In [44]:
# Set OpenAI API Key

load_dotenv()  # Load environment variables from .env file

openai.api_key = os.getenv("OPENAI_API_KEY")

## Part 1: Add a New Question and Validate

In this section, we add a new question using the SQuAD v2.0 format and validate that the populated template is correct using a few rules.

In [45]:
data = {
    "context": "Melatonin is a hormone primarily released by the pineal gland at night and has long been associated with control of the sleep wake cycle. As a dietary supplement, it is often used for the short-term treatment of insomnia, such as from jet lag or shift work, and is typically taken orally. Evidence of its benefit for treating insomnia is unclear, as studies have generally been of low quality. It is also used for children with autism spectrum disorders or attention deficit hyperactivity disorder. Melatonin is found in animals, plants, fungi, and bacteria. Levels of melatonin are influenced by the detection of light and darkness by the retina of the eye.",
    "qas": [
        {
            "question": "What gland releases melatonin, and when is it released?",
            "answers": [{"text": "The pineal gland releases melatonin, primarily at night.", "start_char": 0}],
            "is_impossible": False
        },
        {
            "question": "For what purpose is melatonin often used as a dietary supplement?",
            "answers": [{"text": "Melatonin is often used as a dietary supplement for the short-term treatment of insomnia caused by jet lag or shift work.", "start_char": 0}],
            "is_impossible": False
        },
        {
            "question": "How is melatonin typically taken as a supplement?",
            "answers": [{"text": "Melatonin is typically taken orally as a supplement.", "start_char": 0}],
            "is_impossible": False
        },
        {
            "question": "What influences the levels of melatonin in the body?",
            "answers": [{"text": "The levels of melatonin are influenced by the detection of light and darkness by the retina of the eye.", "start_char": 0}],
            "is_impossible": False
        },
        {
            "question": "In which living organisms is melatonin found?",
            "answers": [{"text": "Melatonin is found in animals, plants, fungi, and bacteria.", "start_char": 0}],
            "is_impossible": False
        },
        {
            "question": "Who discovered the role of melatonin in sleep regulation?",
            "answers": [],
            "is_impossible": True
        },
        {
            "question": "What are the exact mechanisms by which melatonin affects children with autism spectrum disorders?",
            "answers": [],
            "is_impossible": True
        },
        {
            "question": "What is the chemical formula of melatonin?",
            "answers": [],
            "is_impossible": True
        },
        {
            "question": "Which country has the highest production of melatonin supplements?",
            "answers": [],
            "is_impossible": True
        },
        {
            "question": "How does melatonin interact with the body circadian rhythm at the molecular level?",
            "answers": [],
            "is_impossible": True
        },
        
    ]
}
json.dump(data, open('squad_questions.json', 'w+'))

## Part 2: Prompt GPT to Answer Questions

In this section, we prompt our LM to answer the answerable and unanswerable questions and record the responses. We iterate over each question in the data structure and store the responses in answers.

1. Load the Data

In [46]:
# Load the questions and answers from squad_questions.json
with open("squad_questions.json", "r") as file:
    squad_data = json.load(file)
    
# Extract the paragraph and questions
context = squad_data["context"]  # The paragraph
questions = squad_data["qas"]    # The list of questions


2. Define the Prompt Format

In [47]:
def create_prompt(context, question):
    """
    Create a prompt for the language model using the given context and question.
    """
    return f"""Using the following paragraph, answer the questions based on your learnings from the paragraph. If there is no answer for the given questions within the context of the paragraph, return empty array []. Learn and understand the context from the given answers and try to formulate your learnings accordingly:

Paragraph:
{context}

Question:
{question}

Answer the question using only the information from the paragraph. Do not include any additional commentary."""


3. Call the LLM API

In [48]:
# Function to send a request to the LLM
def get_llm_response(prompt, model="gpt-4o-mini"):
    """
    Send a request to the LLM and return the response.
    """
    response = openai.ChatCompletion.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=0  # Set to 0 for deterministic answers
    )
    return response["choices"][0]["message"]["content"]


4. Generate and Collect Responses

In [49]:
responses = []  # List to store question-response pairs

In [50]:
for qa in questions:
    question = qa["question"]
    prompt = create_prompt(context, question)  # Create the prompt
    response = get_llm_response(prompt)       # Get the response from the LLM
    responses.append({"question": question, "response": response})  # Store the response

5. Save responses to a JSON file

In [51]:
json.dump(responses, open('responses.json', 'w+'))

In [52]:
# Step 5: Output Example Responses for Verification

for i, item in enumerate(responses):
    print(f"Question {i+1}: {item['question']}")
    print(f"Response: {item['response']}\n")

Question 1: What gland releases melatonin, and when is it released?
Response: The pineal gland releases melatonin at night.

Question 2: For what purpose is melatonin often used as a dietary supplement?
Response: Melatonin is often used as a dietary supplement for the short-term treatment of insomnia, such as from jet lag or shift work.

Question 3: How is melatonin typically taken as a supplement?
Response: Melatonin is typically taken orally.

Question 4: What influences the levels of melatonin in the body?
Response: Levels of melatonin are influenced by the detection of light and darkness by the retina of the eye.

Question 5: In which living organisms is melatonin found?
Response: [ "animals", "plants", "fungi", "bacteria" ]

Question 6: Who discovered the role of melatonin in sleep regulation?
Response: []

Question 7: What are the exact mechanisms by which melatonin affects children with autism spectrum disorders?
Response: []

Question 8: What is the chemical formula of melatoni

## Part 3: Exact Match (EM) and F1-Score Evaluation of Responses

In this section, we evaluate the responses using Exact Match (EM) and F1-Score compare each predicted answer to the expected answer.


In [53]:
# Step 6: Exact Match (EM) and F1-Score Evaluation
def normalize_answer(s):
    """Lower text and remove punctuation, articles, and extra whitespace."""
    def remove_articles(text):
        return re.sub(r'\b(a|an|the)\b', ' ', text)
    
    def remove_punctuation(text):
        return re.sub(r'[^a-zA-Z0-9\s]', '', text)
    
    def whitespace_fix(text):
        return ' '.join(text.split())
    
    return whitespace_fix(remove_articles(remove_punctuation(s.lower())))

In [54]:
def exact_match(prediction, ground_truth):
    """Compute Exact Match (EM) score."""
    return int(normalize_answer(prediction) == normalize_answer(ground_truth))

def f1_score(prediction, ground_truth):
    """Compute F1-score for a given prediction and ground truth."""
    pred_tokens = normalize_answer(prediction).split()
    truth_tokens = normalize_answer(ground_truth).split()
    common = Counter(pred_tokens) & Counter(truth_tokens)
    num_same = sum(common.values())
    
    if len(pred_tokens) == 0 or len(truth_tokens) == 0:
        return int(pred_tokens == truth_tokens)
    if num_same == 0:
        return 0
    precision = num_same / len(pred_tokens)
    print("Precision: ", precision)
    recall = num_same / len(truth_tokens)
    print("Recall: ", recall)
    return 2 * (precision * recall) / (precision + recall)


In [55]:
# Step 7: Evaluate Responses
evaluation_results = []

total_em, total_f1 = 0, 0
num_questions = len(questions)

for qa, response in zip(questions, responses):
    ground_truths = [ans["text"] for ans in qa["answers"]] if qa["answers"] else [""]
    prediction = response["response"]
    
    em_scores = [exact_match(prediction, gt) for gt in ground_truths]
    f1_scores = [f1_score(prediction, gt) for gt in ground_truths]
    
    max_em = max(em_scores)
    max_f1 = max(f1_scores)
    
    total_em += max_em
    total_f1 += max_f1
    
    evaluation_results.append({
        "question": qa["question"],
        "prediction": prediction,
        "exact_match": max_em,
        "f1_score": max_f1
    })


Precision:  1.0
Recall:  0.8571428571428571
Precision:  0.85
Recall:  0.8947368421052632
Precision:  1.0
Recall:  0.7142857142857143
Precision:  1.0
Recall:  1.0
Precision:  1.0
Recall:  0.4444444444444444


In [56]:
# Calculate final EM and F1 scores
final_em = total_em / num_questions
final_f1 = total_f1 / num_questions

# Print Evaluation Summary
print(f"Final Exact Match Score: {final_em:.4f}")
print(f"Final F1 Score: {final_f1:.4f}")


Final Exact Match Score: 0.6000
Final F1 Score: 0.9244
