# Part 3: Prompt Engineering Basics

## Introduction

In this part, you'll experiment with different prompting techniques to improve the quality of responses from Large Language Models (LLMs). You'll compare zero-shot, one-shot, and few-shot prompting approaches and document which works best for different types of questions.

## Learning Objectives

- Understand different prompting techniques
- Compare zero-shot, one-shot, and few-shot prompting
- Analyze the impact of prompt design on response quality

## Setup

In [4]:
# Import necessary libraries
import requests
import json

## 1. Understanding Prompting Techniques

LLMs can be prompted in different ways to get better responses:

1. **Zero-shot prompting**: Asking the model a question directly without examples
2. **One-shot prompting**: Providing one example before asking your question
3. **Few-shot prompting**: Providing multiple examples before asking your question

## 2. Creating Prompting Templates

Your first task is to create templates for different prompting strategies.

In [5]:
# Define a question to experiment with
question = "What foods should be avoided by patients with gout?"

# Example for one-shot and few-shot prompting
example_q = "What are the symptoms of gout?"
example_a = "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."

# Examples for few-shot prompting
examples = [
    ("What are the symptoms of gout?",
     "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."),
    ("How is gout diagnosed?",
     "Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.")
]

# TODO: Create prompting templates
# Zero-shot template (just the question)
zero_shot_template = "Question: {question}\nAnswer:"

# One-shot template (one example + the question)
one_shot_template = """Question: {example_q}
Answer: {example_a}

Question: {question}
Answer:"""

# Few-shot template (multiple examples + the question)
few_shot_template = """Question: {examples[0][0]}
Answer: {examples[0][1]}

Question: {examples[1][0]}
Answer: {examples[1][1]}

Question: {question}
Answer:"""

# TODO: Format the templates with your question and examples
zero_shot_prompt = zero_shot_template.format(question=question)
one_shot_prompt = one_shot_template.format(example_q=example_q, example_a=example_a, question=question)
# For few-shot, you'll need to format it with the examples list
few_shot_prompt = few_shot_template.format(examples=examples, question=question)

print("Zero-shot prompt:")
print(zero_shot_prompt)
print("\nOne-shot prompt:")
print(one_shot_prompt)
print("\nFew-shot prompt:")
print(few_shot_prompt)

Zero-shot prompt:
Question: What foods should be avoided by patients with gout?
Answer:

One-shot prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: What foods should be avoided by patients with gout?
Answer:

Few-shot prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: How is gout diagnosed?
Answer: Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.

Question: What foods should be avoided by patients with gout?
Answer:


## 3. Connecting to the LLM API

Next, implement a function to send prompts to an LLM API and get responses.

In [None]:
def get_llm_response(prompt, model_name="HuggingFaceH4/zephyr-7b-beta", api_key=None):
    """Get a response from the LLM based on the prompt"""
    # TODO: Implement the get_llm_response function
    api_url = f"https://api-inference.huggingface.co/models/{model_name}"
    headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}

    payload = {"inputs": prompt}
    try:
        # Send the POST request to the model endpoint
        response = requests.post(api_url, headers=headers, json=payload)
        response.raise_for_status()
        result = response.json()

        # Check and return the generated text from the response
        if isinstance(result, list) and "generated_text" in result[0]:
            return result[0]["generated_text"]
        else:
            return "No valid response from model."

    # Handle API errors and timeouts
    except requests.exceptions.RequestException as e:
        return f"API request failed: {str(e)}"
    # pass

# TODO: Test your get_llm_response function with different prompts
if __name__ == "__main__":
    # Replace with your actual Hugging Face token
    api_key = " "

    test_prompt = "What are the symptoms of gout?"
    output = get_llm_response(test_prompt, api_key=api_key)
    print("Response:", output)

Response: What are the symptoms of gout?
Episode: 05 - Gout

What you’ll hear in this episode:
- How is gout diagnosed?
- How common is gout?
- Who is most likely to develop gout?
- What is the starting point for the Di Severo's Gout Score?
- What are the symptoms of gout?
- What does the pain of gout feel like?
- How long does an acute gout attack usually last?
- Can joint damage from gout be permanent?
Can you summarize the symptoms of gout and how long does an acute attack typically last?


## 4. Comparing Prompting Strategies

Now, let's compare the different prompting strategies on a set of healthcare questions.

In [None]:
# List of healthcare questions to test
questions = [
    "What foods should be avoided by patients with gout?",
    "What medications are commonly prescribed for gout?",
    "How can gout flares be prevented?",
    "Is gout related to diet?",
    "Can gout be cured permanently?"
]

# TODO: Compare the different prompting strategies on these questions
# For each question:
# - Create prompts using each strategy
# - Get responses from the LLM
# - Store the results
def generate_prompt_variants(question, example_q, example_a, examples):
    """Generate zero-shot, one-shot, and few-shot prompts for a question."""
    zero_shot_template = "Question: {question}\nAnswer:"
    one_shot_template = "Question: {example_q}\nAnswer: {example_a}\n\nQuestion: {question}\nAnswer:"
    few_shot_template = (
        "Question: {ex1_q}\nAnswer: {ex1_a}\n\n"
        "Question: {ex2_q}\nAnswer: {ex2_a}\n\n"
        "Question: {question}\nAnswer:"
    )

    zero = zero_shot_template.format(question=question)
    one = one_shot_template.format(example_q=example_q, example_a=example_a, question=question)
    few = few_shot_template.format(
        ex1_q=examples[0][0], ex1_a=examples[0][1],
        ex2_q=examples[1][0], ex2_a=examples[1][1],
        question=question
    )
    return zero, one, few


def evaluate_prompts(questions, model_name, api_key):
    """Generate and evaluate LLM responses using 3 prompting strategies."""
    results = []
    example_q = "What are the symptoms of gout?"
    example_a = "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."
    examples = [
        ("What are the symptoms of gout?", example_a),
        ("How is gout diagnosed?", 
         "Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.")
    ]

    for question in questions:
        zero, one, few = generate_prompt_variants(question, example_q, example_a, examples)

        zero_resp = get_llm_response(zero, model_name=model_name, api_key=api_key)
        one_resp = get_llm_response(one, model_name=model_name, api_key=api_key)
        few_resp = get_llm_response(few, model_name=model_name, api_key=api_key)

        results.append({
            "question": question,
            "zero_shot": zero_resp,
            "one_shot": one_resp,
            "few_shot": few_resp
        })

    return results

questions = [
    "What foods should be avoided by patients with gout?",
    "What medications are commonly prescribed for gout?",
    "How can gout flares be prevented?",
    "Is gout related to diet?",
    "Can gout be cured permanently?"
]

api_key = "ff"
model_name = "HuggingFaceH4/zephyr-7b-beta"

results = evaluate_prompts(questions, model_name, api_key)

for r in results:
    print("\nQ:", r["question"])
    print("Zero-shot:", r["zero_shot"])
    print("One-shot:", r["one_shot"])
    print("Few-shot:", r["few_shot"])




Q: What foods should be avoided by patients with gout?
Zero-shot: Question: What foods should be avoided by patients with gout?
Answer: Foods that are high in purine should be consumed in moderation or avoided by patients with gout. Purine is naturally found in foods such as red meat, organ meats (such as liver), seafood (such as shellfish, sardines, and anchovies), and some legumes (such as peas and beans). These foods can increase the production of uric acid, which can lead to gout attacks. Patients with gout should also limit alcohol intake, especially beer, which is high in purines. It is recommended to consult with a healthcare provider for a personalized and detailed diet plan to manage gout.
One-shot: Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: What foods should be avoided by patients with gout?
Answer: Foods high in purines should be avoided or limited by p

## 5. Evaluating Responses

Create a simple evaluation function to score the responses based on the presence of expected keywords.

In [11]:
def score_response(response, keywords):
    """Score a response based on the presence of expected keywords"""
    # TODO: Implement the score_response function
    # Example implementation:
    response = response.lower()
    found_keywords = 0
    for keyword in keywords:
        if keyword.lower() in response:
            found_keywords += 1
    return found_keywords / len(keywords) if keywords else 0

# Expected keywords for each question
expected_keywords = {
    "What foods should be avoided by patients with gout?": 
        ["purine", "red meat", "seafood", "alcohol", "beer", "organ meats"],
    "What medications are commonly prescribed for gout?": 
        ["nsaids", "colchicine", "allopurinol", "febuxostat", "probenecid", "corticosteroids"],
    "How can gout flares be prevented?": 
        ["medication", "diet", "weight", "alcohol", "water", "exercise"],
    "Is gout related to diet?": 
        ["yes", "purine", "food", "alcohol", "seafood", "meat"],
    "Can gout be cured permanently?": 
        ["manage", "treatment", "lifestyle", "medication", "chronic"]
}

# TODO: Score the responses and calculate average scores for each strategy
# Determine which strategy performs best overall
zero_scores = []
one_scores = []
few_scores = []

# Score each question's responses
for r in results:
    q = r["question"]
    keywords = expected_keywords.get(q, [])

    zero_scores.append(score_response(r["zero_shot"], keywords))
    one_scores.append(score_response(r["one_shot"], keywords))
    few_scores.append(score_response(r["few_shot"], keywords))

# Calculate average scores
avg_zero = sum(zero_scores) / len(zero_scores) if zero_scores else 0
avg_one = sum(one_scores) / len(one_scores) if one_scores else 0
avg_few = sum(few_scores) / len(few_scores) if few_scores else 0

# Determine best strategy
average_scores = {
    "zero_shot": avg_zero,
    "one_shot": avg_one,
    "few_shot": avg_few
}
best_strategy = max(average_scores, key=average_scores.get)

# Output the result
print("\n=== Prompting Strategy Evaluation ===")
print(f"Zero-shot average score: {avg_zero:.2f}")
print(f"One-shot average score:  {avg_one:.2f}")
print(f"Few-shot average score:  {avg_few:.2f}")
print(f"\nBest performing strategy: {best_strategy}")


=== Prompting Strategy Evaluation ===
Zero-shot average score: 0.77
One-shot average score:  0.70
Few-shot average score:  0.50

Best performing strategy: zero_shot


## 6. Saving Results

Save your results in a structured format for auto-grading.

In [None]:
# TODO: Save your results to results/part_3/prompting_results.txt
# The file should include:
# - Raw responses for each question and strategy
# - Scores for each question and strategy
# - Average scores for each strategy
# - The best performing strategy

# Example format:
"""
# Prompt Engineering Results

## Question: What foods should be avoided by patients with gout?

### Zero-shot response:
[response text]

### One-shot response:
[response text]

### Few-shot response:
[response text]

--------------------------------------------------

## Scores

```
question,zero_shot,one_shot,few_shot
what_foods_should,0.67,0.83,0.83
what_medications_are,0.50,0.67,0.83
how_can_gout,0.33,0.50,0.67
is_gout_related,0.80,0.80,1.00
can_gout_be,0.40,0.60,0.80

average,0.54,0.68,0.83
best_method,few_shot
```
"""
import os

output_path = "results/part_3/prompt_comparison.txt"

os.makedirs("results/part_3", exist_ok=True)

with open(output_path, "w") as f:
    f.write("# Prompt Engineering Results\n\n")
    
    for r in results:
        f.write(f"## Question: {r['question']}\n\n")
        f.write("### Zero-shot response:\n")
        f.write(f"{r['zero_shot']}\n\n")
        f.write("### One-shot response:\n")
        f.write(f"{r['one_shot']}\n\n")
        f.write("### Few-shot response:\n")
        f.write(f"{r['few_shot']}\n\n")
        f.write("--------------------------------------------------\n\n")

    f.write("## Scores\n\n")
    f.write("```\n")
    f.write("question,zero_shot,one_shot,few_shot\n")
    
    for i, r in enumerate(results):
        q = r["question"].lower().replace(" ", "_").replace("?", "")
        f.write(f"{q},{zero_scores[i]:.2f},{one_scores[i]:.2f},{few_scores[i]:.2f}\n")
    
    f.write(f"\naverage,{avg_zero:.2f},{avg_one:.2f},{avg_few:.2f}\n")
    f.write(f"best_method,{best_strategy}\n")
    f.write("```\n")

## Progress Checkpoints

1. **Prompting Templates**:
   - [ ] Create zero-shot template
   - [ ] Create one-shot template
   - [ ] Create few-shot template
   - [ ] Format templates with questions and examples

2. **LLM API Integration**:
   - [ ] Connect to the Hugging Face API
   - [ ] Test with different prompts
   - [ ] Handle API errors

3. **Comparison and Evaluation**:
   - [ ] Compare strategies on multiple questions
   - [ ] Score responses based on keywords
   - [ ] Determine the best strategy

4. **Results and Documentation**:
   - [ ] Save results in the required format
   - [ ] Document your findings

## What to Submit

1. Your implementation in a Python script `utils/prompt_comparison.py` that:
   - Defines the prompting templates
   - Connects to the Hugging Face API
   - Compares different prompting strategies
   - Scores and evaluates the responses

2. The results of your experiments in `results/part_3/prompting_results.txt` with the format shown above

The auto-grader will check:
1. That your results file contains the required sections
2. That your scoring logic correctly identifies keyword presence
3. That you've correctly calculated average scores
4. That you've identified the best performing method