# Part 3: Prompt Engineering Basics

## Introduction

In this part, you'll experiment with different prompting techniques to improve the quality of responses from Large Language Models (LLMs). You'll compare zero-shot, one-shot, and few-shot prompting approaches and document which works best for different types of questions.

## Learning Objectives

- Understand different prompting techniques
- Compare zero-shot, one-shot, and few-shot prompting
- Analyze the impact of prompt design on response quality

## Setup

In [1]:
# Import necessary libraries
import requests
import json

## 1. Understanding Prompting Techniques

LLMs can be prompted in different ways to get better responses:

1. **Zero-shot prompting**: Asking the model a question directly without examples
2. **One-shot prompting**: Providing one example before asking your question
3. **Few-shot prompting**: Providing multiple examples before asking your question

## 2. Creating Prompting Templates

Your first task is to create templates for different prompting strategies.

In [2]:
# Define a question to experiment with
question = "What foods should be avoided by patients with gout?"

# Example for one-shot and few-shot prompting
example_q = "What are the symptoms of gout?"
example_a = "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."

# Examples for few-shot prompting
examples = [
    ("What are the symptoms of gout?",
     "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."),
    ("How is gout diagnosed?",
     "Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.")
]

# TODO: Create prompting templates
# Zero-shot template (just the question)
zero_shot_template = "Question: {question}\nAnswer:"

# One-shot template (one example + the question)
one_shot_template = """Question: {example_q}
Answer: {example_a}

Question: {question}
Answer:"""

# Few-shot template (multiple examples + the question)
few_shot_template = """Question: {examples[0][0]}
Answer: {examples[0][1]}

Question: {examples[1][0]}
Answer: {examples[1][1]}

Question: {question}
Answer:"""

# TODO: Format the templates with your question and examples
zero_shot_prompt = zero_shot_template.format(question=question)
one_shot_prompt = one_shot_template.format(example_q=example_q, example_a=example_a, question=question)
# For few-shot, you'll need to format it with the examples list
few_shot_prompt = few_shot_template.format(examples=examples, question=question)

print("Zero-shot prompt:")
print(zero_shot_prompt)
print("\nOne-shot prompt:")
print(one_shot_prompt)
print("\nFew-shot prompt:")
print(few_shot_prompt)

Zero-shot prompt:
Question: What foods should be avoided by patients with gout?
Answer:

One-shot prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: What foods should be avoided by patients with gout?
Answer:

Few-shot prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: How is gout diagnosed?
Answer: Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.

Question: What foods should be avoided by patients with gout?
Answer:


## 3. Connecting to the LLM API

Next, implement a function to send prompts to an LLM API and get responses.

In [3]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

# Load model & tokenizer globally so it's only done once
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")

def get_llm_response(prompt, model_name="google/flan-t5-base", api_key=None):
    """Get a response from the LLM based on the prompt"""
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True)
    outputs = model.generate(**inputs, max_new_tokens=150)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response
#testing
if __name__ == "__main__":
    print("Zero-shot:\n", get_llm_response(zero_shot_prompt))
    print("\nOne-shot:\n", get_llm_response(one_shot_prompt))
    print("\nFew-shot:\n", get_llm_response(few_shot_prompt))

  from .autonotebook import tqdm as notebook_tqdm
2025-06-04 21:04:26.070288: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1749071066.384041    5437 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749071066.476122    5437 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1749071067.122739    5437 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749071067.122776    5437 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1749071067.122780    5437

Zero-shot:
 ice cream

One-shot:
 Gout is a bacterial infection that causes inflammation of the urethra and spleen.

Few-shot:
 Gout is a bacterial infection that causes inflammation and irritability in the gastrointestinal tract.


## 4. Comparing Prompting Strategies

Now, let's compare the different prompting strategies on a set of healthcare questions.

In [4]:
# Example QA pairs for one-shot and few-shot prompting
examples = [
    ("What are the symptoms of gout?",
     "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."),
    ("How is gout diagnosed?",
     "Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.")
]

# List of healthcare questions to test
questions = [
    "What foods should be avoided by patients with gout?",
    "What medications are commonly prescribed for gout?",
    "How can gout flares be prevented?",
    "Is gout related to diet?",
    "Can gout be cured permanently?"
]

# Prompt template functions
def try_zero_shot(q):
    return f"Question: {q}\nAnswer:"

def try_one_shot(q):
    return f"""Question: {examples[0][0]}
Answer: {examples[0][1]}

Question: {q}
Answer:"""

def try_few_shot(q):
    return f"""Question: {examples[0][0]}
Answer: {examples[0][1]}

Question: {examples[1][0]}
Answer: {examples[1][1]}

Question: {q}
Answer:"""

# TODO: Implement your actual LLM API call here
def get_llm_response(prompt, model_name="google/flan-t5-base", api_key=None):
    """
    Placeholder function to send prompt to LLM and get response.
    Replace this with your actual API call logic.
    """
    # For now, just echo the prompt (for testing)
    return f"Simulated response for prompt:\n{prompt}"

# Store results here
results = []

# Loop over each question and test all prompting strategies
for q in questions:
    zero_prompt = try_zero_shot(q)
    one_prompt = try_one_shot(q)
    few_prompt = try_few_shot(q)

    zero_resp = get_llm_response(zero_prompt)
    one_resp = get_llm_response(one_prompt)
    few_resp = get_llm_response(few_prompt)

    results.append({
        "question": q,
        "zero_shot": zero_resp,
        "one_shot": one_resp,
        "few_shot": few_resp
    })

# Display the results
for r in results:
    print(f"\nQuestion: {r['question']}")
    print(f"Zero-shot response:\n{r['zero_shot']}\n")
    print(f"One-shot response:\n{r['one_shot']}\n")
    print(f"Few-shot response:\n{r['few_shot']}\n")



Question: What foods should be avoided by patients with gout?
Zero-shot response:
Simulated response for prompt:
Question: What foods should be avoided by patients with gout?
Answer:

One-shot response:
Simulated response for prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: What foods should be avoided by patients with gout?
Answer:

Few-shot response:
Simulated response for prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: How is gout diagnosed?
Answer: Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.

Question: What foods should be avoided by patients with gout?
Answer:


Question: What medications are commonly prescribed for gout?
Zero-

## 5. Evaluating Responses

Create a simple evaluation function to score the responses based on the presence of expected keywords.

In [1]:
# Example QA pairs for one-shot and few-shot prompting
examples = [
    ("What are the symptoms of gout?",
     "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."),
    ("How is gout diagnosed?",
     "Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.")
]

# List of healthcare questions to test
questions = [
    "What foods should be avoided by patients with gout?",
    "What medications are commonly prescribed for gout?",
    "How can gout flares be prevented?",
    "Is gout related to diet?",
    "Can gout be cured permanently?"
]

# Expected keywords for scoring
expected_keywords = {
    "What foods should be avoided by patients with gout?": 
        ["purine", "red meat", "seafood", "alcohol", "beer", "organ meats"],
    "What medications are commonly prescribed for gout?": 
        ["nsaids", "colchicine", "allopurinol", "febuxostat", "probenecid", "corticosteroids"],
    "How can gout flares be prevented?": 
        ["medication", "diet", "weight", "alcohol", "water", "exercise"],
    "Is gout related to diet?": 
        ["yes", "purine", "food", "alcohol", "seafood", "meat"],
    "Can gout be cured permanently?": 
        ["manage", "treatment", "lifestyle", "medication", "chronic"]
}

# Scoring function
def score_response(response, keywords):
    """Score a response based on presence of expected keywords"""
    response = response.lower()
    found_keywords = 0
    for keyword in keywords:
        if keyword.lower() in response:
            found_keywords += 1
    return found_keywords / len(keywords) if keywords else 0

# Prompt template functions
def try_zero_shot(q):
    return f"Question: {q}\nAnswer:"

def try_one_shot(q):
    return f"""Question: {examples[0][0]}
Answer: {examples[0][1]}

Question: {q}
Answer:"""

def try_few_shot(q):
    return f"""Question: {examples[0][0]}
Answer: {examples[0][1]}

Question: {examples[1][0]}
Answer: {examples[1][1]}

Question: {q}
Answer:"""

# Placeholder for LLM API call
def get_llm_response(prompt, model_name="google/flan-t5-base", api_key=None):
    # Replace this with actual call to your LLM API
    # For now, return dummy response for demonstration
    return f"Simulated response for prompt:\n{prompt}"

# Store results with scores
results = []

for q in questions:
    zero_prompt = try_zero_shot(q)
    one_prompt = try_one_shot(q)
    few_prompt = try_few_shot(q)

    zero_resp = get_llm_response(zero_prompt)
    one_resp = get_llm_response(one_prompt)
    few_resp = get_llm_response(few_prompt)

    # Score responses
    zero_score = score_response(zero_resp, expected_keywords.get(q, []))
    one_score = score_response(one_resp, expected_keywords.get(q, []))
    few_score = score_response(few_resp, expected_keywords.get(q, []))

    results.append({
        "question": q,
        "zero_shot": zero_resp,
        "one_shot": one_resp,
        "few_shot": few_resp,
        "zero_shot_score": zero_score,
        "one_shot_score": one_score,
        "few_shot_score": few_score
    })

# Calculate average scores per prompting strategy
avg_zero = sum(r["zero_shot_score"] for r in results) / len(results)
avg_one = sum(r["one_shot_score"] for r in results) / len(results)
avg_few = sum(r["few_shot_score"] for r in results) / len(results)

# Display results and scores
for r in results:
    print(f"\nQuestion: {r['question']}")
    print(f"Zero-shot response:\n{r['zero_shot']}")
    print(f"Zero-shot score: {r['zero_shot_score']:.2f}\n")

    print(f"One-shot response:\n{r['one_shot']}")
    print(f"One-shot score: {r['one_shot_score']:.2f}\n")

    print(f"Few-shot response:\n{r['few_shot']}")
    print(f"Few-shot score: {r['few_shot_score']:.2f}\n")

print(f"Average zero-shot score: {avg_zero:.2f}")
print(f"Average one-shot score: {avg_one:.2f}")
print(f"Average few-shot score: {avg_few:.2f}")

# Optional: Which strategy performed best overall?
best_strategy = max(
    [("zero-shot", avg_zero), ("one-shot", avg_one), ("few-shot", avg_few)],
    key=lambda x: x[1]
)[0]

print(f"\nBest prompting strategy overall: {best_strategy}")



Question: What foods should be avoided by patients with gout?
Zero-shot response:
Simulated response for prompt:
Question: What foods should be avoided by patients with gout?
Answer:
Zero-shot score: 0.00

One-shot response:
Simulated response for prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: What foods should be avoided by patients with gout?
Answer:
One-shot score: 0.00

Few-shot response:
Simulated response for prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: How is gout diagnosed?
Answer: Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.

Question: What foods should be avoided by patients with gout?
Answer:
Few-shot score: 0.00


Qu

## 6. Saving Results

Save your results in a structured format for auto-grading.

In [2]:
import os

# Define output file path
output_dir = "results/part_3"
os.makedirs(output_dir, exist_ok=True)
output_file = os.path.join(output_dir, "prompting_results.txt")

with open(output_file, "w", encoding="utf-8") as f:
    f.write("# Prompt Engineering Results\n\n")

    # Write each question and its responses
    for r in results:
        f.write(f"## Question: {r['question']}\n\n")

        f.write("### Zero-shot response:\n")
        f.write(r["zero_shot"] + "\n\n")

        f.write("### One-shot response:\n")
        f.write(r["one_shot"] + "\n\n")

        f.write("### Few-shot response:\n")
        f.write(r["few_shot"] + "\n\n")

        f.write("-" * 50 + "\n\n")

    # Write scores table header
    f.write("## Scores\n\n```\n")
    f.write("question,zero_shot,one_shot,few_shot\n")

    # Write each question's scores (with simplified question keys)
    for r in results:
        # create a short key for question for csv-friendly format
        q_key = r["question"].lower().replace(" ", "_").replace("?", "")
        f.write(f"{q_key},{r['zero_shot_score']:.2f},{r['one_shot_score']:.2f},{r['few_shot_score']:.2f}\n")

    # Write averages and best method
    f.write(f"\naverage,{avg_zero:.2f},{avg_one:.2f},{avg_few:.2f}\n")
    f.write(f"best_method,{best_strategy}\n")
    f.write("```\n")

print(f"Results saved to {output_file}")


Results saved to results/part_3/prompting_results.txt


## Progress Checkpoints

1. **Prompting Templates**:
   - [ ] Create zero-shot template
   - [ ] Create one-shot template
   - [ ] Create few-shot template
   - [ ] Format templates with questions and examples

2. **LLM API Integration**:
   - [ ] Connect to the Hugging Face API
   - [ ] Test with different prompts
   - [ ] Handle API errors

3. **Comparison and Evaluation**:
   - [ ] Compare strategies on multiple questions
   - [ ] Score responses based on keywords
   - [ ] Determine the best strategy

4. **Results and Documentation**:
   - [ ] Save results in the required format
   - [ ] Document your findings

## What to Submit

1. Your implementation in a Python script `utils/prompt_comparison.py` that:
   - Defines the prompting templates
   - Connects to the Hugging Face API
   - Compares different prompting strategies
   - Scores and evaluates the responses

2. The results of your experiments in `results/part_3/prompting_results.txt` with the format shown above

The auto-grader will check:
1. That your results file contains the required sections
2. That your scoring logic correctly identifies keyword presence
3. That you've correctly calculated average scores
4. That you've identified the best performing method