# Part 3: Prompt Engineering Basics

## Introduction

In this part, you'll experiment with different prompting techniques to improve the quality of responses from Large Language Models (LLMs). You'll compare zero-shot, one-shot, and few-shot prompting approaches and document which works best for different types of questions.

## Learning Objectives

- Understand different prompting techniques
- Compare zero-shot, one-shot, and few-shot prompting
- Analyze the impact of prompt design on response quality

## Setup

In [1]:
# Import necessary libraries
import requests
import json

## 1. Understanding Prompting Techniques

LLMs can be prompted in different ways to get better responses:

1. **Zero-shot prompting**: Asking the model a question directly without examples
2. **One-shot prompting**: Providing one example before asking your question
3. **Few-shot prompting**: Providing multiple examples before asking your question

## 2. Creating Prompting Templates

Your first task is to create templates for different prompting strategies.

In [1]:
# Define a question to experiment with
question = "What foods should be avoided by patients with gout?"

# Example for one-shot and few-shot prompting
example_q = "What are the symptoms of gout?"
example_a = "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."

# Examples for few-shot prompting
examples = [
    ("What are the symptoms of gout?",
     "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."),
    ("How is gout diagnosed?",
     "Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.")
]

# TODO: Create prompting templates
# Zero-shot template (just the question)
zero_shot_template = "Question: {question}\nAnswer:"

# One-shot template (one example + the question)
one_shot_template = """Question: {example_q}
Answer: {example_a}

Question: {question}
Answer:"""

# Few-shot template (multiple examples + the question)
few_shot_template = """Question: {examples[0][0]}
Answer: {examples[0][1]}

Question: {examples[1][0]}
Answer: {examples[1][1]}

Question: {question}
Answer:"""

# TODO: Format the templates with your question and examples
zero_shot_prompt = zero_shot_template.format(question=question)
one_shot_prompt = one_shot_template.format(example_q=example_q, example_a=example_a, question=question)
# For few-shot, you'll need to format it with the examples list
few_shot_prompt = few_shot_template.format(examples=examples, question=question)

print("Zero-shot prompt:")
print(zero_shot_prompt)
print("\nOne-shot prompt:")
print(one_shot_prompt)
print("\nFew-shot prompt:")
print(few_shot_prompt)

Zero-shot prompt:
Question: What foods should be avoided by patients with gout?
Answer:

One-shot prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: What foods should be avoided by patients with gout?
Answer:

Few-shot prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: How is gout diagnosed?
Answer: Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.

Question: What foods should be avoided by patients with gout?
Answer:


## 3. Connecting to the LLM API

Next, implement a function to send prompts to an LLM API and get responses.

In [2]:
%pip install python-dotenv

import requests
from dotenv import load_dotenv
import os

load_dotenv("APIKEY.env")
api_key = os.getenv("HF_API_KEY")

def get_llm_response(prompt_text, model_name="HuggingFaceH4/zephyr-7b-beta", api_key=None):
    endpoint_url = f"https://api-inference.huggingface.co/models/{model_name}"
    headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
    payload = {
        "inputs": prompt_text,
        "parameters": {
            "max_new_tokens": 50 
        }
    }

    try:
        response = requests.post(endpoint_url, headers=headers, json=payload)
        response.raise_for_status()
        response_data = response.json()
        if isinstance(response_data, list) and "generated_text" in response_data[0]:
            return response_data[0]["generated_text"]
        else:
            return str(response_data)
    except Exception as error:
        return f"Error occurred: {error}"

prompt = "What is gout?"
output = get_llm_response(prompt_text=prompt, api_key=api_key)
print(output)

prompt = "What foods should be avoided by patients with gout?"
output = get_llm_response(prompt_text=prompt, api_key=api_key)
print(output)
print(api_key)

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
What is gout?
Gout is a type of arthritis that is caused by a buildup of uric acid in the body. Uric acid is a waste product that is normally eliminated in urine. However, when the body produces too much ur
What foods should be avoided by patients with gout?

Gout is a type of arthritis that is caused by the buildup of uric acid in the body. This can lead to inflammation and pain in the joints, particularly in the big toe. Patients with
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


## 4. Comparing Prompting Strategies

Now, let's compare the different prompting strategies on a set of healthcare questions.

In [3]:
from dotenv import load_dotenv
import os
import requests

load_dotenv("APIKEY.env")
key = os.getenv("HF_API_KEY")

# Set model
model = "HuggingFaceH4/zephyr-7b-beta"

# Set list of questions
qs = [
    "What foods should be avoided by patients with gout?",
    "What medications are commonly prescribed for gout?",
    "How can gout flares be prevented?",
    "Is gout related to diet?",
    "Can gout be cured permanently?"
]

# Set few-shot examples
exs = [
    ("What are the symptoms of gout?",
     "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."),
    ("How is gout diagnosed?",
     "Gout is diagnosed through physical exam, history, blood tests for uric acid, and joint fluid tests.")
]

# Compare zero-shot, one-shot, few-shot for each question
for q in qs:
    # Make zero-shot prompt
    z_prompt = f"Question: {q}\nAnswer:"
    
    # Make one-shot prompt
    one_prompt = f"Question: {exs[0][0]}\nAnswer: {exs[0][1]}\n\nQuestion: {q}\nAnswer:"
    
    # Make few-shot prompt
    few_prompt = (
        f"Question: {exs[0][0]}\nAnswer: {exs[0][1]}\n\n"
        f"Question: {exs[1][0]}\nAnswer: {exs[1][1]}\n\n"
        f"Question: {q}\nAnswer:"
    )

    # Send prompts to model
    url = f"https://api-inference.huggingface.co/models/{model}"
    head = {"Authorization": f"Bearer {key}"}
    
    print(f"\n----- Question: {q} -----\n")

    # Zero-shot
    z_res = requests.post(url, headers=head, json={"inputs": z_prompt})
    print("Zero-shot:\n", z_res.json()[0]['generated_text'].strip())

    # One-shot
    one_res = requests.post(url, headers=head, json={"inputs": one_prompt})
    print("\nOne-shot:\n", one_res.json()[0]['generated_text'].strip())

    # Few-shot
    few_res = requests.post(url, headers=head, json={"inputs": few_prompt})
    print("\nFew-shot:\n", few_res.json()[0]['generated_text'].strip())


----- Question: What foods should be avoided by patients with gout? -----

Zero-shot:
 Question: What foods should be avoided by patients with gout?
Answer: Patients with gout should limit their intake of foods that are high in purines, which can increase uric acid levels in the body and lead to gout flares. Foods to avoid or limit include organ meats (e.g., liver, kidney), game meats (e.g., venison, pheasant), anchovies, sardines, mackerel, herring, scallops, trout, mussels, and까반민 (fermented soybean product). It is, however, still okay to eat small amounts of these foods as part of a balanced diet. It is also recommended to avoid or limit alcoholic beverages, especially those with high purine content such as beer. Additionally, sugary drinks and processed foods should be consumed in moderation as they can contribute to weight gain, which is a risk factor for gout. (Produced by TOMOREN Monograph Team)

One-shot:
 Question: What are the symptoms of gout?
Answer: Gout symptoms include 

KeyboardInterrupt: 

## 5. Evaluating Responses

Create a simple evaluation function to score the responses based on the presence of expected keywords.

In [4]:
def score_response(response, keywords):
    """Score a response based on the presence of expected keywords"""
    # TODO: Implement the score_response function
    # Example implementation:
    response = response.lower()
    found_keywords = 0
    for keyword in keywords:
        if keyword.lower() in response:
            found_keywords += 1
    return found_keywords / len(keywords) if keywords else 0

# Expected keywords for each question
expected_keywords = {
    "What foods should be avoided by patients with gout?": 
        ["purine", "red meat", "seafood", "alcohol", "beer", "organ meats"],
    "What medications are commonly prescribed for gout?": 
        ["nsaids", "colchicine", "allopurinol", "febuxostat", "probenecid", "corticosteroids"],
    "How can gout flares be prevented?": 
        ["medication", "diet", "weight", "alcohol", "water", "exercise"],
    "Is gout related to diet?": 
        ["yes", "purine", "food", "alcohol", "seafood", "meat"],
    "Can gout be cured permanently?": 
        ["manage", "treatment", "lifestyle", "medication", "chronic"]
}

score_zero = []
score_one = []
score_few = []

for q in qs:
    kwords = expected_keywords[q]

    # Zero-shot
    zp = f"Question: {q}\nAnswer:"
    z = get_llm_response(prompt_text=zp, api_key=api_key)
    s_z = score_response(z, kwords)
    score_zero.append(s_z)

    # One-shot
    op = f"Question: {exs[0][0]}\nAnswer: {exs[0][1]}\n\nQuestion: {q}\nAnswer:"
    o = get_llm_response(prompt_text=op, api_key=api_key)
    s_o = score_response(o, kwords)
    score_one.append(s_o)

    # Few-shot
    fp = (
        f"Question: {exs[0][0]}\nAnswer: {exs[0][1]}\n\n"
        f"Question: {exs[1][0]}\nAnswer: {exs[1][1]}\n\n"
        f"Question: {q}\nAnswer:"
    )
    f = get_llm_response(prompt_text=fp, api_key=api_key)
    s_f = score_response(f, kwords)
    score_few.append(s_f)


print("\n--- Scores by Prompting Strategy ---")
for i in range(len(qs)):
    print(f"\nQuestion: {qs[i]}")
    print(f"Zero-shot score: {score_zero[i]:.2f}")
    print(f"One-shot score:  {score_one[i]:.2f}")
    print(f"Few-shot score:  {score_few[i]:.2f}")

avg_z = sum(score_zero) / len(score_zero)
avg_o = sum(score_one) / len(score_one)
avg_f = sum(score_few) / len(score_few)

print("\n--- Average Scores ---")
print(f"Zero-shot: {avg_z:.2f}")
print(f"One-shot:  {avg_o:.2f}")
print(f"Few-shot:  {avg_f:.2f}")

# best
best = max([("zero-shot", avg_z), ("one-shot", avg_o), ("few-shot", avg_f)], key=lambda x: x[1])[0]
print(f"\nBest strategy overall: {best}")


--- Scores by Prompting Strategy ---

Question: What foods should be avoided by patients with gout?
Zero-shot score: 0.33
One-shot score:  0.33
Few-shot score:  0.33

Question: What medications are commonly prescribed for gout?
Zero-shot score: 0.50
One-shot score:  0.50
Few-shot score:  0.67

Question: How can gout flares be prevented?
Zero-shot score: 0.00
One-shot score:  0.50
Few-shot score:  0.50

Question: Is gout related to diet?
Zero-shot score: 0.33
One-shot score:  0.67
Few-shot score:  0.83

Question: Can gout be cured permanently?
Zero-shot score: 0.80
One-shot score:  0.60
Few-shot score:  0.60

--- Average Scores ---
Zero-shot: 0.39
One-shot:  0.52
Few-shot:  0.59

Best strategy overall: few-shot


## 6. Saving Results

Save your results in a structured format for auto-grading.

In [8]:
with open("results/part_3/prompting_results.txt", "w") as f:
    f.write("question,zero_shot,one_shot,few_shot\n")
    for i in range(len(qs)):
        f.write(f"{qs[i].lower().replace(' ', '_')},{score_zero[i]:.2f},{score_one[i]:.2f},{score_few[i]:.2f}\n")
    
    f.write(f"\naverage,{avg_z:.2f},{avg_o:.2f},{avg_f:.2f}\n")
    f.write(f"best_method,{best}\n")

## Progress Checkpoints

1. **Prompting Templates**:
   - [ ] Create zero-shot template
   - [ ] Create one-shot template
   - [ ] Create few-shot template
   - [ ] Format templates with questions and examples

2. **LLM API Integration**:
   - [ ] Connect to the Hugging Face API
   - [ ] Test with different prompts
   - [ ] Handle API errors

3. **Comparison and Evaluation**:
   - [ ] Compare strategies on multiple questions
   - [ ] Score responses based on keywords
   - [ ] Determine the best strategy

4. **Results and Documentation**:
   - [ ] Save results in the required format
   - [ ] Document your findings

## What to Submit

1. Your implementation in a Python script `utils/prompt_comparison.py` that:
   - Defines the prompting templates
   - Connects to the Hugging Face API
   - Compares different prompting strategies
   - Scores and evaluates the responses

2. The results of your experiments in `results/part_3/prompting_results.txt` with the format shown above

The auto-grader will check:
1. That your results file contains the required sections
2. That your scoring logic correctly identifies keyword presence
3. That you've correctly calculated average scores
4. That you've identified the best performing method