# Part 3: Prompt Engineering Basics

## Introduction

In this part, you'll experiment with different prompting techniques to improve the quality of responses from Large Language Models (LLMs). You'll compare zero-shot, one-shot, and few-shot prompting approaches and document which works best for different types of questions.

## Learning Objectives

- Understand different prompting techniques
- Compare zero-shot, one-shot, and few-shot prompting
- Analyze the impact of prompt design on response quality

## Setup

In [2]:
# Import necessary libraries
import requests
import json

## 1. Understanding Prompting Techniques

LLMs can be prompted in different ways to get better responses:

1. **Zero-shot prompting**: Asking the model a question directly without examples
2. **One-shot prompting**: Providing one example before asking your question
3. **Few-shot prompting**: Providing multiple examples before asking your question

## 2. Creating Prompting Templates

Your first task is to create templates for different prompting strategies.

In [3]:
# Import necessary libraries
import requests
import json # Although not used in this specific snippet, it's good practice for API interactions

# --- Part 3: Prompt Engineering Basics ---

# Define a question to experiment with
question = "What foods should be avoided by patients with gout?"

# Example for one-shot and few-shot prompting
example_q = "What are the symptoms of gout?"
example_a = "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."

# Examples for few-shot prompting
examples = [
    ("What are the symptoms of gout?",
     "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."),
    ("How is gout diagnosed?",
     "Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.")
]

# 2. Creating Prompting Templates

# Zero-shot template: Asking the model a question directly without examples
zero_shot_template = "Question: {question}\nAnswer:"

# One-shot template: Providing one example before asking your question
one_shot_template = """Question: {example_q}
Answer: {example_a}

Question: {question}
Answer:"""

# Few-shot template: Providing multiple examples before asking your question
# This template needs placeholders for each part of each example.
# We'll use q1, a1, q2, a2 to make formatting easier.
few_shot_template = """Question: {q1}
Answer: {a1}

Question: {q2}
Answer: {a2}

Question: {question}
Answer:"""

# Format the templates with your question and examples

# Format for zero-shot prompt
zero_shot_prompt = zero_shot_template.format(question=question)

# Format for one-shot prompt
one_shot_prompt = one_shot_template.format(
    example_q=example_q,
    example_a=example_a,
    question=question
)

# Format for few-shot prompt
few_shot_prompt = few_shot_template.format(
    q1=examples[0][0],  # First question from the first example
    a1=examples[0][1],  # First answer from the first example
    q2=examples[1][0],  # Second question from the second example
    a2=examples[1][1],  # Second answer from the second example
    question=question   # The main question for the prompt
)

# Print the formatted prompts for verification
print("Zero-shot_accuracy prompt:")
print(zero_shot_prompt)

print("\nOne-shot_accuracy prompt:")
print(one_shot_prompt)

print("\nFew-shot_accuracy prompt:")
print(few_shot_prompt)


Zero-shot_accuracy prompt:
Question: What foods should be avoided by patients with gout?
Answer:

One-shot_accuracy prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: What foods should be avoided by patients with gout?
Answer:

Few-shot_accuracy prompt:
Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: How is gout diagnosed?
Answer: Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.

Question: What foods should be avoided by patients with gout?
Answer:


## 3. Connecting to the LLM API

Next, implement a function to send prompts to an LLM API and get responses.

In [4]:
# Import necessary libraries
import requests
import json
import os
import sys

# Read the .env file and set variables manually
with open('.env', 'r') as f:
    for line in f:
        line = line.strip()
        if line and not line.startswith('#'):
            if '=' in line:
                key, value = line.split('=', 1)
                key = key.strip()
                value = value.strip().strip('"').strip("'")
                os.environ[key] = value

API_URL = os.getenv("API_URL")

API_KEY = os.getenv("HUGGINGFACE_API_KEY")

headers = {"Authorization": f"Bearer {API_KEY}"}  # Optional for some models

# Default timeout for API requests in seconds.
DEFAULT_API_TIMEOUT = 60

# Check if API_KEY is available. If not, print an error and exit.
if not API_KEY:
    print("Error: HUGGINGFACE_API_KEY environment variable is not set.")
    print("Please set it in your .env file or system environment before running the script.")
    sys.exit(1) # Use sys.exit to properly exit the script


def get_llm_response(prompt: str, model_name: str = "HuggingFaceH4/zephyr-7b-beta", api_key: str = API_KEY) -> str:
    """
    Get a response from the LLM based on the prompt.

    Args:
        prompt (str): The text prompt to send to the LLM.
        model_name (str): The name of the Hugging Face model to use.
        api_key (str): Your Hugging Face API key.

    Returns:
        str: The generated text from the LLM, or an informative error message if the request fails.
    """
    api_url = API_URL.format(model_name=model_name)
    headers = {"Authorization": f"Bearer {api_key}"}

    payload = {
        "inputs": prompt,
        "parameters": {
            "max_new_tokens": 150,  # Limit the length of the generated response
            "temperature": 0.7,     # Controls creativity (higher = more creative)
            "do_sample": True,      # Enables sampling from the model's output distribution
            "return_full_text": False # Instructs the API to only return the generated part
        }
    }

    try:
        response = requests.post(api_url, headers=headers, json=payload, timeout=DEFAULT_API_TIMEOUT)
        response.raise_for_status()  # Raise an HTTPError for bad responses (4xx or 5xx status codes)

        result = response.json()

        if isinstance(result, list) and "generated_text" in result[0]:
            generated_text = result[0]["generated_text"].strip()
            # Clean up potential leading/trailing parts the model might generate
            # (e.g., if it tries to continue the "Answer:" part)
            if generated_text.startswith("User:") or generated_text.startswith("AI:"):
                # This cleanup might be needed if the model tries to continue the chat format
                generated_text = generated_text.split(":", 1)[-1].strip()
            
            return generated_text
        elif isinstance(result, dict) and "error" in result:
            return f"API Error: {result.get('error', 'Unknown API Error')}"
        else:
            return f"Unexpected API response format: {str(result)}"
    except requests.exceptions.Timeout:
        return f"Request failed: Read timed out after {DEFAULT_API_TIMEOUT} seconds."
    except requests.exceptions.HTTPError as http_err:
        return f"HTTP error occurred: {http_err} - Response: {http_err.response.text}"
    except requests.exceptions.RequestException as req_err:
        return f"Request failed: {req_err}"
    except Exception as e:
        return f"An unexpected error occurred: {e}"

# --- Your previously defined prompting templates and examples ---
# Define a question to experiment with
question = "What foods should be avoided by patients with gout?"

# Example for one-shot and few-shot prompting
example_q = "What are the symptoms of gout?"
example_a = "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."

# Examples for few-shot prompting
examples = [
    ("What are the symptoms of gout?",
     "Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe."),
    ("How is gout diagnosed?",
     "Gout is diagnosed through physical examination, medical history, blood tests for uric acid levels, and joint fluid analysis to look for urate crystals.")
]

# Zero-shot template
zero_shot_template = "Question: {question}\nAnswer:"

# One-shot template
one_shot_template = """Question: {example_q}
Answer: {example_a}

Question: {question}
Answer:"""

# Few-shot template
few_shot_template = """Question: {q1}
Answer: {a1}

Question: {q2}
Answer: {a2}

Question: {question}
Answer:"""

# Format the templates with your question and examples
zero_shot_prompt = zero_shot_template.format(question=question)
one_shot_prompt = one_shot_template.format(example_q=example_q, example_a=example_a, question=question)
few_shot_prompt = few_shot_template.format(
    q1=examples[0][0], a1=examples[0][1],
    q2=examples[1][0], a2=examples[1][1],
    question=question
)

# --- Test your get_llm_response function with different prompts ---

print("\n--- Testing Zero-shot Prompt ---")
print("Prompt:\n", zero_shot_prompt)
response_zero_shot = get_llm_response(zero_shot_prompt)
print("LLM Response:")
print(response_zero_shot)

print("\n--- Testing One-shot Prompt ---")
print("Prompt:\n", one_shot_prompt)
response_one_shot = get_llm_response(one_shot_prompt)
print("LLM Response:")
print(response_one_shot)

print("\n--- Testing Few-shot Prompt ---")
print("Prompt:\n", few_shot_prompt)
response_few_shot = get_llm_response(few_shot_prompt)
print("LLM Response:")
print(response_few_shot)


--- Testing Zero-shot Prompt ---
Prompt:
 Question: What foods should be avoided by patients with gout?
Answer:
LLM Response:
HTTP error occurred: 402 Client Error: Payment Required for url: https://router.huggingface.co/hf-inference/models/openai/whisper-large-v3 - Response: {"error":"You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly included credits."}

--- Testing One-shot Prompt ---
Prompt:
 Question: What are the symptoms of gout?
Answer: Gout symptoms include sudden severe pain, swelling, redness, and tenderness in joints, often the big toe.

Question: What foods should be avoided by patients with gout?
Answer:
LLM Response:
HTTP error occurred: 402 Client Error: Payment Required for url: https://router.huggingface.co/hf-inference/models/openai/whisper-large-v3 - Response: {"error":"You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly included credits."}



## 4. Comparing Prompting Strategies

Now, let's compare the different prompting strategies on a set of healthcare questions.

In [5]:
# List of healthcare questions to test
questions = [
    "What foods should be avoided by patients with gout?",
    "What medications are commonly prescribed for gout?",
    "How can gout flares be prevented?",
    "Is gout related to diet?",
    "Can gout be cured permanently?"
]

# Assuming get_llm_response function, API_KEY, API_URL_TEMPLATE, DEFAULT_API_TIMEOUT
# example_q, example_a, examples, zero_shot_template, one_shot_template, and few_shot_template
# are defined in the broader context where this code segment will be placed.

# Store the results in a list of dictionaries
comparison_results = []

print("Starting comparison of prompting strategies...")
print("------------------------------------------------------------------")

for q_idx, current_question in enumerate(questions):
    print(f"\n--- Processing Question {q_idx + 1}: {current_question} ---")

    # 1. Create prompts using each strategy
    
    # Zero-shot prompt
    zero_shot_prompt = zero_shot_template.format(question=current_question)
    
    # One-shot prompt
    one_shot_prompt = one_shot_template.format(
        example_q=example_q, 
        example_a=example_a, 
        question=current_question
    )
    
    # Few-shot prompt
    few_shot_prompt = few_shot_template.format(
        q1=examples[0][0], a1=examples[0][1],
        q2=examples[1][0], a2=examples[1][1],
        question=current_question
    )

    # 2. Get responses from the LLM and store results
    
    # Zero-shot
    print(f"  > Getting Zero-shot response...")
    response_zero_shot = get_llm_response(zero_shot_prompt)
    comparison_results.append({
        "question": current_question,
        "strategy": "Zero-shot",
        "prompt": zero_shot_prompt,
        "response": response_zero_shot
    })
    print(f"    Response: {response_zero_shot[:100]}..." if len(response_zero_shot) > 100 else f"    Response: {response_zero_shot}")


    # One-shot
    print(f"  > Getting One-shot response...")
    response_one_shot = get_llm_response(one_shot_prompt)
    comparison_results.append({
        "question": current_question,
        "strategy": "One-shot",
        "prompt": one_shot_prompt,
        "response": response_one_shot
    })
    print(f"    Response: {response_one_shot[:100]}..." if len(response_one_shot) > 100 else f"    Response: {response_one_shot}")


    # Few-shot
    print(f"  > Getting Few-shot response...")
    response_few_shot = get_llm_response(few_shot_prompt)
    comparison_results.append({
        "question": current_question,
        "strategy": "Few-shot",
        "prompt": few_shot_prompt,
        "response": response_few_shot
    })
    print(f"    Response: {response_few_shot[:100]}..." if len(response_few_shot) > 100 else f"    Response: {response_few_shot}")


print("\n------------------------------------------------------------------")
print("Comparison complete. Summary of results:")

# Print a summary of all results
for result in comparison_results:
    print(f"\nQuestion: {result['question']}")
    print(f"  Strategy: {result['strategy']}")
    # print(f"  Prompt:\n{result['prompt']}") # Uncomment to see full prompt
    print(f"  Response: {result['response']}")
    print("-" * 20)

Starting comparison of prompting strategies...
------------------------------------------------------------------

--- Processing Question 1: What foods should be avoided by patients with gout? ---
  > Getting Zero-shot response...
    Response: HTTP error occurred: 402 Client Error: Payment Required for url: https://router.huggingface.co/hf-in...
  > Getting One-shot response...
    Response: HTTP error occurred: 402 Client Error: Payment Required for url: https://router.huggingface.co/hf-in...
  > Getting Few-shot response...
    Response: HTTP error occurred: 402 Client Error: Payment Required for url: https://router.huggingface.co/hf-in...

--- Processing Question 2: What medications are commonly prescribed for gout? ---
  > Getting Zero-shot response...
    Response: HTTP error occurred: 402 Client Error: Payment Required for url: https://router.huggingface.co/hf-in...
  > Getting One-shot response...
    Response: HTTP error occurred: 402 Client Error: Payment Required for url: h

## 5. Evaluating Responses

Create a simple evaluation function to score the responses based on the presence of expected keywords.

In [18]:
def score_response(response, keywords):
    """Score a response based on the presence of expected keywords"""
    response = response.lower()
    found_keywords_count = 0

    if not keywords:  # Avoid division by zero if keywords list is empty
        return 0.0

    for keyword in keywords:
        if keyword.lower() in response:
            found_keywords_count += 1

    return found_keywords_count / len(keywords)

# Expected keywords for each question
expected_keywords = {
    "What foods should be avoided by patients with gout?":
        ["purine", "red meat", "seafood", "alcohol", "beer", "organ meats"],
    "What medications are commonly prescribed for gout?":
        ["nsaids", "colchicine", "allopurinol", "febuxostat", "probenecid", "corticosteroids"],
    "How can gout flares be prevented?":
        ["medication", "diet", "weight", "alcohol", "water", "exercise"],
    "Is gout related to diet?":
        ["yes", "purine", "food", "alcohol", "seafood", "meat"],
    "Can gout be cured permanently?":
        ["manage", "treatment", "lifestyle", "medication", "chronic"]
}

# Score the responses and calculate average scores for each strategy
strategy_scores = {
    "zero-shot_accuracy": [],
    "one-shot_accuracy": [],
    "few-shot_accuracy": []
}

for result in comparison_results:
    question = result["question"]
    response = result["response"]
    strategy = result["strategy"]
    # Accept both with and without _accuracy for backward compatibility
    if not strategy.endswith("_accuracy"):
        strategy = strategy + "_accuracy"
    keywords_for_question = expected_keywords.get(question, [])
    score = score_response(response, keywords_for_question)
    if strategy in strategy_scores:
        strategy_scores[strategy].append(score)
    else:
        # If a new strategy appears, initialize its list
        strategy_scores[strategy] = [score]
    print(f"Scored '{question}' ({strategy}): {score:.2f} (Response: '{response[:50]}...')")

print("\n--- Scoring Results ---")
average_scores = {}
for strategy, scores in strategy_scores.items():
    if scores:
        average_score = sum(scores) / len(scores)
        average_scores[strategy] = average_score
        print(f"Average score for {strategy}: {average_score:.2f}")
    else:
        print(f"No scores recorded for {strategy}.")

print("\n--- Best Performing Strategy ---")
if average_scores:
    best_strategy = max(average_scores, key=average_scores.get)
    best_score = average_scores[best_strategy]
    print(f"The best performing strategy overall is: {best_strategy} with an average score of {best_score:.2f}")
else:
    print("No strategies were scored.")


Scored 'What foods should be avoided by patients with gout?' (Zero-shot_accuracy): 0.00 (Response: 'HTTP error occurred: 402 Client Error: Payment Req...')
Scored 'What foods should be avoided by patients with gout?' (One-shot_accuracy): 0.00 (Response: 'HTTP error occurred: 402 Client Error: Payment Req...')
Scored 'What foods should be avoided by patients with gout?' (Few-shot_accuracy): 0.00 (Response: 'HTTP error occurred: 402 Client Error: Payment Req...')
Scored 'What medications are commonly prescribed for gout?' (Zero-shot_accuracy): 0.00 (Response: 'HTTP error occurred: 402 Client Error: Payment Req...')
Scored 'What medications are commonly prescribed for gout?' (One-shot_accuracy): 0.00 (Response: 'HTTP error occurred: 402 Client Error: Payment Req...')
Scored 'What medications are commonly prescribed for gout?' (Few-shot_accuracy): 0.00 (Response: 'HTTP error occurred: 402 Client Error: Payment Req...')
Scored 'How can gout flares be prevented?' (Zero-shot_accuracy): 0.00 (

## 6. Saving Results

Save your results in a structured format for auto-grading.

In [19]:
# TODO: Save your results to results/part_3/prompt_comparison.txt
# The file should include:
# - Raw responses for each question and strategy
# - Scores for each question and strategy
# - Average scores for each strategy
# - The best performing strategy

# Define the output directory and file name
output_dir = "results/part_3"
output_file = os.path.join(output_dir, "prompt_comparison.txt")

# Ensure the output directory exists
os.makedirs(output_dir, exist_ok=True)

print(f"\nSaving results to: {output_file}")

with open(output_file, "w", encoding="utf-8") as f:
    f.write("# Prompt Engineering Results\n\n")

    # Write raw responses for each question and strategy
    for current_question in questions:
        f.write(f"## Question: {current_question}\n\n")
        
        # Filter results for the current question
        question_results = [r for r in comparison_results if r["question"] == current_question]

        # Ensure consistent order (Zero-shot, One-shot, Few-shot)
        for strategy in ["zero-shot", "one-shot", "few-shot"]:
            response_entry = next((r for r in question_results if r["strategy"] == strategy), None)
            if response_entry:
                f.write(f"{strategy}_accuracy:\n")
                f.write(f"{response_entry['response']}\n\n")
            else:
                f.write(f"{strategy}_accuracy:\n")
                f.write("[Response not found]\n\n") # Fallback if no response for this strategy

    f.write("--------------------------------------------------\n\n")
    f.write("Scores\n\n")
    f.write("```\n")
    
    # Write CSV header for scores
    f.write("question,zero-shot_accuracy,one-shot_accuracy,few-shot_accuracy\n")

    # Write individual question scores
    # Prepare a dictionary for easy score lookup per question and strategy
    question_strategy_scores = {}
    for result in comparison_results:
        q_norm = result["question"].lower().replace(" ", "_").replace("?", "") # Normalize question for CSV key
        if q_norm not in question_strategy_scores:
            question_strategy_scores[q_norm] = {}
        
        # Calculate the score for this specific response
        score = score_response(result["response"], expected_keywords.get(result["question"], []))
        question_strategy_scores[q_norm][result["strategy"]] = score

    for q_norm, scores_map in question_strategy_scores.items():
        zero_shot_score = scores_map.get("Zero-shot_accuracy", 0.0)
        one_shot_score = scores_map.get("One-shot_accuracy", 0.0)
        few_shot_score = scores_map.get("Few-shot_accuracy", 0.0)
        f.write(f"{q_norm},{zero_shot_score:.2f},{one_shot_score:.2f},{few_shot_score:.2f}\n")

    # Write average scores
    f.write(f"\naverage,{average_scores.get('Zero-shot_accuracy', 0.0):.2f},{average_scores.get('One-shot_accuracy', 0.0):.2f},{average_scores.get('Few-shot_accuracy', 0.0):.2f}\n")
    
    # Write best performing strategy
    f.write(f"best_method,{best_strategy}\n")

    f.write("```\n")

print("Results successfully saved.")



Saving results to: results/part_3\prompt_comparison.txt
Results successfully saved.


## Progress Checkpoints

1. **Prompting Templates**:
   - [ ] Create zero-shot template
   - [ ] Create one-shot template
   - [ ] Create few-shot template
   - [ ] Format templates with questions and examples

2. **LLM API Integration**:
   - [ ] Connect to the Hugging Face API
   - [ ] Test with different prompts
   - [ ] Handle API errors

3. **Comparison and Evaluation**:
   - [ ] Compare strategies on multiple questions
   - [ ] Score responses based on keywords
   - [ ] Determine the best strategy

4. **Results and Documentation**:
   - [ ] Save results in the required format
   - [ ] Document your findings

## What to Submit

1. Your implementation in a Python script `utils/prompt_comparison.py` that:
   - Defines the prompting templates
   - Connects to the Hugging Face API
   - Compares different prompting strategies
   - Scores and evaluates the responses

2. The results of your experiments in `results/part_3/prompting_results.txt` with the format shown above

The auto-grader will check:
1. That your results file contains the required sections
2. That your scoring logic correctly identifies keyword presence
3. That you've correctly calculated average scores
4. That you've identified the best performing method