# Week 3: Prompt Engineering Best Practices

- Topics: Advanced prompt engineering techniques, designing effective prompts, common pitfalls, and improvement strategies.
- Hands-on: Experimenting with prompts to achieve desired responses.

## Introduction

In recent years, **prompt engineering** has emerged as a key process in the effective use of large language models. As Phoenix and Taylor (2024) describe it:

> "Prompt engineering is the process of discovering prompts that reliably yield useful or desired results."

This iterative approach focuses on refining prompts to optimize interactions with generative AI systems, achieving consistent and meaningful outputs (Phoenix and Taylor, 2024).


### **Week 3: Prompt Engineering Learning Objectives**

By the end of this week, you will:

1. Understand the core principles of prompt engineering with a Medical AI use-case:
   - Specificity, Contextuality, Structuring, Iterativity, Creativity, and Adaptability.
2. Learn to iteratively refine prompts for enhanced output quality.
3. Experiment with role-playing and structuring techniques to improve task performance.
4. Explore how temperature and parameters affect LLM outputs.
5. Build hands-on experience in designing and testing effective prompts across multiple LLMs (GPT-4, Claude-3.5, GPT-3.5, Grok).
6. Analyze responses for clarity, relevance, creativity, and adherence to the desired prompt.



## Why Prompt Engineering is Crucial for Advanced LLM Applications

Before diving into specific techniques, it’s important to understand why prompt engineering plays a foundational role in building **advanced applications** using LLMs.

### 1. **Optimizing Application Performance**
   - **Why**: Advanced applications require high-quality and consistent outputs. Poorly designed prompts can lead to unreliable responses.
   - **Example**: In customer service chatbots, effective prompts reduce irrelevant answers, ensuring smooth user experiences.

---

### 2. **Reducing Fine-tuning Costs**
   - **Why**: Fine-tuning models is expensive and resource-intensive. Prompt engineering allows for leveraging pre-trained models without retraining.
   - **Example**: A legal assistant tool can accurately analyze contracts with carefully designed prompts, avoiding domain-specific fine-tuning.

---

### 3. **Extending Model Versatility**
   - **Why**: With prompt engineering, a single model can handle diverse tasks, making applications more flexible and cost-efficient.
   - **Example**: A content generation platform can use tailored prompts to generate marketing copy, blogs, or technical documents without switching models.

---

### 4. **Facilitating Scalability**
   - **Why**: Advanced applications often need consistent outputs at scale. Standardized prompts ensure quality across multiple queries and use cases.
   - **Example**: A translation app can standardize prompts to maintain consistency across multiple languages and users.

---

### 5. **Enabling Complex Workflows**
   - **Why**: Many applications rely on multi-step reasoning or interactions between LLMs and other systems. Prompt chaining supports this complexity.
   - **Example**: A research assistant can extract key points from a paper, then compare them across multiple documents using chained prompts.

---

### 6. **Improving Context Utilization**
   - **Why**: LLMs perform better with clear and relevant context. Prompt engineering ensures the model focuses on important details.
   - **Example**: A financial advisor tool can embed portfolio details into prompts for tailored investment advice.

---

### 7. **Enhancing Creativity and Innovation**
   - **Why**: Applications like storytelling, ideation, or creative design require prompts that balance creativity with coherence.
   - **Example**: A screenplay generator can use prompts to structure engaging narratives, specifying tone, character arcs, and plot points.

---

### 8. **Building Trust and Alignment**
   - **Why**: High-stakes applications require responses that align with user expectations and comply with ethical or legal standards.
   - **Example**: A medical assistant app can use prompts that guide the LLM to suggest seeking professional advice rather than diagnosing directly.

---

### 9. **Mitigating Model Limitations**
   - **Why**: Models occasionally produce hallucinations or misunderstand ambiguous instructions. Proper prompts reduce these risks.
   - **Example**: A knowledge retrieval system can instruct the model to only refer to provided sources, minimizing fabrications.

---

### 10. **Tailoring User Experience**
   - **Why**: LLMs should match user preferences for tone, style, and behavior. Prompts help adapt responses accordingly.
   - **Example**: A conversational AI system can shift between professional and empathetic tones based on the prompt design.

---

### Conclusion

Prompt engineering is not just about improving individual interactions—it’s a strategic approach for designing **robust, scalable, and innovative applications**. By understanding and applying these principles, developers can unlock the full potential of LLMs across diverse use cases.

## **Understanding Zero-Shot, One-Shot, and Few-Shot Learning in Medical AI**

When designing prompts for a Medical AI Assistant, understanding **zero-shot**, **one-shot**, and **few-shot learning** is essential for optimizing the model's performance in clinical scenarios. These approaches determine how much context or guidance we provide to the model to achieve accurate, actionable outputs.

---

### **1. Zero-Shot Learning**

- **Definition**: The model is asked to perform a task without any examples provided in the prompt.
- **Medical Use Case**: Best for straightforward tasks with clear instructions, such as identifying key symptoms or providing definitions.
- **Example Prompt**:
    - *"List three common causes of chest pain in adults."*
- **Advantages**:
    - Quick and efficient for simple, well-defined queries.
    - No additional context required, saving token space.
- **Challenges**:
    - Responses might lack depth or fail to address nuanced scenarios.

---

### **2. One-Shot Learning**

- **Definition**: The model is provided with one example to demonstrate the desired format or level of detail.
- **Medical Use Case**: Useful for tasks requiring a specific tone, style, or format, such as explaining symptoms or generating clinical notes.
- **Example Prompt**:
    - Instruction: *"Categorize the causes of chest pain by severity and provide explanations."*
    - Example: *"Mild: Acid reflux – Burning sensation due to stomach acid irritation. Suggest antacids."*
    - Task: *"Now categorize the following causes of chest pain: muscle strain, pneumonia, heart attack."*
- **Advantages**:
    - Guides the model toward the expected output style or structure.
    - Improves the relevance of responses for specific clinical workflows.
- **Challenges**:
    - Relies on crafting a high-quality example that aligns with the task.

---

### **3. Few-Shot Learning**

- **Definition**: The model is given several examples (usually 2–5) to infer patterns and generate consistent outputs.
- **Medical Use Case**: Best for complex or nuanced tasks, such as differential diagnosis, triaging, or creating patient care plans.
- **Example Prompt**:
    - Instruction: *"Categorize causes of chest pain by severity and suggest next steps."*
    - Examples:
        1. *"Mild: Muscle strain – Caused by physical exertion. Recommend rest and over-the-counter pain relief."*
        2. *"Moderate: Pneumonia – Infection causing breathing issues. Suggest ordering a chest X-ray and antibiotics."*
    - Task: *"Now categorize: heartburn, pulmonary embolism, heart attack."*
- **Advantages**:
    - Helps the model infer patterns from examples, ensuring consistent and accurate categorization.
    - Reduces ambiguity for complex tasks.
- **Challenges**:
    - Uses more tokens, limiting the space for longer inputs or outputs.

---

### **When to Use Each Approach**

| **Scenario**                                      | **Best Approach**    | **Reason**                                                                          |
|---------------------------------------------------|----------------------|------------------------------------------------------------------------------------|
| Simple queries (e.g., listing symptoms)           | Zero-Shot Learning   | Clear instructions are sufficient for direct answers.                             |
| Tasks requiring specific tone or formatting       | One-Shot Learning    | A single example guides the model toward the desired style.                       |
| Complex clinical reasoning (e.g., triaging)       | Few-Shot Learning    | Multiple examples provide the context needed for accurate and structured outputs. |

---

### **Practical Activity: Applying Learning Approaches**

Let’s test the same medical task using all three approaches:

**Task**: Categorize causes of chest pain into mild, moderate, and severe, and suggest next steps.

#### **Zero-Shot Learning**:
```plaintext
Prompt: 
"Categorize causes of chest pain in adults into mild, moderate, and severe, and suggest next steps."
```

#### **One-Shot Learning**:
```plaintext
Prompt:
"Categorize causes of chest pain in adults into mild, moderate, and severe, and suggest next steps. 

Example:
Mild: Acid reflux – Burning sensation caused by stomach acid. Suggest antacids and dietary changes.

Now categorize: muscle strain, pneumonia, and heart attack."
```

#### **Few-Shot Learning**:
```plaintext
Prompt:
"Categorize causes of chest pain in adults into mild, moderate, and severe, and suggest next steps. 

Examples:
1. Mild: Acid reflux – Burning sensation caused by stomach acid. Suggest antacids and dietary changes.
2. Moderate: Pneumonia – Infection causing breathing issues. Suggest ordering a chest X-ray and antibiotics.

Now categorize: pulmonary embolism, muscle strain, and heart attack."
```

---

### **Evaluating Outputs**

When comparing outputs across these approaches, consider:
1. **Categorization Accuracy**: Are causes assigned correctly to mild, moderate, or severe?
2. **Clarity and Relevance**: Are explanations concise and medically relevant?
3. **Consistency**: Does the model follow the expected format?
4. **Actionability**: Are next steps practical and aligned with clinical workflows?

---

### **Why This Matters for Medical AI**
- **Zero-Shot**: Efficient for quick, simple queries.
- **One-Shot**: Sets clear expectations for style and detail in clinical settings.
- **Few-Shot**: Excels at complex, nuanced tasks requiring reasoning or pattern recognition.

This section equips learners to choose the right approach for their Medical AI use cases and adapt it to other scenarios.


## Definitions of Prompt Engineering

To gain a comprehensive understanding of prompt engineering, here are several authoritative definitions from recent literature:

- **Phoenix and Taylor (2024):**
  
  "Prompt engineering is the process of discovering prompts that reliably yield useful or desired results."
  
  *(Phoenix and Taylor, 2024)*

- **Liu et al. (2023):**

  "Prompt engineering involves crafting input prompts that guide pre-trained language models to produce desired outputs, effectively leveraging their knowledge without the need for fine-tuning."

  *(Liu et al., 2023)*

- **Brown et al. (2020):**

  "Prompt engineering is the process of designing task-specific prompts to elicit desired behavior from language models, enabling them to perform a wide range of tasks without explicit training."

  *(Brown et al., 2020)*

- **Reynolds and McDonell (2021):**

  "Prompt engineering is the art of designing prompts that effectively communicate the user's intent to a language model, facilitating accurate and relevant responses."

  *(Reynolds and McDonell, 2021)*

- **White et al. (2023):**

  "Prompt engineering is a methodology for structuring inputs to large language models to achieve desired outcomes, often involving iterative refinement and understanding of model behavior."

  *(White et al., 2023)*


In their 2024 book, *Prompt Engineering for Generative AI*, James Phoenix and Mike Taylor outline five key principles to enhance prompt engineering:

1. **Specificity**: Craft prompts with clear and detailed instructions to guide the AI toward the desired output.

2. **Contextuality**: Provide relevant context within the prompt to help the AI understand the background and nuances of the task.

3. **Structuring**: Organize prompts in a logical and coherent manner, using formatting techniques like bullet points or numbered lists to improve clarity.

4. **Iterativity**: Engage in an iterative process of refining prompts based on the AI's responses, continually adjusting to achieve optimal results.

5. **Creativity and Adaptability**: Encourage the AI to explore creative solutions and adapt to various scenarios by designing prompts that allow for flexibility and innovation.

These principles serve as a framework for effectively interacting with AI models, ensuring outputs are reliable and aligned with user expectations.  

## Goals of Prompt Engineering

The primary goals of prompt engineering, informed by the five principles of Phoenix and Taylor (2024), are:

1. **Eliciting Specific Outputs** (*Specificity*):  
   - Design prompts with clear and detailed instructions to guide AI responses toward the desired format or content.

2. **Enhancing Relevance and Accuracy** (*Contextuality*):  
   - Provide the necessary background and context within prompts to minimize misinterpretations and ensure precise outputs.

3. **Maximizing Clarity and Efficiency** (*Structuring*):  
   - Organize prompts logically to reduce ambiguity and streamline the interaction, minimizing the need for excessive iterations.

4. **Facilitating Iterative Refinement** (*Iterativity*):  
   - Use a trial-and-error process to test, evaluate, and improve prompts until the model consistently delivers optimal results.

5. **Encouraging Creativity and Adaptability** (*Creativity and Adaptability*):  
   - Explore innovative solutions and expand the model's capabilities by crafting prompts that encourage flexibility and diverse outputs.

---

### Why These Goals Matter
These goals not only emphasize creating effective and reliable prompts but also ensure that the process of interacting with LLMs is structured, efficient, and innovative. They align directly with building advanced applications by:

- Ensuring clarity and relevance for practical use cases.
- Reducing costs and time spent fine-tuning prompts.
- Encouraging exploration of the model's full potential for creative and high-impact solutions.



## Common Challenges in Prompt Engineering

Even with a well-designed framework, prompt engineering comes with its own set of challenges. Recognizing these obstacles can help practitioners refine their techniques and avoid common pitfalls.

### 1. **Ambiguity**
- **Problem**: Poorly worded or overly general prompts lead to vague, irrelevant, or inconsistent responses.
- **Example**:  
  - Prompt: *"Tell me about technology."*  
  - Likely Response: Broad, unfocused content about various aspects of technology, such as history, trends, or specific fields.  
- **Solution**: Use **Specificity** to clarify the task and guide the model toward a focused response.  
  - Refined Prompt: *"Describe three major trends in AI technology in 2024."*

---

### 2. **Overloading**
- **Problem**: Including multiple unrelated tasks or excessive details in a single prompt can confuse the model, leading to incomplete or inaccurate outputs.
- **Example**:  
  - Prompt: *"Summarize this document and generate a response to the email thread below."*  
  - Likely Response: Partial completion of either the summary or the email response, or mixing both tasks incoherently.
- **Solution**: Break down tasks into manageable, sequential steps using **Prompt Chaining**.  
  - Refined Approach:  
    - Step 1: *"Summarize this document in 100 words."*  
    - Step 2: *"Generate a response to the email thread based on the summary."*

---

### 3. **Context Limitations**
- **Problem**: LLMs have token limits, so overly long prompts or excessive context can truncate critical information, affecting input processing or output quality.
- **Example**: Providing an entire document as context for a summarization task might cut off essential parts of the input or output.
- **Solution**: Use **Contextuality** to prioritize relevant information and trim unnecessary details.  
  - Refined Approach:  
    - Extract key sections from the document before including them in the prompt.  
    - Prompt: *"Summarize the following key sections of the document: [Insert Key Sections]."*

---

### 4. **Misaligned Expectations**
- **Problem**: Assuming the model inherently understands a task without explicit instructions can lead to responses that are off-target or lack critical detail.
- **Example**:  
  - Prompt: *"Create a project plan."*  
  - Likely Response: A generic or overly simplistic outline, without tailoring to the specific project.  
- **Solution**: Combine **Specificity** and **Structuring** to clearly define expectations.  
  - Refined Prompt: *"Create a project plan for launching a healthcare AI application. Include the following: timeline, milestones, team roles, and deliverables."*

---

### 5. **Inconsistent Tone or Style**
- **Problem**: Without guidance, the model may produce outputs in an unintended tone or style (e.g., overly formal, casual, or inconsistent).
- **Example**: A customer email draft may sound too robotic or too informal.
- **Solution**: Use **Creativity and Adaptability** to specify tone and style explicitly.  
  - Refined Prompt: *"Write a professional yet empathetic email to a customer about a delayed order."*

---

### 6. **Over-reliance on Model Defaults**
- **Problem**: Relying solely on default model behavior can lead to missed opportunities for customization or innovation.
- **Example**: Accepting a generic summary without exploring variations.
- **Solution**: Experiment with **Iterativity** by testing and refining prompts to align the model's responses with specific needs.  
  - Refined Approach: Adjust temperature, tone, or formatting based on the task’s requirements.

---

### Conclusion
Understanding and addressing these common challenges ensures that your prompts are precise, efficient, and aligned with your goals. By applying the **Five Principles**—Specificity, Contextuality, Structuring, Iterativity, and Creativity and Adaptability—you can overcome these obstacles and harness the full potential of LLMs.



## Best Practices for Effective Prompt Engineering

Effective prompt engineering involves crafting inputs that reliably produce high-quality, task-aligned outputs from LLMs. By following these best practices, you can optimize model performance and achieve consistent results.

---

### 1. **Be Specific**
- **What**: Provide clear and detailed instructions to minimize ambiguity and guide the model.
- **Why**: Specific prompts reduce the likelihood of irrelevant or overly broad responses.
- **Example**:  
  - Weak Prompt: *"Summarize this document."*  
  - Improved Prompt: *"Summarize this document in less than 50 words, focusing on key recommendations."*
- **Related Principle**: **Specificity**

---

### 2. **Iterate and Refine**
- **What**: Test multiple variations of a prompt and refine it based on the quality of responses.
- **Why**: Iteration ensures that the prompt evolves to align better with your goals, especially for complex tasks.
- **Example**:  
  - Initial: *"Explain AI."*  
  - Refined: *"Explain AI to a high school student in simple terms, using an example of a virtual assistant."*
- **How**: Use a systematic approach to log changes and compare results.
- **Related Principle**: **Iterativity**

---

### 3. **Structure Prompts Clearly**
- **What**: Organize prompts with logical formatting, such as bullet points or numbered lists.
- **Why**: Structured prompts improve clarity, especially for multi-step or detailed tasks.
- **Example**:  
  - *"List the benefits of regular exercise:\n1.\n2.\n3."*
- **Tips**: Use sub-prompts for stepwise outputs in complex tasks.
- **Related Principle**: **Structuring**

---

### 4. **Incorporate Role-Playing**
- **What**: Assign a specific role to the model to tailor its tone, style, and perspective.
- **Why**: Role-playing aligns the response with the expectations of a particular domain or audience.
- **Example**:  
  - *"You are a legal advisor. Draft a response to this contract clause highlighting potential risks."*
- **Use Cases**: Medical AI (e.g., “You are a clinician explaining a diagnosis”), customer support, or education.
- **Related Principle**: **Contextuality**

---

### 5. **Set Explicit Parameters**
- **What**: Define specific constraints like tone, length, or style to shape the model’s response.
- **Why**: Parameters improve consistency and alignment with the intended audience.
- **Example**:  
  - *"Write a professional email apologizing for a delayed shipment in under 150 words."*
- **Additional Tips**: Experiment with tone variations (e.g., formal, casual, empathetic) for tailored outputs.
- **Related Principles**: **Specificity, Creativity and Adaptability**

---

### 6. **Leverage Context Effectively**
- **What**: Provide relevant background information to guide the model’s response.
- **Why**: Context improves the accuracy and relevance of outputs.
- **Example**:  
  - Weak Prompt: *"Explain quantum computing."*  
  - Improved Prompt: *"Explain quantum computing to a 10th-grade student using the analogy of flipping coins."*
- **Additional Tips**: Trim unnecessary context to stay within token limits.
- **Related Principles**: **Contextuality**

---

### 7. **Experiment with Creativity**
- **What**: Encourage the model to explore creative solutions or diverse outputs.
- **Why**: Creativity enhances responses in domains like storytelling, marketing, or brainstorming.
- **Example**:  
  - *"Create a short story about a scientist who discovers a new planet."*
- **Tips**: Adjust the `temperature` parameter for more imaginative responses.
- **Related Principle**: **Creativity and Adaptability**

---

### 8. **Prompt Chaining for Complex Tasks**
- **What**: Break down multi-step tasks into sequential prompts.
- **Why**: Chaining ensures that the model handles each step systematically, reducing confusion.
- **Example**:  
  - Step 1: *"Summarize this patient’s medical history in two sentences."*  
  - Step 2: *"Based on the summary, suggest three potential diagnoses."*
- **Related Principles**: **Structuring, Iterativity**

---

### Conclusion
These best practices, grounded in the **Five Principles**, enable practitioners to craft effective prompts that maximize clarity, relevance, and creativity. By combining structured methodologies with iterative refinement, prompt engineering becomes a powerful tool for leveraging LLMs in diverse applications.



### **Essential Components of a Great Prompt**

A great prompt incorporates the following essential components to ensure it is well-designed and effective:

1. **Task Definition**:
   - Clearly state the task or goal.
   - Example: *"Generate a summary of the following text."*

2. **Context**:
   - Provide relevant background information or details that guide the model’s understanding.
   - Example: *"This text is from a medical report. Summarize the findings in layman's terms."*

3. **Role Assignment**:
   - Assign a specific role or persona to the model to align its tone, style, or domain expertise.
   - Example: *"You are a medical AI assistant. Explain the diagnosis in simple terms."*

4. **Constraints**:
   - Define parameters such as tone, length, style, or formatting requirements.
   - Example: *"Limit the response to 150 words and use a professional tone."*

5. **Desired Output**:
   - Specify the expected output format or structure.
   - Example: *"Provide a bulleted list of three key recommendations."*

6. **Clarity and Focus**:
   - Use simple and unambiguous language to avoid confusing the model.
   - Avoid: *"Tell me about this."*  
   - Use: *"Explain the main causes of chest pain, focusing on cardiovascular issues."*

7. **Iterative Refinement** (if applicable):
   - Test and refine the prompt over multiple interactions to align outputs with your expectations.

---

### **Prompt Template**

Here’s a reusable template for designing effective prompts:

```plaintext
[Task Definition]
- [Clearly state the task you want the model to perform.]

[Context]
- [Provide relevant background information or details that guide the response.]

[Role Assignment]
- [Specify the role or persona the model should assume.]

[Constraints]
- [Define any parameters such as tone, length, style, or formatting.]

[Desired Output]
- [Specify the expected format or structure of the response.]
```

---

### **Example of a Well-Designed Prompt**

**Scenario**: Medical AI assisting clinicians in triaging chest pain.

```plaintext
[Task Definition]
- Categorize common causes of chest pain in adults by severity.

[Context]
- Chest pain can arise from cardiovascular, respiratory, or gastrointestinal causes. The purpose is to assist clinicians in triaging patients effectively.

[Role Assignment]
- You are a medical AI assistant working in an emergency department.

[Constraints]
- Use a professional tone. Limit your response to 200 words.

[Desired Output]
- Organize the causes into three categories: mild, moderate, and severe. Provide a brief explanation and suggest one next step for clinicians for each category.
```

---

### **Output Example**

**Prompt Execution Result:**

```plaintext
Mild Causes:
- Muscle strain: Often due to physical activity; advise rest and monitoring.
- Acid reflux: Caused by stomach acid; recommend antacids and lifestyle changes.

Moderate Causes:
- Pneumonia: An infection causing chest pain and difficulty breathing; suggest ordering a chest X-ray.
- Pulmonary embolism (suspected): Requires evaluation with imaging and blood tests.

Severe Causes:
- Heart attack: Acute and life-threatening; recommend immediate ECG and cardiac enzyme testing.
- Aortic dissection: Sudden severe pain; urgent CT scan and surgical consultation needed.
```

---

### **How This Template Helps**
1. **Consistency**: Ensures every prompt includes the key components.
2. **Reusability**: Can be adapted to various tasks and domains.
3. **Clarity**: Reduces ambiguity, improving the quality of the model’s outputs.
4. **Scalability**: Useful for both individual and team-based workflows.



## Hands-On Exercise: Testing Prompts Across LLMs

### Setup
1. Ensure all necessary API keys are loaded.
2. Initialize connections to the following models:
   - GPT-4
   - GPT-3.5
   - Claude-3.5
   - Grok-Beta

### Example Exercise
**Prompt**:  
*"You are a medical AI assistant helping a clinician assess chest pain in adults. Categorize potential causes into mild, moderate, and severe based on urgency. Provide brief explanations for each cause and suggest next steps for clinicians to take based on the severity level."*

**Code Implementation**:
```python
# Define the prompt
test_prompt = (
    "You are a medical AI assistant helping a clinician assess chest pain in adults. "
    "Categorize potential causes into mild, moderate, and severe based on urgency. "
    "Provide brief explanations for each cause and suggest next steps for clinicians to take based on the severity level."
)

# Compare responses across models
compare_responses(test_prompt)
```

**Expected Output**:
The notebook should display responses from all four models, allowing analysis of:
- **Clarity**: Are the causes clearly categorized and explained?
- **Relevance**: Are the responses medically accurate and appropriate for a clinical setting?
- **Creativity**: Do the suggestions for next steps add value to the use case?
- **Adherence to Prompt Instructions**: Are the responses well-structured and aligned with the given requirements?



## Advanced Prompt Refinement Techniques

1. **Iterative Refinement**:
   - Start with a general prompt and refine it based on the responses.
   - *Example*:
     - Initial: "What causes chest pain?"
     - Refined: "Categorize potential causes of chest pain into mild, moderate, and severe, and provide examples."

2. **Prompt Chaining**:
   - Use a sequence of prompts to guide the model toward solving complex tasks.
   - *Example*:
     - Step 1: "List potential causes of chest pain in adults."
     - Step 2: "For each cause, categorize it as mild, moderate, or severe based on urgency."
     - Step 3: "Suggest a diagnostic or treatment step for each category."

3. **Experimenting with Temperature**:
   - Explore varying levels of creativity in the model's output by adjusting the `temperature` parameter.
   - *Example*:
     - Low Temperature (e.g., `temperature=0`): Produces precise, deterministic responses suitable for clinical tasks.
     - High Temperature (e.g., `temperature=0.8`): Encourages diverse, creative outputs, useful for brainstorming.

4. **Contrastive Prompts**:
   - Test the model's adaptability by framing the same question in contrasting ways.
   - *Example*:
     - Prompt 1: "Explain chest pain causes to a first-year medical student."
     - Prompt 2: "Explain chest pain causes to a non-medical audience."

In [1]:
# pip install anthropic --break-system-packages

In [4]:
# pip install langchain_anthropic --break-system-packages

In [27]:
# ======================================
# Initialization and Environment Setup
# ======================================

import os
import requests
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from anthropic import Anthropic

# Load environment variables from .env file
load_dotenv()

# Helper function to load environment variables
def get_env_var(var: str):
    value = os.getenv(var)
    if value is None:
        raise ValueError(f"{var} not found in environment variables. Make sure it is set in your .env file.")
    return value

# Load API keys
langchain_api_key = get_env_var("LANGCHAIN_API_KEY")
langchain_tracing_v2 = get_env_var("LANGCHAIN_TRACING_V2")
openai_api_key = get_env_var("OPENAI_API_KEY")
anthropic_api_key = get_env_var("ANTHROPIC_API_KEY")
grok_api_key = get_env_var("GROK_API_KEY")

# ======================================
# Model Setup
# ======================================

# OpenAI GPT models
gpt4o_chat = ChatOpenAI(model="gpt-4o", temperature=0, openai_api_key=openai_api_key)
gpt35_chat = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0, openai_api_key=openai_api_key)

# Anthropic Claude models
claude = Anthropic(api_key=anthropic_api_key)
claude_chat = ChatAnthropic(model="claude-3-5-sonnet-20240620", temperature=0, anthropic_api_key=anthropic_api_key)

# ======================================
# Grok API Integration
# ======================================

def query_grok(prompt: str, model="grok-beta", stream=False, temperature=0):
    """
    Query the Grok API with a user-provided prompt and return cleaned content.
    """
    # Define the Grok API endpoint and headers
    url = "https://api.x.ai/v1/chat/completions"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {grok_api_key}"
    }

    # Define the payload
    payload = {
        "messages": [
            {"role": "system", "content": "You are Grok, a chatbot inspired by the Hitchhikers Guide to the Galaxy."},
            {"role": "user", "content": prompt}
        ],
        "model": model,
        "stream": stream,
        "temperature": temperature
    }

    try:
        # Send the request to the Grok API
        response = requests.post(url, headers=headers, json=payload)
        response.raise_for_status()  # Raise an error if the request fails
        
        # Parse and return the relevant content
        response_json = response.json()
        choices = response_json.get("choices", [])
        if choices and "content" in choices[0].get("message", {}):
            return choices[0]["message"]["content"]  # Extract only the assistant's content
        else:
            return "No content returned"

    except requests.exceptions.RequestException as e:
        print("Error querying Grok API:", e)
        return "Error querying Grok API."


# ======================================
# Unified Response Comparison with Formatting and Step-specific Saving
# ======================================

def format_response(response: str) -> str:
    """
    Standardize the formatting of the model's response for consistent readability.
    """
    # Split the response into sections based on categories (if applicable)
    sections = response.split("\n")
    formatted_response = []
    for section in sections:
        # Add formatting for sections that appear as headers
        if section.strip().endswith(":"):
            formatted_response.append(f"\n### {section.strip()}")  # Add Markdown-style headers
        else:
            formatted_response.append(section.strip())  # Keep other lines as is
    return "\n".join(formatted_response)

def save_comparison_to_markdown(prompt: str, results: dict, filename: str):
    """
    Save the formatted output of compare_responses to a Markdown file.

    Parameters:
    - prompt: The prompt used for the comparison.
    - results: A dictionary containing model responses.
    - filename: The filename for the Markdown file.
    """
    try:
        with open(filename, "w", encoding="utf-8") as f:
            # Write the prompt
            f.write(f"# Prompt:\n\n{prompt}\n\n")
            f.write("=" * 80 + "\n\n")
            
            # Write each model's response
            for model, response in results.items():
                f.write(f"## {model} Response\n\n")
                f.write(f"{response}\n\n")
                f.write("-" * 80 + "\n\n")
        
        print(f"Saved comparison results to {filename}")
    except Exception as e:
        print(f"Error saving comparison to {filename}: {e}")

def compare_responses(prompt: str, step_name="initial", include_claude=True, include_gpt4=True, include_gpt35=True, include_grok=True, save_dir="outputs"):
    """
    Compare responses from different models for the same prompt, format the output, and save to step-specific Markdown files.

    Parameters:
    - prompt: The input prompt for comparison.
    - step_name: A unique identifier for the step (e.g., "initial", "step1").
    - include_claude: Include Claude model in the comparison.
    - include_gpt4: Include GPT-4 model in the comparison.
    - include_gpt35: Include GPT-3.5 model in the comparison.
    - include_grok: Include Grok model in the comparison.
    - save_dir: Directory to save the Markdown file.
    """
    results = {}

    # Collect responses from all included models
    if include_claude:
        claude_response = claude_chat.invoke(prompt)
        results["Claude-3.5-Sonnet"] = format_response(claude_response.content)

    if include_gpt4:
        gpt4_response = gpt4o_chat.invoke(prompt)
        results["GPT-4o"] = format_response(gpt4_response.content)

    if include_gpt35:
        gpt35_response = gpt35_chat.invoke(prompt)
        results["GPT-3.5"] = format_response(gpt35_response.content)

    if include_grok:
        grok_response = query_grok(prompt)
        results["Grok-Beta"] = format_response(grok_response)

    # Ensure output directory exists
    os.makedirs(save_dir, exist_ok=True)

    # Generate the step-specific filename
    output_filename = os.path.join(save_dir, f"{step_name}_comparison_responses.md")
    save_comparison_to_markdown(prompt, results, output_filename)

    # Display the formatted results in the console
    print(f"\nPrompt: {prompt}\n")
    print("=" * 80)
    for model, response in results.items():
        print(f"\n{model}:\n")
        print(response)
        print("-" * 80)


In [None]:
# =========================================
# Example If Testing and Evaluation of various prompts
# =========================================

# Example test prompts
test_prompts = [
    "Generate a list of creative product names for a new eco-friendly cleaning brand.",
    "What is the meaning of life, the universe, and everything?",
    "Explain quantum entanglement in simple terms."
]

# Run comparisons for each prompt
for prompt in test_prompts:
    compare_responses(prompt)


### Example: **Medical AI Use Case - Diagnosing Symptoms**

#### **Initial Prompt**  
*"What could be causing chest pain?"*



In [30]:
# Initial step
compare_responses(prompt="What are the common causes of chest pain?", step_name="initial")

Saved comparison results to outputs/initial_comparison_responses.md

Prompt: What are the common causes of chest pain?


Claude-3.5-Sonnet:


### Chest pain can be caused by a variety of factors, ranging from minor issues to serious medical conditions. Here are some common causes of chest pain:


### 1. Cardiovascular causes:
- Heart attack (myocardial infarction)
- Angina (reduced blood flow to the heart)
- Pericarditis (inflammation of the heart's protective sac)
- Aortic dissection (tear in the wall of the aorta)


### 2. Respiratory causes:
- Pneumonia
- Pleurisy (inflammation of the lung lining)
- Pulmonary embolism (blood clot in the lungs)
- Asthma or bronchitis


### 3. Gastrointestinal causes:
- Acid reflux or GERD (gastroesophageal reflux disease)
- Esophageal spasms
- Peptic ulcers
- Gallbladder issues


### 4. Musculoskeletal causes:
- Costochondritis (inflammation of rib cartilage)
- Muscle strain or injury
- Rib fracture


### 5. Psychological causes:
- Anxiety or panic a

- Problem: This prompt is too vague and open-ended. The model may return a broad list of causes, some irrelevant or too general, without structure or focus.
- Goal: Refine the prompt to guide the model toward more specific, actionable, and clinically relevant responses.

#### **Step 1: Apply Specificity**  
**Revised Prompt**:  
*"Provide a list of common causes of chest pain."*  

**Why**:  
The initial prompt is too open-ended, potentially leading to vague or overly broad responses. Adding specificity focuses the task.


In [32]:
# Step 1 Specificity Prompt
compare_responses(prompt="Provide a list of common causes of chest pain.", step_name="step1")

Saved comparison results to outputs/step1_comparison_responses.md

Prompt: Provide a list of common causes of chest pain.


Claude-3.5-Sonnet:


### Here's a list of common causes of chest pain:


### 1. Cardiovascular causes:
- Angina
- Heart attack (myocardial infarction)
- Pericarditis
- Aortic dissection
- Coronary artery spasm


### 2. Respiratory causes:
- Pneumonia
- Pleurisy
- Pulmonary embolism
- Pneumothorax (collapsed lung)
- Asthma


### 3. Gastrointestinal causes:
- Gastroesophageal reflux disease (GERD)
- Esophageal spasms
- Peptic ulcers
- Gallbladder disease
- Pancreatitis


### 4. Musculoskeletal causes:
- Costochondritis
- Muscle strain
- Rib fracture
- Fibromyalgia


### 5. Psychological causes:
- Anxiety or panic attacks
- Depression


### 6. Other causes:
- Shingles (before the rash appears)
- Chest wall inflammation
- Breast pain
- Sickle cell crisis


### 7. Lung-related causes:
- Bronchitis
- Tuberculosis


### 8. Trauma-related causes:
- Chest injury
- Broken r

### What We’re Achieving
- Improvement: By specifying "common causes" and the demographic "in adults," the model focuses on relevant and typical causes, reducing noise.
- Why: Vague terms like "could be" are replaced with actionable language.


#### **Step 2: Add Contextuality**  
**Revised Prompt**:  
*"What are the common causes of chest pain in adults, focusing on cardiovascular, respiratory, and gastrointestinal causes?"*

**Why**:  
Adding context about the specific systems involved (e.g., cardiovascular, respiratory, gastrointestinal) helps narrow the scope of the response to medically relevant areas. Including the demographic detail (adults) further ensures the output is tailored to the appropriate population. This contextual guidance reduces the risk of irrelevant or overly generalized answers.

**What We’re Achieving**:  
- **Focus**: The model prioritizes key systems that are most clinically significant for chest pain.  
- **Relevance**: By specifying "adults," the model excludes causes more common in children or rare cases not relevant to the audience.  
- **Structure**: The added focus naturally organizes the response around the three systems mentioned.

--- 

This refinement ensures the prompt sets up the AI to produce medically relevant and actionable results.



In [33]:
# Step 2 Add Contextuality
compare_responses(prompt="What are the common causes of chest pain in adults, focusing on cardiovascular, respiratory, and gastrointestinal causes?", step_name="step2")

Saved comparison results to outputs/step2_comparison_responses.md

Prompt: What are the common causes of chest pain in adults, focusing on cardiovascular, respiratory, and gastrointestinal causes?


Claude-3.5-Sonnet:


### Chest pain in adults can be caused by a variety of conditions related to the cardiovascular, respiratory, and gastrointestinal systems. It's important to note that chest pain can be a symptom of serious, life-threatening conditions and should always be evaluated by a healthcare professional. Here are some common causes of chest pain in adults, categorized by system:


### Cardiovascular Causes:


### 1. Coronary Artery Disease (CAD):
- Angina: Chest pain due to reduced blood flow to the heart muscle
- Myocardial Infarction (Heart Attack): Blockage of blood flow to the heart muscle

2. Pericarditis: Inflammation of the protective sac around the heart

3. Myocarditis: Inflammation of the heart muscle

4. Aortic Dissection: Tearing of the inner layer of the aorta

5. M

### What We’re Achieving in Step 2: Add Contextuality
- Improvement: The added context directs the model to focus on specific systems (cardiovascular, respiratory, gastrointestinal), ensuring more clinically relevant responses.
- Why: Context narrows the scope of possible answers, increasing precision.

## **Step 3: Improve Structure**  
**Revised Prompt**:  
*"You are a medical AI assistant helping a clinician. Provide a list of common causes of chest pain in adults, focusing on cardiovascular, respiratory, and gastrointestinal causes. Organize the causes into three categories: mild, moderate, and severe, based on urgency. Include a brief explanation for why each cause belongs to its category."*

**Why**:  
- **Role Assignment**: Assigning a role ensures the model uses a professional tone and aligns the output with clinical expectations.  
- **Focus and Context**: Retains the focus on cardiovascular, respiratory, and gastrointestinal causes from Step 2, ensuring the response is medically relevant and appropriately scoped.  
- **Task Structuring**: Clearly defined categories (mild, moderate, severe) with explanations make the response actionable and easy to use.  

**What We’re Achieving**:  
1. **Reinforcing Context**: By keeping the focus on specific systems, the model produces relevant and structured responses.  
2. **Clarity and Structure**: The use of categories ensures the output is logically organized for quick interpretation.  
3. **Added Value**: Explanations provide clinicians with reasoning behind the categorizations, making the output more informative and actionable.



In [34]:
# Step 3
compare_responses(
    prompt="You are a medical AI assistant helping a clinician. Provide a list of common causes of chest pain in adults, focusing on cardiovascular, respiratory, and gastrointestinal causes. Organize the causes into three categories: mild, moderate, and severe, based on urgency. Include a brief explanation for why each cause belongs to its category.",
    step_name="step3"
)

Saved comparison results to outputs/step3_comparison_responses.md

Prompt: You are a medical AI assistant helping a clinician. Provide a list of common causes of chest pain in adults, focusing on cardiovascular, respiratory, and gastrointestinal causes. Organize the causes into three categories: mild, moderate, and severe, based on urgency. Include a brief explanation for why each cause belongs to its category.


Claude-3.5-Sonnet:


### Here's a list of common causes of chest pain in adults, organized into mild, moderate, and severe categories based on urgency, focusing on cardiovascular, respiratory, and gastrointestinal causes:


### Mild (generally non-urgent):

1. Costochondritis: Inflammation of the cartilage connecting ribs to the breastbone. Categorized as mild due to its benign nature and lack of serious complications.

2. Gastroesophageal reflux disease (GERD): Acid reflux causing heartburn. Mild because it's usually manageable with lifestyle changes and medication.

3. Muscl

### **What We’re Achieving in Step 3: Improve Structure**

1. **Clarity and Organization**:
   - By introducing categories (mild, moderate, severe) and structuring the task, the output is now logically organized and easier for clinicians to interpret and act on.

2. **Professional Alignment**:
   - Assigning the role of a "medical AI assistant" aligns the model’s tone and focus with the expectations of a professional clinical setting.

3. **Scoping the Task**:
   - Narrowing the scope to focus on specific systems (cardiovascular, respiratory, gastrointestinal) ensures responses are relevant and medically appropriate.

4. **Actionability**:
   - Including brief explanations for each cause provides clinicians with reasoning behind the categorization, enhancing the practicality of the response.

5. **Foundation for Refinement**:
   - This structured prompt creates a foundation for further iterative improvements, such as adding next steps or refining tone.

**Example Expected Output**:
```plaintext
Mild Causes:
- Acid reflux: Often due to stomach acid irritation; can cause burning chest pain. Recommending antacids and dietary changes is sufficient.

Moderate Causes:
- Pneumonia: Infection causing chest discomfort and breathing issues. Suggest ordering a chest X-ray and antibiotics.

Severe Causes:
- Heart attack: Acute chest pain with life-threatening potential. Immediate ECG and cardiac enzyme tests are essential.
```

**Next Steps**: 
Use the results of this step to evaluate:
- Are the causes appropriately categorized?
- Are the explanations clear and concise?
- Is the tone professional and aligned with the role?

This ensures the structured framework is ready for iterative refinement in Step 4.


## **Step 4: Iterate Based on Results**  

**Prompt Testing**:  
Test the structured prompt from Step 3 with an LLM. Evaluate the output based on the following criteria:  
1. **Categorization Accuracy**: Are the causes appropriately categorized as mild, moderate, or severe based on urgency?  
2. **Explanatory Clarity**: Are the explanations concise, medically relevant, and easy to understand?  
3. **Completeness**: Does the response address all specified systems (cardiovascular, respiratory, gastrointestinal)?  
4. **Professional Tone**: Does the output align with the expected role of a medical assistant?  

---

**Example Revision** (if the output needs refinement):  
*"You are a medical AI assistant helping a clinician assess chest pain in adults. Categorize potential causes into mild, moderate, and severe based on urgency. Focus on cardiovascular, respiratory, and gastrointestinal causes. Provide one or two concise sentences explaining why each cause falls into its category. Suggest one next step for clinicians to take for each category."*  

---

**Why**:  
- **Refining Explanations**: By specifying the length and clarity of explanations (one or two concise sentences), the prompt ensures outputs are actionable without being verbose.  
- **Adding Next Steps**: Suggesting clinical next steps makes the output directly useful for decision-making.  
- **Iterative Refinement**: Testing the prompt and adjusting based on feedback ensures alignment with clinical needs and model performance.  

---

**What We’re Achieving**:  
1. **Alignment**: Ensures that the model's output matches clinical priorities by testing and refining the categorization.  
2. **Clarity**: Focuses on concise, relevant explanations that support quick understanding.  
3. **Practicality**: Adds actionable recommendations, enhancing the utility of the output for real-world applications.  
4. **Continuous Improvement**: Establishes a feedback loop to iteratively refine the prompt for optimal results.

---

**Evaluation Questions for Students**:  
When testing and iterating on prompts, encourage students to ask:  
- Does the model categorize causes accurately?  
- Are the explanations clear and clinically relevant?  
- Does the output provide actionable insights?  
- How can I refine the prompt to improve these aspects?  

This iterative approach ensures a systematic process for refining prompts, making them both effective and practical. 

In [35]:
# Step 4
compare_responses(
    prompt="You are a medical AI assistant helping a clinician assess chest pain in adults. Categorize potential causes into mild, moderate, and severe based on urgency. Focus on cardiovascular, respiratory, and gastrointestinal causes. Provide one or two concise sentences explaining why each cause falls into its category. Suggest one next step for clinicians to take for each category.",
    step_name="step4"
)

Saved comparison results to outputs/step4_comparison_responses.md

Prompt: You are a medical AI assistant helping a clinician assess chest pain in adults. Categorize potential causes into mild, moderate, and severe based on urgency. Focus on cardiovascular, respiratory, and gastrointestinal causes. Provide one or two concise sentences explaining why each cause falls into its category. Suggest one next step for clinicians to take for each category.


Claude-3.5-Sonnet:


### Here's a categorization of potential causes of chest pain in adults, along with brief explanations and next steps for clinicians:


### Mild (Low Urgency):

1. Costochondritis: Inflammation of chest wall cartilage. Usually benign and self-limiting.
2. Gastroesophageal reflux disease (GERD): Acid reflux causing chest discomfort. Typically responds well to lifestyle changes and medication.
3. Anxiety or panic attacks: Can mimic cardiac symptoms but are not life-threatening.

Next step: Conduct a thorough history and p

### **What We’re Achieving in Step 4: Iterate Based on Results**

1. **Validation of Categorization**:
   - Testing the prompt helps confirm whether the model correctly assigns causes of chest pain to mild, moderate, or severe categories based on urgency.
   - Misclassifications can be identified and addressed through prompt refinements.

2. **Improved Explanations**:
   - Refining the requirement for concise and relevant explanations ensures the model provides actionable insights that are aligned with clinical needs.
   - Testing ensures the explanations avoid unnecessary complexity or vagueness.

3. **Enhanced Practicality**:
   - Adding suggested next steps makes the output directly actionable, moving from categorization to practical guidance for clinicians.

4. **Alignment with Context**:
   - The iterative process ensures that the model’s response stays focused on the specified systems (cardiovascular, respiratory, gastrointestinal) and excludes irrelevant causes.

5. **Feedback Loop for Refinement**:
   - Testing and refining create a feedback loop where the prompt is continually improved based on observed results, making it more robust and effective for real-world use cases.

---

**Example Expected Output** (After Refinement):

```plaintext
Mild Causes:
- Muscle strain: Caused by overuse or minor injuries. Recommend rest and over-the-counter pain relief.
- Acid reflux: Stomach acid irritation often mistaken for chest pain. Suggest antacids and lifestyle adjustments.

Moderate Causes:
- Pneumonia: Infection causing chest discomfort and breathing difficulty. Suggest ordering a chest X-ray and starting antibiotics.
- Pulmonary embolism (suspected): Requires immediate evaluation with imaging and blood tests. Suggest hospitalization.

Severe Causes:
- Heart attack: Acute and potentially life-threatening chest pain. Recommend immediate ECG and cardiac enzyme testing.
- Aortic dissection: Sudden, severe chest pain radiating to the back. Urgently refer for CT angiography and surgical consultation.
```

---

**Next Steps**:
- Use the results to evaluate:
  - Are causes correctly categorized and supported by clear explanations?
  - Are suggested next steps actionable and relevant to each severity level?
  - Does the model output align with the professional tone and context specified in the prompt?
- Refine further based on evaluation to ensure consistency and precision.

This step emphasizes the importance of iterative testing and continuous refinement, laying the groundwork for incorporating creativity and adaptability in Step 5.

## **Step 5: Explore Creativity and Adaptability**  

**Revised Prompt**:  
*"You are a medical AI assistant helping a clinician assess chest pain in adults. Categorize potential causes into mild, moderate, and severe based on urgency, focusing on cardiovascular, respiratory, and gastrointestinal causes. Include brief, clear explanations for each cause. Suggest actionable next steps for clinicians in addressing each severity level. Use an empathetic tone to convey urgency where appropriate."*  

---

**Why**:  
- **Adding Creativity**: By introducing an empathetic tone, the output becomes more human-centered, which is especially important in healthcare settings.  
- **Enhancing Actionability**: Including actionable next steps ensures the response goes beyond categorization to provide practical advice.  
- **Encouraging Flexibility**: The prompt now adapts to diverse clinical scenarios by combining categorizations, explanations, and guidance in a comprehensive manner.  

---

**What We’re Achieving**:  
1. **Empathy**: An empathetic tone ensures the response is relatable and aligned with the sensitivities of medical interactions.  
2. **Actionability**: By suggesting next steps, the model provides insights that are immediately useful for clinical decision-making.  
3. **Adaptability**: The refined prompt balances structure and creativity, enabling the model to handle a wide range of scenarios (e.g., varying urgency or patient presentations).  
4. **Advanced Utility**: The output is enriched with both contextual understanding and practical guidance, making it suitable for integration into real-world healthcare workflows.  

---

### **Encouraging Creativity in Outputs**
To explore the model's adaptability further:
1. **Vary Tone and Style**:  
   - Prompt: *"Use a reassuring and professional tone to explain the causes and suggest next steps."*
2. **Incorporate Analogies**:  
   - Prompt: *"Explain the causes using analogies where appropriate to make the information relatable for junior clinicians."*
3. **Ask for Alternatives**:  
   - Prompt: *"Provide alternative next steps for clinicians in cases of limited resources."*

---

**Evaluation Criteria for Students**:  
- Does the model incorporate an empathetic tone effectively?  
- Are the next steps practical and relevant for clinicians?  
- How well does the model adapt to tone, audience, or additional instructions?  
- Are creative elements (e.g., analogies, alternative approaches) appropriately used?

---

### Final Refined Example Prompt:
*"You are a medical AI assistant helping a clinician assess chest pain in adults. Categorize potential causes into mild, moderate, and severe based on urgency, focusing on cardiovascular, respiratory, and gastrointestinal causes. Include concise explanations for each cause and suggest actionable next steps for clinicians to take. Use an empathetic and professional tone to convey urgency and reassurance where appropriate."*

This refinement makes the prompt comprehensive, actionable, and adaptable while encouraging creativity and relevance in responses. 

In [36]:
# Step 5
compare_responses(
    prompt="You are a medical AI assistant helping a clinician assess chest pain in adults. Categorize potential causes into mild, moderate, and severe based on urgency, focusing on cardiovascular, respiratory, and gastrointestinal causes. Include concise explanations for each cause and suggest actionable next steps for clinicians to take. Use an empathetic and professional tone to convey urgency and reassurance where appropriate.",
    step_name="step5"
)

Saved comparison results to outputs/step5_comparison_responses.md

Prompt: You are a medical AI assistant helping a clinician assess chest pain in adults. Categorize potential causes into mild, moderate, and severe based on urgency, focusing on cardiovascular, respiratory, and gastrointestinal causes. Include concise explanations for each cause and suggest actionable next steps for clinicians to take. Use an empathetic and professional tone to convey urgency and reassurance where appropriate.


Claude-3.5-Sonnet:

Thank you for consulting me regarding the assessment of chest pain in adults. I'll categorize potential causes based on urgency and provide concise explanations and next steps for each. Please remember that this information is meant to assist clinical judgment, not replace it.


### Mild (Less Urgent):


### 1. Costochondritis:
Explanation: Inflammation of chest wall cartilage.
Next steps: Recommend NSAIDs and reassurance. Follow up if symptoms persist.


### 2. Gastroesophag

### **What We’re Achieving in Step 5: Explore Creativity and Adaptability**

1. **Human-Centered Responses**:
   - Incorporating an empathetic and professional tone ensures the output is more relatable, especially in sensitive scenarios like healthcare. 
   - The model’s ability to convey urgency and reassurance adds depth to the response, making it suitable for clinician-patient interactions.

2. **Enhanced Engagement**:
   - Creative additions, such as analogies or alternative approaches, make the response more engaging and accessible, especially for varying audiences like junior clinicians or trainees.

3. **Flexibility in Output**:
   - By encouraging adaptability, the model can adjust its tone, style, and level of detail based on different clinical or educational contexts. 
   - The refined prompt allows for exploration of outputs that balance structure with creative flair, enhancing versatility.

4. **Practical Recommendations**:
   - Adding actionable next steps tailored to each severity level provides clinicians with insights that can immediately inform their decisions. 

5. **Broadening Use Cases**:
   - The adaptable design makes the prompt suitable for diverse medical AI applications, such as training tools, clinical decision support systems, or patient education materials.

---

**Example Expected Output** (After Exploring Creativity and Adaptability):

```plaintext
Mild Causes:
- **Acid reflux**: Often due to stomach acid irritation, presenting as a burning sensation in the chest. Suggest antacids and dietary changes.  
  *Analogy*: "Think of acid reflux as a leaky faucet—when the acid escapes, it irritates the pipes (esophagus)."  

Moderate Causes:
- **Pneumonia**: An infection causing chest discomfort and difficulty breathing. Suggest ordering a chest X-ray and antibiotics.  
  *Alternative Next Step*: In resource-limited settings, monitor symptoms closely and treat empirically with antibiotics.

Severe Causes:
- **Heart attack**: Acute chest pain with a life-threatening potential. Recommend immediate ECG and cardiac enzyme tests.  
  *Empathy*: "This is an emergency that requires immediate action to save the patient’s life."
```

---

**What Makes This Step Unique**:
- **Emphasis on Creativity**: The use of analogies, tone shifts, and alternative approaches makes the output more engaging and memorable.
- **Adaptability for Audience**: Encourages tailoring responses for different stakeholders, such as patients, clinicians, or students.
- **Application Versatility**: The enhanced prompt works across different use cases, including clinical settings, patient education, and training tools.

---

**Next Steps**:
- Evaluate:
  - Does the model effectively adapt to changes in tone or style?
  - Are the analogies, next steps, and explanations aligned with the specified audience and context?
  - Does the empathetic tone enhance the user experience without compromising clarity or professionalism?
- Iterate further if necessary, based on the results of creative exploration.

This step integrates all the previous improvements while encouraging flexibility, creativity, and audience alignment, making the prompt truly advanced and versatile.



## **Final Version**  
*"You are a medical AI assistant helping a clinician assess chest pain in adults. Provide a detailed list of potential causes of chest pain, categorized into mild, moderate, and severe urgency levels. For each category, include causes related to cardiovascular, respiratory, and gastrointestinal systems. Provide brief explanations for each cause and suggest appropriate diagnostic tests or treatments. Ensure the response is concise yet comprehensive."*




In [37]:
# Final
compare_responses(
    prompt="You are a medical AI assistant helping a clinician assess chest pain in adults. Provide a detailed list of potential causes of chest pain, categorized into mild, moderate, and severe urgency levels. For each category, include causes related to cardiovascular, respiratory, and gastrointestinal systems. Provide brief explanations for each cause and suggest appropriate diagnostic tests or treatments. Ensure the response is concise yet comprehensive.",
    step_name="final"
)


Saved comparison results to outputs/final_comparison_responses.md

Prompt: You are a medical AI assistant helping a clinician assess chest pain in adults. Provide a detailed list of potential causes of chest pain, categorized into mild, moderate, and severe urgency levels. For each category, include causes related to cardiovascular, respiratory, and gastrointestinal systems. Provide brief explanations for each cause and suggest appropriate diagnostic tests or treatments. Ensure the response is concise yet comprehensive.


Claude-3.5-Sonnet:


### Here's a detailed list of potential causes of chest pain in adults, categorized by urgency level and body system:


### Mild Urgency:


### 1. Cardiovascular:
- Costochondritis: Inflammation of chest wall cartilage
Diagnostic: Physical exam, chest X-ray
Treatment: NSAIDs, rest

- Pericarditis (mild cases): Inflammation of the pericardium
Diagnostic: ECG, echocardiogram
Treatment: NSAIDs, colchicine


### 2. Respiratory:
- Mild bronchitis: Infl

Here’s an enhanced **Final Refined Prompt** section that includes an **Evaluation** section to assess the prompt’s performance and ensure it aligns with your goals.

---

### **Final Refined Prompt**
*"You are a medical AI assistant supporting a clinician in assessing chest pain in adults. Your task is to categorize potential causes into mild, moderate, and severe based on urgency, focusing on cardiovascular, respiratory, and gastrointestinal systems. For each cause, provide a concise explanation of why it fits the assigned category. Suggest one actionable next step for the clinician to address each severity level. Use an empathetic and professional tone to convey urgency and reassurance where appropriate."*

---

### **Example Expected Output**

```plaintext
Mild Causes:
- **Muscle strain**: Often due to overuse or minor injuries. Recommend rest and over-the-counter pain relief.
- **Acid reflux**: Stomach acid irritation often mistaken for chest discomfort. Suggest dietary adjustments and antacids.

Moderate Causes:
- **Pneumonia**: Infection causing chest pain and difficulty breathing. Suggest ordering a chest X-ray and initiating antibiotics.
- **Pulmonary embolism (suspected)**: Requires immediate evaluation with imaging and blood tests. Suggest hospitalization for further workup.

Severe Causes:
- **Heart attack**: Acute and potentially life-threatening chest pain. Recommend immediate ECG and cardiac enzyme testing.
- **Aortic dissection**: Sudden, severe pain often radiating to the back. Urgently refer for CT angiography and surgical consultation.
```

---

### **Evaluation**

Once you’ve tested the prompt with an LLM, evaluate its performance using the following criteria:

#### **1. Categorization Accuracy**
- Are the causes of chest pain appropriately assigned to mild, moderate, or severe categories based on urgency?  
- Does the model correctly interpret the focus on cardiovascular, respiratory, and gastrointestinal systems?

#### **2. Clarity and Relevance of Explanations**
- Are the explanations concise and relevant to each category?  
- Do they provide clear reasoning for why each cause fits the assigned severity level?

#### **3. Practicality of Suggested Next Steps**
- Are the next steps actionable and medically appropriate?  
- Do they align with real-world clinical workflows?

#### **4. Empathy and Professionalism**
- Does the model adopt an empathetic tone where appropriate, particularly for severe cases?  
- Is the response professional and suitable for a healthcare setting?

#### **5. Adaptability**
- Can the prompt handle slight variations in input, such as focusing on different demographics or medical systems?  
- Does the model maintain consistency across multiple iterations of testing?

---

### **Next Steps Based on Evaluation**
- **If Issues Are Identified**:
  - Refine the prompt further to address any gaps in clarity, categorization, or tone.
  - Add additional constraints or instructions to improve the specificity of the output.
  
- **If the Output Meets Expectations**:
  - Use the prompt in broader applications such as training tools, clinical decision support systems, or patient education.
  - Document this refined prompt as a reusable template for similar use cases.

---

This section provides a clear workflow for testing, assessing, and iterating on the final refined prompt, ensuring it is both effective and practical for its intended use.

## **What We Have Learned**

Through this hands-on activity, we have explored the principles and practices of effective prompt engineering, emphasizing iterative refinement and practical application. Here's what we gained from the process:

1. **The Importance of Structured Prompts**:
   - Well-structured prompts yield clear, actionable, and contextually relevant responses. By breaking down tasks into logical components, we guide the model to deliver organized outputs.

2. **Role of Context and Specificity**:
   - Providing detailed instructions and relevant background information helps the model focus on the desired outcome, reducing ambiguity and irrelevant outputs.

3. **Iterative Refinement for Quality**:
   - Testing and iterating prompts allows us to improve clarity, accuracy, and alignment with specific goals, ensuring outputs are consistent and actionable.

4. **Incorporating Creativity and Empathy**:
   - Adding creative elements, such as tone requirements, analogies, or next-step recommendations, enhances the adaptability and user-centered nature of the output.

5. **Evaluation and Feedback Loop**:
   - By evaluating prompts systematically, we ensure that they meet the intended objectives and identify areas for further refinement, creating a reusable and scalable workflow.

---

## **Activity Summary: Prompt Engineering for a Medical AI Assistant**

### **Step-by-Step Process**

1. **Define the Task**:
   - We started with a broad prompt ("What could be causing chest pain?") and iteratively refined it to focus on categorizing causes of chest pain by severity, using specific systems (cardiovascular, respiratory, gastrointestinal) as a guide.

2. **Add Context and Structure**:
   - Context was introduced to narrow the scope, and structure was added by organizing the causes into "mild," "moderate," and "severe" categories.

3. **Refine with Iterative Testing**:
   - We tested the prompt with an LLM, evaluated the output, and refined it further by clarifying expectations, such as including concise explanations and next steps for clinicians.

4. **Incorporate Creativity and Empathy**:
   - By tailoring tone (empathetic and professional), adding analogies, and suggesting actionable recommendations, we made the output both engaging and practical.

5. **Evaluate the Final Prompt**:
   - We created an evaluation framework to systematically assess the prompt's effectiveness across categorization accuracy, clarity, practicality, empathy, and adaptability.

---

## **What This Activity Demonstrates**

1. **Real-World Application**:  
   - This exercise demonstrates how prompt engineering can support real-world medical AI use cases, such as clinical decision-making or patient education.

2. **Adaptability Across Domains**:  
   - While focused on healthcare, the principles and techniques applied here can be adapted to various domains, including customer support, education, and content generation.

3. **Scalable Methodology**:  
   - The step-by-step process provides a repeatable framework for designing, testing, and refining prompts for any complex task.

---

### **Next Steps for Learners**

1. **Practice with Different Scenarios**:
   - Use the refined workflow to address other challenges, such as generating treatment plans, explaining complex diagnoses, or summarizing patient histories.

2. **Apply in New Domains**:
   - Experiment with prompts in non-medical contexts (e.g., law, education, finance) to test the versatility of the learned techniques.

3. **Refine Through Feedback**:
   - Share prompts with peers or test across different LLMs to identify areas for improvement and explore diverse outputs.

4. **Document and Reuse**:
   - Create a library of refined prompts for common tasks to build a scalable prompt engineering toolkit.

---

This summary and activity review encapsulate the learning outcomes and provides a foundation for applying prompt engineering techniques in various professional and academic contexts. 

## References

- Phoenix, J., and Taylor, M. 2024. *Prompt Engineering for Generative AI*. Sebastopol, CA: Saxifrage, LLC, Just Understanding Data LTD, & O'Reilly Media, Inc. Available at: [O'Reilly Media](https://www.oreilly.com/library/view/prompt-engineering-for/9781098153427/)

- Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., et al. 2020. Language Models are Few-Shot Learners. In *Advances in Neural Information Processing Systems*, 33, 1877–1901. Available at: [NeurIPS Proceedings](https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf)

- Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. 2023. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. *Trans. Assoc. Comput. Linguistics*. 11, 630–676. Available at: [ACL Anthology](https://aclanthology.org/2023.tacl-1.88.pdf)

- Reynolds, L., and McDonell, K. 2021. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. *arXiv preprint arXiv:2102.07350*. Available at: [arXiv](https://arxiv.org/abs/2102.07350)

- White, J., Narayanan, A., Chiu, S., Huang, Y., and Schmidt, D. 2023. A Comprehensive Survey on Prompt Engineering for Large Language Models. *arXiv preprint arXiv:2302.07842*. Available at: [arXiv](https://arxiv.org/abs/2302.07842)



