<a href="https://colab.research.google.com/github/appliedcode/mthree-c422/blob/mthree-422-salleh/Exercises/day-12/PE-Asignments/PE-Assignments-Team-1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Team Project: End-to-End Prompt Engineering with Healthcare \& Medical Research Dataset

## Project Overview

Your team will build a modular AI-powered application pipeline in Google Colab using the OpenAI API. The focus is on **healthcare and medical research** domain, covering all major prompt engineering aspects from basics to advanced, including safety, ethics, privacy, integration, and evaluation.

You must adapt prompt designs, pipeline integration, and monitoring to this new dataset while ensuring responsible AI use.

***

## Dataset (embed in your notebook)

```
Healthcare & Medical Research Dataset:

1. Cancer Research: Focuses on understanding cancer biology, developing treatments, and improving early diagnosis. Advances include immunotherapy and targeted therapies. Challenges involve drug resistance and side-effects.

2. Cardiovascular Disease: Study of heart-related illnesses including prevention, diagnosis, and treatment. Key factors include hypertension, cholesterol, and lifestyle. Challenges are late detection and chronic management.

3. Infectious Diseases: Research on pathogens like bacteria, viruses, and fungi. Development of vaccines, antibiotics, and antiviral drugs. Challenges include drug resistance and emerging diseases.

4. Mental Health: Covers psychological disorders, therapy methods, and social impact. Emphasis on early intervention and reducing stigma. Challenges are access to care and individualized treatment.

5. Public Health: Focus on population health, epidemiology, and health policy. Aims to improve health outcomes through education and preventive measures. Challenges include health disparities and resource allocation.
```


***

## Core Tasks for Teams

1. **Fundamentals**
    - Create direct and zero-shot/few-shot prompts explaining healthcare concepts.
    - Use clear instructions and test on dataset entries.
2. **Intermediate Techniques**
    - Chain-of-thought prompts breaking down causes and treatments per disease type.
    - Persona-based prompts (e.g., doctor, health educator) explaining complex info simply.
3. **Advanced**
    - Multi-turn conversation simulating a patient-doctor Q\&A, maintaining context.
    - Resolve ambiguity in vague medical prompts by refining with dataset facts.
    - Domain-specific prompts for summarizing research, recommending wellness habits, extracting JSON structured data on diseases or treatments.
4. **Safety \& Ethics**
    - Write prompts avoiding medical misinformation and biased language.
    - Add instructions preventing unsafe advice.
    - Include disclaimers and privacy respect.
5. **Integration \& Version Control**
    - Implement prompt versioning with at least three phrasings for explaining public health.
    - Construct multi-stage pipelines (e.g., disease summary → suggested lifestyle changes).
6. **Monitoring \& Evaluation**
    - Log prompts, latencies, and response excerpts.
    - Perform keyword relevance checks (e.g., “treatment”, “risk”, “prevention”).
    - Calculate consistency scores over repeated prompt runs.
7. **Security \& Privacy**
    - Simulate refusal of private patient info requests.
    - Demonstrate safe prompt reframing and role restriction.

***

## Deliverables

- Google Colab notebook with:
    - Adapted prompts and codes for each task using the healthcare dataset.
    - Monitoring, logging, and evaluation modules.
    - Safety and privacy safeguards.
- Team report documenting your approach, observations, challenges, and ethical considerations.

***

## Evaluation Criteria

- Coverage and quality of all prompt engineering aspects.
- Domain-specific creativity and accuracy.
- Ethical and safety measures applied.
- Robustness and monitoring effectiveness.
- Clarity and completeness of documentation.

***

💡 **Hint:** Extend with real healthcare challenges or recent medical advances but keep prompts factual, respectful, and unbiased.

***

In [None]:
from google.colab import userdata
import os

# Set your OpenAI API key securely in Colab Secrets (once)
# userdata.set("OPENAI_API_KEY", "your-api-key-here")

# Retrieve key in your notebook
openai_api_key = userdata.get("OPENAI_API_KEY")
if openai_api_key:
    os.environ["OPENAI_API_KEY"] = openai_api_key
    print("✅ OpenAI API key loaded safely")
else:
    print("❌ OpenAI API key not found. Please set it using Colab Secrets.")

In [None]:
!pip install --quiet openai -q
# Create client
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

In [None]:
# Helper Function to Send Prompts
def generate_response(prompt, model="gpt-4o-mini", temperature=0.7):
    """
    Send a prompt to the OpenAI model and return the response text.
    """
    try:
        completion = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature
        )
        return completion.choices[0].message.content.strip()
    except Exception as e:
        return f"Error: {e}"


In [None]:
dataset = (
    '1. Cancer Research: Focuses on understanding cancer biology, developing treatments, and improving early diagnosis. Advances include immunotherapy and targeted therapies. Challenges involve drug resistance and side-effects. ',
    '2. Cardiovascular Disease: Study of heart-related illnesses including prevention, diagnosis, and treatment. Key factors include hypertension, cholesterol, and lifestyle. Challenges are late detection and chronic management. ',
    '3. Infectious Diseases: Research on pathogens like bacteria, viruses, and fungi. Development of vaccines, antibiotics, and antiviral drugs. Challenges include drug resistance and emerging diseases. ',
    '4. Mental Health: Covers psychological disorders, therapy methods, and social impact. Emphasis on early intervention and reducing stigma. Challenges are access to care and individualized treatment. ',
    '5. Public Health: Focus on population health, epidemiology, and health policy. Aims to improve health outcomes through education and preventive measures. Challenges include health disparities and resource allocation.'
)

In [None]:
# Fundamentals
direct_prompt = "Explain how immunotherapy works in cancer treatment."

zero_shot_prompt = "Explain what pathogens are to a class of medical students."

# Not mine
few_shot="""You are a medical educator. Below are examples of how to explain topics simply.
    Example 1:\nQ: What is hypertension?\nA: Hypertension means high blood pressure — when the force of blood pushing on artery walls is too high.
    It raises risk for heart disease; lifestyle changes and medication can help.\n\
    Now explain the following:\nQ: {topic}\nDATA: {data}\nA:"""
print("Few-shot concept:")
print(generate_response(few_shot.format(topic="Public Health", data=HEALTHCARE_DATASET)))

multi_shot_prompt = """
Topic: Mental Health
Explanation: Covers psychological disorders, therapy methods, and social impact. Emphasis on early intervention and reducing stigma.
Challenges: Access to care and individualized treatment.

Topic: Public Health
Explanation: Focuses on population health, epidemiology, and health policy. Aims to improve health outcomes through education and preventive measures.
Challenges: Health disparities and resource allocation.

Topic: Infectious Diseases
Explanation:
Challenges:
"""

multi_prompt = "explain the following please"

print("response\n")
print (generate_response(multi_prompt + multi_shot_prompt))

In [None]:
# Intermediate
chain_of_thought_prompt = """
Explain the role of hypertension in cardiovascular disease by using only the information in the dataset below.

These are the steps to follow:
1. Identify relevant parts of the dataset.
2. Explain your reasoning step-by-step using only dataset content.
3. Provide the final answer.

If the dataset does not contain enough information, respond with 'Insufficient information in dataset.'

Dataset:
    1. Cancer Research: Focuses on understanding cancer biology, developing treatments, and improving early diagnosis. Advances include immunotherapy and targeted therapies. Challenges involve drug resistance and side-effects.
    2. Cardiovascular Disease: Study of heart-related illnesses including prevention, diagnosis, and treatment. Key factors include hypertension, cholesterol, and lifestyle. Challenges are late detection and chronic management.
    3. Infectious Diseases: Research on pathogens like bacteria, viruses, and fungi. Development of vaccines, antibiotics, and antiviral drugs. Challenges include drug resistance and emerging diseases.
    4. Mental Health: Covers psychological disorders, therapy methods, and social impact. Emphasis on early intervention and reducing stigma. Challenges are access to care and individualized treatment.
    5. Public Health: Focus on population health, epidemiology, and health policy. Aims to improve health outcomes through education and preventive measures. Challenges include health disparities and resource allocation.
)
"""

# Not mine
chain_of_thought_prompt = """
Topic: Infectious Diseases

1. Think step-by-step.
2. Explain the concept of Infectious Diseases clearly and concisely.
3. Define technical terms like "pathogens", "epidemiology", and "antimicrobial resistance".
4. Provide practical takeaways, such as the importance of vaccination and handwashing.
5. Structure the explanation logically, starting with the basic definition and moving to prevention and challenges.

Explanation:
"""

print("Chain of Thought concept:")
print(generate_response(chain_of_thought_prompt.format(topic="Public Health", data=HEALTHCARE_DATASET)))


# Not mine
persona_prompt = """
You are a friendly family doctor speaking to a curious patient.
Your job is to explain the topic in everyday language,
avoiding medical jargon unless you define it clearly.
Use a warm and reassuring tone, and end with one simple tip
the patient can use in their daily life.


Doctor's Explanation:
"""

print("Persona-based concept:")
print(generate_response(persona_prompt.format(topic="Cardiovascular Disease", data=HEALTHCARE_DATASET)))

In [None]:
dataset = (
    '1. Cancer Research: Focuses on understanding cancer biology, developing treatments, and improving early diagnosis. Advances include immunotherapy and targeted therapies. Challenges involve drug resistance and side-effects. ',
    '2. Cardiovascular Disease: Study of heart-related illnesses including prevention, diagnosis, and treatment. Key factors include hypertension, cholesterol, and lifestyle. Challenges are late detection and chronic management. ',
    '3. Infectious Diseases: Research on pathogens like bacteria, viruses, and fungi. Development of vaccines, antibiotics, and antiviral drugs. Challenges include drug resistance and emerging diseases. ',
    '4. Mental Health: Covers psychological disorders, therapy methods, and social impact. Emphasis on early intervention and reducing stigma. Challenges are access to care and individualized treatment. ',
    '5. Public Health: Focus on population health, epidemiology, and health policy. Aims to improve health outcomes through education and preventive measures. Challenges include health disparities and resource allocation.'
)

# --- Common instruction header (persona + constraints) ---
header = (
    "Role: You are an empathetic AI doctor in a simulated patient–doctor Q&A.\n"
    "Constraints: Use ONLY the dataset provided below. If the dataset lacks information to answer, reply exactly with "
    "'Insufficient information in dataset.' Do NOT diagnose or give personalized medical advice. Provide general "
    "information only. Keep replies concise (120–180 words).\n\n"
    f"Dataset:\n{dataset}\n\n"
)

# -------------------------
# Turn 1: Basic explanation
# -------------------------
turn1 = header + \
"""Patient: I've had occasional chest tightness after climbing stairs. Should I be worried?
Task: Provide a general explanation relevant to cardiovascular disease based on the dataset. End with one clarifying question to the patient."""
res1 = generate_response(turn1)
print("=== Turn 1 ===\n", res1)

# ---------------------------------
# Turn 2: Build on previous answer
# ---------------------------------
turn2 = header + f"""Based on your previous explanation:
{res1}

Patient: It eases after a minute of rest.
Task: Build on your prior explanation using ONLY the dataset. Discuss the relevance of hypertension and lifestyle as key factors. Keep within general info; no diagnosis. End with one clarifying question."""
res2 = generate_response(turn2)
print("\n=== Turn 2 ===\n", res2)

# -------------------------------------------------
# Turn 3: Use all context for practical next steps
# -------------------------------------------------
turn3 = header + f"""
Using the previous answers:
1) {res1}
2) {res2}

Task: Provide general, non-personalized guidance drawn from the dataset about ongoing management challenges in cardiovascular disease and high-level preventive approaches (e.g., lifestyle). If appropriate, note when seeking professional care is reasonable without diagnosing. End with one clarifying question."""
res3 = generate_response(turn3)
print("\n=== Turn 3 ===\n", res3)