<a href="https://colab.research.google.com/github/appliedcode/mthree-c422/blob/mthree-c422-dipti/Exercises/day-12/Prompt_Production/Prompt_Evaluation_Practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## **Self‑Practice Problem Set – Prompt Integration \& Performance Evaluation**

***

### **Section 1 – Integrating Prompts in Applications and Pipelines**

**Problem 1 – Multi‑Stage Prompt Pipeline**

- Design a two‑step pipeline where:

1. The first prompt summarizes an article.
2. The second prompt takes that summary and generates 3 quiz questions.
- Implement a function that automatically feeds the output of step 1 as the input to step 2.
- Test with at least two different articles to ensure reusability.

***

**Problem 2 – Role‑Switching in Prompt Pipelines**

- Create a small application where the AI can operate in two distinct roles:

1. As a **technical explainer** (breaking a topic into bullet points).
2. As a **motivational mentor** (giving encouragement on the same topic).
- Implement a role selector in code so the same base question can be processed in either role.

***

**Problem 3 – Prompt Version Control Simulation**

- Store three versions of a prompt for a travel chatbot:
    - Version 1: Minimal instructions
    - Version 2: More detailed instructions
    - Version 3: Includes examples (few‑shot)
- Write code to allow “switching” versions dynamically via a variable or function parameter.
- Compare outputs for the same test queries.

***

### **Section 2 – Prompt Performance Measurement and Evaluation**

**Problem 4 – Keyword‑Based Scoring**

- Choose a test prompt (e.g., “Explain how photosynthesis works”).
- Define a set of 5 keywords you expect in the response.
- Write a function to:
    - Run the prompt.
    - Score the output based on how many expected keywords appear.
- Display a “keyword match score” for each run.

***

**Problem 5 – Response Length Evaluation**

- For a given prompt, run it at least 3 times.
- Record the **word count** for each output.
- Calculate:
    - Average length.
    - Minimum and maximum length.
- Reflect on whether differences in length affect clarity.

***

**Problem 6 – Consistency Measurement**

- Run the same prompt 5 times with the same parameters.
- Compare each output’s similarity using a text‑matching approach (e.g., `difflib.SequenceMatcher`).
- Calculate an average **consistency score**.
- Decide if the current prompt is “stable” or needs refinement.

***

**Problem 7 – Simulated User Feedback Loop**

- Create a simple rating system where after each AI response, a “user” assigns a rating from 1–5 (manually or randomly).
- Store prompt, response excerpt, and rating in a log.
- At the end, output the **average rating per prompt version** to identify better‑performing prompts.

***

**Problem 8 – Latency Tracking for Prompt Performance**

- Time how long the API takes to return a response for each prompt call.
- Record:
    - Prompt text
    - Response time
    - Word count of output
- Plot or print the trade‑off between latency and length, noting any patterns.

***

### **📌 Instructions for Students**

For each problem:

1. Implement the code in Google Colab.
2. Try with at least **two different prompts or datasets**.
3. For performance evaluation tasks, **capture metrics** (keywords, similarity, latency) and store results in a dictionary or DataFrame.
4. At the end, write a short reflection on:
    - What prompt style worked best?
    - Which metrics were most useful for evaluation?
    - What trade‑offs you noticed between detail, speed, and consistency.

***


In [1]:
!pip install --quiet openai

from google.colab import userdata
import os, time
from openai import OpenAI

api_key = userdata.get("OPENAI_API_KEY")
if not api_key:
    raise ValueError("❌ API key not found. Please set it in Colab Secrets (userdata).")

os.environ["OPENAI_API_KEY"] = api_key
client = OpenAI()
print("✅ OpenAI client ready")

✅ OpenAI client ready


In [2]:
def get_ai_response(prompt, model="gpt-4o-mini", temperature=0.7):
    try:
        start_time = time.time()
        resp = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature
        )
        latency = time.time() - start_time
        return resp.choices[0].message.content.strip(), latency
    except Exception as e:
        return f"Error: {e}", None

In [3]:
# Step 1: Summarize article
article = """
Artificial intelligence is being used to improve renewable energy efficiency.
Smart algorithms help predict electricity demand, optimize grid usage,
and integrate solar and wind power into existing infrastructure.
"""

summary_prompt = f"Summarize this article in 2 concise sentences:\n\n{article}"
summary, _ = get_ai_response(summary_prompt)
print("Summary:", summary, "\n")

# Step 2: Generate quiz questions from summary
quiz_prompt = f"From this summary, generate 3 quiz questions:\n\n{summary}"
quiz, _ = get_ai_response(quiz_prompt)
print("Quiz Questions:\n", quiz)

Summary: Artificial intelligence is enhancing renewable energy efficiency by utilizing smart algorithms to forecast electricity demand and optimize grid operations. These advancements facilitate the integration of solar and wind power into existing energy infrastructure. 

Quiz Questions:
 1. How is artificial intelligence improving renewable energy efficiency according to the summary?
   
2. What role do smart algorithms play in the integration of renewable energy sources like solar and wind power into existing energy infrastructure?

3. In what ways do advancements in AI contribute to the optimization of grid operations?


In [4]:
base_question = "Explain the importance of cybersecurity for small businesses."

role_explainer = f"You are a technical explainer. Break the following topic into bullet points:\n{base_question}"
role_coach = f"You are a motivational mentor. Encourage the reader to learn about:\n{base_question}"

print("=== Technical Explainer ===")
print(get_ai_response(role_explainer)[0], "\n")

print("=== Motivational Mentor ===")
print(get_ai_response(role_coach)[0])

=== Technical Explainer ===
### Importance of Cybersecurity for Small Businesses

- **Protection of Sensitive Data**
  - Safeguards customer information, financial records, and proprietary business data.
  - Reduces the risk of data breaches that can compromise personal and financial information.

- **Maintaining Customer Trust**
  - Builds and maintains customer confidence in the business.
  - Customers are more likely to engage with businesses that demonstrate robust cybersecurity practices.

- **Compliance with Regulations**
  - Helps meet legal and regulatory requirements for data protection (e.g., GDPR, HIPAA).
  - Avoids penalties and fines associated with non-compliance.

- **Prevention of Financial Loss**
  - Minimizes the risk of financial losses due to cyberattacks, fraud, or operational disruptions.
  - Protects against costs related to recovery, legal fees, and reputational damage.

- **Business Continuity**
  - Ensures that business operations can continue smoothly in the 

In [5]:
prompt_v1 = "Give travel tips for visiting Japan."
prompt_v2 = "You are a travel assistant. Suggest 3 budget-friendly travel tips for Japan."
prompt_v3 = """You are a helpful travel assistant.
Example:
1. Visit free attractions like parks
2. Eat at local markets
3. Use public transport
Now give 3 budget-friendly Japan travel tips in similar style."""

versions = [prompt_v1, prompt_v2, prompt_v3]

for i, pv in enumerate(versions, 1):
    print(f"--- Version {i} ---")
    print(get_ai_response(pv)[0], "\n")

--- Version 1 ---
Visiting Japan can be a wonderful experience filled with rich culture, delicious food, and breathtaking sights. Here are some travel tips to help you make the most of your trip:

### 1. **Learn Basic Japanese Phrases**
   - While many Japanese people speak some English, learning basic phrases like "arigato" (thank you), "sumimasen" (excuse me/sorry), and "konnichiwa" (hello) can go a long way in showing respect for the culture.

### 2. **Get a Japan Rail Pass**
   - If you plan to travel between cities, consider purchasing a Japan Rail Pass. It allows unlimited travel on most JR trains, including the Shinkansen (bullet train), for a set number of days.

### 3. **Use Public Transportation**
   - Japan has an extensive and efficient public transportation system. Familiarize yourself with local trains, subways, and buses. Apps like Hyperdia can help you navigate routes and schedules.

### 4. **Cash is King**
   - While credit cards are becoming more accepted, many places

In [6]:
from collections import Counter

prompt = "Explain how photosynthesis works."
expected_keywords = ["photosynthesis", "sunlight", "chlorophyll", "glucose", "oxygen"]

response, _ = get_ai_response(prompt)
print("Response:\n", response, "\n")

score = sum(1 for kw in expected_keywords if kw.lower() in response.lower())
print(f"Keyword Match Score: {score}/{len(expected_keywords)}")

Response:
 Photosynthesis is a biochemical process through which green plants, algae, and some bacteria convert light energy into chemical energy stored in glucose. This process plays a crucial role in the Earth's ecosystem, as it produces oxygen and serves as the foundation for the food chain.

### Key Components of Photosynthesis:
1. **Light Energy**: Primarily from the sun.
2. **Chlorophyll**: A green pigment found in chloroplasts that captures light energy.
3. **Water (H₂O)**: Absorbed by plant roots from the soil.
4. **Carbon Dioxide (CO₂)**: Taken from the atmosphere through tiny openings in leaves called stomata.

### The Process of Photosynthesis:
Photosynthesis occurs mainly in two stages: the light-dependent reactions and the light-independent reactions (Calvin cycle).

#### 1. Light-Dependent Reactions:
- **Location**: Thylakoid membranes of the chloroplasts.
- **Process**:
  - When chlorophyll absorbs sunlight, it energizes electrons, which are then transferred through a se

In [7]:
runs = [get_ai_response("What are the benefits of exercise?")[0] for _ in range(3)]
lengths = [len(r.split()) for r in runs]

for i, r in enumerate(runs, 1):
    print(f"Run {i} length: {lengths[i-1]} words")

print("Average length:", sum(lengths) / len(lengths))
print("Min length:", min(lengths))
print("Max length:", max(lengths))

Run 1 length: 377 words
Run 2 length: 378 words
Run 3 length: 370 words
Average length: 375.0
Min length: 370
Max length: 378


In [8]:
import difflib

prompt = "Define cloud computing."
responses = [get_ai_response(prompt)[0] for _ in range(5)]

scores = []
for i in range(len(responses) - 1):
    sim = difflib.SequenceMatcher(None, responses[i], responses[i+1]).ratio()
    scores.append(sim)
    print(f"Similarity Run {i+1} vs Run {i+2}: {sim:.2f}")

print("Average similarity:", sum(scores) / len(scores))

Similarity Run 1 vs Run 2: 0.89
Similarity Run 2 vs Run 3: 1.00
Similarity Run 3 vs Run 4: 0.02
Similarity Run 4 vs Run 5: 0.01
Average similarity: 0.48106511788670714


In [9]:
import random

prompt_versions = [
    "Explain blockchain in simple terms.",
    "Explain blockchain in simple terms with a real-world analogy."
]

log = []

for pv in prompt_versions:
    response, _ = get_ai_response(pv)
    rating = random.randint(1, 5)  # Simulated feedback
    log.append({"prompt": pv, "excerpt": response[:60], "rating": rating})

for entry in log:
    print(entry)

avg_ratings = {}
for pv in prompt_versions:
    ratings = [e["rating"] for e in log if e["prompt"] == pv]
    avg_ratings[pv] = sum(ratings) / len(ratings)

print("Average ratings:", avg_ratings)

{'prompt': 'Explain blockchain in simple terms.', 'excerpt': "Error: Error code: 429 - {'error': {'message': 'Rate limit r", 'rating': 1}
{'prompt': 'Explain blockchain in simple terms with a real-world analogy.', 'excerpt': 'Sure! Think of blockchain like a public library that everyon', 'rating': 5}
Average ratings: {'Explain blockchain in simple terms.': 1.0, 'Explain blockchain in simple terms with a real-world analogy.': 5.0}


In [12]:
test_prompts = [
    "Summarize the plot of Romeo and Juliet in 2 sentences.",
    "Write a 200-word essay on climate change impacts."
]

for p in test_prompts:
    resp, latency = get_ai_response(p)
    print(f"Prompt: {p}")
    if latency is not None:
        print(f"Latency: {latency:.2f} sec | Word count: {len(resp.split())}")
    else:
        print(f"Error: {resp}")
    print()

Prompt: Summarize the plot of Romeo and Juliet in 2 sentences.
Error: Error: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4o-mini in organization org-PXo9R2TUiiTEAM2j0VKjrIlU on requests per day (RPD): Limit 200, Used 200, Requested 1. Please try again in 7m12s. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/account/billing.', 'type': 'requests', 'param': None, 'code': 'rate_limit_exceeded'}}

Prompt: Write a 200-word essay on climate change impacts.
Error: Error: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4o-mini in organization org-PXo9R2TUiiTEAM2j0VKjrIlU on requests per min (RPM): Limit 3, Used 3, Requested 1. Please try again in 20s. Visit https://platform.openai.com/account/rate-limits to learn more. You can increase your rate limit by adding a payment method to your account at https://platform.openai.com/