## Welcome to the Second Lab - Week 1, Day 3 (Modified)

Today we will work with multiple OpenAI models! This is a way to get comfortable with APIs and compare different model capabilities.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [126]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [125]:
# Always remember to do this!
load_dotenv(override=True)

True

In [29]:
# Print the key prefix to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set - please add it to your .env file")

OpenAI API Key exists and begins sk-proj-


In [30]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [31]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [32]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


You are an independent AI consultant hired by a mid-sized coastal city that wants to reduce traffic fatalities by 50% within five years while respecting individual privacy, promoting equity across neighborhoods, and maintaining transparency and public trust—outline a detailed, practical five-year roadmap that includes: the combination of technical interventions (software, sensors, infrastructure changes, policy levers and behavioral programs) you would deploy and why, a data-collection plan that achieves required accuracy under strict privacy constraints (including what data you would and would not collect and how you would minimize re-identification risk), concrete metrics and experimental designs (including randomized or phased rollouts, control groups, sample sizes, and statistical tests) you would use to measure progress and verify causality, governance and accountability structures (including stakeholder roles, oversight, auditability and mechanisms for redress), anticipated failu

In [33]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

## Note - Comparing OpenAI Models

This notebook has been modified to compare different OpenAI models instead of multiple providers. We'll test:
- gpt-5-nano (latest mini model)
- gpt-4o-mini (fast and efficient)
- gpt-4o (most capable)
- gpt-4-turbo (previous generation)
- o1-mini (reasoning model)
- gpt-3.5-turbo (classic model)

Note that some models may take 1-2 minutes to respond, especially the newer ones that think deeply before answering.

In [None]:
# The API we know well
# I've updated this with the latest model, but it can take some time because it likes to think!
# Replace the model with gpt-4.1-mini if you'd prefer not to wait 1-2 mins

model_name = "gpt-5-nano"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
# GPT-4o-mini - Fast and efficient

model_name = "gpt-4o-mini"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
# GPT-4o - Most capable OpenAI model

model_name = "gpt-4o"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
# GPT-4-turbo - Previous generation flagship

model_name = "gpt-4-turbo"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
# GPT-4 - Classic flagship model

model_name = "gpt-4"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

## Adding one more OpenAI model for comparison

Let's add gpt-3.5-turbo to see how the older generation performs.

In [None]:
# GPT-3.5-turbo - Classic, fast model

model_name = "gpt-3.5-turbo"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [40]:
# So where are we?

print(competitors)
print(answers)


['gpt-5-nano', 'gpt-4o-mini', 'gpt-4o', 'gpt-4-turbo', 'gpt-3.5-turbo']
['Below is a practical, five-year roadmap you can adapt for a mid-sized coastal city aiming to cut traffic fatalities by 50% within five years while prioritizing privacy, equity, transparency, and public trust. It blends technology, policy, infrastructure, and community engagement with rigorous evaluation and governance. Where numbers are needed, I provide ranges and the logic to tailor them to your city’s specifics.\n\n1) Guiding principles and outcomes\n- Primary goal: reduce fatalities from motor-vehicle crashes by 50% within five years, with strong emphasis on equity across neighborhoods and mitigation of privacy concerns.\n- Core approach: a data-driven, privacy-preserving, phased implementation that pairs engineering controls (infrastructure, sensors, ITS), policy levers (speed, enforcement, street design), and behaviorally informed programs (education, engagement). Success hinges on measurable safety outcome

In [41]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gpt-5-nano

Below is a practical, five-year roadmap you can adapt for a mid-sized coastal city aiming to cut traffic fatalities by 50% within five years while prioritizing privacy, equity, transparency, and public trust. It blends technology, policy, infrastructure, and community engagement with rigorous evaluation and governance. Where numbers are needed, I provide ranges and the logic to tailor them to your city’s specifics.

1) Guiding principles and outcomes
- Primary goal: reduce fatalities from motor-vehicle crashes by 50% within five years, with strong emphasis on equity across neighborhoods and mitigation of privacy concerns.
- Core approach: a data-driven, privacy-preserving, phased implementation that pairs engineering controls (infrastructure, sensors, ITS), policy levers (speed, enforcement, street design), and behaviorally informed programs (education, engagement). Success hinges on measurable safety outcomes, rigorous evaluation, and community trust.
- Cross-c

In [42]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [43]:
print(together)

# Response from competitor 1

Below is a practical, five-year roadmap you can adapt for a mid-sized coastal city aiming to cut traffic fatalities by 50% within five years while prioritizing privacy, equity, transparency, and public trust. It blends technology, policy, infrastructure, and community engagement with rigorous evaluation and governance. Where numbers are needed, I provide ranges and the logic to tailor them to your city’s specifics.

1) Guiding principles and outcomes
- Primary goal: reduce fatalities from motor-vehicle crashes by 50% within five years, with strong emphasis on equity across neighborhoods and mitigation of privacy concerns.
- Core approach: a data-driven, privacy-preserving, phased implementation that pairs engineering controls (infrastructure, sensors, ITS), policy levers (speed, enforcement, street design), and behaviorally informed programs (education, engagement). Success hinges on measurable safety outcomes, rigorous evaluation, and community trust.
- C

In [44]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [45]:
print(judge)

You are judging a competition between 5 competitors.
Each model has been given this question:

You are an independent AI consultant hired by a mid-sized coastal city that wants to reduce traffic fatalities by 50% within five years while respecting individual privacy, promoting equity across neighborhoods, and maintaining transparency and public trust—outline a detailed, practical five-year roadmap that includes: the combination of technical interventions (software, sensors, infrastructure changes, policy levers and behavioral programs) you would deploy and why, a data-collection plan that achieves required accuracy under strict privacy constraints (including what data you would and would not collect and how you would minimize re-identification risk), concrete metrics and experimental designs (including randomized or phased rollouts, control groups, sample sizes, and statistical tests) you would use to measure progress and verify causality, governance and accountability structures (incl

In [104]:
judge_messages = [{"role": "user", "content": judge}]

In [105]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results": ["1", "2", "3", "4", "5"]}


In [55]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: gpt-5-nano
Rank 2: gpt-4o
Rank 3: gpt-4o-mini
Rank 4: gpt-3.5-turbo
Rank 5: gpt-4-turbo


In [90]:
# Get the winning answer
best_competitor_index = int(ranks[0]) - 1
best_competitor = competitors[best_competitor_index]
best_answer = answers[best_competitor_index]

print(f" Winner: {best_competitor}")
print(f"\nOriginal winning answer:\n{best_answer[:200]}...")  # Show first 200 chars

 Winner: gpt-5-nano

Original winning answer:
Below is a practical, five-year roadmap you can adapt for a mid-sized coastal city aiming to cut traffic fatalities by 50% within five years while prioritizing privacy, equity, transparency, and publi...


In [91]:
# Ask GPT-4o to reflect on and improve the winning answer
reflection_prompt = f"""You are an expert reviewer. Here is an answer to the question: "{question}"

ANSWER:
{best_answer}

Please:
1. Identify 2-3 ways this answer could be improved
2. Provide an enhanced version that addresses these improvements

Be specific and constructive."""

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": reflection_prompt}],
    temperature=0.7
)

improved_answer = response.choices[0].message.content
print("Reflection complete!")
print("=" * 80)
print(improved_answer)

Reflection complete!
The initial answer provides a comprehensive and detailed roadmap for reducing traffic fatalities while respecting privacy and promoting equity. However, there are several areas where improvements can be made:

1. **Clarity and Conciseness**: The answer is lengthy and may overwhelm readers. A more concise version focusing on key points would improve readability and accessibility while maintaining detail where necessary.

2. **Community Engagement**: While community involvement is addressed, more specific strategies for ongoing engagement and feedback incorporation could be beneficial. These could include mechanisms for real-time feedback and iterative design processes.

3. **Cost and Resource Estimation**: The cost and personnel estimates could be more specific and aligned with realistic constraints. Providing a detailed breakdown of costs associated with each phase and potential funding sources would enhance the plan.

Here is an enhanced version addressing these i

In [116]:
# Fresh start - define the scenario
weekend_request = """I have a free weekend in Manchester, UK with my partner. We enjoy:
- Outdoor activities (hiking, walking)
- Good coffee and food
- Art and culture
- Sport
- Budget: £300 total

Suggest 3 different weekend plans with specific places and estimated costs."""

print("🎯 Weekend Planning Request:")
print(weekend_request)

🎯 Weekend Planning Request:
I have a free weekend in Manchester, UK with my partner. We enjoy:
- Outdoor activities (hiking, walking)
- Good coffee and food
- Art and culture
- Sport
- Budget: £300 total

Suggest 3 different weekend plans with specific places and estimated costs.


In [117]:
# Get 3 different weekend plans from different models
plan_models = ["gpt-4o", "gpt-4o-mini", "gpt-3.5-turbo"]
weekend_plans = []

client = OpenAI()  # Fresh client instance

for plan_model in plan_models:
    plan_response = client.chat.completions.create(
        model=plan_model,
        messages=[{"role": "user", "content": weekend_request}],
        temperature=0.8
    )
    plan = plan_response.choices[0].message.content
    weekend_plans.append(plan)
    print(f"✅ Plan from {plan_model}")
    print("-" * 80)

✅ Plan from gpt-4o
--------------------------------------------------------------------------------
✅ Plan from gpt-4o-mini
--------------------------------------------------------------------------------
✅ Plan from gpt-3.5-turbo
--------------------------------------------------------------------------------


In [118]:
# Judge the best weekend plan
all_plans = ""
for idx, plan in enumerate(weekend_plans):
    all_plans += f"=== PLAN {idx+1} ===\n\n{plan}\n\n"

judge_weekend = f"""You are a Manchester local and weekend planning expert. 
Review these 3 weekend plans and choose the BEST one based on:
- Realistic budget adherence
- Mix of activities
- Local knowledge
- Practicality (timing, locations)

{all_plans}

Respond with ONLY a JSON:
{{"winner": <1, 2, or 3>, "why": "one sentence explanation"}}"""

judge_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": judge_weekend}],
    temperature=0.2,
    response_format={"type": "json_object"} 
)

verdict = json.loads(judge_response.choices[0].message.content)
winner_index = verdict["winner"] - 1
winning_plan = weekend_plans[winner_index]
winning_model = plan_models[winner_index]

print(f"🏆 Best Plan: {winning_model}")
print(f"📝 Reason: {verdict['why']}")

🏆 Best Plan: gpt-4o
📝 Reason: Plan 1 offers a balanced mix of cultural, outdoor, and entertainment activities within a realistic budget and practical itinerary.


In [120]:
# Reflection: Improve the winning plan
refine_request = f"""You're a Manchester expert. Take this weekend plan and add:
1. One insider tip (parking, best time to visit, etc.)
2. A backup activity in case of bad weather
3. One local food spot that's not touristy

Original Plan:
{winning_plan}

Provide the refined plan with these enhancements smoothly integrated."""

refine_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": refine_request}],
    temperature=0.7
)

final_plan = refine_response.choices[0].message.content

print("✨ FINAL WEEKEND PLAN")
print("=" * 80)
print(final_plan)

✨ FINAL WEEKEND PLAN
Here are your refined weekend plans with the requested enhancements:

### Weekend Plan 1: Urban Exploration and Culture

**Day 1:**
- **Breakfast:** Start your day with breakfast at **Evelyn's Cafe Bar** in the Northern Quarter. Estimated cost: £30 for two.
- **Insider Tip:** Arrive early at Manchester Art Gallery to avoid crowds, especially on weekends.
- **Morning Activity:** Visit the **Manchester Art Gallery** (free entry, donations appreciated) to explore its impressive collections.
- **Lunch:** Grab a sandwich or salad at **Federal Café Bar**. Estimated cost: £20 for two.
- **Afternoon Activity:** Explore **The John Rylands Library** (free entry) and its stunning architecture.
- **Dinner:** Enjoy a meal at **Rudy's Neapolitan Pizza**. Estimated cost: £35 for two.
- **Evening Activity:** Catch a live music performance at **Band on the Wall**. Tickets usually around £15 each, total £30.
- **Backup Activity:** If it's raining, consider visiting the **People's Hi

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

### Summary of Agentic Patterns Used

This notebook demonstrates **three powerful agentic design patterns**:

1. **Multi-Agent Collaboration** 
   - Multiple models independently answer the same question
   - Leverages diverse perspectives and capabilities
   
2. **Reflection/Judge Pattern**
   - A judge model evaluates and ranks all responses
   - Identifies the best answer objectively
   
3. **Reflection Pattern** ⭐ (Simple & Effective)
   - An AI agent critiques and improves its own output
   - Creates iterative improvement without complex evaluation

These patterns are widely used in production AI systems!

## Exercise Solution: Simple Reflection Pattern

The **Reflection** pattern is powerful yet simple: we ask an AI to review and improve its own work.

### Simple Reflection Pattern

Let's use a **Reflection** pattern - where we ask the AI to critique and improve its own answer.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>