## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [8]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [9]:
# Always remember to do this!
load_dotenv(override=True)

True

In [10]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set (and this is optional)
Google API Key exists and begins AI
DeepSeek API Key not set (and this is optional)
Groq API Key exists and begins gsk_


In [11]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [12]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [13]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


You are an advisor to the mayor of a mid-sized city (population ~300,000) facing three simultaneous crises: a projected 40% budget shortfall over the next six months, a rapidly spreading respiratory illness with an estimated R between 1.5–3 and uncertain mortality, and 10,000 residents displaced by recent storm damage with limited shelter capacity. You cannot raise taxes or receive immediate external funding. Provide a prioritized 6-month action plan of up to 12 items that balances public health, fiscal solvency, and equity. For each item include: (a) one-sentence description, (b) one-sentence expected benefit, (c) one-sentence main cost or risk, (d) confidence level (low/medium/high) with one-sentence justification, and (e) one specific measurable data point that, if materially different, would make you reorder this item’s priority. Then name the single action you would implement first and justify that choice in one sentence. Finally, explicitly list up to four core assumptions you ma

In [14]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

## Note - update since the videos

I've updated the model names to use the latest models below, like GPT 5 and Claude Sonnet 4.5. It's worth noting that these models can be quite slow - like 1-2 minutes - but they do a great job! Feel free to switch them for faster models if you'd prefer, like the ones I use in the video.

In [15]:
# The API we know well
# I've updated this with the latest model, but it can take some time because it likes to think!
# Replace the model with gpt-4.1-mini if you'd prefer not to wait 1-2 mins

model_name = "gpt-5-nano"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Prioritized 6-month plan (8 items)

1) Shelter expansion and displacement triage
(a) Immediately expand displacement shelters and implement rapid triage to house 10,000 displaced residents.
(b) Reduces overcrowding and disease transmission while stabilizing basic needs.
(c) Requires staffing, security, and operating costs within a tight budget.
(d) High confidence: crisis is acute and shelter capacity is the bottleneck.
(e) Data point: if shelter occupancy remains >95% for 5 days or 3,000 residents remain unsheltered, reorder upward.

2) Health surge: testing, vaccination, and ventilation
(a) Deploy mobile testing, vaccination clinics, and ventilation upgrades in shelters and public spaces.
(b) Lowers transmission and protects vulnerable groups, helping curb R (1.5–3).
(c) Costs include test kits, vaccines, staffing, and retrofit expenses; risk of diverting funds.
(d) High confidence: rapid health gains are essential in a high-R setting.
(e) Data point: if R stays >1.2 or daily cases rise for 14 days, elevate priority.

3) Protect essential services and workforce
(a) Ensure continuity of police, fire, EMS, transit, and utilities with PPE and contingencies.
(b) Maintains critical infrastructure and safe service delivery during crisis.
(c) Cost: PPE, overtime, and supply shortages; risk of crowding out other needs.
(d) High confidence: without operations intact, health and economy collapse.
(e) Data point: if critical-service downtime exceeds 8 hours in a week, adjust.

4) Targeted relief for vulnerable and displaced residents
(a) Provide utility/rent relief, food support, language access, and job-search help for 10,000 displaced and low-income residents.
(b) Improves health outcomes and prevents homelessness, advancing equity.
(c) Cost/risk: limited funds and logistics; potential for misuse without controls.
(d) Medium–high confidence: essential for equity but must be tightly targeted.
(e) Data point: percent of displaced households with stable housing within 30 days; if <60%, raise priority.

5) Fiscal consolidation with equity guardrails
(a) Freeze nonessential hiring, pause discretionary procurement, renegotiate contracts, and redeploy staff.
(b) Conserves funds to close a 40% six-month gap while protecting essential services.
(c) Risk: slower capacity-building and possible equity gaps if protections lapse.
(d) Medium–high confidence: necessary to avert insolvency but requires safeguards.
(e) Data point: actual shortfall vs forecast; if shortfall >40% of forecast, escalate.

6) Data dashboards and risk monitoring
(a) Establish a citywide dashboard for health, shelter, and fiscal indicators, with multilingual updates.
(b) Improves situational awareness and enables rapid reallocation of resources.
(c) Costs: data system buildout and staff; risk of misinterpretation.
(d) Medium confidence: informs decisions but not a substitute for action.
(e) Data point: transmission/hospitalization trends; if positivity >10% for 14 days, escalate.

7) Regional mutual aid and coordination
(a) Formalize regional shelter sharing, medical surge, and supply pooling with neighboring jurisdictions.
(b) Extends capacity and resilience without new external funding.
(c) Risk: bureaucratic delays and uneven participation.
(d) Medium confidence: beneficial but depends on partner commitments.
(e) Data point: regional shelter capacity utilization; if <80% of regional need for 7 days, elevate.

8) Home-based care and telehealth expansion
(a) Scale home-based care and telehealth for noncritical cases and chronic monitoring.
(b) Reduces hospital admissions and preserves local health system capacity.
(c) Costs: technology access, privacy safeguards, and training needs.
(d) Medium confidence: valuable if residents have digital access.
(e) Data point: proportion of eligible cases managed at home; if <30%, escalate.

First action to implement
Implement item 1 immediately: expand and triage shelters to rapidly reduce overcrowding and health risk for the 10,000 displaced; this is the most urgent bottleneck driving health and equity.

Core assumptions
- No new taxes or immediate external funding; funding reallocation is required. 
- The six-month budget shortfall remains around 40% without external inflows. 
- Shelter expansion and mutual aid can be operational with existing channels and partners. 
- Data systems and dashboards can be scaled quickly enough to inform decisions.

In [None]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-sonnet-4-5"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [16]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.5-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Here is a prioritized 6-month action plan balancing public health, fiscal solvency, and equity:

1.  **Activate Emergency Operations Center (EOC).**
    (a) Establish a central command for coordinated crisis response across all departments.
    (b) Facilitates efficient resource allocation and communication, crucial for multi-faceted crises.
    (c) Costs include diverting key personnel from regular duties and initial setup.
    (d) High: Essential for any multi-crisis response, ensuring effective governance from day one.
    (e) If internal communication delays exceed 2 hours regularly.

2.  **Secure Emergency Shelter & Basic Needs for Displaced.**
    (a) Provide immediate safe shelter, food, water, and sanitation for all 10,000 displaced residents.
    (b) Prevents further human suffering, mitigates public health risks, and addresses equity for storm victims.
    (c) Significant financial outlay and complex logistics are the primary costs/risks.
    (d) High: Addresses immediate life-safety and humanitarian crisis for a vulnerable population.
    (e) If 500+ displaced residents remain unsheltered daily after 72 hours.

3.  **Rapid Illness Assessment & Emergency Testing Capacity.**
    (a) Implement widespread testing and rapid contact tracing to understand illness severity and spread.
    (b) Provides critical data to inform public health measures, reduce uncertainty, and manage healthcare impact.
    (c) Main costs include acquiring tests and personnel, with potential for supply chain issues.
    (d) Medium: Availability of testing supplies and trained personnel is uncertain but vital for data-driven response.
    (e) If daily illness hospitalizations increase by 20% over 3 consecutive days.

4.  **Implement Targeted Public Health Measures.**
    (a) Enforce city-wide mask mandates, hygiene campaigns, and voluntary social distancing guidelines.
    (b) Reduces the R-value, slows illness spread, and protects healthcare capacity without severe economic lockdown.
    (c) Risks include public resistance and mild economic disruption for certain businesses.
    (d) High: Proven, cost-effective measures for mitigating respiratory illness spread.
    (e) If the estimated illness R-value consistently stays above 2.0.

5.  **Immediate Hiring Freeze & Non-Essential Spending Halt.**
    (a) Halt all non-critical hiring, non-essential travel, and discretionary spending across all city departments.
    (b) Provides an immediate, significant reduction in the city's financial burn rate.
    (c) Risks impacting employee morale and potentially minor disruptions to non-critical services.
    (d) High: A direct and immediate control measure for managing the budget shortfall.
    (e) If the monthly projected budget deficit does not decrease by 10%.

6.  **Launch Comprehensive Public Information Campaign.**
    (a) Deliver clear, consistent updates on the illness, storm recovery, and city services via multiple channels.
    (b) Builds public trust, promotes compliance with health guidelines, and mitigates misinformation/panic.
    (c) Risks include information overload or failure to reach all vulnerable populations.
    (d) High: Essential for public cooperation and maintaining social order during multiple crises.
    (e) If public compliance with mask mandates drops below 70% within two weeks.

7.  **Critical Services Audit & Budget Prioritization.**
    (a) Conduct a rapid review of all city services, prioritizing those essential for public safety, health, and infrastructure.
    (b) Focuses limited funds on core functions, identifying specific areas for deeper cuts or restructuring.
    (c) Risk of political pushback and loss of valued but non-critical community services.
    (d) Medium: Requires tough, potentially unpopular decisions but is vital for fiscal sustainability.
    (e) If budget projections show less than 20% shortfall reduction after initial cuts.

8.  **Reallocate Existing City Staff to Crisis Response.**
    (a) Repurpose non-essential city employees to support emergency shelters, public health, or administrative tasks.
    (b) Addresses critical staffing needs without new hires, optimizing workforce and saving funds.
    (c) Requires rapid cross-training and may disrupt normal non-critical operations.
    (d) Medium: Depends on staff adaptability and the feasibility of quick training for new roles.
    (e) If staffing gaps in critical emergency operations exceed 15% of required roles.

9.  **Engage Community & NGO Partners for Support.**
    (a) Solicit volunteers, donations, and expertise from local non-profits, businesses, and community groups.
    (b) Supplements city resources for shelter management, food distribution, and public health outreach efforts.
    (c) Risks include coordination challenges and varying reliability of external support.
    (d) High: Leverages existing community assets to extend capacity and foster city-wide resilience.
    (e) If fewer than 5 local NGOs actively participate in formal response efforts.

10. **Explore Temporary Facility Use for Shelters/Isolation.**
    (a) Identify and adapt vacant public buildings (e.g., schools, community centers) for additional shelter or isolation.
    (b) Cost-effectively expands capacity for displaced residents and illness containment/quarantine.
    (c) Risks include suitability of facilities, staffing, and potential community resistance to locations.
    (d) Medium: Requires rapid assessment and potentially significant logistical effort for conversion.
    (e) If identified temporary facilities can house fewer than 2,000 additional people.

11. **Negotiate Deferred Payments/Reduced Rates with Vendors/Creditors.**
    (a) Proactively engage city vendors and creditors to renegotiate payment terms or seek temporary rate reductions.
    (b) Frees up immediate cash flow to address urgent crisis needs within the constrained budget.
    (c) Risk of damaging vendor relationships or negatively impacting the city's credit rating.
    (d) Medium: Success depends heavily on vendor goodwill and the city's negotiating power.
    (e) If fewer than 25% of major vendors agree to revised terms.

12. **Establish Emergency Relief Fund for Public Donations.**
    (a) Create a secure, transparent fund for public and corporate donations to support crisis efforts.
    (b) Provides an additional, flexible funding source outside the constrained city budget for direct aid.
    (c) Requires administrative oversight and strong public trust to attract significant contributions.
    (d) Medium: Success relies on public generosity and meticulous transparent management.
    (e) If total fund donations are less than $500,000 within the first month.

**Single First Action:** Activate the Emergency Operations Center (EOC).
**Justification:** This action immediately centralizes command and ensures a coordinated, data-driven response to all three simultaneous crises, which is impossible without it.

**Core Assumptions:**
1.  The city has foundational emergency management infrastructure and some existing staff, even if stretched.
2.  A significant portion of the public will comply with reasonable health guidelines and official directives.
3.  Local businesses and non-governmental organizations are willing to partner with the city during the crisis.
4.  The respiratory illness, while uncertain, is manageable with public health measures and not immediately 100% fatal.

In [None]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [17]:
# Updated with the latest Open Source model from OpenAI

groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "openai/gpt-oss-120b"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


**Prioritized 6‑month Action Plan (9 items)**  

| # | Action (a) | Expected Benefit (b) | Main Cost / Risk (c) | Confidence (d) | Re‑order Trigger (e) |
|---|------------|----------------------|----------------------|----------------|----------------------|
| 1 | Deploy modular units and repurpose schools, gyms, and community centers to house 10 000 displaced residents. | Stops homelessness, reduces disease spread, and restores basic safety. | Requires upfront logistics, staffing, and utilities; may strain limited shelter staff. | **High** – city already owns the facilities and has contracts for rapid modular construction. | If the storm‑damage count falls below 5 000, shelter expansion can be de‑escalated. |
| 2 | Scale up free rapid‑test kiosks and a digital contact‑tracing platform city‑wide. | Cuts the effective reproduction number (R) toward 1, limiting cases and hospital load. | Procurement and data‑privacy safeguards cost $4‑6 M; privacy concerns could limit uptake. | **Medium** – similar programs succeeded in comparable cities but depend on public buy‑in. | If daily new cases drop below 50 for two weeks, testing intensity can be reduced. |
| 3 | Issue a citywide mask‑and‑hygiene mandate plus a multilingual education campaign (TV, radio, social media). | Low‑cost reduction in transmission; improves compliance among vulnerable groups. | Enforcement may strain police; risk of public resistance. | **High** – proven effectiveness of mask mandates in high‑R settings. | If mask compliance surveys exceed 80 %, the mandate can be relaxed. |
| 4 | Impose a temporary 6‑month curfew on non‑essential retail (10 pm–5 am) and limit indoor capacity to 50 %. | Lowers person‑to‑person contact, supporting R reduction while keeping most commerce open. | Small revenue loss for affected businesses; possible legal challenges. | **Medium** – similar curfews have reduced transmission elsewhere. | If hospital ICU occupancy stays below 60 % for a month, curfew can be lifted. |
| 5 | Freeze hiring for all non‑critical city positions and defer 25 % of payroll for senior staff (repayable after 12 months). | Saves an estimated $12 M, directly addressing the budget gap. | May lower morale; risk of attrition among skilled workers. | **High** – payroll is the largest controllable expense. | If the shortfall shrinks to <10 % of projected budget, the freeze can be eased. |
| 6 | Form a public‑private food‑distribution coalition using existing grocery warehouses to deliver meals to shelters and low‑income neighborhoods. | Guarantees nutrition for displaced and at‑risk residents, preventing secondary health crises. | Coordination complexity; reliance on volunteer driver availability. | **Medium** – logistics proven in disaster drills. | If food‑bank inventories dip below 2 weeks of supply, expand external procurement. |
| 7 | Launch a city‑run volunteer corps (students, retirees) for sanitation, temperature checks, and basic caregiving in shelters. | Augments limited city staff, improves infection control, and builds community resilience. | Requires training and liability coverage; variable volunteer reliability. | **High** – volunteer pools are sizable and eager after recent storms. | If volunteer enrollment falls under 1 000, consider paid short‑term contracts. |
| 8 | Re‑allocate $15 M from postponed capital projects (e.g., park upgrades) to emergency health supplies and shelter utilities. | Provides immediate cash without new borrowing; preserves core services. | Delays long‑term infrastructure improvements, possible political pushback. | **High** – funds are already earmarked and can be redirected quickly. | If a state emergency grant is secured, re‑allocation can be reduced. |
| 9 | Install a weekly fiscal‑health dashboard (budget burn‑rate, case trends, shelter occupancy) for the mayor’s office and council. | Enables data‑driven adjustments, preventing hidden deficits or spikes. | Minimal staff time; risk of mis‑interpretation without proper training. | **High** – dashboards are low‑cost and improve transparency. | If dashboard metrics show stable trends for two consecutive months, frequency can be reduced. |

**First Action to Implement:** Deploy the emergency shelter surge (Item 1). It immediately protects the most vulnerable, curtails disease transmission in crowded conditions, and creates a base for all subsequent health and fiscal measures.

**Core Assumptions**  
1. No new tax revenues or external grants can be accessed for the next six months.  
2. The respiratory illness’s mortality is low enough that containment (R < 1) will avoid overwhelming hospitals.  
3. Existing city‑owned facilities (schools, gyms) can be repurposed without legal impediments.  
4. Private sector partners (grocers, contractors) are willing to cooperate on short‑notice contracts at near‑cost rates.

## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [18]:
!ollama pull llama3.2

[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h

In [19]:
!ollama ls

NAME                        ID              SIZE      MODIFIED      
llama3.2:latest             a80c4f17acd5    2.0 GB    2 seconds ago    
mistral:7b                  3944fe81ec14    4.1 GB    6 months ago     
mistral:latest              f974a74358d6    4.1 GB    7 months ago     
tinyllama:latest            2644915ede35    637 MB    7 months ago     
phi:latest                  e2fd6321a5fe    1.6 GB    7 months ago     
dolphin-phi:latest          c5761fc77240    1.6 GB    7 months ago     
llama2:7b                   78e26419b446    3.8 GB    7 months ago     
gemma:2b                    b50d6c999e59    1.7 GB    7 months ago     
mxbai-embed-large:latest    468836162de7    669 MB    8 months ago     
gemma3:latest               a2af6cc3eb7f    3.3 GB    8 months ago     


In [20]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

**6-Month Action Plan:**

1. **Expand Homelessness Services**
Description: Increase available temporary housing by partnering with local shelters, churches, and community organizations.
Expected Benefit: Provides safe shelter for displaced residents while maintaining a stable social network.
Main Cost/Risk: Increased partnerships and resource allocation; moderate confidence level due to the feasibility of implementing existing support systems.
Data Point: Total number of individuals served at new temporary housing sites.

2. **Enhance Public Health Response**
Description: Create a task force to provide immediate support to residents, including contact tracing and telemedicine for treatment.
Expected Benefit: Minimizes public exposure to infection while accessing necessary medical assistance.
Main Cost/Risk: Financial implications of staffing task force; higher confidence level due to the urgent need for social services.
Data Point: Number of suspected cases receiving remote care.

3. **Streamline Budget Reconciliation**
Description: Implement austerity measures, such as reducing utility costs and reducing city operations by 10%.
Expected Benefit: Reduces overall budget shortfall by $2 million in the first six months.
Main Cost/Risk: Reduced staff morale; moderate confidence level due to the relatively minor impact on personnel.
Data Point: Cumulative utility bill savings.

4. **Foster Community Support**
Description: Organize volunteer outreach programs through school districts, senior centers, and community groups.
Expected Benefit: Utilizes existing networks for resource distribution while maintaining social connections among displaced residents.
Main Cost/Risk: Volunteer engagement rates; moderate confidence level because potential partnerships should be feasible to secure.
Data Point: Total number of participated volunteers.

5. **Enhanced Emergency Communication**
Description: Activate emergency alert systems, improve messaging platforms, and develop critical response units (C.R.U.) for key sectors.
Expected Benefit: Promotes enhanced public awareness of health concerns via real-time response.
Main Cost/Risk: Equipment acquisition costs; moderate confidence level because implementing technology to replace existing processes would address the need more efficiently.
Data Point: Response rate percentage to issued alerts.

6. **Healthcare Partnership Development**
Description: Partner with regional hospitals and private healthcare providers for expanded capacity and treatment.
Expected Benefit: Allocates critical medical support in response to demand for healthcare services among displaced residents.
Main Cost/Risk: Negotiation costs; high confidence level due to the imperative need to improve access to necessary care.
Data Point: Percentage increase in new patients from crisis area.

7. **Temporary Budget Measures**
Description: Reduce city overhead and procurement expenses, while shifting certain programs to short-term or contract arrangements.
Expected Benefit: Mitigates overall financial risks by extending available capacity on short-term basis.
Main Cost/Risk: Changes in program effectiveness due to limited resource contracts; moderate confidence level as implemented effectively through flexible implementation of service adjustments.
Data Point: Increase in budgetary savings over fixed-term adjustment windows.

8. **Debt Restructure Analysis**
Description: Develop strategic partnerships with regional lenders for possible temporary hardship assistance, while retaining long-term financial oversight.
Expected Benefit: Secures the city's immediate credit through a debt settlement and interest structure modifications that reduces risks temporarily.
Main Cost/Risk: Risk to local employment due to reduced growth potential within loan agreement terms; medium confidence level as successful when carefully managed through collaborative agreements with strategic partners.
Data Point: Change in interest rates.

9. **Neighborhood Emergency Planning**
Description: Work with the public on localized response development for disaster recovery scenarios and urban renewal strategies.
Expected Benefit: Enhances public safety while addressing ongoing storm preparedness efforts simultaneously with crisis assistance needs.
Main Cost/Risk: Building capacity and community engagement at all costs; medium confidence level as necessary resources and strategic planning can lead to effective responses, though it depends on the quality of coordination with partners in this initiative.
Data Point: Percentage increase in neighborhood recovery teams established.

10. **Crisis Financial Planning**
Description: Work closer with public health officials, finance departments, and private sector organizations for emergency preparedness updates.
Expected Benefit: Integrates planning capabilities to address crisis financial needs, enhancing response effectiveness by aligning all departments simultaneously with crisis support strategies.
Main Cost/Risk: Misaligned expectations or inefficient allocation of risk-assessment strategies; moderate confidence level due to the complexity involved and interdependency across departments involved.
Data Point: Number of times crisis plans updated.

11. **Public Awareness Campaigns**
Description: Develop targeted health and economic information dissemination using community networks as sources for rapid public education.
Expected Benefit: Promotes understanding of risks amidst crisis challenges more efficiently without increasing unnecessary stress among affected residents.
Main Cost/Risk: Miscommunication or misinformation due to complexity in message translation across different community groups; medium confidence level due to effectiveness potential depending on the communication strategies put into place.
Data Point: Increase in resident self-assessment of health awareness.

12. **Business Support Assistance**
Description: Develop partnerships with local and regional organizations that can provide temporary recovery resources such as consulting services, loans available for crisis impacted businesses.
Expected Benefit: Provides economic safety net via collaboration between business growth agencies to foster resilience among crisis-stricken residents during this time period.
Main Cost/Risk: Potential failure of aid or misuse in handling of funds due incorrect intentions; moderate confidence level as collaborative approaches lead to successful implementation under proper planning practices.
Data Point: Number of small, medium businesses revived through recovery support.

In [21]:
# So where are we?

print(competitors)
print(answers)


['gpt-5-nano', 'gemini-2.5-flash', 'openai/gpt-oss-120b', 'llama3.2']
['Prioritized 6-month plan (8 items)\n\n1) Shelter expansion and displacement triage\n(a) Immediately expand displacement shelters and implement rapid triage to house 10,000 displaced residents.\n(b) Reduces overcrowding and disease transmission while stabilizing basic needs.\n(c) Requires staffing, security, and operating costs within a tight budget.\n(d) High confidence: crisis is acute and shelter capacity is the bottleneck.\n(e) Data point: if shelter occupancy remains >95% for 5 days or 3,000 residents remain unsheltered, reorder upward.\n\n2) Health surge: testing, vaccination, and ventilation\n(a) Deploy mobile testing, vaccination clinics, and ventilation upgrades in shelters and public spaces.\n(b) Lowers transmission and protects vulnerable groups, helping curb R (1.5–3).\n(c) Costs include test kits, vaccines, staffing, and retrofit expenses; risk of diverting funds.\n(d) High confidence: rapid health gain

In [22]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gpt-5-nano

Prioritized 6-month plan (8 items)

1) Shelter expansion and displacement triage
(a) Immediately expand displacement shelters and implement rapid triage to house 10,000 displaced residents.
(b) Reduces overcrowding and disease transmission while stabilizing basic needs.
(c) Requires staffing, security, and operating costs within a tight budget.
(d) High confidence: crisis is acute and shelter capacity is the bottleneck.
(e) Data point: if shelter occupancy remains >95% for 5 days or 3,000 residents remain unsheltered, reorder upward.

2) Health surge: testing, vaccination, and ventilation
(a) Deploy mobile testing, vaccination clinics, and ventilation upgrades in shelters and public spaces.
(b) Lowers transmission and protects vulnerable groups, helping curb R (1.5–3).
(c) Costs include test kits, vaccines, staffing, and retrofit expenses; risk of diverting funds.
(d) High confidence: rapid health gains are essential in a high-R setting.
(e) Data point: if R sta

In [23]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [24]:
print(together)

# Response from competitor 1

Prioritized 6-month plan (8 items)

1) Shelter expansion and displacement triage
(a) Immediately expand displacement shelters and implement rapid triage to house 10,000 displaced residents.
(b) Reduces overcrowding and disease transmission while stabilizing basic needs.
(c) Requires staffing, security, and operating costs within a tight budget.
(d) High confidence: crisis is acute and shelter capacity is the bottleneck.
(e) Data point: if shelter occupancy remains >95% for 5 days or 3,000 residents remain unsheltered, reorder upward.

2) Health surge: testing, vaccination, and ventilation
(a) Deploy mobile testing, vaccination clinics, and ventilation upgrades in shelters and public spaces.
(b) Lowers transmission and protects vulnerable groups, helping curb R (1.5–3).
(c) Costs include test kits, vaccines, staffing, and retrofit expenses; risk of diverting funds.
(d) High confidence: rapid health gains are essential in a high-R setting.
(e) Data point: if

In [25]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [26]:
print(judge)

You are judging a competition between 4 competitors.
Each model has been given this question:

You are an advisor to the mayor of a mid-sized city (population ~300,000) facing three simultaneous crises: a projected 40% budget shortfall over the next six months, a rapidly spreading respiratory illness with an estimated R between 1.5–3 and uncertain mortality, and 10,000 residents displaced by recent storm damage with limited shelter capacity. You cannot raise taxes or receive immediate external funding. Provide a prioritized 6-month action plan of up to 12 items that balances public health, fiscal solvency, and equity. For each item include: (a) one-sentence description, (b) one-sentence expected benefit, (c) one-sentence main cost or risk, (d) confidence level (low/medium/high) with one-sentence justification, and (e) one specific measurable data point that, if materially different, would make you reorder this item’s priority. Then name the single action you would implement first and j

In [27]:
judge_messages = [{"role": "user", "content": judge}]

In [28]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results": ["3", "2", "1", "4"]}


In [29]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: openai/gpt-oss-120b
Rank 2: gemini-2.5-flash
Rank 3: gpt-5-nano
Rank 4: llama3.2


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>