## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [2]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [3]:
# Always remember to do this!
load_dotenv(override=True)

True

In [4]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set (and this is optional)
Google API Key not set (and this is optional)
DeepSeek API Key not set (and this is optional)
Groq API Key not set (and this is optional)


In [5]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [6]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [7]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


You are given responsibility for advising an international coalition on a 10-year rescue-and-development plan for a newly discovered, isolated island nation ("Naru") with 200,000 people, two main ethnic groups with a long history of intermittent violence, 60% primary tropical rainforest cover, abundant freshwater, and coastal deposits of rare-earth minerals; current GDP per capita is $4,000, literacy 70%, high infant mortality, and a fragile coalition government that rejects foreign military intervention—design a detailed, practical, prioritized policy roadmap that (A) increases the island’s Human Development Index by at least 0.15 in 10 years, (B) preserves at least 80% of primary forest cover, (C) ends open ethnic violence and establishes durable political institutions, and (D) attracts sustainable private and public investment—subject to a total externally sourced budget cap of $2 billion and no military enforcement—by providing, for each year 1–10: (1) six priority actions (one eac

In [8]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

## Note - update since the videos

I've updated the model names to use the latest models below, like GPT 5 and Claude Sonnet 4.5. It's worth noting that these models can be quite slow - like 1-2 minutes - but they do a great job! Feel free to switch them for faster models if you'd prefer, like the ones I use in the video.

In [9]:
# The API we know well
# I've updated this with the latest model, but it can take some time because it likes to think!
# Replace the model with gpt-4.1-mini if you'd prefer not to wait 1-2 mins

model_name = "gpt-5-nano"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Executive summary and framing

- Context: Naru is a 200,000-person island nation with two main ethnic groups that have a history of intermittent violence. It has substantial primary rainforest (60%), abundant freshwater, and coastal deposits of rare-earth minerals. Current GDP per capita is about $4,000; literacy is 70%; infant/maternal health outcomes are poor; the coalition government is fragile and openly rejects foreign military intervention. External budget cap: $2 billion over 10 years; no military enforcement.

- Goal of the plan (10 years): (A) raise the island’s HDI by at least 0.15; (B) preserve at least 80% of primary forest cover (interpreted here as maintaining at least 80% of the current forest’s baseline extent over the 10-year period, with a firm floor on deforestation); (C) end open ethnic violence and establish durable, inclusive political institutions; (D) attract sustainable private and public investment.

- Core design principles: non-military, rights-based, locally legitimate, gender- and youth-inclusive, climate- and forest-sensitive, and fiscally disciplined within the $2B cap. Emphasis on ownership by Naruan people and institutions, with transparent use of funds, independent monitoring, and strong anti-corruption measures.

- What follows: for each year 1–10, six priority actions (one per governance, education, health, economy, environment, social reconciliation), each with a measurable indicator and annual target, a stated probability that the 10-year HDI target is met if followed, three plausible failure modes with contingency plans, a discussion of ethical trade-offs and mitigations, and a clear list of missing critical data/assumptions and how uncertainties would affect recommendations.

A. Assumptions, data gaps, and ethical framing (clearly state what is fact, what is assumed, and what reflects value judgments)

- Facts given by you (not contested here): population size, forest cover, key characteristics of economy/education/health, ethnic tensions, absence of foreign military enforcement, external budget cap.

- Assumptions we explicitly rely on (not facts about reality, but planning premises you requested we separate from data):  
  - The coalition can legally provide and manage $2B in a blended finance package (grants, concessional loans, and grants via a joint fund) over 10 years.  
  - The Naruan coalition government remains open to non-military external support that is conditional on governance and reform measures.  
  - Private sector interest can be mobilized in a staged manner (extractive value chain work, upstream processing, and green services) in a way that aligns with environmental safeguards.  
  - International partners are capable of delivering technical assistance and monitoring without on-the-ground coercion.  
  - A ceasefire and voluntary disarmament is possible with credible security-sector reform, police strengthening, and community policing, not with military enforcement.

- Data gaps and uncertainties (critical to track and update):  
  - Baseline HDI components (life expectancy, education index, and per-capita income) and current infant/maternal mortality rates with disaggregations by district/ethnic group.  
  - Current capacity of health system (number of facilities, health workers, supply chains).  
  - Baseline literacy rates disaggregated by age and gender, and current school enrollment/attendance by region.  
  - Exact land area under primary rainforest, current deforestation rate, and the legal/policy framework for forest governance.  
  - The structure of the coalition government, party system, and the feasibility of rapid constitutional reform.  
  - Potential external macro shocks (commodity prices for rare-earths, climate impacts, regional security dynamics) and their likely budgetary implications.  

- Value judgments and ethical commitments (how we mitigate harms):  
  - Prioritize non-coercive governance reform and inclusive participation; avoid coercive policing, preserve indigenous rights, and protect forest-dependent communities.  
  - Ensure labor rights in extractive and processing activities, with strict workplace standards and freedom of association.  
  - Promote environmental justice: disadvantaged regions and ethnic groups receive targeted health, education, and livelihood support to prevent widening inequalities.  
  - Transparency and consent: all large-scale investments require free, prior, and informed consent (as applicable) from affected communities.  
  - Balance between rapid HDI gains and long-term forest protection: we avoid destructive land-use projects that yield quick gains but long-term environmental damage.

B. Budget framing and sequencing across 10 years

- Total external budget cap: $2,000,000,000 (in present value, contingent on no material macro disruptions).  
- Suggested annual mix (average, with adjustments as milestones are met):  
  - Governance and state-building: $800M total (~$80M/year)  
  - Education: $400M total (~$40M/year)  
  - Health: $400M total (~$40M/year)  
  - Economy and private investment enablement: $250M total (~$25M/year)  
  - Environment and forest protection: $100M total (~$10M/year)  
  - Social reconciliation and community mediation: $50M total (~$5M/year)  
  - Administrative/monitoring/contingencies: covered within the above allocations; annual spend may vary with project maturity.

- Financing logic: blend grants and concessional funding with private-sector co-investments and local-government co-financing; establish a transparent, accountable fund with independent monitoring; tie disbursements to measurable milestones and social safeguards.

C. The 10-year policy roadmap: Year-by-year actions, indicators, risks, contingencies, ethics, and uncertainties

Important note on format: for each year, six actions (governance, education, health, economy, environment, social reconciliation) are listed with (1) indicator and target, (2) estimated probability of meeting the 10-year HDI target if the plan is followed, (3) three plausible failure modes and contingency plans, (4) ethics and harm mitigation, and (5) data/assumptions gaps and how they affect recommendations. Targets are stated as annual milestones or year-end goals. Where baselines are unknown, targets are expressed as year-over-year progress or relative improvements, with explicit caveats.

Year 1

- Governance action: Establish a508-equitable, inclusive coalition governance framework and independent oversight.  
  Indicator: Proportion of major policy reforms enacted with cross-ethnic buy-in and public release of reform documents; target: 100% of flagship reforms published and at least 60% cross-party/ethnic consensus by Year-end.  
  Probability HDI target if followed: moderate (e.g., 0.60). Rationale: governance reforms enable service delivery and reduce violence risk.  
  Failure modes and contingencies:  
  1) Political blocking by entrenched actors; contingency: fast-track non-controversial reforms via interim executive orders with parliamentary ratification window.  
  2) Public distrust in reform process; contingency: establish community councils with randomized representation to solicit input.  
  3) Corruption risk during procurement; contingency: independent procurement watchdog and open data portal.  
  Ethics and harms: ensure that reforms avoid marginalizing smaller communities; use inclusive consultation; publish procurement data.  
  Data gaps/assumptions: need baseline on current governance transparency and reform capacity; if baseline is weaker, adjust timeline and set smaller milestones and stronger monitoring.

- Education action: Launch universal primary education access initiative with literacy-first approach; teacher recruitment and training.  
  Indicator: Primary enrollment rate; target: 90% by Year-end; Literacy improvement of 5 percentage points; target: +5 points.  
  Probability: medium-to-high (0.55).  
  Failure modes:  
  1) Teacher shortages; contingency: accelerate teacher recruitment and international teacher-mentor program;  
  2) Infrastructure gaps (schools, electricity, water); contingency: deploy modular classrooms and solar/water access.  
  3) Student absenteeism due to household tasks; contingency: provide school meals and conditional cash transfers to families.  
  Ethics: ensure gender parity in access; actively recruit female teachers; remove barriers for girls’ attendance.  
  Data gaps: need baseline enrollment and literacy rates by district; if poor data quality, rely on rapid learning assessments and triangulation.

- Health action: Deploy community health workers (CHWs) and essential maternal-newborn care package; strengthen essential medicines supply chain.  
  Indicator: Under-5 mortality rate (U5MR) reduction target: -10% relative to baseline by Year-end; Maternal mortality ratio (MMR) reduction target: -15%.  
  Probability: medium (0.50).  
  Failure modes:  
  1) Supply-chain disruptions; contingency: reserve stockpile and build regional distribution hubs;  
  2) CHW retention issues; contingency: incentive schemes and career ladders.  
  3) Security-related disruption to clinic access; contingency: mobile clinics and telemedicine where feasible.  
  Ethics: ensure equitable access across districts, including remote/indigenous communities; maintain patient confidentiality.  
  Data gaps: need district-level health indicators; if not available, establish a baseline within six months with rapid surveys.

- Economy action: Create a transparent, streamlined investment-enabling framework with project portals, environmental safeguards, and community-benefit sharing requirements.  
  Indicator: Number of investment projects approved with environmental/social safeguards; target: 5 projects by year-end; investment commitments valued.  
  Probability: moderate (0.50).  
  Failure modes:  
  1) Local capacity gaps (land-use permitting, tax compliance); contingency: build local capacity through training;  
  2) Environmental backlash against mining-related projects; contingency: robust EIA and community engagement.  
  3) Market/commodity-price volatility; contingency: diversification into processing and other sectors.  
  Ethics: ensure fair labor conditions; require community-benefit agreements; share value with local communities.  
  Data gaps: need baseline for investment appetite and permitting processes; if weak, start with pilot zones and scalable governance improvements.

- Environment action: Pass forest-protection baseline law and establish pilot protected-buffer zones.  
  Indicator: Number of new protected areas established; target: 2 zones; Definir baseline deforestation rate; but baseline unknown. Target deforestation rate: cut by at least 50% of baseline within protected zones by Year-end.  
  Probability: medium (0.45).  
  Failure modes:  
  1) Illegal logging; contingency: community forest watch programs and remote sensing;  
  2) Insufficient funding for enforcement; contingency: performance-based grants to local authorities;  
  3) Competing land-use pressures; contingency: land-use planning and compensation schemes.  
  Ethics: ensure indigenous rights and land tenure clarity; avoid displacing communities; ensure benefit-sharing.  
  Data gaps: current forest tenure arrangements; if unclear, provide land-tenure clarity workshops and mapping.

- Social reconciliation action: Launch truth-and-reconciliation process design and local peace committees.  
  Indicator: Establishment of a national truth-and-reconciliation commission; target: scheduled in Year 1; number of local peace committees formed; target: 10.  
  Probability: moderate (0.50).  
  Failure modes:  
  1) Security threats to commissioners; contingency: secure venues and protective measures;  
  2) Deep mistrust among groups; contingency: facilitated dialogues and intermediary mediators;  
  3) Political interference; contingency: independent funding and governance structure.  
  Ethics: ensure equitable representation; protect witness safety; ensure non-retaliation; consent-based processes for harms remediation.  
  Data gaps: need baseline conflict metrics and past grievances; if weak, implement rapid social surveys in parallel with reconciliation activities.

Year 2

- Governance: Build civil-service reform and anti-corruption framework; establish independent auditing body with public reporting.  
  Indicator: Public procurement integrity index; target: publish quarterly procurement reports and achieve 75% compliance by year-end.  
  Probability: 0.60.  
  Failure modes and contingencies:  
  1) Political pushback; contingency: build cross-sector oversight board;  
  2) Shadow economy resistance; contingency: digital payments and traceability;  
  3) Insufficient data; contingency: baseline audit and external evaluation.  
  Ethics: maintain transparency; avoid targeting individuals; protect whistleblowers.  
  Data gaps: baseline procurement transparency; fill with independent audit.

- Education: Primary-to-secondary transition program; expand secondary access; teacher training expansion; digital learning tools.  
  Indicator: % of cohort completing primary-to-secondary transition; target: 60% by year-end; Secondary enrollment growth rate; target: +10 percentage points.  
  Probability: 0.52.  
  Failure modes:  
  1) Funding gaps; contingency: reform budget lines and private philanthropy;  
  2) Technology access gaps; contingency: offline digital content and radio-based lessons;  
  3) Gender barriers persist; contingency: gender-sensitivity training; girls’ scholarships.  
  Ethics: improve opportunities for girls and marginalized communities; monitor gender parity.  
  Data gaps: baseline transition rates; update.

- Health: Immunization scale-up, nutrition programs; workforce training.  
  Indicator: Vaccination coverage (children under 5 for core vaccines); target: 90%; stunting rate; target: -10 percentage points from baseline.  
  Probability: 0.55.  
  Failure modes:  
  1) Vaccine supply issues; contingency: buffer stock and multiple suppliers;  
  2) Cold-chain failures in remote areas; contingency: solar-powered cold chain;  
  3) Demand-side hesitancy; contingency: community engagement and trusted messengers.  
  Ethics: ensure equitable service across districts; involve communities in planning.  
  Data gaps: immunization baseline; fill with survey data.

- Economy: Local value addition in early-stage processing; incentivize small and medium-sized enterprises (SMEs).  
  Indicator: Number of SMEs linked to mining/value chains; target: 15 new SMEs integrated; local revenue contribution growth; target: +5% of GDP from targeted sectors.  
  Probability: 0.45.  
  Failure modes:  
  1) SME finance gaps; contingency: guarantee programs and microfinance;  
  2) Environmental compliance risk; contingency: mandatory environmental safeguards;  
  3) Market access constraints; contingency: export facilitation and trade-linkages.  
  Ethics: ensure benefit-sharing with local communities; ensure fair labor rights.  
  Data gaps: baseline SME ecosystem; fill via mapping.

- Environment: Forest governance improvements; community-led forest management pilots.  
  Indicator: Community forest-user group formal recognition; target: 6 groups recognized; annual deforestation rate in pilot zones reduced by 60% relative to baseline.  
  Probability: 0.50.  
  Failure modes:  
  1) Non-compliance by logging operators; contingency: enforcement and sanctions;  
  2) Conflict between communities over resource use; contingency: mediation and transparent benefit-sharing;  
  3) Data gaps; contingency: remote sensing and local trackers.  
  Ethics: ensure indigenous tenure and consent; ensure consent for restricted use.  
  Data gaps: need baseline forest-use patterns; fill with participatory mapping.

- Social reconciliation: Inter-ethnic dialogue forums and local security reform; disarmament planning.  
  Indicator: Number of demobilized ex-combatants reintegrated; target: 100; community-violence incidents reduced; target: -25% from baseline.  
  Probability: 0.50.  
  Failure modes:  
  1) Relapse into violence; contingency: rapid-response peace teams;  
  2) Commando rearmament; contingency: trace-and-disarm programs;  
  3) Distrust of disarmament process; contingency: transparency and independent monitoring.  
  Ethics: ensure fair treatment; comprehensive reintegration support (jobs, land, livelihoods).  
  Data gaps: current scale of violence; fill with rapid assessments.

Year 3

- Governance: Strengthen judiciary and legal reform to improve rule of law; institute civilian oversight of security services.  
  Indicator: Perceived judicial independence index; target: improve by +0.1 on a 0–1 scale; cases resolved within timeframes; target: 20% improvement.  
  Probability: 0.60.  
  Failure modes:  
  1) Political capture; contingency: independent budget and multi-stakeholder oversight;  
  2) Lack of qualified judges; contingency: regional judicial exchanges;  
  3) Backlog of cases; contingency: fast-track courts for urgent matters.  
  Ethics: protect due process; ensure fair trials; protect vulnerable groups.  
  Data gaps: baseline independence ratings; fill with surveys.

- Education: Expand secondary-level science and vocational tracks; teacher specialization.  
  Indicator: Secondary enrollment rate; target: +15 percentage points; STEM enrollment share; target: 25%.  
  Probability: 0.58.  
  Failure modes:  
  1) STEM teacher shortages; contingency: remote teaching and international mentors;  
  2) Apprenticeship placement gaps; contingency: public-private cooperation;  
  3) Student dropouts; contingency: retention programs and scholarships.  
  Ethics: gender parity; ensure inclusive access to STEM fields.  
  Data gaps: baseline secondary enrollment; fill with administrative data.

- Health: Strengthen maternal and neonatal health; expand essential obstetric services.  
  Indicator: Facility-based delivery rate; target: +20 percentage points; skilled birth attendant coverage; target: 90%.  
  Probability: 0.62.  
  Failure modes:  
  1) Transport barriers for pregnant women; contingency: ambulance networks and community transport;  
  2) Cultural barriers; contingency: involve traditional birth attendants with training;  
  3) Stockouts; contingency: cross-border procurement arrangements.  
  Ethics: ensure culturally sensitive care; protect patient privacy; ensure informed consent.  
  Data gaps: baseline facility-delivery rates; fill with surveys.

- Economy: Invest in regional trade hubs and logistics; develop export-readiness for minerals processed locally.  
  Indicator: Number of export-ready value-added products; target: 3; logistics efficiency index improvement; target: +0.2.  
  Probability: 0.50.  
  Failure modes:  
  1) Global demand shifts; contingency: diversify product lines;  
  2) Infrastructure failures; contingency: resilience investments;  
  3) Environmental pushback; contingency: robust environmental safeguards.  
  Ethics: ensure local communities’ access to opportunities; monitor labor rights.  
  Data gaps: baseline logistics and value chains; fill with mapping.

- Environment: Scale up forest protection across jurisdictions; strengthen protected-area networks.  
  Indicator: Area under formal protection; target: +10% new protected area; enforcement capacity; target: enforcement visits.  
  Probability: 0.52.  
  Failure modes:  
  1) Boundary disputes; contingency: customary law recognition;  
  2) Enforcement gaps; contingency: community rangers;  
  3) Funding shortfalls; contingency: donor financing triggers.  
  Ethics: ensure rights of indigenous communities; transparent management.  
  Data gaps: baseline protected areas; fill via inventory.

- Social reconciliation: Local peace committees expand; youth and women leadership in reconciliation; cross-ethnic exchange programs.  
  Indicator: Youth and women representation in local peace committees; target: 40% by Year-end; incidents of violence; target: -20% from baseline.  
  Probability: 0.55.  
  Failure modes:  
  1) Elite capture; contingency: rotate leadership;  
  2) Fatigue and disengagement; contingency: recognition incentives;  
  3) Local conflicts resurface; contingency: rapid response mediation.  
  Ethics: ensure fair participation; protect vulnerable groups; value traditional practices while promoting universal human rights.  
  Data gaps: measure youth and women participation baselines; fill with surveys.

Year 4

- Governance: Constitutional reform process to enshrine power-sharing and minority rights with public ratification, constitutional court, and elections reform.  
  Indicator: Adoption of constitutional amendments and ratification; target: process completed; public vote scheduled; outcome validated.  
  Probability: 0.60.  
  Failure modes:  
  1) Ratification failure; contingency: alternative reform tracks;  
  2) Outbreaks of violence during reform; contingency: temporary security escalation is not permitted; community policing;  
  3) International donor fatigue; contingency: milestone-based disbursements and progress reporting.  
  Ethics: safeguard minority rights; ensure consent and legitimacy.  
  Data gaps: public opinion baseline on constitutional support; fill with polls.

- Education: Universal basic literacy with female education emphasis; adult literacy programs.  
  Indicator: Adult literacy rate; target: +7 percentage points; female literacy rate; target: +10 points.  
  Probability: 0.57.  
  Failure modes:  
  1) Stagnant female participation; contingency: targeted scholarships and safe schooling;  
  2) Funding constraints; contingency: grant-matching and community-based funding;  
  3) Literacy measurement inaccuracies; contingency: multiple methods (tests, surveys).  
  Ethics: ensure gender equity and culturally sensitive pedagogy.  
  Data gaps: baseline adult literacy by gender; fill with rapid assessment.

- Health: Expanded immunization, maternal and child nutrition; water/sanitation improvements.  
  Indicator: Access to safe water and sanitation; target: 80% of households; vitamin A and micronutrient supplementation coverage; target: 90%.  
  Probability: 0.58.  
  Failure modes:  
  1) Water contamination or supply issues; contingency: water trucking and purification;  
  2) Vaccine cold-chain breakage; contingency: solar-powered options;  
  3) Nutrition program leakage (not reaching target groups); contingency: targeted outreach and home visits.  
  Ethics: respect consent and privacy; ensure outreach respects local customs.  
  Data gaps: baseline water/sanitation, nutrition indicators; fill with rapid surveys.

- Economy: Investment in education- and tech-enabled sectors; incentivize green jobs.  
  Indicator: Jobs created in green/tech sectors; target: +2,000 jobs; female participation rate; target: 40%.  
  Probability: 0.52.  
  Failure modes:  
  1) Mismatch between skills and jobs; contingency: re-train and re-skill programs;  
  2) Private sector risk aversion; contingency: policy guarantees and public-private partnerships;  
  3) Environmental safeguards constraints; contingency: streamlined compliance;  
  Ethics: ensure fair labor standards and community benefits.  
  Data gaps: baseline labor market; fill with surveys.

- Environment: Forest monitoring and sustainable harvesting; community governance.  
  Indicator: Forest health index; target: stabilization or improvement; illegal harvesting incidents; target: -50%.  
  Probability: 0.50.  
  Failure modes:  
  1) Illegal actors circumvent monitoring; contingency: community ranger networks;  
  2) Climate-related stressors; contingency: adaptation measures;  
  3) Data gaps; contingency: cross-border data sharing;  
  Ethics: ensure indigenous rights; ensure benefit-sharing from forest resources.  
  Data gaps: baseline forest health; fill with field surveys.

- Social reconciliation: National reconciliation week; inter-ethnic youth exchange; mediation training.  
  Indicator: Number of youth mediators trained; target: 200; violence incident count; target: -15%.  
  Probability: 0.54.  
  Failure modes:  
  1) Youth disengagement; contingency: incentives and mentorship;  
  2) Political manipulation; contingency: independent monitoring;  
  3) Intergroup mistrust; contingency: ongoing dialogue;  
  Ethics: protect minority rights, empower youth and women; avoid coercion.  
  Data gaps: baseline youth mediator counts; fill with program data.

Year 5 (midpoint)

- Governance: Strengthen formal legal and regulatory frameworks; publish annual public governance dashboards.  
  Indicator: Public trust and perceived legitimacy index; target: +0.15 on a 0–1 scale; number of reforms implemented; target: 75% of planned reforms completed.  
  Probability: 0.65.  
  Failure modes:  
  1) Reform fatigue; contingency: simplify and consolidate reforms;  
  2) Corruption resurgence; contingency: automated compliance monitoring;  
  3) Political gridlock; contingency: legal mandate for essential reforms.  
  Ethics: maintain transparency; protect witnesses and whistleblowers.  
  Data gaps: baseline trust index; fill via mixed-methods.

- Education: Secondary education completion and quality; STEM focus; teacher development.  
  Indicator: Secondary completion rate; target: +15 percentage points; standardized test pass rate in core subjects; target: +10%.  
  Probability: 0.60.  
  Failure modes:  
  1) Access issues in rural/remote areas; contingency: mobile schools;  
  2) Teacher attrition; contingency: higher salaries and career progression;  
  3) Technology access; contingency: offline resources and ICT labs.  
  Ethics: equity of access; ensure girls’ participation and safe schooling.  
  Data gaps: baseline secondary outcomes; fill with district data.

- Health: Expanded maternal and neonatal health; essential medicines access; disease prevention.  
  Indicator: Neonatal mortality rate; target: -20%; maternal midwifery coverage; target: 95%.  
  Probability: 0.60.  
  Failure modes:  
  1) Supply chain fragility; contingency: multiple suppliers and stockpiles;  
  2) Access barriers for remote communities; contingency: mobile clinics;  
  3) Data lag; contingency: sentinel surveillance and rapid assessments.  
  Ethics: ensure culturally appropriate care; respect privacy and consent.  
  Data gaps: baseline MMR and neonatal mortality; fill with rapid health surveys.

- Economy: Infrastructure investments and regional integration; export diversification.  
  Indicator: Infrastructure quality index; target: +0.2; number of regional trade agreements or partnerships; target: 2.  
  Probability: 0.50.  
  Failure modes:  
  1) Financing gaps; contingency: project-based financing and donor co-financing;  
  2) Environmental conflicts; contingency: alignment with forest safeguards;  
  3) Local procurement challenges; contingency: capacity-building.  
  Ethics: ensure fair labor practices and local employment; monitor for displacement.  
  Data gaps: baseline infrastructure, trade readiness; fill with surveys.

- Environment: Forest restoration pilots; community-based conservation; climate resilience.  
  Indicator: Area restored within degraded landscapes; target: +5% of degraded area; carbon-stock monitoring; target: establish baseline, 0–1 scale.  
  Probability: 0.52.  
  Failure modes:  
  1) Climate stress; contingency: diversified species plan;  
  2) Funding lapse; contingency: performance-based allocation;  
  3) Land tenure disputes; contingency: mediation and recognition of customary rights.  
  Ethics: ensure representatives of indigenous communities in restoration governance.  
  Data gaps: baseline degraded land; fill with mapping.

- Social reconciliation: Long-term peace architecture; local governance reforms; cross-ethnic coalitions.  
  Indicator: Proportion of districts with functioning peace committees; target: 75% of districts; violence incidents; target: -30%.  
  Probability: 0.58.  
  Failure modes:  
  1) Entrenched elites block reform; contingency: independent oversight;  
  2) Inter-ethnic resentment resurfaces; contingency: ongoing dialogue;  
  3) Resource competition; contingency: fair distribution mechanisms.  
  Ethics: protect minority rights; ensure alternative livelihoods; maintain community consent.  
  Data gaps: district-level peace indicators; fill via field reports.

Year 6

- Governance: Stabilize constitutional/government practices; strengthen budget transparency and citizen input.  
  Indicator: Public budget transparency index; target: +0.2; citizen budget engagement events; target: 50 per year.  
  Probability: 0.65.  
  Failure modes:  
  1) Political backsliding; contingency: legal safeguards and independent oversight;  
  2) Data manipulation concerns; contingency: open data portal;  
  3) Donor fatigue; contingency: milestone-based funding.  
  Ethics: ensure non-discrimination; provide access for marginalized groups.  
  Data gaps: baseline transparency; fill via review.

- Education: Vocational training linked to mining- and green-industries; apprenticeships.  
  Indicator: Apprenticeship completion rate; target: 60%; local hires in new projects; target: 40%.  
  Probability: 0.58.  
  Failure modes:  
  1) Employer demand mismatch; contingency: adjust curricula;  
  2) Training capacity limits; contingency: expand partnerships and remote learning;  
  3) Safety concerns in industrial settings; contingency: improved safety training and enforcement.  
  Ethics: ensure fair hiring; avoid displacement; respect workers’ rights.  
  Data gaps: baseline vocational uptake; fill with program data.

- Health: Expanded disease surveillance and emergency response; maternal/child health network.  
  Indicator: Disease outbreak response time; target: reduce to threshold; health-system readiness index; target: 0.75 on readiness scale.  
  Probability: 0.60.  
  Failure modes:  
  1) Surveillance gaps; contingency: community-based reporting;  
  2) Logistics disruptions; contingency: regional hubs;  
  3) Public trust issues; contingency: transparent communication.  
  Ethics: ensure privacy; equitable access; avoid stigmatization.  
  Data gaps: baseline readiness; fill with drills and reviews.

- Economy: Attract sustainable investments into processing and services; implement export controls to ensure value capture locally.  
  Indicator: Local value-added share in exports; target: +10 percentage points; new foreign-invested projects; target: 3.  
  Probability: 0.52.  
  Failure modes:  
  1) Resource-nationalism pushback; contingency: governance reforms and community-benefit sharing;  
  2) Environmental compliance risk; contingency: risk-based auditing;  
  3) Currency risk; contingency: hedging and macro-stabilization plans.  
  Ethics: ensure fair labor rights and community consent in extractive projects.  
  Data gaps: baseline export processing share; fill with trade data.

- Environment: Strengthened forest-fire prevention; cross-border environmental governance.  
  Indicator: Fire incidents in forested areas; target: -50%.  
  Probability: 0.50.  
  Failure modes:  
  1) Drought-driven fires; contingency: early-warning system and community fire brigades;  
  2) Funding gaps; contingency: emergency reserve;  
  3) Community risk of displacement; contingency: compensation and reforestation commitments.  
  Ethics: protect Indigenous lands; ensure rights and occupancy stay intact.  
  Data gaps: baseline fire risk; fill with climate data.

- Social reconciliation: Long-term peace agreements embedded in local governance; youth-to-youth exchanges; inclusive public communications.  
  Indicator: Public trust in institutions index; target: +0.15; participants in exchanges; target: 2,000.  
  Probability: 0.60.  
  Failure modes:  
  1) Media manipulation; contingency: independent public broadcasting;  
  2) Local-level violence eruptions; contingency: rapid response mediation;  
  3) Donor-driven narratives; contingency: ensure local leadership and consent;  
  Ethics: maintain autonomy of local actors; avoid externalizing responsibility; respect cultural practices.  
  Data gaps: baseline trust in institutions; fill via surveys.

Year 7–Year 9 (consolidation and scale-up; maintain focus on HDI components, forest protection, and inclusive governance)

- Governance: Strengthen anti-corruption and the rule of law; ensure public participation in budget and policy cycles.  
  Indicator: Corruption perception/progress index; target: improvement of at least 0.05 on composite index; number of bills enacted to strengthen oversight; target: 8–10.  
  Probability: 0.68.  
  Failure modes:  
  1) Elite capture; contingency: citizen oversight, public dashboards;  
  2) Enforcement gaps; contingency: expanded training and regional cooperation;  
  3) Administrative capacity constraints; contingency: capacity-building programs.  
  Ethics: protect rights; ensure due process for all.

- Education: Upgrade higher education and technical training aligned with economy’s green/mineral sectors; remote and on-site programs.  
  Indicator: University/institute capacity and enrollment; target: 2 new accredited programs; 25% of graduates employed in domestic sectors; target: 25%.  
  Probability: 0.60.  
  Failure modes:  
  1) Resource constraints; contingency: blended finance for education infrastructure;  
  2) Mismatch with labor market; contingency: industry advisory boards;  
  3) Brain drain risk; contingency: retention strategies and local-for-local initiatives.  
  Ethics: ensure access for marginalized groups; provide scholarships for underrepresented cohorts.

- Health: Primary health network expansion; non-communicable disease (NCD) prevention; mental health integration.  
  Indicator: NCD screening coverage; target: 40% of adult population; mental health service utilization; target: 15% of population.  
  Probability: 0.60.  
  Failure modes:  
  1) Funding constraints; contingency: scaling with private sector and NGOs;  
  2) Stigma; contingency: community outreach;  
  3) Supply chain; contingency: digital health and telemedicine;  
  Ethics: protect patient privacy; ensure informed consent; address cultural concerns.  
  Data gaps: baseline NCD burden; fill with health surveys.

- Economy: Large-scale private investment in processing and logistics; export diversification; regional integration.  
  Indicator: Foreign direct investment (FDI) inflow; target: +40% YoY; export diversification index; target: +0.25.  
  Probability: 0.62.  
  Failure modes:  
  1) Global market shifts; contingency: diversify export markets;  
  2) Infrastructure delays; contingency: accelerate critical logistics upgrades;  
  3) Environmental conflict; contingency: independent environmental review and mitigation.

  Ethics: ensure labor rights and community share of benefits; ensure environmental justice.  
  Data gaps: baseline FDI and diversification; fill with investment data.

- Environment: Forest protection and climate adaptation; community-based forest management and carbon-credit readiness.  
  Indicator: Carbon-credit readiness score; target: ready for third-party verification by Year 9; forest-loss rate; target: maintained below threshold.  
  Probability: 0.55.  
  Failure modes:  
  1) Carbon market volatility; contingency: counter-cyclical funding;  
  2) Governance gaps in benefit-sharing; contingency: transparent governance;  
  3) Local opposition to restricted use; contingency: adapt land-use policies and compensation.  
  Ethics: ensure indigenous tenure rights; ensure fair distribution of benefits.  
  Data gaps: baseline carbon stock; fill with forest inventories.

- Social reconciliation: Finalize constitution and long-term peace framework; national-level memorialization and education on history; community leadership networks.  
  Indicator: National peace index; target: +0.20; inclusion of diverse ethnic groups in leadership; target: 50% across leadership positions.  
  Probability: 0.62.  
  Failure modes:  
  1) Resistance from factions; contingency: ongoing dialogue;  
  2) Fragmentation of regional governance; contingency: inter-regional councils;  
  3) Narrative conflict; contingency: independent truth-mediation;  
  Ethics: inclusive memory and justice; consider indigenous cultural practices with respect to truth narratives.  
  Data gaps: baseline peace index; fill with surveys.

Year 10 (final evaluation and consolidation)

- Governance: Finalize and entrench durable institutions; independent oversight; public trust in governance at or above target.  
  Indicator: HDI-aligned governance quality composite and public trust; target: top-tier measure within this context; public trust index +0.20 since Year 1; cross-party policy consensus.  
  Probability: 0.75 (optimistic) / 0.60 (central).  
  Failure modes:  
  1) Political backsliding; contingency: sunset clauses and independent oversight;  
  2) External shocks; contingency: reserve plans;  
  3) Donor fatigue; contingency: domestic revenue mobilization and local co-financing;  
  Ethics: ensure sustainability without perpetuating power disparities; preserve minority protections.  
  Data gaps: long-run governance and trust metrics; fill with longitudinal surveys.

- Education: Education system resilient, inclusive, and adaptable; universal literacy, STEM pipelines, and lifelong learning.  
  Indicator: HDI education sub-index composite; target: +0.10; tertiary and vocational enrollments; target: 35% of youth.  
  Probability: 0.70.  
  Failure modes:  
  1) Demographic shifts; contingency: scalable programs;  
  2) Skills misalignment; contingency: continuous industry feedback;  
  3) Financial volatility; contingency: endowments and stable funding streams.  
  Ethics: ensure equitable access; protect privacy in education data.

- Health: Robust health system with universal access; strong maternal/child health outcomes; resilient public health infrastructure.  
  Indicator: Life expectancy at birth; target: +5–7 years; U5MR; target: -40% (relative).  
  Probability: 0.70.  
  Failure modes:  
  1) Emerging diseases; contingency: preparedness and international health partnerships;  
  2) Health system fragility; contingency: multi-year financing commitments;  
  3) Equity gaps; contingency: targeted programs;  
  Ethics: maintain patient dignity; ensure consent; efficient use of resources.

- Economy: Sustainable private investment and public investment with a diversified economy and strong domestic markets; continued value addition.  
  Indicator: GDP per capita (nominal) growth; target: +50–70% over 10 years; private investment share of GDP; target: 25–30%.  
  Probability: 0.65.  
  Failure modes:  
  1) Commodity price shocks; contingency: diversification into services and manufacturing;  
  2) Fiscal constraints; contingency: debt sustainability and revenue reforms;  
  3) Social tensions from growth; contingency: inclusive growth strategy;  
  Ethics: ensure fair labor rights; prevent exploitation; ensure benefit-sharing.

- Environment: Forest cover preserved; ongoing restoration; carbon-positive activities where feasible.  
  Indicator: Primary forest cover remaining (or remaining baseline); target: at least 80% of initial baseline preserved; net deforestation rate near zero or negative where possible.  
  Probability: 0.60.  
  Failure modes:  
  1) Deforestation spillovers; contingency: cross-border enforcement;  
  2) Climate variability; contingency: adaptation investments;  
  3) Resource-use pressures; contingency: stronger land-use planning;  
  Ethics: ensure indigenous rights and consent; sustainable resource use; respect community livelihoods.  
  Data gaps: up-to-date forest inventory; fill with remote sensing.

- Social reconciliation: Durable peace and social cohesion; inclusive institutions; long-term social trust.  
  Indicator: National social cohesion index; target: +0.25; violence incidents; target: -50%.  
  Probability: 0.70.  
  Failure modes:  
  1) Emerging tensions; contingency: continuous dialogue;  
  2) Elite manipulation; contingency: strong anti-corruption and oversight;  
  3) Program fatigue; contingency: incentives for continued participation;  
  Ethics: ensure indigenous practices and rights are respected while advancing universal human rights; maintain cultural heritage.

D. Three plausible failure modes per action and contingency (summary approach)

Because enumerating three failure modes for 60 actions would be unwieldy in full detail here, here is a structured, scalable approach you can apply in the field:

- Common failure modes (apply across actions):  
  1) Political resistance or backsliding; contingency: safeguard provisions, independent oversight, phased reforms with sunset clauses, broad-based coalitions.  
  2) Funding shortfalls or misallocation; contingency: milestone-based disbursement, blended finance, private-sector co-funding, and in-kind contributions; strict monitoring and public dashboards.  
  3) Implementation capacity gaps; contingency: capacity-building packages, pilot-to-scale approach, local co-management and partner-support, and phased rollouts.  

- Action-specific examples (illustrative, not exhaustive):  
  Governance: risk of elite capture; contingency: citizen oversight bodies and public reporting; risk of reform fatigue; contingency: simplify reforms; risk of data gaps; contingency: rapid baseline data collection and independent evaluation.  
  Education: risk of teacher shortages; contingency: fast-tracked recruitment and international mentoring; risk of infrastructure gaps; contingency: modular classrooms and offline learning tools; risk of gender barriers; contingency: targeted scholarships and community-based safe schools.  
  Health: risk of supply chain disruptions; contingency: buffer stock and multiple suppliers; risk of demand-side hesitancy; contingency: trusted community health champions; risk of remote-area access issues; contingency: mobile clinics.  
  Economy: risk of commodity-price volatility; contingency: diversify product lines; risk of private sector risk aversion; contingency: policy guarantees and co-investment; risk of environmental backlash; contingency: strong safeguards.  
  Environment: risk of illegal logging or boundary disputes; contingency: community ranger networks and mediation; risk of funding gaps; contingency: performance-based grants and reserve funds.  
  Social reconciliation: risk of resurgence of violence; contingency: rapid mediation teams; risk of elite manipulation; contingency: independent oversight; risk of fatigue; contingency: incentives and community engagement structures.

E. Ethical trade-offs and harm mitigation

- Trade-offs we acknowledge: rapid HDI gains may require some short-term concessions in land-use or extractive activity; deforestation risk must be mitigated with robust protections; power-sharing may slow decision-making but is essential for durable peace; external private investment must be balanced with community benefits and environmental safeguards; social reconciliation may require truth-telling that is painful for some actors.

- How harms are mitigated:  
  - Forests: enforce strong environmental safeguards, independent monitoring, community involvement, and rights-based tenure.  
  - Labor: enforce labor rights, safe workplaces, and freedom of association; require fair compensation and community benefit sharing.  
  - Indigenous practices: secure consent for resource use on indigenous lands; ensure co-management and respect for customary laws where compatible with human rights.  
  - Health and education: ensure equitable access for rural/remote communities and marginalized groups; monitor for inequitable outcomes and correct gaps.  
  - Social reconciliation: avoid punitive or retributive measures; prioritize truth-telling with protection for victims and witnesses; ensure culturally appropriate memory and heritage work.

F. Critical data gaps, assumptions, and how uncertainties change recommendations

- Data and assumptions currently missing (examples):  
  - Baseline HDI components (life expectancy, education index, GNI per capita) and current health metrics (infant/maternal mortality by district/ethnic group).  
  - Regular, reliable data on literacy, school enrollment, and education quality by district.  
  - Detailed health system capacity (facility counts, workforce, supply chains) and immunization baselines.  
  - Forest inventory, tenure arrangements, deforestation rates, land-use maps, and enforcement capacity.  
  - Detailed governance structure, coalition dynamics, and capacity for reform; baseline trust in institutions; public opinion.  
  - Private-sector appetite and potential co-investment amounts; local procurement capacity; workforce skills; energy and infrastructure baselines.

- How uncertainties would change recommendations (practical implications):  
  - If health baselines are worse than expected, allocate more resources to health early and adjust education-targeted investments to align with health improvements; speed up maternal/child health investments.  
  - If governance capacity is weaker or trust is low, place heavier emphasis on inclusive processes, civil-society oversight, and phased reforms with strong transparency; possibly slow reform pace to build legitimacy.  
  - If forest governance and tenure systems are unclear, prioritize land-tenure clarification, community-based forest management pilots, and independent monitoring before large-scale extractive investments.  
  - If private investment interest is slower, front-load governance reforms to reduce risk, and consider targeted grant-based seed capital and public-private pilot projects to demonstrate credibility.  
  - If data quality remains poor, use a dual-track approach: short-cycle, low-cost data collection (rapid surveys, sentinel indicators) to guide mid-course corrections, while establishing long-term national data systems for ongoing monitoring.

G. Explicit list of critical data and assumptions we lack (as requested), with how lacking data would change recommendations

- Critical data gaps we currently lack (not invented here; we are listing what would be needed to refine plans):  
  - Baseline and district-level HDI components (life expectancy, schooling indices, income) and current health indicators (infant/maternal mortality) disaggregated by region and ethnicity.  
  - Current literacy rates by age/sex, primary and secondary enrollment/attendance by district, and teacher density in rural vs urban areas.  
  - Structure of the coalition government, party positions, and public trust in institutions; baseline measures of political violence, grievance intensity, and conflict indicators.  
  - Forest inventory (area by forest type, fragmentation), tenure and land-use rights (including indigenous rights), and baseline deforestation rate.  
  - Private-sector appetite, existing mineral-processing capacity, and potential co-investment commitments.  
  - Health system capacity (facilities, supply chains, workforce density) and current immunization/nutrition baselines.  
  - Infrastructure baseline (roads, electricity, water), logistics capacity, and energy mix.

- How these uncertainties would shift recommendations:  
  - If HDI baselines are poor or gender-disaggregated data reveal large gaps, push education and health investments earlier and adopt gender-responsive programming to accelerate equity gains.  
  - If forest-tenure is unsettled or land rights are contested, postpone large-scale extractive investment until tenure is clarified and community benefit-sharing is guaranteed; prioritize pilots with strong community consent.  
  - If governance distrust is high, slow the pace of reforms and increase transparency, community engagement, and independent monitoring to build legitimacy before scaling.  
  - If private investment interest is uncertain, rely more on donor-supported capacity-building and public investments in critical enablers (education, health, governance) to reduce perceived risk for private partners.

H. Practicalities and next steps for implementation

- Establish a joint, transparent implementation vehicle (a multi-donor trust fund with a strong governance framework, local representation, and independent monitoring and evaluation).  
- Create a phased disbursement schedule tied to milestone achievement, with explicit governance, environmental, and social safeguards.  
- Build an integrated data and monitoring system to track the six action areas, HDI-related metrics, forest status, and peace indicators; publish dashboards for accountability.  
- Create local capacity-building programs to reduce dependence on external personnel over time; emphasize community ownership.

Closing notes

- This roadmap is designed to be implementable within a $2B external budget cap, no military enforcement, and with a strong emphasis on non-coercive governance, environmental stewardship, social reconciliation, and inclusive private-sector development.  
- The plan relies on critical data that should be collected early and updated continuously; uncertainties should be explicitly tracked, and the plan should be adapted in light of new information.  
- The ethical framework centers on protecting indigenous practices and land tenure, ensuring fair labor practices, equitable education and health access, and robust civil-society participation. If any of the critical data gaps prove large, the plan should be revised toward transparency, incremental progress, and inclusive decision-making rather than rapid but unverified reforms.

In [11]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-sonnet-4-5"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

TypeError: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted"

In [None]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.5-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
# Updated with the latest Open Source model from OpenAI

groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "openai/gpt-oss-120b"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [None]:
!ollama pull llama3.2

In [12]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

APIConnectionError: Connection error.

In [13]:
# So where are we?

print(competitors)

print(answers)


['gpt-5-nano']


In [14]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gpt-5-nano

Executive summary and framing

- Context: Naru is a 200,000-person island nation with two main ethnic groups that have a history of intermittent violence. It has substantial primary rainforest (60%), abundant freshwater, and coastal deposits of rare-earth minerals. Current GDP per capita is about $4,000; literacy is 70%; infant/maternal health outcomes are poor; the coalition government is fragile and openly rejects foreign military intervention. External budget cap: $2 billion over 10 years; no military enforcement.

- Goal of the plan (10 years): (A) raise the island’s HDI by at least 0.15; (B) preserve at least 80% of primary forest cover (interpreted here as maintaining at least 80% of the current forest’s baseline extent over the 10-year period, with a firm floor on deforestation); (C) end open ethnic violence and establish durable, inclusive political institutions; (D) attract sustainable private and public investment.

- Core design principles: non-milita

In [15]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [16]:
print(together)

# Response from competitor 1

Executive summary and framing

- Context: Naru is a 200,000-person island nation with two main ethnic groups that have a history of intermittent violence. It has substantial primary rainforest (60%), abundant freshwater, and coastal deposits of rare-earth minerals. Current GDP per capita is about $4,000; literacy is 70%; infant/maternal health outcomes are poor; the coalition government is fragile and openly rejects foreign military intervention. External budget cap: $2 billion over 10 years; no military enforcement.

- Goal of the plan (10 years): (A) raise the island’s HDI by at least 0.15; (B) preserve at least 80% of primary forest cover (interpreted here as maintaining at least 80% of the current forest’s baseline extent over the 10-year period, with a firm floor on deforestation); (C) end open ethnic violence and establish durable, inclusive political institutions; (D) attract sustainable private and public investment.

- Core design principles: non-

In [None]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [None]:
print(judge)

In [None]:
judge_messages = [{"role": "user", "content": judge}]

In [None]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


In [None]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>