## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [1]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [2]:
# Always remember to do this!
load_dotenv(override=True)

True

In [4]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set (and this is optional)
Google API Key not set (and this is optional)
DeepSeek API Key not set (and this is optional)
Groq API Key not set (and this is optional)


In [5]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [6]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [7]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


You're an independent public‑health investigator presented with the following puzzling outbreak — please analyze and respond as requested below: Over the past 3 weeks, 128 residents of a mid‑sized riverside city (ages 4–82) developed acute encephalopathy with fever, headache, confusion and new‑onset seizures; cases cluster in two neighborhoods downstream from a small chemical plant. Timeline: the first cases began 3 weeks ago, approximately 3 weeks after an unreported “maintenance release” at the plant 6 weeks ago. Key clinical/laboratory/imaging findings: CSF in 32 tested patients shows lymphocytic pleocytosis and moderately elevated protein; PCR panel for common neurotropic viruses (HSV‑1/2, VZV, enteroviruses, West Nile, Japanese encephalitis, CMV, EBV) is negative in all; brain MRI in 18 patients shows bilateral medial temporal lobe and thalamic hyperintensities in 12; serum cholinesterase is low in 11 of 50 tested patients; three patients had transient clinical improvement after a

In [8]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

## Note - update since the videos

I've updated the model names to use the latest models below, like GPT 5 and Claude Sonnet 4.5. It's worth noting that these models can be quite slow - like 1-2 minutes - but they do a great job! Feel free to switch them for faster models if you'd prefer, like the ones I use in the video.

In [9]:
# The API we know well
# I've updated this with the latest model, but it can take some time because it likes to think!
# Replace the model with gpt-4.1-mini if you'd prefer not to wait 1-2 mins

model_name = "gpt-5-nano"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Below is a structured, stepwise public‑health analytic framework. It treats the data as compatible with multiple plausible etiologies and emphasizes transparent uncertainty, test‑driven updating, and actions that minimize harm while protecting the population.

1) Four or more plausible hypotheses (distinct, including single causes, mixed causes, and non‑infectious explanations)

H1. Primary organophosphate (OP) poisoning causing neurotoxic encephalopathy
- Rationale: Chemical plant maintenance release 6 weeks ago; environmental samples show OP compound in household dust (40% of tested homes) and river sediment near the plant; 32 tested patients show lymphocytic CSF pleocytosis with modest protein elevation (could reflect secondary inflammatory response to a toxin); 11/50 tested have low serum cholinesterase; transient improvement after atropine in 3 patients; two neighborhoods downstream—geographic plausibility for exposure.
- Non-infectious: This could present with encephalopathy and seizures in a toxin‑driven, inflammatory milieu rather than a classic infectious pattern.

H2. Unidentified or nonstandard neurotropic infectious encephalitis (including post‑infectious autoimmune limbic encephalitis)
- Rationale: CSF lymphocytic pleocytosis; MRI shows bilateral medial temporal lobe and thalamic hyperintensities (a pattern seen in limbic/related encephalitides); common viral panel negative; 5% seropositive for enterovirus IgM suggests possible low‑prevalence or timing‑related false negatives; clustering downstream could reflect shared exposure (e.g., to a reservoir or vector) or a localized outbreak of an unusual agent not included in standard panels.

H3. Mixed etiology: OP exposure with concurrent or sequential CNS infection or toxin
- Rationale: Presence of environmental OPs plus evidence of infectious signal (enterovirus IgM in a subset) or another neurotoxic exposure could jointly explain the constellation (toxin‑induced vulnerability plus infectious encephalitis; or toxin‑mediated inflammation predisposing to neuroinfection).

H4. Non‑OP chemical/toxic encephalopathy or autoimmune/inflammatory encephalopathy triggered by environmental exposure
- Rationale: A secondary toxin class (other pesticides, industrial solvents, heavy metals, or river‑borne toxins) or an autoimmune/inflammatory process triggered by environmental exposure could produce encephalopathy with seizures and limbic MRI findings, sometimes with variable CSF pleocytosis and modest inflammatory markers; low cholinesterase would be less central unless overlap with OP exposure is present.

Optional (less likely given current data but plausible in low signal scenarios)
H5. Fish‑related marine/river toxin (e.g., a neurotoxin from contaminated fish) causing encephalopathy
- Rationale: Dead fish reported; river exposure could suggest a toxin in aquatic food chains. However, many classic fish toxins produce prominent GI/paresthesias or early cardiopulmonary symptoms rather than the specific MRI pattern and predominant CSF picture here.

2) How to rank the hypotheses and assign probabilities (sum to 100%)

Initial plausible weights (subjective, data‑informed, and uncertain; rounded for clarity)
- H1 (OP toxin–driven encephalopathy): 45%
- H2 (unidentified/neurotropic infectious encephalitis or autoimmune encephalitis): 30%
- H3 (mixed exposure/infection/toxin etiology): 15%
- H4 (non‑OP chemical/toxic encephalopathy or autoimmune inflammatory process): 10%

Notes on justification:
- H1 has strong environmental and exposure clues (plant release timing, downstream geography, environmental OP in dust/sediment, cholinesterase depression in a subset, atropine responsiveness). It is the most parsimonious single‑cause explanation given the data, even if MRI features are not the classic OP encephalopathy pattern.
- H2 fits the clinical syndrome (encephalitis with limbic MRI pattern, CSF lymphocytosis, negative routine viral panel) and accounts for possible infections not captured by standard tests.
- H3 acknowledges that a coexistent or synergistic process (toxin plus infection) could be at work; probability is smaller but not implausible.
- H4 covers non‑OP toxins or autoimmune processes that could mimic this presentation; data are less specific but should be considered, especially if toxin panels broaden or autoantibody testing is positive.
- The sum is 100%; the numbers reflect current data emphasis but are inherently uncertain.

3) The single most informative test/observation for each hypothesis and how to update probabilities (Bayesian framework)

How to update in practice: after each pivotal data point, adjust P(H) using a qualitative likelihood ratio (LR). For illustration, we give approximate LRs, then show how to compute updated probabilities. The exact numbers will evolve with test performance in real life; use the same approach with your own calculated LRs as data accrue.

- H1: OP toxin as primary driver
  Most informative test: broad, quantitative OP exposure assessment across cases and controls
  - Test data: urinary dialkyl phosphate (DAP) metabolites; red blood cell acetylcholinesterase (AChE) activity; plant/household dust and river sediment testing for OP metabolites
  - If results are positive (high DAPs in many case households plus depressed RBC AChE in cases vs controls):
    - LR positive (LR+): e.g., ~5–10
    - Posterior: P(H1|data) = [P(H1)×LR+] / [P(H1)×LR+ + P(H2)+P(H3)+P(H4) × 1]
    - With prior P(H1)=0.45 and LR+=7, posterior ≈ (0.315) / (0.315 + 0.55) ≈ 0.364? Wait: calculation must include total denominator; better to compute generally:
      Numerator = 0.45×7 = 3.15
      Denominator = 3.15 + (0.55×1) = 3.70
      Posterior ≈ 0.852
    - Practical takeaway: If strong positive OP biomarker data emerge, P(H1) could rise toward 80–85%.
  - If results are negative:
    - LR−: e.g., 0.2
    - Posterior ≈ [0.45×0.2] / [0.45×0.2 + 0.55×1] = 0.09 / (0.09 + 0.55) ≈ 0.14
    - Practical takeaway: P(H1) could fall to ~14% or lower.
  Action: rely heavily on OP biomarker data to reweight.

- H2: Unidentified/neurotropic infectious encephalitis or autoimmune encephalitis
  Most informative test: unbiased metagenomic sequencing of CSF (and expanded targeted PCR panel)
  - If sequencing identifies a plausible neurotropic pathogen or autoantibody signature:
    - Substantial LR+: e.g., 20–100 depending on pathogen specificity
    - Posterior: e.g., with P(H2)=0.30, LR+=50 yields Posterior ≈ (0.30×50) / (0.30×50 + 0.70×1) ≈ 15 / (15 + 0.7) ≈ 0.957 → ~95%
    - If a pathogen is found with clear causal plausibility, strong support for H2; if no infectious agent identified after deep sequencing, LR− may reduce P(H2) to ~5–10%.
  Action: sequencing results dominate updates; positive etiologic agent grows H2 dramatically; negative results substantially downgrade H2.

- H3: Mixed etiology (OP plus infection/toxin)
  Most informative test: convergent evidence of both toxin exposure and infection/toxin in the same cases
  - Data: demonstration of high OP biomarkers plus detection of a neurotropic pathogen (or toxin) in CSF/blood, or simultaneous environmental exposure signals
  - If both OP and pathogen/toxin signals are present (dual evidence):
    - LR+: 10–50
    - Posterior will rise substantially (e.g., from 0.15 to 0.50–0.70 depending on strength)
  - If only OP is strong but no infection/toxin signal (or only infection/toxin but no OP signal):
    posterior will stay around the heterogeneity of priors
  Action: treat as a potential mixed etiology and cover both exposure reduction and clinical management.

- H4: Non‑OP chemical/toxin or autoimmune inflammatory encephalopathy
  Most informative test: neuronal autoantibody panel in CSF; MRI patterns consistent with autoimmune limbic encephalitis; alternative toxin panels
  - If autoantibodies (e.g., NMDA receptor, LGI1, CASPR2) or strong autoimmune signature are positive:
    LR+: 20–40
    Posterior: from 0.10 to 0.50–0.70
  - If autoantibody panel is negative and other non‑OP toxins are not detected:
    posterior may drop toward 5–10% depending on remaining signals
  Action: positive autoimmune result would shift emphasis toward immunotherapy considerations; negative results keep OP/infectious etiologies higher.

5) Immediate public‑health actions to protect the population (trade‑offs, risk of false positives/negatives, criteria to lift/intensify)

Immediate actions (iterative and proportional to evidence)
- Activate an outbreak response with a clearly defined incident command and transparent risk communication.
- Environmental controls at the plant
  - Action: halt or modify maintenance releases; require containment and safe remediation; perform occupational exposure assessment for workers and neighboring residents.
  - Trade‑offs/risks: economic impact; false alarms could lead to unnecessary shutdowns; ensure due process for plant operators.
  - Indicators to lift: if plant releases are definitively halted and environmental OP signals are not rising, consider phased return with monitoring.

- River and environmental controls
  - Action: restrict fishing in affected river sections; issue public advisories about consumption of river fish; intensify river sediment sampling; test for additional toxins.
  - Trade‑offs: economic impact on fisheries; avoid unnecessary panic; ensure advisories harm proportionate to risk.
  - Criteria to sustain: ongoing detection of OP or other toxins in sediments/dust and/or continued human cases attributable to exposure gaps.

- Household/environmental decontamination
  - Action: provide guidance for dust reduction in homes (HEPA vacuuming, wet cleaning), air/ventilation improvements; distribute protective gear to high‑exposure households; consider targeted environmental cleanup if OP compounds are confirmed.
  - Trade‑offs: cost to households; risk of false positives if the toxin is not widely present.
  - Criteria to lift: environmental sampling shows no appreciable contamination and no new cases linked to home environments.

- Medical management guidance for clinicians
  - Action: develop a clinical protocol for suspected OP exposure (airway/respiratory monitoring, atropine, pralidoxime on suspected ON exposure, seizure management); maintain readiness for advanced neuroimaging and ICU care where needed.
  - Trade‑offs: overtreatment if OP exposure is not the driver; however, early treatment for OP exposure can be life‑saving if confirmed.
  - Criteria to adapt: if OP biomarkers are strongly positive in many cases, maintain; if not, de‑escalate.

- Public communication and community engagement
  - Action: clearly explain uncertainties, what is known, what is being tested, and how residents can reduce exposure; provide hotlines and multilingual resources; hold regular town hall briefings with plant operators, public health officials, and medical providers.
  - Trade‑offs: risk of causing undue alarm; need to avoid misinformation.
  - Criteria to adjust: as new data come in (e.g., strong evidence for H1, or strong evidence for H2), messaging should shift accordingly.

- Fish and food safety advisories
  - Action: provide interim, precautionary guidance on river fish consumption; collaborate with local vendors to monitor fish harvesters.
  - Trade‑offs: economic impact; avoid over‑restriction if risk is low; use continual data to adjust.

6) Investigation plan: 0–72 hours vs 2–6 weeks (specific tests, sampling strategy, study designs, definitions, communication)

First 48–72 hours (rapid, decisive steps)
- Case definitions and surveillance
  - Define probable and confirmed cases (e.g., probable: acute encephalopathy with fever, seizures, confusion; age‑range 4–82; two or more neighborhoods; onset within last 3 weeks; CSF lymphocytic pleocytosis; imaging abnormalities).
  - Establish case line list with exposure histories (plant contact, residence downstream, household dust/dust exposure, river contact, fish consumption, pets).

- Biospecimen collection and basic testing
  - Collect from cases: serum, plasma, whole blood for cholinesterase activity (AChE), RBC AChE; plasma/serum OP metabolites (urinary DAPs); CSF when available for cell counts, protein, glucose, infectious panels, autoantibody screens; store aliquots for metagenomic sequencing.
  - Obtain environmental samples: household dust (carpets, rugs, upholstery), river sediment, surface water, plant vicinity air/soil, fish tissue if available; test for OP compounds and other neurotoxins; run broad panels for known toxins.
  - Start broad infectious disease testing: expanded PCR panels for neurotropic viruses; send CSF for metagenomic sequencing (unbiased) if available.

- Environmental and exposure assessment
  - Conduct focused environmental assessments around the plant (emission sources, maintenance records, recent releases, waste handling).
  - Map spatial clustering of cases relative to plant effluent and river dynamics (tidal, flow, downstream vs upstream).

- Interim clinical guidance
  - Provide clinicians with a protocol for management of suspected OP exposure (airway management, atropine dosing, pralidoxime if exposure suspected), seizure management, and supportive care.
  - Prepare hospital surge capacity for potential neurointensive care.

Next 2–6 weeks (expanded, hypothesis‑driven investigations)
- Epidemiologic studies
  - Case–control study: compare exposure histories (proximity to plant, dust/dirt exposure, dust in homes, river contact, fishing, fish consumption, occupational exposure) between cases and well controls drawn from the same neighborhoods.
  - Time‑series analysis: observe onset pattern relative to plant maintenance events; correlate case incidence with environmental OP levels if measured.

- Laboratory and diagnostic expansion
  - H1 support: quantify OP exposure across cases and controls (urinary DAPs, RBC AChE) with blinded analyses; test multiple environmental media for OP residues and metabolites.
  - H2 support: perform metagenomic sequencing on CSF and broad pathogen panels; expand PCR to include less common arboviruses and neurotropic agents; test for autoimmune markers (CSF autoantibodies) in a subset.
  - H3 support: perform toxin screens for other known neurotoxins; test for markers of co‑exposure; examine correlations between toxin levels and severity.
  - H4 support: autoimmune panels (e.g., NMDA receptor, LGI1, CASPR2) in CSF; MRI re‑review for patterns suggestive of autoimmune encephalitis; repeat imaging as clinically indicated.

- Environmental and ecological investigations
  - Expand river/water/soil/fish sampling to track spatial temporal trends; correlate with human case distribution.
  - Engage toxicology labs to quantify a broad suite of pesticides, solvents, heavy metals, and marine toxins in environmental and biological samples.

- Communication and stakeholder engagement
  - Maintain transparent updates to residents, plant management, environmental regulators, fishing communities, and healthcare providers.
  - Prepare risk communication materials that reflect evolving findings and avoid over‑statement.

7) Ethical, legal, and practical constraints (and how to address them)

- Testing and disclosure
  - Constraint: privacy and consent for biological testing; data sharing with the plant or industry could raise liability concerns.
  - Approach: obtain informed consent when feasible for biospecimen testing; anonymize data for public reporting; limit early dissemination of sensitive information; secure data sharing agreements with stakeholders.

- Evacuation or relocation
  - Constraint: potential displacement of residents; poor evidence base before decisive causation is established.
  - Approach: avoid mass evacuation; instead implement targeted exposure controls, protective guidance, and environmental remediation; escalate rapidly only if evidence strongly implicates a single localized source requiring urgent removal.

- Liability and regulatory actions
  - Constraint: plant responsibilities and potential legal action; political and economic implications.
  - Approach: work through regulatory channels; document evidence rigorously; engage independent third parties for environmental testing; ensure workers’ safety without undue blame.

- Public health ethics
  - Constraint: balancing precaution with proportionality; preventing stigmatization of neighborhoods or groups.
  - Approach: communicate uncertainties honestly; base actions on probability and risk assessments; minimize disruption; provide support resources to affected households.

8) Assumptions and numeric confidence interval for leading diagnosis

Assumptions used in this framework
- The outbreak is temporally and spatially linked to the river and plant operations; exposure likely via environmental media (dust, river sediments) rather than solely person‑to‑person transmission.
- The symptoms (encephalopathy with fever, seizures; CSF lymphocytosis) can be consistent with both toxin‑mediated CNS injury and certain infectious/inflammatory encephalitides.
- The initial testing panel may miss nonstandard pathogens or toxins; some tests (e.g., metagenomic sequencing, broad toxin panels) are required to resolve ambiguity.
- The MRI pattern of bilateral medial temporal lobe and thalamic hyperintensities is compatible with several etiologies (toxic, autoimmune, certain viral infections) and should prompt broadened testing rather than singular diagnostic conclusions.

Leading diagnosis (current best support)
- Leading hypothesis: H1 – Organophosphate poisoning causing neurotoxic encephalopathy.
- Point estimate: 45%
- Uncertainty range: ±15%
- Confidence interval: 30%–60%
- Rationale for leading status: strong, consistent environmental exposure clues (plant release, downstream neighborhoods, dust/sediment OP detection, depressed cholinesterase in a subset, atropine responsiveness) that cohere with the clinical picture, even if MRI findings are not exclusively characteristic of OP toxicity.

Other hypotheses with provisional weights (for context)
- H2: 30% (unidentified/neurotropic infectious encephalitis or autoimmune encephalitis)
- H3: 15% (mixed OP exposure with infection/oxidative toxin)
- H4: 10% (non‑OP toxin or autoimmune inflammatory encephalopathy)
- Rationale: While H2 remains plausible given limbic MRI and CSF pleocytosis, the weight is moderated by the repeated environmental OP signal and the cholinesterase data. H3 and H4 are plausible but less supported by current data, though they must be actively explored.

Step‑by‑step reasoning summary
- The strongest single‑cause signal is OP exposure with plant‑related release; this is supported by environmental sampling and a plausible exposure pathway to two downstream neighborhoods, plus laboratory hints (low cholinesterase in a subset, atropine response).
- Infectious etiologies are plausible given CSF findings and MRI patterns, but routine panels are negative; the 5% enterovirus IgM signal cannot be ignored, and nonstandard pathogens or autoimmune processes must be actively ruled in/out.
- Mixed etiologies are plausible; the outbreak could involve both toxin exposure and infection/inflammation.
- Autoimmune/inflammatory etiologies are plausible but currently less strongly supported; they should be considered if autoimmune markers are positive or if imaging evolves in a manner typical of autoimmune limbic encephalitis.

What you should do next (condensed plan)
- Immediately: implement OP exposure controls; expand environmental sampling; begin broad infectious/toxin testing; establish case definitions; set up a central data hub; begin risk communication.
- Within 0–72 hours: obtain baseline biospecimens and environmental samples; perform initial labs (AChE, urinary DAPs); begin expanded imaging and CSF testing where feasible; initiate a case–control design.
- Weeks 2–6: conduct extended case–control studies; perform metagenomic CSF sequencing; broaden toxin panels; perform autoantibody testing; conduct environmental and ecological analyses; refine public health actions as results emerge.
- Throughout: maintain stakeholder engagement, ethical oversight, and transparent risk communication; adjust actions as data evolve.

Key references for actions and interpretation (implicit in plan)
- Use standard toxicology and neurology protocols for suspected OP exposure management (atropine + pralidoxime where indicated; seizure management; airway support).
- Follow best practice for exposure investigations: environmental surveillance, metagenomic pathogen discovery when standard panels fail, and case–control designs in cluster outbreaks.
- Prioritize rapid, low‑cost, high‑yield tests (AChE, urinary DAPs) early, then scale to comprehensive molecular and toxin testing as results dictate.

If you’d like, I can tailor the plan to a specific real‑world setting (local regulations, lab capabilities, hospital capacity) or provide a structured data collection form and a one‑page decision tree to guide action in the field.

In [None]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-sonnet-4-5"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.5-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
# Updated with the latest Open Source model from OpenAI

groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "openai/gpt-oss-120b"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [None]:
!ollama pull llama3.2

In [None]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [11]:
# So where are we?

print(competitors)
print(answers)


['gpt-5-nano']
['Below is a structured, stepwise public‑health analytic framework. It treats the data as compatible with multiple plausible etiologies and emphasizes transparent uncertainty, test‑driven updating, and actions that minimize harm while protecting the population.\n\n1) Four or more plausible hypotheses (distinct, including single causes, mixed causes, and non‑infectious explanations)\n\nH1. Primary organophosphate (OP) poisoning causing neurotoxic encephalopathy\n- Rationale: Chemical plant maintenance release 6 weeks ago; environmental samples show OP compound in household dust (40% of tested homes) and river sediment near the plant; 32 tested patients show lymphocytic CSF pleocytosis with modest protein elevation (could reflect secondary inflammatory response to a toxin); 11/50 tested have low serum cholinesterase; transient improvement after atropine in 3 patients; two neighborhoods downstream—geographic plausibility for exposure.\n- Non-infectious: This could present w

In [12]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gpt-5-nano

Below is a structured, stepwise public‑health analytic framework. It treats the data as compatible with multiple plausible etiologies and emphasizes transparent uncertainty, test‑driven updating, and actions that minimize harm while protecting the population.

1) Four or more plausible hypotheses (distinct, including single causes, mixed causes, and non‑infectious explanations)

H1. Primary organophosphate (OP) poisoning causing neurotoxic encephalopathy
- Rationale: Chemical plant maintenance release 6 weeks ago; environmental samples show OP compound in household dust (40% of tested homes) and river sediment near the plant; 32 tested patients show lymphocytic CSF pleocytosis with modest protein elevation (could reflect secondary inflammatory response to a toxin); 11/50 tested have low serum cholinesterase; transient improvement after atropine in 3 patients; two neighborhoods downstream—geographic plausibility for exposure.
- Non-infectious: This could present 

In [13]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [14]:
print(together)

# Response from competitor 1

Below is a structured, stepwise public‑health analytic framework. It treats the data as compatible with multiple plausible etiologies and emphasizes transparent uncertainty, test‑driven updating, and actions that minimize harm while protecting the population.

1) Four or more plausible hypotheses (distinct, including single causes, mixed causes, and non‑infectious explanations)

H1. Primary organophosphate (OP) poisoning causing neurotoxic encephalopathy
- Rationale: Chemical plant maintenance release 6 weeks ago; environmental samples show OP compound in household dust (40% of tested homes) and river sediment near the plant; 32 tested patients show lymphocytic CSF pleocytosis with modest protein elevation (could reflect secondary inflammatory response to a toxin); 11/50 tested have low serum cholinesterase; transient improvement after atropine in 3 patients; two neighborhoods downstream—geographic plausibility for exposure.
- Non-infectious: This could pr

In [16]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [17]:
print(judge)

You are judging a competition between 1 competitors.
Each model has been given this question:

You're an independent public‑health investigator presented with the following puzzling outbreak — please analyze and respond as requested below: Over the past 3 weeks, 128 residents of a mid‑sized riverside city (ages 4–82) developed acute encephalopathy with fever, headache, confusion and new‑onset seizures; cases cluster in two neighborhoods downstream from a small chemical plant. Timeline: the first cases began 3 weeks ago, approximately 3 weeks after an unreported “maintenance release” at the plant 6 weeks ago. Key clinical/laboratory/imaging findings: CSF in 32 tested patients shows lymphocytic pleocytosis and moderately elevated protein; PCR panel for common neurotropic viruses (HSV‑1/2, VZV, enteroviruses, West Nile, Japanese encephalitis, CMV, EBV) is negative in all; brain MRI in 18 patients shows bilateral medial temporal lobe and thalamic hyperintensities in 12; serum cholinesteras

In [18]:
judge_messages = [{"role": "user", "content": judge}]

In [19]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results": ["1"]}


In [20]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: gpt-5-nano


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>