## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [1]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [2]:
# Always remember to do this!
load_dotenv(override=True)

True

In [3]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_


In [4]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [5]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [6]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


You are advising the mayor of a mid-sized city (population ~500,000) who must allocate a fixed one-year budget to reduce traffic fatalities (current baseline ≈120/year) while balancing equity, economic disruption, public acceptance, and political constraints: propose a prioritized portfolio of up to five interventions (for each, give a concise description, estimated annual reduction in fatalities with a 90% credible interval, cost, and implementation timeline), list the top five assumptions your plan depends on and assign a probability that each assumption is true, identify three specific data-collection actions or pilot experiments that would most reduce key uncertainties and explain how new results would change your plan, describe contingency plans for two plausible failure modes, and analyze three cognitive or distributional biases that could have skewed your recommendations.


In [8]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

## Note - update since the videos

I've updated the model names to use the latest models below, like GPT 5 and Claude Sonnet 4.5. It's worth noting that these models can be quite slow - like 1-2 minutes - but they do a great job! Feel free to switch them for faster models if you'd prefer, like the ones I use in the video.

In [9]:
# The API we know well
# I've updated this with the latest model, but it can take some time because it likes to think!
# Replace the model with gpt-4.1-mini if you'd prefer not to wait 1-2 mins

model_name = "gpt-4.1-mini"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Certainly. Here's a comprehensive advisory plan for the mayor on reducing traffic fatalities within the given constraints.

---

### **Prioritized Portfolio:**

| Rank | Intervention                             | Description                                                  | Est. Annual Fatalities Reduction (90% CI) | Cost (USD millions) | Implementation Timeline |
|-------|----------------------------------------|--------------------------------------------------------------|--------------------------------------------|---------------------|-------------------------|
| 1     | **Targeted Vision Zero Street Redesigns**  | Redesign the highest-risk corridors with protected bike lanes, curb extensions, reduced lane widths, pedestrian islands, and better lighting to calm traffic and enhance pedestrian safety. | 25 (20–30)                               | 15                  | 6–12 months             |
| 2     | **Automated Speed Enforcement**        | Deploy automated speed cameras in urban areas with high pedestrian traffic, integrated with dynamic speed limits and real-time alerts. | 20 (15–25)                               | 3                   | 3–6 months              |
| 3     | **Expanded & Improved Public Transit** | Increase frequency, coverage, and safety features in buses and light rail to shift mode share away from cars, reducing overall exposure and high-risk driving. | 15 (10–20)                               | 20                  | 12 months               |
| 4     | **Safe Routes to School & Traffic Calming near Schools** | Implement crosswalk upgrades, signage, speed humps, and crossing guards near schools to protect children and encourage walking/biking. | 10 (7–13)                                | 5                   | 4–8 months              |
| 5     | **Pedestrian & Cyclist Public Awareness Campaign** | Run a multi-channel campaign emphasizing driver awareness of vulnerable road users, safe crossing behaviors, and helmet use, combined with local enforcement support. | 8 (5–11)                                 | 1                   | 2–4 months              |

**Total estimated reduction:** ~78 fatalities/year, reducing baseline 120 to ~42 (range 55–90)

Budget utilized: ~$44M (assuming total budget ~50M for example).

---

### **Top 5 Assumptions & Probability Estimates:**

| Assumption                                                                                 | Probability of Truth |
|--------------------------------------------------------------------------------------------|---------------------|
| 1. Fatalities are concentrated on certain high-risk corridors amendable to street redesign. | 0.85                |
| 2. Automated speed enforcement will be politically and publicly accepted with appropriate privacy safeguards. | 0.75                |
| 3. Mode shift induced by transit improvements will meaningfully reduce vehicle miles traveled (VMT). | 0.70                |
| 4. Traffic calming near schools will not cause significant traffic diversion creating new safety issues. | 0.80                |
| 5. Public awareness campaigns result in measurable behavior changes within one year.       | 0.60                |

---

### **Key Data Collection & Pilot Experiments:**

1. **High-Resolution Crash & Exposure Data Mapping:**  
   - Deploy sensors and collect detailed crash and near-miss data across the city, especially corridors planned for redesign.  
   - *Why:* Validate the concentration of fatal crashes.  
   - *Effect on plan:* Could refine or re-prioritize street redesign focus areas.

2. **Speed Enforcement Pilot with Public Feedback:**  
   - Implement a limited, 3-month speed camera pilot in one district, tracking speed violations, fatal/injury crashes, and public opinion through surveys.  
   - *Why:* Test acceptance and effectiveness before scaling citywide.  
   - *Effect on plan:* May adjust scale or modify messaging or enforcement approach.

3. **Transit Ridership & Mode Shift Study:**  
   - Partner with transit agencies to monitor changes in travel patterns pre/post service enhancements using smart card and travel surveys.  
   - *Why:* Quantify actual mode shift and potential safety benefits.  
   - *Effect on plan:* Could increase or decrease investment in transit expansions.

---

### **Contingency Plans for Failure Modes:**

1. **Low Public Acceptance or Political Backlash (e.g., against automated enforcement):**  
   - Pivot to enhanced manual enforcement, invest more heavily in street design and community education, and create an oversight committee involving community representatives to address concerns transparently.

2. **Unintended Traffic Diversion & Increased Impacts Elsewhere:**  
   - Employ adaptive traffic management strategies, monitor diversion patterns in near-real-time, and be ready to adjust or remove traffic calming measures in problem areas. Augment with increased enforcement or interventions in affected neighborhoods.

---

### **Cognitive or Distributional Biases That Could Skew Recommendations:**

1. **Optimism Bias:**  
   - Overestimating the implementation speed and impact effectiveness of interventions due to inherent optimism about capacity and public compliance.

2. **Availability Heuristic:**  
   - Relying heavily on recent or high-profile crash incidents which may bias focus toward certain areas or intervention types over equally important but less sensational factors.

3. **Equity Blind Spots (Distributional Bias):**  
   - Potential underestimation of how interventions affect different socioeconomic groups differently, e.g., whether automated enforcement disproportionately impacts low-income neighborhoods or whether transit expansions serve all communities equitably.

---

**Summary:**  
The proposed portfolio balances direct engineering fixes, enforcement, modal shifts, and behavioral interventions within political and budgetary realities. Each assumption and uncertainty is actively monitored with targeted data collection and pilots. Contingency plans ensure adaptability, and awareness of biases guards against one-dimensional planning.

I’m happy to provide further detail on any element or additional options if desired.

In [10]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-sonnet-4-5"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

# Traffic Fatality Reduction Plan: Portfolio Analysis

## Executive Summary
Target: Reduce 120 annual fatalities through evidence-based interventions balancing effectiveness, equity, and feasibility.

---

## PRIORITIZED INTERVENTION PORTFOLIO

### 1. **Automated Speed Enforcement at High-Risk Corridors**
**Description:** Deploy 50 automated speed cameras on the 15 highest-fatality corridors (identified through crash data analysis), focusing on arterials through low-income neighborhoods currently underserved by traffic calming.

- **Estimated annual fatality reduction:** 18-28 lives (90% CI: 12-35)
- **Cost:** $2.5M (equipment, installation, ongoing operations, public education campaign)
- **Timeline:** 6 months to full deployment
- **Evidence base:** Meta-analyses show 20-30% reduction in fatal crashes on treated roads; adjusted downward for implementation challenges and displacement effects

### 2. **Protected Intersection Program for Vulnerable Road Users**
**Description:** Redesign 12 intersections with highest pedestrian/cyclist fatalities, adding protected turn phases, leading pedestrian intervals, and high-visibility crossings.

- **Estimated annual fatality reduction:** 8-14 lives (90% CI: 5-18)
- **Cost:** $4.8M ($400K per intersection)
- **Timeline:** 12 months (engineering + construction)
- **Rationale:** Intersection-related crashes account for ~40% of pedestrian/cyclist fatalities; targeted redesign addresses systemic vulnerability

### 3. **Impaired Driving Saturation Patrols + Rideshare Subsidy**
**Description:** Increase DUI enforcement during high-risk hours (Fri-Sat 10PM-3AM) with 8 additional patrol units, coupled with 50% rideshare subsidies (max $15) during same windows via city app.

- **Estimated annual fatality reduction:** 12-20 lives (90% CI: 8-26)
- **Cost:** $3.2M ($2.4M enforcement, $800K rideshare subsidy)
- **Timeline:** 3 months to full implementation
- **Evidence:** ~30% of city fatalities are alcohol-related; combined deterrence + alternative transportation addresses the problem from both ends

### 4. **Neighborhood Slow Zones (Vision Zero Districts)**
**Description:** Implement comprehensive traffic calming in 8 residential neighborhoods (total 35 sq miles) with high pedestrian activity: 20mph limits, speed humps, curb extensions, reduced crossing distances. Prioritize historically underinvested communities.

- **Estimated annual fatality reduction:** 6-12 lives (90% CI: 3-16)
- **Cost:** $5.5M
- **Timeline:** 9 months (quick-build methods)
- **Equity consideration:** Site selection weighted 70% by crash density, 30% by neighborhood disadvantage index

### 5. **Rumble Strip + Shoulder Widening on Rural Arterials**
**Description:** Install centerline and edge-line rumble strips on 40 miles of high-speed rural roads entering city; widen shoulders to 6 feet where feasible.

- **Estimated annual fatality reduction:** 5-9 lives (90% CI: 3-12)
- **Cost:** $1.8M
- **Timeline:** 4 months
- **Evidence:** Head-on and run-off-road crashes represent 15% of fatalities; FHWA data shows 40-50% reduction in these crash types on treated roads

### **Portfolio Total**
- **Expected fatality reduction:** 49-83 lives saved (41-69% reduction from baseline)
- **Total budget:** $17.8M
- **Weighted average timeline:** 7 months to portfolio-wide impact

---

## CRITICAL ASSUMPTIONS & PROBABILITIES

### 1. **Crash data accurately represents risk geography** (Probability: 75%)
Assumes existing 3-year crash database captures persistent danger zones rather than random variation. Risk: Recent development patterns or unreported crashes in marginalized communities could mean we're treating wrong locations.

### 2. **Speed camera enforcement survives legal/political challenge** (Probability: 65%)
Automated enforcement faces recurring legislative threats and court challenges. Program effectiveness depends on sustaine

In [11]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.5-flash-lite"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Mayor, reducing traffic fatalities is a critical objective, and this budget presents an opportunity to make a significant impact. Given the complexity of balancing various factors, I've developed a prioritized portfolio of interventions designed to maximize life-saving potential while considering equity, economic impact, public acceptance, and political realities.

## Prioritized Portfolio of Interventions to Reduce Traffic Fatalities

Here are up to five interventions, prioritized by their estimated impact, cost-effectiveness, and feasibility:

---

### 1. Enhanced Intersection Safety (Traffic Signal Retiming & Protected Left Turns)

*   **Description:** This involves optimizing traffic signal timing at high-crash intersections, prioritizing pedestrian and cyclist crossing phases, and installing protected left-turn signals at intersections with a history of angle crashes. This aims to reduce conflicts and improve visibility.
*   **Estimated Annual Reduction in Fatalities (90% CI):** 15-25 (median 20)
*   **Cost:** $5 million (includes software, hardware upgrades, and labor for analysis and implementation across 50-75 intersections)
*   **Implementation Timeline:** 6-12 months (design, procurement, phased installation)

---

### 2. Speed Management Enforcement & Education (Targeted Campaigns & Technology)

*   **Description:** Implement a multi-pronged approach including increased, visible enforcement in high-speed zones, particularly near schools and residential areas. This will be coupled with targeted public education campaigns highlighting the dangers of speeding and the importance of adhering to speed limits. Consider piloting speed cameras in persistently problematic areas, subject to public and political acceptance.
*   **Estimated Annual Reduction in Fatalities (90% CI):** 10-20 (median 15)
*   **Cost:** $3 million (includes overtime for enforcement, campaign materials, data analysis, and initial pilot for speed cameras)
*   **Implementation Timeline:** 3-9 months (campaign launch, enforcement shifts, pilot setup)

---

### 3. Pedestrian and Cyclist Infrastructure Upgrades (Targeted Corridors)

*   **Description:** Invest in proven safety countermeasures along key pedestrian and cyclist corridors identified as high-risk. This includes: adding dedicated bike lanes, improving sidewalk continuity and width, installing high-visibility crosswalks with improved signage and lighting, and implementing pedestrian refuge islands where appropriate. Focus on areas with high concentrations of vulnerable road users.
*   **Estimated Annual Reduction in Fatalities (90% CI):** 8-15 (median 12)
*   **Cost:** $7 million (construction costs for prioritized corridors, material procurement)
*   **Implementation Timeline:** 9-18 months (design, community engagement, phased construction)

---

### 4. Vision Zero Task Force & Data Integration

*   **Description:** Establish a dedicated, multi-departmental Vision Zero Task Force (including transportation, police, public health, and planning departments) to coordinate efforts, analyze crash data more effectively, and develop a comprehensive, data-driven strategy. This intervention focuses on improving the underlying systems and decision-making processes.
*   **Estimated Annual Reduction in Fatalities (90% CI):** 5-10 (median 7) - *This reduction is a result of improved strategy and data utilization, which can enhance the effectiveness of other interventions.*
*   **Cost:** $1 million (staff time, data analysis software, training, public engagement)
*   **Implementation Timeline:** 3-6 months (task force formation, initial data review, strategy development)

---

### 5. Enhanced Driver Education & Behavior Change Programs (Targeted Outreach)

*   **Description:** Develop and pilot targeted outreach programs aimed at influencing risky driving behaviors like impaired driving, distracted driving, and aggressive driving. This could include partnerships with driving schools, community organizations, and employers, utilizing evidence-based behavioral change strategies.
*   **Estimated Annual Reduction in Fatalities (90% CI):** 3-7 (median 5)
*   **Cost:** $2 million (program development, material creation, outreach staff, pilot implementation)
*   **Implementation Timeline:** 6-12 months (program design, partnership building, initial rollout)

---

**Total Estimated Annual Reduction in Fatalities (Median):** 20 + 15 + 12 + 7 + 5 = **59**
**Total Cost:** $5M + $3M + $7M + $1M + $2M = **$18 million**

**Note:** The current baseline is approximately 120 fatalities/year. This portfolio targets a median reduction of 59, representing a nearly 50% reduction. The credible intervals indicate the range of potential impact.

## Top Five Assumptions and Their Probabilities

Our plan relies on several key assumptions for its success. Here are the top five and their estimated probabilities:

1.  **Assumption:** The proposed infrastructure upgrades and signal retiming will be implemented with high fidelity and will effectively address the identified safety issues at those locations.
    *   **Probability:** 85%
    *   **Reasoning:** Proven engineering solutions are being applied to well-understood crash types. The primary risk is imperfect implementation or unforeseen site-specific issues.

2.  **Assumption:** Increased visible enforcement and public education campaigns will lead to a measurable reduction in speeding behaviors.
    *   **Probability:** 75%
    *   **Reasoning:** While evidence supports the effectiveness of these tactics, achieving sustained behavioral change can be challenging and depends on public receptiveness and consistent application.

3.  **Assumption:** The Vision Zero Task Force will be able to effectively integrate data from various departments and translate it into actionable strategies that improve the effectiveness of other interventions.
    *   **Probability:** 80%
    *   **Reasoning:** The success hinges on inter-departmental cooperation and the ability to overcome data silos. Bureaucratic inertia or lack of political will could hinder this.

4.  **Assumption:** Public and political acceptance for interventions, particularly those involving enforcement technology (like speed cameras), will be sufficient for successful implementation, or that pilot programs can demonstrate their value.
    *   **Probability:** 60%
    *   **Reasoning:** Public perception of surveillance and enforcement can be a significant barrier. Political leaders may be hesitant to adopt controversial measures, even if effective.

5.  **Assumption:** The cost estimates for each intervention are reasonably accurate and will not significantly exceed budget constraints.
    *   **Probability:** 70%
    *   **Reasoning:** Construction projects and technology implementations are prone to cost overruns due to unforeseen conditions, material price fluctuations, or scope creep.

## Data-Collection Actions or Pilot Experiments to Reduce Key Uncertainties

To refine our understanding and potentially pivot our strategy, these three actions are crucial:

1.  **Pilot High-Visibility Crosswalks & Pedestrian Islands in a High-Fatality Corridor:**
    *   **Action:** Select a half-mile corridor with a statistically significant number of pedestrian-involved fatalities and serious injuries. Implement high-visibility crosswalk treatments (e.g., rectangular rapid flashing beacons, textured pavement) and, where feasible, a pedestrian refuge island. Collect data on pedestrian crossing behavior, vehicle speeds, and near-miss incidents *before* and *after* implementation for 6-12 months.
    *   **How Results Would Change the Plan:**
        *   **If Highly Successful:** This would strongly support an accelerated and expanded rollout of similar pedestrian infrastructure across other high-risk corridors, potentially reallocating funds from less proven interventions.
        *   **If Moderately Successful:** It would inform the design and placement of future infrastructure, identifying specific features that are most effective and those that may need modification. We might also investigate underlying behavioral factors more deeply.
        *   **If Unsuccessful or Ineffective:** We would need to investigate alternative explanations. This could prompt a shift in focus towards enforcement or driver behavior modification in that specific corridor, or a deeper dive into systemic issues (e.g., inadequate driver education, pervasive cultural norms around pedestrian safety).

2.  **Controlled Speed Camera Pilot Program in One "Problematic" Zone:**
    *   **Action:** Identify a small, specific zone (e.g., a school zone with persistent speeding violations, or a highway segment with a high crash rate attributed to speeding) where speed cameras have been legally permitted. Implement a pilot program for 6 months, focusing on robust data collection of speed compliance, citations issued, and crashes. Conduct a community engagement campaign to explain the pilot's purpose and collect feedback.
    *   **How Results Would Change the Plan:**
        *   **If Highly Successful (Significant Speed Reduction & Citizen Support):** This would build a strong case for broader deployment of speed cameras, potentially shifting more budget towards this effective enforcement tool and away from less impactful measures.
        *   **If Mixed Results (Some Speed Reduction but Significant Public Opposition):** This would necessitate careful consideration of public acceptance. We might recommend a phased approach with extensive public consultation or explore alternative, less controversial speed enforcement methods.
        *   **If Largely Ineffective or Met with Strong Opposition:** We would confirm our initial reservations and redirect resources to other interventions, focusing on more traditional enforcement or education-based speed reduction strategies, and potentially re-evaluating the underlying reasons for lack of compliance.

3.  **Driver Behavior Study Focusing on Distracted and Impaired Driving:**
    *   **Action:** Conduct a targeted observational study (e.g., discreet roadside observation with trained personnel) or a survey-based study combined with anonymized telematics data (if available from a partner organization) to assess the prevalence of distracted (e.g., phone use) and impaired driving behaviors in key urban corridors and during high-risk times. This could be integrated with a small-scale pilot of a targeted behavior-change campaign (e.g., a community pledge initiative or targeted media).
    *   **How Results Would Change the Plan:**
        *   **If High Prevalence of Distracted/Impaired Driving is Confirmed:** This would strongly advocate for increasing the budget for Intervention #5 (Driver Education & Behavior Change Programs) and potentially reallocating funds from interventions with less direct impact on these behaviors. It would also inform the *content* and *targeting* of those campaigns.
        *   **If Prevalence is Lower Than Expected:** We would re-evaluate the emphasis on these specific behaviors and consider if other contributing factors to fatalities are being underestimated, potentially shifting resources to infrastructure or other enforcement areas.
        *   **If Specific Behaviors are Dominant:** The study's findings would allow for highly tailored interventions, ensuring that our outreach and education efforts are precisely targeted at the most dangerous behaviors.

## Contingency Plans for Two Plausible Failure Modes

1.  **Failure Mode: Public Backlash Against Speed Cameras (affecting Intervention #2 & potentially #4)**
    *   **Description:** Even with a pilot, strong and organized public opposition or media criticism arises, making political leaders hesitant to proceed with broader speed camera deployment.
    *   **Contingency Plan:**
        *   **Phase 1 (Immediate Mitigation):** Halt any planned expansion beyond the pilot. Focus on bolstering public education about the *purpose* and *effectiveness* of speed cameras, emphasizing data-driven decision-making. Increase engagement with community groups to address concerns directly.
        *   **Phase 2 (Resource Reallocation):** If political will collapses, reallocate remaining funds earmarked for speed cameras to *enhanced traditional enforcement* (more visible patrols, targeted operations in high-speed areas) and *proven infrastructure upgrades* that have broader public support. We would also increase the budget for Intervention #4 (Vision Zero Task Force) to provide more robust data analysis and evidence to support future policy decisions, and to focus on areas where community consensus is stronger.

2.  **Failure Mode: Significant Delays and Cost Overruns in Pedestrian Infrastructure Projects (affecting Intervention #3)**
    *   **Description:** Construction delays, unexpected site conditions, or inflation lead to substantial cost increases and project timelines stretching well beyond the one-year budget, making the projected fatality reduction unachievable within the timeframe.
    *   **Contingency Plan:**
        *   **Phase 1 (Project Prioritization & Scope Adjustment):** Immediately review the scope of all planned pedestrian infrastructure projects. Prioritize the most critical segments or those with the highest potential for impact that are least likely to experience significant delays. Consider breaking down larger projects into smaller, more manageable phases that can be completed within the year.
        *   **Phase 2 (Resource Reallocation):** Reallocate any freed-up funds (from delayed or scaled-back infrastructure projects) to interventions that can be implemented more quickly and predictably within the year. This could include bolstering Intervention #1 (Enhanced Intersection Safety) with more aggressive signal retiming and physical improvements at a larger number of intersections, or increasing the budget for Intervention #2 (Speed Management Enforcement & Education) with more sustained enforcement efforts and expanded public awareness campaigns. The Vision Zero Task Force (Intervention #4) would be critical in re-evaluating priorities and quickly identifying alternative high-impact, rapid-deployment solutions.

## Analysis of Cognitive or Distributional Biases

Several cognitive and distributional biases could have skewed my recommendations:

1.  **Availability Heuristic (Availability Bias):**
    *   **Description:** My recommendations might be overemphasizing interventions that are easier to visualize or that have recently received significant media attention (e.g., infrastructure projects, visible enforcement), while potentially underemphasizing less visible but equally effective interventions (e.g., data system improvements, subtle policy changes, long-term education programs).
    *   **How it Could Skew Recommendations:** If highly visible crashes or well-publicized infrastructure projects are more readily available in my "memory" or in public discourse, I might allocate more resources to them without a rigorous comparison of their true cost-effectiveness against less salient interventions. For example, a highly visible bike lane project might be favored over the less glamorous but potentially more impactful work of improving police and transportation data sharing.

2.  **Confirmation Bias:**
    *   **Description:** I might have a predisposition towards certain types of interventions based on past successes or a belief system about what "works." This can lead me to seek out or interpret information that confirms these existing beliefs, while downplaying or ignoring evidence that contradicts them.
    *   **How it Could Skew Recommendations:** If I strongly believe that infrastructure is the "silver bullet," I might disproportionately recommend infrastructure projects, even if data suggests that behavioral interventions or enforcement are more cost-effective for certain types of fatalities. I might overemphasize studies that support infrastructure and underemphasize those that highlight the impact of driver behavior.

3.  **Distributional Bias (e.g., Geographic or Socioeconomic Disparities):**
    *   **Description:** Recommendations could inadvertently benefit or disadvantage certain geographic areas or socioeconomic groups within the city. This could be due to data limitations that don't accurately represent the needs of all communities, or a focus on interventions that are more easily implemented in certain areas.
    *   **How it Could Skew Recommendations:** If our crash data is primarily collected in more affluent or easily accessible areas, we might overlook critical safety needs in underserved or lower-income neighborhoods. For instance, if we recommend speed cameras in visible, well-trafficked areas with good public perception, we might neglect the need for basic pedestrian infrastructure in a disadvantaged neighborhood with a high concentration of pedestrian fatalities but less political capital. Similarly, if interventions are expensive to implement, they might be concentrated in areas with greater capital improvement budgets, leaving other areas vulnerable. My recommendations attempted to address equity by focusing on "high-risk corridors" and "vulnerable road users," but a deeper dive into the specific demographics and geographic distribution of fatalities would be crucial to truly mitigate this bias.

Mayor, this portfolio represents a strategic starting point. Continuous monitoring, data analysis, and a willingness to adapt based on real-world outcomes will be essential for maximizing our success in saving lives and creating a safer city for all residents.

In [12]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Of course. Here is a comprehensive advisory report for the Mayor, structured to address all requested components.

***

### **Memorandum**

**To:** The Mayor
**From:** Office of Strategic Policy & Safety
**Date:** October 26, 2023
**Subject:** Prioritized Portfolio for Reducing Traffic Fatalities

This document outlines a one-year, evidence-based strategy to significantly reduce traffic fatalities from the current baseline of approximately 120 per year. The proposed portfolio balances immediate impact with long-term systemic change, emphasizing equity, cost-effectiveness, and public acceptance.

---

### **1. Prioritized Portfolio of Interventions**

The following five interventions are prioritized based on the Safe System approach, targeting the most common and severe crash types in our city (likely right-turn and crosswalk conflicts, high-speed arterial roads, and intersections).

| # | Intervention | Description | Est. Annual Fatalities Reduced (90% CI) | Cost (1 Year) | Timeline |
|---|--------------|-------------|------------------------------------------|----------------|-----------|
| 1 | **Protected Intersection & Crosswalk Retrofits** | Convert 20 high-risk intersections (using crash & equity data) with concrete curb extensions, protected left-turn phases, leading pedestrian intervals (LPI), and high-visibility crosswalks. | **8 - 15** <br/>*(Based on 40-60% reduction at treated sites)* | $4 - $6 Million | 6-9 months (Design & Construction) |
| 2 | **Automated Speed Enforcement (ASE) on Arterials** | Install 50 fixed and 5 mobile speed cameras on the city's most dangerous high-speed arterial roads, based on crash data. Focus on corridors near schools, senior centers, and in historically underserved communities. | **6 - 12** <br/>*(Based on 15-25% reduction on corridors)* | $3 Million <br/>*(Includes setup, operation, & public awareness)* | 4-6 months (Procurement & Setup) |
| 3 | **Systematic Daylighting at 200 Intersections** | A low-cost, high-impact program to remove parking and other visual obstructions within 20 feet of every crosswalk at 200 high-pedestrian-volume intersections. Improves visibility for drivers and pedestrians. | **3 - 7** <br/>*(Based on 10-20% reduction in pedestrian incidents)* | $500,000 <br/>*(Primarily for signage, paint, and curb modifications)* | 3-6 months (Rapid Implementation) |
| 4 | **Targeted Micromobility & Pedestrian Infrastructure** | Build 25 miles of protected bike lanes and widened sidewalks on key corridors connecting transit hubs, commercial districts, and low-income neighborhoods. Separates vulnerable users from high-speed traffic. | **2 - 5** <br/>*(Based on reducing conflicts & exposure)* | $5 - $7 Million | 9-12 months (Phased Construction) |
| 5 | **Focused DUI & Distracted Driving High-Visibility Enforcement** | Dedicate 10,000 additional officer-hours to targeted patrols in nightlife districts and on high-risk corridors during peak incident hours (e.g., weekend nights), coupled with a public awareness campaign. | **2 - 4** <br/>*(Based on deterrence effect)* | $1.5 Million <br/>*(Overtime & campaign costs)* | Immediate (0-3 months) |

**Total Estimated Fatalities Reduced:** **21 - 43**
**Total Estimated Budget:** **~$15 - $19 Million**

---

### **2. Key Assumptions & Probabilities**

Our plan's success depends on the following critical assumptions:

1.  **Traffic volumes and speeds remain relatively stable.** A significant increase in average speed or a return to pre-pandemic traffic patterns could alter risk profiles.
    *   **Probability: 80%**

2.  **The public and city council will accept the reallocation of street space (e.g., parking removal, lane reductions) and the use of automated enforcement.**
    *   **Probability: 65%** (High risk of political pushback)

3.  **Our historical crash data accurately predicts future high-risk locations.** (i.e., the problem isn't about to radically shift).
    *   **Probability: 90%**

4.  **Law enforcement and city departments have the capacity to implement these interventions effectively and on schedule.**
    *   **Probability: 75%**

5.  **The selected engineering countermeasures will perform as effectively in our specific urban context as they have in peer-reviewed studies and other cities.**
    *   **Probability: 85%**

---

### **3. Data Collection & Pilot Experiments to Reduce Uncertainty**

1.  **Pilot "Daylighting" with Temporary Materials:** Before permanent concrete work, use flexible posts and paint to implement daylighting at 10 diverse intersections.
    *   **How it Reduces Uncertainty:** Measures actual changes in driver yielding behavior, pedestrian crossing confidence, and parking turnover impact.
    *   **Plan Change:** If ineffective, reallocate that portion of the budget to more ASE or intersection signal upgrades.

2.  **Conduct a Pre-Implementation Public Sentiment Survey on ASE:** Before procuring cameras, run a statistically significant survey gauging acceptance, identifying key concerns (e.g., equity, revenue generation), and testing messaging.
    *   **How it Reduces Uncertainty:** Quantifies the political risk (Assumption #2).
    *   **Plan Change:** If sentiment is highly negative, we could pivot to emphasizing the mobile cameras in specific high-complaint areas and reinvest savings into physical traffic calming.

3.  **Deploy Temporary "Tactical" Bike Lanes:** Use low-cost materials (paint, planters, plastic bollards) to create a 1-mile protected bike lane prototype for 3 months.
    *   **How it Reduces Uncertainty:** Provides real-world data on usage, conflict points, and traffic flow impacts before committing to millions in permanent infrastructure.
    *   **Plan Change:** Data on usage (especially from lower-income neighborhoods) and crash patterns would refine the final 25-mile network plan, ensuring it serves the most needed routes.

---

### **4. Contingency Plans for Failure Modes**

**Failure Mode A: Significant Public/Political Backlash Against Speed Cameras**
*   **Contingency Plan:** Immediately pause the rollout of fixed cameras. Double down on the mobile camera program, explicitly framing it as a safety tool, not a revenue source. Allocate the re-purposed funds to a rapid, city-wide "Neighborhood Slow Streets" program, using low-cost humps and chicanes to physically enforce speed limits, which is often more publicly palatable.

**Failure Mode B: Construction Delays for Major Infrastructure Projects**
*   **Contingency Plan:** To ensure fatality reductions are not delayed, pre-approve a "Quick Build" package of interim measures. If an intersection retrofit is delayed, it automatically triggers the installation of temporary rubber speed humps, enhanced signage, and radar speed feedback signs to provide immediate, if partial, risk reduction.

---

### **5. Analysis of Potential Biases**

1.  **Salience Bias (Availability Heuristic):** We may over-weight interventions that address recent, high-profile crashes while under-weighting solutions for less dramatic but more numerous types of fatalities (e.g., single-vehicle run-off-road incidents). Our reliance on robust historical data is designed to counter this.

2.  **Status Quo Bias:** There is a natural tendency to prioritize incremental changes that disrupt existing traffic patterns and parking availability the least. Our recommendations for daylighting and lane reallocation directly challenge this bias, explicitly prioritizing human safety over vehicular convenience and storage.

3.  **Distributional/Equity Blindness:** Without conscious effort, we might propose interventions that disproportionately benefit wealthier, more politically connected neighborhoods with higher "squeaky wheel" capacity. Our plan explicitly uses an equity index (combining income, race, crash density, and transit dependency) to select project sites, aiming to counteract this systemic bias and direct resources to the most vulnerable communities.

***

This portfolio provides a robust, defensible, and adaptive strategy to save lives. I am prepared to discuss any element in further detail.

In [13]:
# Updated with the latest Open Source model from OpenAI

groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "openai/gpt-oss-120b"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


**To: Mayor [Name]**  
**From: Traffic‑Safety Advisory Team**  
**Subject: One‑Year, Fixed‑Budget “Vision Zero” Portfolio – Balancing Fatalities, Equity, Disruption, Acceptance & Politics**  
**Date: 28 Oct 2025**  

---

## 1. Prioritized Portfolio (max 5 interventions)

| # | Intervention (short name) | What it does (≤2 sentences) | Estimated annual **fatality reduction**<br>90 % credible interval* | **Cost** (FY 2025‑26 dollars) | **Implementation timeline** |
|---|---------------------------|-----------------------------|---------------------------------------------------------------|------------------------------|------------------------------|
| 1 | **High‑Risk Corridor Redesign (HCR)** | Complete engineering upgrades on the 8 most crash‑prone arterials (protected bike lanes, median barriers, raised cross‑walks, traffic‑calming). | **22** (12 – 35) | $28 M (design $8 M + construction $20 M) | 0‑3 mo: GIS‑based hotspot selection → 4‑12 mo: construction (phased) |
| 2 | **AI‑Powered Red‑Light & Speed Enforcement (AI‑Enf)** | Deploy AI‑camera pods at 60 high‑volume intersections to issue automated citations for red‑light running and speeding > 15 mph over limit. | **16** (8 – 26) | $12 M (hardware $7 M + data‑ops $5 M) | 0‑2 mo: procurement → 3‑5 mo: pilot (20 sites) → 6‑12 mo: city‑wide rollout |
| 3 | **Graduated Congestion Pricing & Transit Incentive (GCPT)** | Introduce a $2/hour dynamic toll on downtown core during peak periods; revenue funds 30 % fare‑subsidies for low‑income riders and adds 5 % more bus service. | **13** (5 – 22) | $9 M (pricing system $4 M + transit subsidy $5 M) | 0‑3 mo: stakeholder & legal prep → 4‑6 mo: system test → 7‑12 mo: full operation |
| 4 | **Targeted High‑Risk Driver Program (THRDP)** | Mandatory graduated‑licensing + low‑cost ignition‑interlock for repeat DUI offenders; plus a city‑run “Safe‑Driver” mobile app with real‑time feedback for drivers < 25 yr. | **9** (3 – 16) | $4 M (interlock contracts $2 M + app dev/marketing $2 M) | 0‑2 mo: partnership agreements → 3‑5 mo: pilot (200 drivers) → 6‑12 mo: scale‑up |
| 5 | **Equity‑Focused Street‑Level Safety Audits (ELSA)** | Deploy a mixed‑discipline audit team (engineers, community organizers, public‑health analysts) to walk‑through 150 low‑income neighborhoods, produce “quick‑fix” action lists (e.g., signage, pothole repair, street‑lighting). | **5** (1 – 11) | $3 M (staff $2 M + implementation of quick‑fixes $1 M) | 0‑1 mo: team hire → 2‑6 mo: audits & fixes → 7‑12 mo: monitor outcomes |

\*Credible intervals are derived from meta‑analyses of comparable U.S. programs (see Appendix A) and calibrated to the city’s 2023 crash database (120 deaths/yr, 2 % annual VMT growth).  

**Total cost:** **$56 M** – fits the fixed one‑year budget (assume the council has approved a $60 M allocation, leaving a $4 M contingency).  

**Why this order?**  
1. **Fatality impact per dollar** – HCR and AI‑Enf give the highest expected lives saved for the spend.  
2. **Equity** – GCPT, THRDP and especially ELSA explicitly target disadvantaged groups and neighborhoods that bear a disproportionate share of crashes.  
3. **Political palatability** – All interventions have precedents in comparable “mid‑size” metros and can be framed as “smart, data‑driven” rather than “punitive.”  

---

## 2. Top‑5 Assumptions & Probabilities

| # | Assumption | Why it matters | **Probability true** (subjective, 0‑1) |
|---|------------|----------------|----------------------------------------|
| A1 | **Crash‑risk reduction observed elsewhere transfers to our city** (e.g., AI‑Enf yields 30 % fewer red‑light deaths). | Underpins all quantitative fatality estimates. | 0.78 |
| A2 | **Public acceptance ≥ 65 % for AI‑camera enforcement** (measured via pre‑implementation survey). | Determines whether enforcement can be sustained without costly legal challenges or “opt‑out” petitions. | 0.71 |
| A3 | **Revenue from congestion pricing is reliably collected & earmarked for transit subsidies**. | Guarantees that the equity component (fare discounts) materializes and avoids budget shortfalls. | 0.66 |
| A4 | **Construction on high‑risk corridors can be completed within 12 months without major utility delays**. | Affects the timeline for HCR and the early‑year fatality reduction. | 0.62 |
| A5 | **Behavioral response to ignition‑interlock & app feedback leads to ≥ 15 % reduction in repeat DUI & high‑speed events**. | Drives the THRDP impact estimate. | 0.57 |

*If any assumption falls below ~0.5, we would need to re‑size or re‑prioritize the affected intervention.*

---

## 3. Data‑Collection / Pilot Experiments that Most Reduce Uncertainty

| # | Action (pilot/experiment) | Uncertainty addressed | How results would reshape the portfolio |
|---|----------------------------|----------------------|------------------------------------------|
| D1 | **12‑month AI‑Enf pilot at 20 intersections** (randomly selected, mixed land‑use). Collect crash, citation, and traffic‑flow data. | A1 (effect size) & A2 (public acceptance) | *If observed fatality reduction ≥ 25 % and acceptance > 70 % → accelerate city‑wide rollout, possibly re‑allocate $2 M from ELSA to expand AI‑Enf to 80 sites.*<br>*If reduction < 10 % or backlash > 30 % → down‑scale AI‑Enf, shift funds to HCR or ELSA.* |
| D2 | **Real‑time VMT & speed monitoring in the proposed congestion‑pricing zone** (install 5 portable sensors + mobile‑app telemetry for 3 months). | A3 (revenue reliability) & A1 (induced demand). | *If VMT drops ≥ 8 % and revenue meets projected $1.5 M/yr → keep GCPT at full scale.*<br>*If VMT drops < 2 % (price elasticity low) → reduce toll rate, re‑allocate $1 M to additional transit service rather than pricing.* |
| D3 | **Community‑led ELSA quick‑fix trial in two low‑income districts** (install better lighting, reflective signage, pothole patching). Track night‑time crash rates for 6 months. | A5 (behavioral response of vulnerable drivers) & equity impact. | *If crash reduction ≥ 12 % in trial zones → scale ELSA city‑wide, possibly adding a modest $1 M to expand quick‑fixes.*<br>*If no measurable change → deprioritize ELSA, redirect funds to THRDP or HCR.* |

---

## 4. Contingency Plans for Two Plausible Failure Modes

| Failure Mode | Likelihood (subjective) | Immediate Impact | Contingency Plan (trigger & actions) |
|--------------|------------------------|------------------|--------------------------------------|
| **F1 – AI‑Enf legal/legislative pushback** (e.g., privacy lawsuit, city council amendment) | 0.25 | Loss of enforcement capability; $12 M sunk in hardware. | **Trigger:** Formal injunction or council vote to suspend cameras.<br>**Actions:** 1) Switch to **manual red‑light & speed enforcement** using existing police resources (budget $2 M for overtime). 2) Deploy **temporary signage** and increase **speed‑limit signage** on the same 60 sites. 3) Re‑budget $5 M of the AI‑Enf spend toward accelerated **HCR** (add two more corridors). |
| **F2 – Major construction delays on HCR** (utility strikes, weather) extending beyond 12 months | 0.18 | Fatality reduction from HCR realized later; possible cost overrun. | **Trigger:** 30‑day delay notice from the contractor.<br>**Actions:** 1) **Re‑phase** HCR: complete the three highest‑risk segments first (expected to deliver ~12 deaths saved). 2) Activate **ELSA** (already funded) to provide interim safety improvements (signage, flashing beacons) on unfinished sections. 3) Shift $4 M from HCR to **additional AI‑Enf pods** (quick‑install) to compensate for delayed deaths averted. |

Both contingencies preserve the overall **budget balance** and maintain a **minimum expected reduction of 45 %** (≈ 54 lives) even under worst‑case combined failures.

---

## 5. Cognitive & Distributional Biases that Could Skew Recommendations

| Bias | How it could have entered the analysis | Mitigation steps taken |
|------|----------------------------------------|------------------------|
| **Availability Bias** (over‑weighting recent high‑profile crashes) | Early briefings highlighted a fatal pedestrian crash on Main St., nudging us toward “high‑visibility” interventions (e.g., more crosswalks) at the expense of data‑driven hotspot work. | Conducted a **city‑wide crash‑frequency analysis** (last 5 years) to identify true high‑risk corridors; kept HCR anchored to statistically verified hotspots. |
| **Status‑Quo Bias** (preference for “traditional” enforcement) | Some senior staff favored expanding police traffic stops rather than automated enforcement, fearing technology backlash. | Ran a **comparative cost‑effectiveness model** (including political cost) and deliberately **included a pilot** (D1) to test public acceptance before scaling. |
| **Equity Blindness** (ignoring distributional effects) | Initial cost‑per‑life‑saved calculations would have ranked GCPT last because its per‑life impact is modest. | Applied a **weighted scoring system** (50 % lives saved, 30 % equity impact, 20 % disruption) and explicitly inserted **ELSA** and **fare‑subsidy** components to correct the imbalance. |

---

### Bottom Line

- **Projected lives saved:** **≈ 65** per year (90 % CI ≈ 38 – 93), a **≈ 54 %** reduction from baseline.
- **Equity gain:** 30 % of total fatalities averted occur in the city’s lowest‑income ZIP codes (currently 45 % of deaths), moving the share to **≈ 20 %**.
- **Economic disruption:** Minimal (≈ 2 % temporary lane reductions during HCR; congestion pricing offsets lost trips with better transit service).
- **Public acceptance:** Targeted communications, transparent revenue earmarking, and early pilots keep support > 65 % in surveys.

We recommend moving forward with the **five‑intervention portfolio as ordered**, while simultaneously launching the three data‑collection pilots (D1‑D3) to confirm assumptions and allow rapid re‑allocation if needed.

*Prepared by the Traffic‑Safety Advisory Team*  
*(Contact: traffic.safety@city.gov)*  

## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [14]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB                         [K
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da: 100% ▕██████████████████▏ 7.7 KB                         [K
pulling a7

In [15]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

**Prioritized Portfolio of Interventions**

1. **Speed Management**
Description: Implement speed management zones (e.g., reduced-speed limits, speed cameras) in high-fatality areas.
Estimated mortalità reduction: 13.4 (-3.8 to 27.6) per annum
Cost: $750,000
Implementation timeline: 6 months

2. **Infrastructure Improvements**
Description: Upgrade road surfaces and lighting, install dedicated bike lanes and improved pedestrian infrastructure.
Estimated mortality reduction: 10.2 (-5.1 to 17.3) per annum
Cost: $1,200,000
Implementation timeline: 12 months

3. **Public Awareness Campaigns**
Description: Implement public awareness campaigns (e.g., school programs, community outreach) to promote safe road behavior.
Estimated mortality reduction: 8.5 (-4.4 to 15.4) per annum
Cost: $500,000
Implementation timeline: 6 months

4. **Enforcement Enhancement**
Description: Increase enforcement efforts through enhanced police presence, dedicated traffic patrols, and improved speed monitoring systems.
Estimated mortalità reduction: 14.9 (-2.1 to 29.9) per annum
Cost: $900,000
Implementation timeline: 12 months

5. **Alternative Modes of Transportation**
Description: Promote alternative modes of transportation (e.g., public transportation, carpooling).
Estimated mortality reduction: 11.8 (-6.3 to 22.5) per annum
Cost: $800,000
Implementation timeline: 12 months

**Top Five Assumptions**

1. **Demand for alternative modes of transportation will increase**: Probability = 0.70 (70%)
    This assumption relies on the city's ability to attract more jobs and businesses that promote alternative modes of transportation.

2. **Existing infrastructure can be safely upgraded**: Probability = 0.80 (80%)
    The success of upgrades depends on the current state of the infrastructure, which may be hidden or not transparent.

3. **Public awareness campaigns will lead to lasting behavioral changes**: Probability = 0.50 (50%)
    Behavioral changes are influenced by individual beliefs and values, making it uncertain how effective public awareness campaigns will be.

4. **Enforcement improvements will reduce speeding without a corresponding shift in willingness to speed**: Probability = 0.65 (65%)
    There is uncertainty about whether enforcement efforts can significantly impact speed-related behavior without affecting other factors.

5. **Community acceptance of new measures leads to broader support**: Probability = 0.75 (75%)
    Public acceptance of measures such as infrastructure improvements and alternative modes of transportation may be lower than expected due to opposition from groups interested in status quo behaviors.

**Pilot Experiments/Data collection Actions**

1. Traffic surveys before and after implementing speed management zones to estimate changes relative to expected traffic volumes.
2. Evaluation of a minimum number of road lanes upgrade projects with rigorous infrastructure specifications and quality control, including monitoring for accidents over several months
3. Testing the effectiveness of public awareness campaigns by conducting comparative evaluations between areas where these programs were implemented vs those without them.

**Contingency Plans**

1. **In case of lackluster community acceptance:** The city might try revisiting its proposed infrastructure plans, with a focus on incorporating broader community engagement and stakeholder consultations upfront.
2. **In case of minimal traffic mortality reduction after implementing new measures**: Continuously assess the interventions that are being implemented on an ongoing basis within two years to be able to adapt to better outcomes for this problem.

**Cognitive/Distributional Biases**

1. **Confirmation bias** and selection of only partially favorable results when evaluating alternative modes of transportation will have reduced overall effectiveness of proposed measures.
2. **Recency effect**: A new evaluation framework could potentially benefit from incorporating current findings into the decision making process to avoid past mistakes from occurring again.
3. **Sunk cost fallacy**, prioritizing infrastructure projects that may be costly but offer little improvement in mortality reduction, and ignoring newer results from alternative modes of transportation or interventions.

By considering these assumptions, pilot experiments/data collection actions, contingency plans, and potential cognitive biases, the plan aims to maximize effectiveness while reducing uncertainty within constraints.

In [16]:
# So where are we?

print(competitors)
print(answers)


['gpt-4.1-mini', 'claude-sonnet-4-5', 'gemini-2.5-flash-lite', 'deepseek-chat', 'openai/gpt-oss-120b', 'llama3.2']
["Certainly. Here's a comprehensive advisory plan for the mayor on reducing traffic fatalities within the given constraints.\n\n---\n\n### **Prioritized Portfolio:**\n\n| Rank | Intervention                             | Description                                                  | Est. Annual Fatalities Reduction (90% CI) | Cost (USD millions) | Implementation Timeline |\n|-------|----------------------------------------|--------------------------------------------------------------|--------------------------------------------|---------------------|-------------------------|\n| 1     | **Targeted Vision Zero Street Redesigns**  | Redesign the highest-risk corridors with protected bike lanes, curb extensions, reduced lane widths, pedestrian islands, and better lighting to calm traffic and enhance pedestrian safety. | 25 (20–30)                               | 15          

In [17]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gpt-4.1-mini

Certainly. Here's a comprehensive advisory plan for the mayor on reducing traffic fatalities within the given constraints.

---

### **Prioritized Portfolio:**

| Rank | Intervention                             | Description                                                  | Est. Annual Fatalities Reduction (90% CI) | Cost (USD millions) | Implementation Timeline |
|-------|----------------------------------------|--------------------------------------------------------------|--------------------------------------------|---------------------|-------------------------|
| 1     | **Targeted Vision Zero Street Redesigns**  | Redesign the highest-risk corridors with protected bike lanes, curb extensions, reduced lane widths, pedestrian islands, and better lighting to calm traffic and enhance pedestrian safety. | 25 (20–30)                               | 15                  | 6–12 months             |
| 2     | **Automated Speed Enforcement**        | Deploy autom

In [18]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [19]:
print(together)

# Response from competitor 1

Certainly. Here's a comprehensive advisory plan for the mayor on reducing traffic fatalities within the given constraints.

---

### **Prioritized Portfolio:**

| Rank | Intervention                             | Description                                                  | Est. Annual Fatalities Reduction (90% CI) | Cost (USD millions) | Implementation Timeline |
|-------|----------------------------------------|--------------------------------------------------------------|--------------------------------------------|---------------------|-------------------------|
| 1     | **Targeted Vision Zero Street Redesigns**  | Redesign the highest-risk corridors with protected bike lanes, curb extensions, reduced lane widths, pedestrian islands, and better lighting to calm traffic and enhance pedestrian safety. | 25 (20–30)                               | 15                  | 6–12 months             |
| 2     | **Automated Speed Enforcement**        | Deploy a

In [20]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst. 
Explain the logic of your ranking in detail.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...], 
"explanation": ["detailed explanation of the ranking logic for each competitor"],
"responses": ["full response from competitor 1", "full response from competitor 2", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [21]:
print(judge)

You are judging a competition between 6 competitors.
Each model has been given this question:

You are advising the mayor of a mid-sized city (population ~500,000) who must allocate a fixed one-year budget to reduce traffic fatalities (current baseline ≈120/year) while balancing equity, economic disruption, public acceptance, and political constraints: propose a prioritized portfolio of up to five interventions (for each, give a concise description, estimated annual reduction in fatalities with a 90% credible interval, cost, and implementation timeline), list the top five assumptions your plan depends on and assign a probability that each assumption is true, identify three specific data-collection actions or pilot experiments that would most reduce key uncertainties and explain how new results would change your plan, describe contingency plans for two plausible failure modes, and analyze three cognitive or distributional biases that could have skewed your recommendations.

Your job is 

In [22]:
judge_messages = [{"role": "user", "content": judge}]

In [23]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4.1-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results":["1","5","4","3","2","6"],"explanation":["Competitor 1 provides a well-structured, clear, and comprehensive response, including a balanced portfolio, detailed assumptions with probabilities, thoughtful data collection plans, contingency strategies, and critical reflection on cognitive biases. The fatality reduction estimates are robust, budget is realistic, and equity/political concerns are addressed with nuance, making this the strongest submission.","Competitor 5 delivers a highly detailed and sophisticated response, explicitly quantifying costs, impacts, timelines, and equity outcomes. The inclusion of advanced technology interventions (AI enforcement) and pricing strategies shows innovative thinking. The assumptions and pilots are well connected to the plan, and contingency plans are thorough, though the narrative is denser and slightly less accessible than Competitor 1. It ranks second due to slight complexity and somewhat higher reliance on technology and political ass

In [24]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
explanation = results_dict["explanation"]
output = ""
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    reason = explanation[index]
    output += f"### Rank {index+1}: {competitor}\n\n"
    output += f"#### Explanation: {reason}\n\n"

display(Markdown(f"## Final Rankings:\n\n{output}"))

## Final Rankings:

### Rank 1: gpt-4.1-mini

#### Explanation: Competitor 1 provides a well-structured, clear, and comprehensive response, including a balanced portfolio, detailed assumptions with probabilities, thoughtful data collection plans, contingency strategies, and critical reflection on cognitive biases. The fatality reduction estimates are robust, budget is realistic, and equity/political concerns are addressed with nuance, making this the strongest submission.

### Rank 2: openai/gpt-oss-120b

#### Explanation: Competitor 5 delivers a highly detailed and sophisticated response, explicitly quantifying costs, impacts, timelines, and equity outcomes. The inclusion of advanced technology interventions (AI enforcement) and pricing strategies shows innovative thinking. The assumptions and pilots are well connected to the plan, and contingency plans are thorough, though the narrative is denser and slightly less accessible than Competitor 1. It ranks second due to slight complexity and somewhat higher reliance on technology and political assumptions.

### Rank 3: deepseek-chat

#### Explanation: Competitor 4 offers a solid, evidence-based portfolio with an explicit focus on equity and public acceptance. The plan is pragmatic with reasonable cost and timelines, detailed assumptions with probabilities, clear pilot interventions, and well-considered contingencies. The cognitive bias analysis is good, though overall fatality reductions are on the lower side compared to top competitors, placing it third.

### Rank 4: gemini-2.5-flash-lite

#### Explanation: Competitor 3 gives a detailed and thoughtful plan that balances engineering, enforcement, behavior change, and institutional improvements. Assumptions and data actions are thorough, and contingencies are well thought out. However, the overall estimated impact is lower, budget is smaller, and the response is longer and more narrative, which slightly hampers immediate clarity.

### Rank 5: claude-sonnet-4-5

#### Explanation: Competitor 2 provides a reasonably detailed and evidence-supported intervention portfolio with specific location targeting and equity focus. However, the response is incomplete/truncated and some critical assumptions and contingency sections are missing or partial, limiting the evaluation of plan robustness. Therefore, it ranks lower despite promising content.

### Rank 6: llama3.2

#### Explanation: Competitor 6 has the least detailed and least structured response, with insufficient clarity on intervention impacts, costs, and timelines. The fatality reductions have wide negative intervals, assumptions and probabilities are vague, and contingencies and biases lack depth. This submission is overall weak in clarity, depth, and justification, placing it last.



In [45]:
# print(competitors[int(ranks[0])-1])
display(Markdown(f"# Top Response from: **{competitors[int(ranks[0])-1]}**\n\n## Response:\n\n{answers[int(ranks[0])-1]}") )

# Top Response from: **gpt-4.1-mini**

## Response:

Certainly. Here's a comprehensive advisory plan for the mayor on reducing traffic fatalities within the given constraints.

---

### **Prioritized Portfolio:**

| Rank | Intervention                             | Description                                                  | Est. Annual Fatalities Reduction (90% CI) | Cost (USD millions) | Implementation Timeline |
|-------|----------------------------------------|--------------------------------------------------------------|--------------------------------------------|---------------------|-------------------------|
| 1     | **Targeted Vision Zero Street Redesigns**  | Redesign the highest-risk corridors with protected bike lanes, curb extensions, reduced lane widths, pedestrian islands, and better lighting to calm traffic and enhance pedestrian safety. | 25 (20–30)                               | 15                  | 6–12 months             |
| 2     | **Automated Speed Enforcement**        | Deploy automated speed cameras in urban areas with high pedestrian traffic, integrated with dynamic speed limits and real-time alerts. | 20 (15–25)                               | 3                   | 3–6 months              |
| 3     | **Expanded & Improved Public Transit** | Increase frequency, coverage, and safety features in buses and light rail to shift mode share away from cars, reducing overall exposure and high-risk driving. | 15 (10–20)                               | 20                  | 12 months               |
| 4     | **Safe Routes to School & Traffic Calming near Schools** | Implement crosswalk upgrades, signage, speed humps, and crossing guards near schools to protect children and encourage walking/biking. | 10 (7–13)                                | 5                   | 4–8 months              |
| 5     | **Pedestrian & Cyclist Public Awareness Campaign** | Run a multi-channel campaign emphasizing driver awareness of vulnerable road users, safe crossing behaviors, and helmet use, combined with local enforcement support. | 8 (5–11)                                 | 1                   | 2–4 months              |

**Total estimated reduction:** ~78 fatalities/year, reducing baseline 120 to ~42 (range 55–90)

Budget utilized: ~$44M (assuming total budget ~50M for example).

---

### **Top 5 Assumptions & Probability Estimates:**

| Assumption                                                                                 | Probability of Truth |
|--------------------------------------------------------------------------------------------|---------------------|
| 1. Fatalities are concentrated on certain high-risk corridors amendable to street redesign. | 0.85                |
| 2. Automated speed enforcement will be politically and publicly accepted with appropriate privacy safeguards. | 0.75                |
| 3. Mode shift induced by transit improvements will meaningfully reduce vehicle miles traveled (VMT). | 0.70                |
| 4. Traffic calming near schools will not cause significant traffic diversion creating new safety issues. | 0.80                |
| 5. Public awareness campaigns result in measurable behavior changes within one year.       | 0.60                |

---

### **Key Data Collection & Pilot Experiments:**

1. **High-Resolution Crash & Exposure Data Mapping:**  
   - Deploy sensors and collect detailed crash and near-miss data across the city, especially corridors planned for redesign.  
   - *Why:* Validate the concentration of fatal crashes.  
   - *Effect on plan:* Could refine or re-prioritize street redesign focus areas.

2. **Speed Enforcement Pilot with Public Feedback:**  
   - Implement a limited, 3-month speed camera pilot in one district, tracking speed violations, fatal/injury crashes, and public opinion through surveys.  
   - *Why:* Test acceptance and effectiveness before scaling citywide.  
   - *Effect on plan:* May adjust scale or modify messaging or enforcement approach.

3. **Transit Ridership & Mode Shift Study:**  
   - Partner with transit agencies to monitor changes in travel patterns pre/post service enhancements using smart card and travel surveys.  
   - *Why:* Quantify actual mode shift and potential safety benefits.  
   - *Effect on plan:* Could increase or decrease investment in transit expansions.

---

### **Contingency Plans for Failure Modes:**

1. **Low Public Acceptance or Political Backlash (e.g., against automated enforcement):**  
   - Pivot to enhanced manual enforcement, invest more heavily in street design and community education, and create an oversight committee involving community representatives to address concerns transparently.

2. **Unintended Traffic Diversion & Increased Impacts Elsewhere:**  
   - Employ adaptive traffic management strategies, monitor diversion patterns in near-real-time, and be ready to adjust or remove traffic calming measures in problem areas. Augment with increased enforcement or interventions in affected neighborhoods.

---

### **Cognitive or Distributional Biases That Could Skew Recommendations:**

1. **Optimism Bias:**  
   - Overestimating the implementation speed and impact effectiveness of interventions due to inherent optimism about capacity and public compliance.

2. **Availability Heuristic:**  
   - Relying heavily on recent or high-profile crash incidents which may bias focus toward certain areas or intervention types over equally important but less sensational factors.

3. **Equity Blind Spots (Distributional Bias):**  
   - Potential underestimation of how interventions affect different socioeconomic groups differently, e.g., whether automated enforcement disproportionately impacts low-income neighborhoods or whether transit expansions serve all communities equitably.

---

**Summary:**  
The proposed portfolio balances direct engineering fixes, enforcement, modal shifts, and behavioral interventions within political and budgetary realities. Each assumption and uncertainty is actively monitored with targeted data collection and pilots. Contingency plans ensure adaptability, and awareness of biases guards against one-dimensional planning.

I’m happy to provide further detail on any element or additional options if desired.

In [47]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.5-flash-lite"
response = gemini.chat.completions.create(model=model_name, messages=judge_messages)

results = response.choices[0].message.content
print(results)


{"results": ["1", "2", "3", "4", "5", "6"], "explanation": ["Competitor 1 provides a well-structured and comprehensive response. The intervention portfolio is logical, with clear descriptions, estimated reductions, costs, and timelines. The assumptions are relevant, and the probability estimates are presented. The data collection actions are insightful, and the contingency plans are plausible. The identified biases are pertinent. The overall argument is strong and well-supported.", "Competitor 2 presents a strong response with a well-defined portfolio and clear justifications. The inclusion of equity considerations within specific interventions is a notable strength. The assumptions are well-articulated, and the proposed data collection methods are practical. The contingency plans are relevant. The identified biases are appropriate. The response is slightly less polished in its presentation compared to Competitor 1 but the substance is very good.", "Competitor 3 offers a detailed and t

In [48]:
results_dict = json.loads(results)
ranks = results_dict["results"]
explanation = results_dict["explanation"]
output = ""
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    reason = explanation[index]
    output += f"### Rank {index+1}: {competitor}\n\n"
    output += f"#### Explanation: {reason}\n\n"

display(Markdown(f"## Final Rankings:\n\n{output}"))

## Final Rankings:

### Rank 1: gpt-4.1-mini

#### Explanation: Competitor 1 provides a well-structured and comprehensive response. The intervention portfolio is logical, with clear descriptions, estimated reductions, costs, and timelines. The assumptions are relevant, and the probability estimates are presented. The data collection actions are insightful, and the contingency plans are plausible. The identified biases are pertinent. The overall argument is strong and well-supported.

### Rank 2: claude-sonnet-4-5

#### Explanation: Competitor 2 presents a strong response with a well-defined portfolio and clear justifications. The inclusion of equity considerations within specific interventions is a notable strength. The assumptions are well-articulated, and the proposed data collection methods are practical. The contingency plans are relevant. The identified biases are appropriate. The response is slightly less polished in its presentation compared to Competitor 1 but the substance is very good.

### Rank 3: gemini-2.5-flash-lite

#### Explanation: Competitor 3 offers a detailed and thoughtfully constructed response. The portfolio is logical, and the inclusion of a Vision Zero Task Force as an intervention is a systemic approach. The assumptions are reasonable, and the data collection actions are well-explained with clear 'how results would change the plan' sections. Contingency plans are practical. The bias analysis is insightful. The primary weakness is that the 'estimated annual reduction' for the task force is somewhat nebulous and difficult to quantify.

### Rank 4: deepseek-chat

#### Explanation: Competitor 4's response is structured and addresses all components of the prompt. The portfolio is clear and uses a table format effectively. The assumptions are pertinent and probabilities are assigned. The data collection/pilot experiments are concrete and well-linked to uncertainty reduction. Contingency plans are plausible. The bias analysis is sound, though the 'Daylighting' intervention's estimated reduction might be understated given its described impact. Overall a solid response, slightly less detailed on the 'why' behind some prioritization than Competitors 1 and 2.

### Rank 5: openai/gpt-oss-120b

#### Explanation: Competitor 5's response is ambitious and attempts to integrate complex interventions like congestion pricing and AI enforcement. The structure is clear, and the attempt to quantify equity impacts and use a weighted scoring system is commendable. However, the 'credible intervals' for fatality reduction are presented as single numbers, not ranges, contradicting the prompt. The assumptions have probabilities, but the 'why it matters' could be more deeply explained. The data collection actions are good, but the contingency plans are less detailed. The bias analysis is present but less developed than top competitors.

### Rank 6: llama3.2

#### Explanation: Competitor 6 provides a very superficial response. The intervention descriptions are minimal, and the estimated fatality reductions lack credible intervals (e.g., '13.4 (-3.8 to 27.6) per annum' is the only one with a range, the others are single points or ranges without context). The costs are significantly lower than expected for the described interventions (e.g., $750k for speed management including cameras). The assumptions are somewhat vague, and the probabilities are presented but lack justification. The pilot experiments are generic. Contingency plans are weak, and the bias analysis is superficial. This response lacks the depth and detail required by the prompt.



<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>