## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [6]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [7]:
# Always remember to do this! to avoid any existing env variable taking priority
load_dotenv(override=True)

True

In [8]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
ollama_api_key = os.getenv('ollama')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set (and this is optional)
Google API Key not set (and this is optional)
DeepSeek API Key not set (and this is optional)
Groq API Key exists and begins gsk_


In [11]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [12]:
messages
[{'role': 'user',
'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [13]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


How would you reconcile the ethical implications of artificial intelligence in decision-making processes with the need for accountability and transparency in those decisions?


In [14]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

## Note - update since the videos

I've updated the model names to use the latest models below, like GPT 5 and Claude Sonnet 4.5. It's worth noting that these models can be quite slow - like 1-2 minutes - but they do a great job! Feel free to switch them for faster models if you'd prefer, like the ones I use in the video.

In [17]:
# The API we know well
# I've updated this with the latest model, but it can take some time because it likes to think!
# Replace the model with gpt-4.1-mini if you'd prefer not to wait 1-2 mins

model_name = "gpt-4.1-mini"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Reconciling the ethical implications of artificial intelligence (AI) in decision-making with the need for accountability and transparency involves a multifaceted approach that addresses both the technical and societal dimensions of AI deployment. Here are several key strategies:

1. **Design for Explainability and Transparency**  
   - **Explainable AI (XAI):** Develop AI systems that provide clear, understandable justifications for their decisions. This helps stakeholders, including users and regulators, comprehend how outcomes are reached.  
   - **Open Algorithms and Documentation:** Maintain comprehensive documentation of AI models, their training data, and decision logic to facilitate external audits and reviews.  

2. **Establish Clear Accountability Frameworks**  
   - **Defined Responsibility:** Identify who is responsible for decisions made or influenced by AI—developers, deployers, or users—and ensure legal and ethical standards are applied accordingly.  
   - **Governance Structures:** Implement oversight bodies or ethics committees that monitor AI operations and enforce compliance with ethical guidelines.  

3. **Incorporate Ethical Principles into AI Development**  
   - **Fairness and Bias Mitigation:** Actively detect and mitigate biases in training data and algorithms to prevent discriminatory outcomes.  
   - **Respect for Privacy:** Ensure AI systems uphold data protection rights and operate within users’ consent boundaries.  

4. **Enhance Human Oversight**  
   - **Human-in-the-Loop:** Keep humans involved in critical decisions, especially those affecting people's rights or well-being, to add judgment beyond what AI can provide.  
   - **Intervention Mechanisms:** Allow users or operators to challenge, override, or appeal AI-driven decisions when necessary.  

5. **Promote Ethical AI Culture and Stakeholder Engagement**  
   - **Continuous Training:** Educate AI developers, users, and policymakers about ethical issues and responsible practices.  
   - **Public Consultation:** Engage diverse stakeholders, including impacted communities, in designing and deploying AI systems.  

6. **Legal and Regulatory Compliance**  
   - **Adherence to Laws:** Align AI decision-making processes with existing legal frameworks such as anti-discrimination laws and consumer protection statutes.  
   - **Proactive Regulation:** Support the development of AI-specific regulations that mandate transparency, auditability, and accountability.  

By integrating these strategies, organizations can balance the benefits of AI in decision-making with ethical imperatives, ensuring that AI systems operate in a manner that is transparent, fair, and accountable. This not only builds trust but also aligns AI deployment with societal values and legal standards.

In [None]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-sonnet-4-5"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.5-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [18]:
# Updated with the latest Open Source model from OpenAI

groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "openai/gpt-oss-120b"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


## Reconciling AI Ethics with Accountability & Transparency  
*(A practical framework for responsible decision‑making)*  

---

### 1. Start with a **Principle‑Based Charter**

| Ethical Principle | What It Means for Decision‑Making | Accountability & Transparency Hook |
|-------------------|-----------------------------------|------------------------------------|
| **Beneficence**   | AI should advance human well‑being and avoid harm. | Require *impact assessments* that quantify benefits vs. risks. |
| **Non‑maleficence**| Prevent discrimination, privacy breaches, or unsafe outcomes. | Mandate *risk‑mitigation logs* and *audit trails* for every high‑impact model. |
| **Justice & Fairness**| Ensure equitable treatment across groups. | Publish *fairness metrics* (e.g., disparate impact, equalized odds). |
| **Autonomy**      | Preserve human agency; avoid opaque “black‑box” control. | Implement *human‑in‑the‑loop (HITL)* or *human‑on‑the‑loop* checkpoints. |
| **Explainability**| Provide understandable rationales for AI outputs. | Require *model cards* and *decision‑explanations* that are understandable to the affected stakeholder. |
| **Responsibility**| Assign clear legal and moral liability. | Create *responsibility matrices* (RACI) linking people, processes, and AI components. |

*Why it works*: By codifying these principles up front, you give every downstream technical and governance decision a clear ethical compass, while simultaneously establishing concrete artefacts (impact assessments, fairness dashboards, responsibility matrices) that make accountability and transparency measurable.

---

### 2. Design‑Time Controls (Embedding Ethics Early)

| Control | How It Helps Ethics | How It Enables Accountability/Transparency |
|---------|-------------------|--------------------------------------------|
| **Model Cards & Datasheets** (Mitchell et al., 2019; Gebru et al., 2021) | Document data provenance, intended use, known biases, performance limits. | Provides a *single source of truth* for auditors, regulators, and end‑users. |
| **Explainable‑by‑Design Architectures** (e.g., attention‑based, rule‑augmented, hybrid symbolic‑neural) | Generates human‑readable rationales at inference time. | Makes it possible to *trace* a decision back to specific features or rules. |
| **Ethical Guardrails / Constraint Layers** | Hard‑coded fairness constraints (e.g., “no decision can increase disparity > 5%”). | The guardrails are *verifiable* through automated compliance checks. |
| **Simulated Ethical Stress‑Testing** | Run adversarial scenarios that stress bias, privacy, safety. | Produces *stress‑test reports* that become part of the audit trail. |
| **Stakeholder‑Centric Requirement Workshops** | Directly capture values, concerns, and edge cases from those impacted. | Generates *requirements traceability matrices* linking stakeholder concerns to technical artifacts. |

---

### 3. Operational Governance (Running AI Responsibly)

1. **Governance Body (AI Ethics Board)**
   - Cross‑functional (legal, data science, domain experts, civil‑society reps).  
   - Meets regularly to review new models, monitor drift, and approve “deployment tickets.”

2. **Decision‑Lifecycle Registry**
   - **Entry**: Model card, data provenance, intended use, risk assessment.  
   - **Mid‑life**: Retraining logs, performance drift alerts, fairness re‑evaluation.  
   - **Exit**: Decommission rationale, post‑mortem analysis, impact summary.  
   - *All entries are immutable (e.g., stored in a blockchain‑style ledger) to guarantee non‑repudiation.*

3. **Human‑In‑The‑Loop (HITL) / Human‑On‑The‑Loop (HOTL)**
   - **HITL** for high‑stakes decisions (e.g., credit denial, medical triage).  
   - **HOTL** for lower‑stakes but still accountable contexts (e.g., content moderation).  
   - *Audit logs must capture who overrode or approved each AI recommendation.*

4. **Continuous Monitoring Dashboard**
   - Real‑time fairness metrics, error rates, drift detectors, and “explainability confidence scores.”  
   - Alerts trigger automatic rollback or human review.

5. **External Audits & Certification**
   - Independent third‑party audits (e.g., ISO/IEC 42001 AI governance standard, EU AI Act conformity).  
   - Publicly released **audit summaries** (redacted for IP/privacy) to demonstrate transparency.

---

### 4. Legal & Regulatory Alignment

| Regulation | Core Requirement | Practical Mapping |
|------------|------------------|-------------------|
| **GDPR Art. 22** (Automated decision‑making) | Right to obtain meaningful information about logic involved. | Provide *plain‑language explanations* and *opt‑out mechanisms* for every automated decision. |
| **EU AI Act** (High‑risk AI) | Conformity assessment, post‑market monitoring, data governance. | Adopt *CE‑mark style* conformity packages: model card, risk management file, and continuous monitoring plan. |
| **US Algorithmic Accountability Act (proposed)** | Impact assessment, bias testing, documentation. | Use the same *impact‑assessment template* for both internal compliance and external reporting. |
| **Sector‑specific rules** (e.g., HIPAA, Fair Credit Reporting Act) | Data protection, fairness, recourse. | Layer sector‑specific controls (e.g., de‑identification pipelines, credit‑score explainability) on top of the generic framework. |

*Takeaway*: By mapping each regulatory clause to a concrete artefact in the governance pipeline, you turn “legal compliance” into an **observable, auditable process** rather than a checklist at the end of the project.

---

### 5. Communication & Recourse Mechanisms

| Mechanism | Ethical Purpose | Accountability/Transparency Benefit |
|-----------|----------------|--------------------------------------|
| **Decision Explainability Portal** (user‑facing UI) | Gives affected individuals understandable reasons (“Why was my loan denied?”). | Generates *interaction logs* that can be reviewed by regulators or internal auditors. |
| **Appeal Workflow** | Allows users to contest AI‑driven outcomes. | Each appeal creates a *traceable case file* (human review, outcome, any model adjustments). |
| **Public Model‑Performance Reports** (quarterly) | Builds societal trust, shows commitment to openness. | Provides an external benchmark for **accountability** and encourages community scrutiny. |
| **Bug‑Bounty / Red‑Team Programs** | Proactively uncover hidden biases or vulnerabilities. | Findings become part of the **audit trail** and trigger mandatory remediation cycles. |

---

### 6. Balancing Trade‑offs: When Ethics and Transparency Collide

| Conflict | Example | Mitigation Strategy |
|----------|---------|---------------------|
| **Proprietary IP vs. Full Model Disclosure** | Companies may want to protect model architecture. | Use **model cards** that disclose *behavioural* properties, training data scope, and fairness metrics without exposing raw code. Provide *trusted‑party* (e.g., regulator) access to full source under NDAs. |
| **Explainability vs. Predictive Performance** | Highly accurate deep nets are less interpretable. | Deploy **hybrid models**: a high‑perform “core” model plus an interpretable surrogate that approximates decisions for explanation. Validate that the surrogate’s explanations are faithful (use metrics like **faithfulness score**). |
| **Speed of Decision‑Making vs. Human Review** | Real‑time fraud detection may not allow HITL. | Implement **HOTL** with a “pause‑and‑review” buffer for borderline cases; use risk‑based thresholds to auto‑escalate only the most uncertain predictions. |
| **Data Minimization vs. Need for Auditable Records** | GDPR encourages minimal data retention, yet audits need logs. | Store **hashed, pseudonymized** logs that preserve auditability while respecting privacy. Delete raw personal data after a defined retention period, retaining only the audit‑ready abstractions. |

---

### 7. A Simple “Reconciliation Blueprint” (Step‑by‑Step)

1. **Define Scope & Stakeholders**  
   - List all decision contexts (credit, hiring, medical triage, etc.).  
   - Identify affected parties (customers, employees, regulators).

2. **Conduct an Ethical Impact Assessment (EIA)**  
   - Map potential harms, fairness concerns, privacy risks.  
   - Score severity and likelihood; decide on mitigation priority.

3. **Select an Explainable Architecture**  
   - Choose models that can produce post‑hoc or intrinsic explanations.  
   - Document the chosen approach in the model card.

4. **Create Accountability Artefacts**  
   - Model card, datasheet, responsibility matrix, risk‑management plan.  

5. **Implement Governance Controls**  
   - Set up an AI Ethics Board, decision‑lifecycle registry, monitoring dashboard.

6. **Deploy with Human Oversight**  
   - Configure HITL/HOTL thresholds, integrate appeal workflow.

7. **Monitor & Audit Continuously**  
   - Track fairness, drift, and explainability confidence.  
   - Schedule periodic external audits.

8. **Publish Transparency Outputs**  
   - Release performance & fairness reports, explanation portal, and audit summaries.

9. **Iterate**  
   - Feed new findings back into the EIA and model retraining pipeline.

---

### 8. Illustrative Case Study (Credit‑Scoring AI)

| Phase | Ethical Safeguard | Accountability / Transparency Mechanism |
|-------|-------------------|------------------------------------------|
| **Model Development** | Use *balanced training data*; run bias‑mitigation algorithms. | Publish a **Datasheet** showing demographic breakdown, preprocessing steps, and bias‑mitigation techniques. |
| **Pre‑Deployment** | Conduct a **Fairness Impact Assessment** (disparate impact < 1.25). | Obtain **AI Ethics Board** sign‑off; store assessment in the **Decision‑Lifecycle Registry**. |
| **Live Decision** | **HITL**: every decline above a risk‑threshold must be reviewed by a loan officer. | Log the AI recommendation, human reviewer ID, and final decision; store in immutable ledger. |
| **Post‑Decision** | Provide a **plain‑language explanation** (“Your credit score is low because …”). | Record the explanation delivered to the applicant; enable a **appeal** that triggers a second human review. |
| **Ongoing** | Quarterly **fairness dashboard** monitoring for drift. | Publish a **public performance report** and submit an **external audit** certificate annually. |

*Result*: The bank meets GDPR’s right‑to‑explanation, satisfies the EU AI Act’s high‑risk requirements, and demonstrably reduces discrimination, all while retaining a clear chain of responsibility for every credit decision.

---

## Bottom Line

**Reconciling AI ethics with accountability and transparency is not a one‑off checkbox—it is a continuous, system‑wide discipline.** By:

1. **Codifying ethical principles into concrete artefacts** (model cards, impact assessments, responsibility matrices);  
2. **Designing explainable, constraint‑aware models from day one**;  
3. **Embedding governance, monitoring, and human‑oversight into the decision‑making pipeline**;  
4. **Mapping every legal requirement to an observable process or document**; and  
5. **Providing clear, user‑centric explanations and appeal pathways**,

organizations can make AI‑driven decisions that are *fair, understandable, and traceable*, while also being *legally defensible* and *socially acceptable*. The key is to treat ethics, accountability, and transparency as **interlocking components of a single governance architecture**, not as separate, competing goals.

## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [1]:
!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling ma

In [19]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Reconciling the ethical implications of AI in decision-making processes requires a multifaceted approach that addresses accountability, transparency, and the potential consequences of AI-driven decisions. Here are some steps to consider:

1. **Establish clear principles and guidelines**: Develop a set of principles and guidelines for AI development and deployment, ensuring that they prioritize human values and dignity. These principles should be transparent and widely shared among stakeholders.
2. **Design for explainability**: Implement explanations and insights into AI decision-making processes, such as feature attributions, model interpretability techniques, or model-agnostic explanations. This will help identify potential biases and errors in AI decision-making.
3. **Regular auditing and testing**: Regularly audit and test AI systems to ensure they are functioning as intended, identifying potential biases, errors, or unintended consequences. This will involve human oversight and review of AI-driven decisions.
4. **Establish transparent decision-making processes**: Implement transparent and accessible decision-making procedures that allow stakeholders to understand how AI-powered decisions were made. This can include data sharing, model documentation, and clear communication about the decision-making process.
5. **Ensure inclusive and representative stakeholder groups**: Involve diverse stakeholders in decision-making processes, including those with expertise in AI ethics, social justice, advocacy group members, and community representatives to ensure a wide range of perspectives are considered.
6. **Foster an AI literacy culture**: Promote education, training, and awareness about AI, its limitations, and the potential ethical concerns associated with it. This will help build a broad community that understands the challenges and implications of AI decision-making processes.
7. **Use human oversight mechanisms**: Implement governance structures, such as appeals committees or administrative bodies, to review and resolve disputes related to AI-driven decisions.
8. **Develop standards for accountability and liability**: Establish clear standards for accountability, liability, and consequence attribution for organizations that develop and deploy AI systems.
9. **Support ongoing research and evaluation**: Continuously fund and participate in interdisciplinary research on AI ethics, ensuring that emerging evidence and best practices inform policy and practice changes.
10. **Prepare for uncertainty and errors**: Develop strategies to address potential errors or biases arising from AI decision-making processes, such as revising decisions or requesting human intervention.

By considering these steps, we can mitigate the risks associated with AI decision-making processes while harnessing the benefits of technological innovation.

**Additional Considerations:**

* The need for ethical consideration arises not only from internal concerns but also from external scrutiny and regulation.
- Collaboration between policymakers, industry leaders, civil society organizations, and academia should be essential in developing shared frameworks and principles for responsible AI development and deployment.
- Developing an understanding about the social and economic context of the problems that AI is trying to solve is necessary for truly effective, ethical solutions.

This is a start but we need a lot more work on this to effectively make society and businesses accountable.

In [20]:
# So where are we?

print(competitors)
print(answers)


['llama3.2', 'gpt-4.1-mini', 'openai/gpt-oss-120b', 'llama3.2']
['Reconciling the ethical implications of artificial intelligence (AI) in decision-making processes with the need for accountability and transparency requires a multifaceted approach. Here are some potential strategies:\n\n1. **Develop transparent AI systems**: Design AI systems that provide clear explanations for their decisions, such as model interpretability techniques or Explainable AI (XAI). This can help build trust in AI-driven decision-making.\n2. **Establish human oversight**: Ensure that human experts review and validate AI-generated decisions to prevent potential errors or biases. Human oversight can also improve accountability and transparency.\n3. **Implement robust testing and validation procedures**: Regularly test AI systems on diverse datasets to ensure they perform as intended under different scenarios. This can help identify potential issues before they lead to unfair or biased decisions.\n4. **Foster a 

In [21]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: llama3.2

Reconciling the ethical implications of artificial intelligence (AI) in decision-making processes with the need for accountability and transparency requires a multifaceted approach. Here are some potential strategies:

1. **Develop transparent AI systems**: Design AI systems that provide clear explanations for their decisions, such as model interpretability techniques or Explainable AI (XAI). This can help build trust in AI-driven decision-making.
2. **Establish human oversight**: Ensure that human experts review and validate AI-generated decisions to prevent potential errors or biases. Human oversight can also improve accountability and transparency.
3. **Implement robust testing and validation procedures**: Regularly test AI systems on diverse datasets to ensure they perform as intended under different scenarios. This can help identify potential issues before they lead to unfair or biased decisions.
4. **Foster a culture of explainability**: Encourage open commu

In [22]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [23]:
print(together)

# Response from competitor 1

Reconciling the ethical implications of artificial intelligence (AI) in decision-making processes with the need for accountability and transparency requires a multifaceted approach. Here are some potential strategies:

1. **Develop transparent AI systems**: Design AI systems that provide clear explanations for their decisions, such as model interpretability techniques or Explainable AI (XAI). This can help build trust in AI-driven decision-making.
2. **Establish human oversight**: Ensure that human experts review and validate AI-generated decisions to prevent potential errors or biases. Human oversight can also improve accountability and transparency.
3. **Implement robust testing and validation procedures**: Regularly test AI systems on diverse datasets to ensure they perform as intended under different scenarios. This can help identify potential issues before they lead to unfair or biased decisions.
4. **Foster a culture of explainability**: Encourage op

In [24]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [25]:
print(judge)

You are judging a competition between 4 competitors.
Each model has been given this question:

How would you reconcile the ethical implications of artificial intelligence in decision-making processes with the need for accountability and transparency in those decisions?

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}

Here are the responses from each competitor:

# Response from competitor 1

Reconciling the ethical implications of artificial intelligence (AI) in decision-making processes with the need for accountability and transparency requires a multifaceted approach. Here are some potential strategies:

1. **Develop transparent AI systems**: Design AI systems that provide clear explanations for their decisions, such as model interpretability tech

In [26]:
judge_messages = [{"role": "user", "content": judge}]

In [27]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results": ["3", "2", "4", "1"]}


In [28]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: openai/gpt-oss-120b
Rank 2: gpt-4.1-mini
Rank 3: llama3.2
Rank 4: llama3.2


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>