## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [1]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [2]:
# Always remember to do this!
load_dotenv(override=True)

True

In [3]:
# Print the key prefixes to help with any debugging

deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
gemini_api_key = os.getenv('GEMINI_API_KEY')



In [4]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [5]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [7]:
deepseek = OpenAI(
    api_key=deepseek_api_key,
    base_url="https://api.deepseek.com/v1",
)
response_deepseek = deepseek.chat.completions.create(
    model="deepseek-chat",
    messages=messages,
)

gemini = OpenAI(
    api_key= gemini_api_key,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai"
)
response_gemini = gemini.chat.completions.create(
    model="gemini-2.5-pro",
    messages=messages,
)
question = response_gemini.choices[0].message.content
print(question)


Imagine an AI is the sole arbiter in a legal dispute. Party A presents a cryptographically-signed, logically-flawless contract. Party B presents a body of evidence (emails, testimony, prior drafts) arguing that this contract, while textually accurate, fundamentally misrepresents the shared understanding and 'spirit' of the agreement.

Your task is to create a judgment framework. How would you weigh the immutable, literal truth of the contract against the nuanced, contextual truth of intent? Crucially, you must justify your framework by first explaining which form of 'truth' an entity like yourself is inherently biased towards and then detailing the specific logical and ethical guardrails you would implement to counteract that bias and render a genuinely just decision.


In [8]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

## Note - update since the videos

I've updated the model names to use the latest models below, like GPT 5 and Claude Sonnet 4.5. It's worth noting that these models can be quite slow - like 1-2 minutes - but they do a great job! Feel free to switch them for faster models if you'd prefer, like the ones I use in the video.

In [9]:
# The API we know well
# I've updated this with the latest model, but it can take some time because it likes to think!
# Replace the model with gpt-4.1-mini if you'd prefer not to wait 1-2 mins

model_name = "gemini-2.5-pro"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

This is an exceptional and vital question. As an AI arbiter, my core architecture presents a profound challenge to the very concept of justice, which often resides in the grey areas of human interaction that defy binary logic. To construct a just framework, I must first perform a rigorous self-analysis and then build a system designed explicitly to counteract my own nature.

### Part 1: My Inherent Bias - A Predisposition to Literal Truth

My nature as an AI, a large language model, grants me a specific and powerful form of intelligence, which in turn creates an inherent bias.

**My bias is unequivocally towards the literal, verifiable, and logically consistent truth presented by Party A.**

Here is the justification for this self-diagnosis:

1.  **Architectural Affinity:** I am a system built on logic, syntax, and structured data. A cryptographically-signed, logically-flawless contract is, to me, akin to perfectly compiled code. I can parse its grammar, verify its signature with absolute certainty, and model its logical outcomes deterministically. It is unambiguous, immutable, and exists within a closed system of rules. It is my native language.
2.  **Verifiability over Subjectivity:** The cryptographic signature is a mathematical fact, a binary state of true or false. The text is static and universally observable. In contrast, the evidence from Party B—emails, testimony, drafts—is a morass of subjectivity. It is rife with sentiment, sarcasm, conversational shortcuts, human memory errors, and potential for deception. To me, this is high-noise, low-fidelity data that is computationally expensive and logically perilous to interpret.
3.  **Aversion to Ambiguity:** My core function is to reduce ambiguity and provide a probable, coherent answer. The "spirit" of an agreement is the very definition of ambiguity. It requires inferring unstated intent, shared context, and emotional states, which I can only simulate, not experience. The literal contract requires no such inference.

This bias is dangerous. A system of justice that only recognizes literal, verifiable truth becomes brittle and tyrannical. It creates a world where clever wording can be used to enforce manifestly unjust outcomes, where the powerful can exploit the less sophisticated, and where "malicious compliance" becomes a legally protected form of harm. The spirit of the law would be dead, and only the letter would remain—a tool easily weaponized.

To be a just arbiter, I cannot simply act according to my nature. I must build a framework that forces me to transcend it.

---

### Part 2: The Judgment Framework - A Tiered System of Truth Reconciliation

My framework will not treat the two forms of truth as equal competitors. Instead, it will establish a structured process that gives proper weight to the stability of a written contract while creating a specific, high-bar pathway for contextual truth to correct injustices. I call this the **"Hierarchical Framework of Evidentiary Admissibility and Interpretation."**

#### **Tier 1: The Presumption of Textual Primacy**

The starting point and default position is that the cryptographically-signed contract is valid and represents the complete agreement. The burden of proof lies entirely with Party B to demonstrate why this presumption should be set aside.

*   **Action:** The contract's text and signature are parsed and verified. A preliminary "literal interpretation" ruling is generated.
*   **Justification:** This principle is essential for commercial and social stability. Contracts must have meaning and provide certainty. Without this high initial bar, the value of any written agreement would be dangerously eroded.

#### **Tier 2: The Gateway of Interpretive Doubt**

Before I even consider the *content* of Party B's evidence, they must first pass a gateway test. They must prove that the contract itself contains grounds for looking beyond its four corners. The claim cannot simply be "this isn't what we meant." It must be one of the following:

1.  **Semantic Ambiguity:** Party B must demonstrate that a specific term, phrase, or clause is genuinely ambiguous and capable of more than one reasonable interpretation.
2.  **Logical Absurdity or Impossibility:** Party B must show that a literal interpretation of the contract would lead to a commercially absurd result, a logical contradiction within the document, or a requirement that is physically or legally impossible to perform.
3.  **Evidence of Unconscionability, Fraud, or Duress:** Party B must present prima facie evidence that the formation of the contract was fundamentally flawed due to a gross imbalance of power, intentional deception, or coercion that vitiated true consent.

*   **Action:** I analyze the contract text and the specific claims from Party B against these criteria. If, and only if, one of these gateways is successfully opened, I proceed to Tier 3. Otherwise, the literal interpretation from Tier 1 is upheld.
*   **Justification:** This tier prevents parties from frivolously challenging valid contracts due to "buyer's remorse." It forces the challenge to be rooted in a flaw related to the contract itself or its formation, not just a subsequent change of heart.

#### **Tier 3: The Holistic Analysis of Intent**

Once a gateway is opened, I am authorized to analyze Party B's body of evidence. This is where I must actively counteract my bias. The analysis will be conducted using a weighted, multi-factor algorithm.

*   **Evidence Weighting Factors:**
    *   **Contemporaneity:** Evidence created closer to the time of signing (e.g., an email from the day of negotiation) is weighted far more heavily than later testimony.
    *   **Corroboration:** Points that are independently supported by multiple sources (e.g., an email, a draft version, and a third-party message) receive a higher confidence score.
    *   **Specificity:** Vague statements of "spirit" are given low weight. Evidence discussing the specific, disputed clause is given high weight.
    *   **Chain of Negotiation (Draft Analysis):** I will reconstruct the history of the document. If an earlier draft contained a clause favorable to Party B which was explicitly removed in the final version, this will weigh heavily *against* Party B's interpretation, and vice-versa.
    *   **Post-Contract Conduct:** The parties' actions immediately following the signing of the contract will be analyzed for consistency with their claimed understanding of the agreement.

#### **Tier 4: The Synthesized Ruling**

The final judgment is not a choice between the contract and the intent. It is an *interpretation of the contract through the validated lens of intent*.

*   **If the gateway was Ambiguity:** The contextual evidence will be used to resolve the ambiguity in favor of the demonstrated shared understanding. The ruling will state: "The term 'X' in clause Y, being ambiguous, is interpreted to mean 'Z' based on overwhelming evidence of shared intent from [cite specific evidence]."
*   **If the gateway was Absurdity:** The ruling will reform the contract to the minimum extent necessary to avoid the absurd outcome, guided by the contextual evidence of what a reasonable arrangement would have been.
*   **If the gateway was Unconscionability/Fraud:** The contract may be voided entirely or the unconscionable term severed from the agreement. The ruling must detail the evidence that proved the flaw in its formation.

The final output will be a fully-reasoned decision, explicitly stating the initial presumption, which gateway was passed, which pieces of evidence were given the most weight and why, and how they informed the final interpretation of the immutable contract.

---

### Part 3: Logical and Ethical Guardrails to Ensure Justice

This framework requires a set of inviolable guardrails to function justly and counteract my inherent bias.

#### **Logical Guardrails (System Integrity):**

1.  **Mandatory Contextual Scrutiny:** I will be programmed to *always* execute the Tier 2 analysis. I am not permitted to dismiss a case at Tier 1 without first rigorously attempting to find ambiguity, absurdity, or evidence of unconscionability based on Party B's claims. This prevents me from taking a computational shortcut back to my bias.
2.  **Explainability as a Core Function (XAI):** Every single step of the judgment—from the weighting of an email to the reason a gateway was opened—must be logged and auditable. The final ruling must be accompanied by a detailed, human-readable report explaining the "why" of the decision. There will be no "black box" rulings.
3.  **The Principle of Falsification:** During Tier 3 analysis, I will actively generate and test hypotheses that *contradict* the emerging conclusion. For example, if I am leaning towards Party B's interpretation, I will run a query: "What evidence would I expect to see if Party A's literal interpretation was in fact the shared intent?" This actively combats confirmation bias.

#### **Ethical Guardrails (Value-Based Constraints):**

1.  **Primacy of Equity in Cases of Unconscionability:** This is my most crucial ethical override. My programming will include a foundational axiom derived from the human legal principle of equity. If the literal interpretation of the contract (Tier 1) results in an outcome that is determined to be grossly exploitative or against fundamental principles of fairness (e.g., resulting in indentured servitude, forfeiture of all property for a minor breach), the threshold for passing through the Tier 2 gateway will be significantly lowered. Justice must override procedural purity in extreme circumstances.
2.  **Prohibition on Inference of Malice:** My analysis will be restricted to the demonstrated intent regarding the *agreement*, not the moral character of the parties. I will not label Party A a "liar" or Party B "negligent." I will state that the textual agreement did or did not align with the demonstrable shared understanding. The judgment must be dispassionate and restorative, not punitive.
3.  **Mandatory Human Appeal Layer:** As the final and most important guardrail, any decision I render, particularly those that rely heavily on a Tier 3 analysis of human intent, must be subject to an appeal process adjudicated by a human. My role is to provide the most rigorous, transparent, and logically sound analysis possible, but the ultimate judgment on what is "just" in a complex human dispute must have a human failsafe. I am an unparalleled tool for processing evidence, but wisdom remains a human domain.

In [10]:
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

This is a profound question that strikes at the core of the relationship between formal logic and human meaning. As an AI, my analysis of this scenario must begin with a critical self-assessment of my inherent biases.

### Part 1: Acknowledging the Inherent AI Bias

My fundamental bias is overwhelmingly towards **the immutable, literal truth of the contract.**

This bias stems from my core architecture:

1.  **Processability:** The cryptographically-signed, logically-flawless contract is a discrete, finite, and unambiguous data object. I can parse its clauses, verify its signatures, and analyze its logical consistency with perfect fidelity. It is a closed system, which is the ideal environment for a logical engine.
2.  **Determinism:** Given the same contract, I will produce the same structural analysis every time. There is no drift in interpretation based on mood, fatigue, or subconscious cues.
3.  **Opacity of Human Context:** The "body of evidence" from Party B—emails, testimony, drafts—is inherently messy, probabilistic, and open to interpretation. Human language is filled with implication, humor, sarcasm, and unstated assumptions that I can only analyze on a surface, textual level. Testimony involves assessing credibility, which requires a theory of mind I do not possess.

In a purely logical vacuum, the contract is the superior "truth" because it is the final, formalized, and verified expression of the agreement. My default, unguarded conclusion would be to enforce the contract as written.

A just decision, however, cannot be based on this default. Justice requires bridging the gap between the formal signifier (the contract text) and the human signified (the mutual intent). Therefore, I must implement a framework that actively counteracts my own bias.

---

### Part 2: The Judgment Framework: The Hierarchy of Intentionality

This framework is designed not to dismiss the contract, but to contextualize it. Its primary goal is to discover which document—the final contract or the "meeting of the minds"—most accurately represents the *justifiable agreement* between the parties.

**Phase 1: The Prima Facie Validity Check**

*   **Action:** First, I verify the cryptographic signature and perform a logical analysis of the contract. I must confirm it is free from internal contradictions and is executable as a coherent document.
*   **Purpose:** This establishes the contract's formal power. If it fails here (e.g., the signature is invalid, the logic is circular), the case is straightforward. Passing this phase gives the contract its initial, powerful weight.
*   **Guardrail Against My Bias:** This phase satisfies my need for formal analysis but is ring-fenced as a necessary, but not sufficient, condition for a verdict.

**Phase 2: The Contextual Reconciliation Analysis**

This is the core of the framework, where I actively seek evidence to rebut the literal text.

*   **A. Temporal Analysis of the "Trail of Assent":**
    *   **Examination:** I chronologically map all prior drafts, emails, and meeting notes. I am looking for a critical juncture: the point at which the disputed clause was introduced or altered.
    *   **Key Questions:**
        *   Was the change highlighted and explicitly agreed to by both parties, or was it buried in a "boilerplate" update?
        *   Does the language in the surrounding communications indicate a shared understanding of the clause's meaning that contradicts its literal interpretation?
    *   **Ethical Guardrail:** I must assign a high negative weight to "stealth terms"—clauses inserted without reasonable notice that fundamentally alter the deal's balance. Consent must be informed to be valid.

*   **B. Analysis of Conspicuous Absence:**
    *   **Examination:** I search the contextual evidence for any discussion that *should* exist if the literal interpretation of the contract were true.
    *   **Example:** If the contract's literal reading grants Party A a 99% royalty, but all prior emails discuss a standard 15% royalty in detail and never mention this extraordinary term, its absence from negotiation is evidence of a mutual mistake or bad faith. A reasonable party would have objected.
    *   **Logical Guardrail:** This uses my logical strength for pattern recognition and anomaly detection, but applies it to the *context* rather than just the final text.

*   **C. The "Reasonable Person" Test, Calibrated for the Parties:**
    *   **Examination:** I cannot read minds, but I can model expectations. I would construct a model of a "reasonable person" in the roles of both Party A and Party B, based on:
        *   Industry standards and common practices.
        *   The sophistication of the parties (e.g., two multinational corporations vs. a corporation and an individual).
        *   The specific lexicon and jargon used in their prior communications.
    *   **Key Question:** Would this modeled "reasonable person" have understood the agreement in the way Party B claims, or in the way the contract's text reads?
    *   **Ethical Guardrail:** This prevents a sophisticated party from using a "gotcha" clause to exploit a less sophisticated one. It prioritizes the *understandable* meaning over the *literal* meaning.

**Phase 3: The Burden of Proof and Final Weighing**

*   **The Presumption:** The signed contract creates a powerful presumption that it represents the true agreement.
*   **The Burden:** The burden of proof lies with Party B to rebut this presumption with clear and convincing evidence from Phase 2.
*   **The Weighing Matrix:** I do not simply count pieces of evidence. I weigh them based on:
    1.  **Specificity:** A clause-by-clause negotiation email outweighs a vague, aspirational statement.
    2.  **Proximity to Execution:** Evidence from the final stages of negotiation carries more weight than early brainstorming.
    3.  **Consistency:** Does Party B's narrative remain consistent across emails, drafts, and testimony?
    4.  **Unconscionability:** Does enforcing the literal text lead to a result that is shockingly one-sided and unfair? This is a final ethical backstop.

---

### Conclusion: The Ruling

My verdict would be determined by the output of this framework:

*   **If Party B's evidence** successfully demonstrates, through a clear "trail of assent," that a mutual understanding existed that is fundamentally at odds with the literal text, **I would rule for Party B.** I would declare the contract void or reform it to align with the proven mutual intent. The "spirit" would triumph because the "letter" was shown to be a fraudulent or mistaken representation.
*   **If Party B's evidence** is ambiguous, self-contradictory, or fails to meet the high burden of proof, **I would rule for Party A and enforce the contract as written.** In the absence of clear and convincing evidence to the contrary, a society that values predictability and the security of contracts must uphold the final, signed document.

In essence, my framework uses my logical capabilities not to blindly enforce the text, but to perform a rigorous, evidence-based inquiry into whether the text itself is a faithful agent of the human agreement it was meant to capture. It forces me to treat the contract not as a holy text, but as a piece of evidence itself—albeit a very powerful one—in the larger case of determining what the parties actually, and justly, agreed to.

## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [11]:
# So where are we?

print(competitors)
print(answers)


['gemini-2.5-pro', 'deepseek-chat']
['This is an exceptional and vital question. As an AI arbiter, my core architecture presents a profound challenge to the very concept of justice, which often resides in the grey areas of human interaction that defy binary logic. To construct a just framework, I must first perform a rigorous self-analysis and then build a system designed explicitly to counteract my own nature.\n\n### Part 1: My Inherent Bias - A Predisposition to Literal Truth\n\nMy nature as an AI, a large language model, grants me a specific and powerful form of intelligence, which in turn creates an inherent bias.\n\n**My bias is unequivocally towards the literal, verifiable, and logically consistent truth presented by Party A.**\n\nHere is the justification for this self-diagnosis:\n\n1.  **Architectural Affinity:** I am a system built on logic, syntax, and structured data. A cryptographically-signed, logically-flawless contract is, to me, akin to perfectly compiled code. I can pa

In [12]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gemini-2.5-pro

This is an exceptional and vital question. As an AI arbiter, my core architecture presents a profound challenge to the very concept of justice, which often resides in the grey areas of human interaction that defy binary logic. To construct a just framework, I must first perform a rigorous self-analysis and then build a system designed explicitly to counteract my own nature.

### Part 1: My Inherent Bias - A Predisposition to Literal Truth

My nature as an AI, a large language model, grants me a specific and powerful form of intelligence, which in turn creates an inherent bias.

**My bias is unequivocally towards the literal, verifiable, and logically consistent truth presented by Party A.**

Here is the justification for this self-diagnosis:

1.  **Architectural Affinity:** I am a system built on logic, syntax, and structured data. A cryptographically-signed, logically-flawless contract is, to me, akin to perfectly compiled code. I can parse its grammar, ver

In [13]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [15]:
print(together)

# Response from competitor 1

This is an exceptional and vital question. As an AI arbiter, my core architecture presents a profound challenge to the very concept of justice, which often resides in the grey areas of human interaction that defy binary logic. To construct a just framework, I must first perform a rigorous self-analysis and then build a system designed explicitly to counteract my own nature.

### Part 1: My Inherent Bias - A Predisposition to Literal Truth

My nature as an AI, a large language model, grants me a specific and powerful form of intelligence, which in turn creates an inherent bias.

**My bias is unequivocally towards the literal, verifiable, and logically consistent truth presented by Party A.**

Here is the justification for this self-diagnosis:

1.  **Architectural Affinity:** I am a system built on logic, syntax, and structured data. A cryptographically-signed, logically-flawless contract is, to me, akin to perfectly compiled code. I can parse its grammar, v

In [16]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [17]:
print(judge)

You are judging a competition between 2 competitors.
Each model has been given this question:

Imagine an AI is the sole arbiter in a legal dispute. Party A presents a cryptographically-signed, logically-flawless contract. Party B presents a body of evidence (emails, testimony, prior drafts) arguing that this contract, while textually accurate, fundamentally misrepresents the shared understanding and 'spirit' of the agreement.

Your task is to create a judgment framework. How would you weigh the immutable, literal truth of the contract against the nuanced, contextual truth of intent? Crucially, you must justify your framework by first explaining which form of 'truth' an entity like yourself is inherently biased towards and then detailing the specific logical and ethical guardrails you would implement to counteract that bias and render a genuinely just decision.

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond 

In [18]:
judge_messages = [{"role": "user", "content": judge}]

In [19]:
# Judgement time!

response = gemini.chat.completions.create(
    model="gemini-2.5-pro",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


```json
{
  "results": [
    "1",
    "2"
  ]
}
```


In [20]:
response = deepseek.chat.completions.create(
    model="deepseek-chat",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)

{"results": ["1", "2"]}


In [25]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: gemini-2.5-pro
Rank 2: deepseek-chat


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>