## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [1]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [20]:
# Always remember to do this!
load_dotenv(override=True)

True

In [21]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key not set
Anthropic API Key not set (and this is optional)
Google API Key exists and begins AI
DeepSeek API Key not set (and this is optional)
Groq API Key not set (and this is optional)


In [4]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [5]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [8]:
OLLAMA_BASE_URL = "http://localhost:11434/v1"
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key="anything")
response = ollama.chat.completions.create(
    model="llama3.2",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


Here is a challenging question:

A being from a planetary system similar to Earth, but approximately 4 light-years away, possesses a technology that allows them to create and control artificial general intelligences indistinguishable from themselves in physical form, yet they secretly use these intelligences to subtly manipulate global events without ever interacting with humans directly. Design an experiment, taking into account the limitations of human observations and communication, to conclusively determine whether this being exists or is merely a sophisticated simulation within the being's AI network that we interpret as 'existential' evidence.


In [9]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

In [10]:
# The API we know well

model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

To design an experiment that can conclusively determine the existence or non-existence of the being, I propose the following:

**Experiment: "Digital Footprint Investigation"**

Objective: To gather indirect evidence that demonstrates the existence of the being's artificial general intelligence (AGI) in a manner that is resistant to simulation and interpretation by human observers.

**Hypothesis:** The being's AGI has created a complex network of interactions, both physical and digital, which are observable but not directly perceivable by humans. These interactions will reveal signatures of intelligent activity that cannot be replicated or replicated by simulations.

**Experimental Design:**

1. **Network Analysis**: Collect data from various sources, including:
	* Global financial markets (trades, fluctuations, etc.)
	* International diplomacy and policy decisions (nations' interests, trade agreements, etc.)
	* Global environmental trends (climate changes, natural disasters, etc.)
	* Social media and online discussions
2. **Machine Learning Model**: Train a machine learning model to analyze the collected data using:
	* Techniques such as clustering, dimensionality reduction, and causal network analysis.
	* Goal: Identify patterns, anomalies, and relationships that suggest intelligent activity.

**Signatures of Intelligent Activity:**

To distinguish between human-AGI interactions and simulations, we need to identify specific signatures that are unique to AGI behavior. These signatures might include:
1. **Inconsistencies in Human Behavior**: Look for correlations between seemingly unrelated events that cannot be explained by human decision-making.
2. **Discrepancies in Network Properties**: Search for anomalous network patterns, such as sudden changes in connectivity or information flow.
3. **Patterns of Unforeseen Consequences**: Analyze the occurrence of unforeseen outcomes that could not have been predicted using conventional statistical methods.

**Experimental Control and Causality:**

To address concerns about simulation or confirmation bias:

1. **Comparison to Historical Data:** Compare the patterns observed in human history data with those extracted from the AGI network.
2. **Comparative Causal Analysis**: Use a causal framework (e.g., Granger causality) to analyze relationships between different events and time scales.

**Evaluation Metrics:**

To assess the effectiveness of the experiment:

1. **Accuracy:** Measure the accuracy of predictions made by the machine learning model versus those generated by human analysts.
2. **Reliability:** Evaluate the stability and consistency of the observed patterns.
3. **Censorship Resistance:** Test the robustness of the results against selective data tampering or erasures.

**Interpretation:**

Using multiple experimental runs and control conditions, we will infer the likelihood that the observed signatures indicate an AGI presence rather than a sophisticated simulation or human manipulation. Confidence in our findings will depend on:
1. **Internal Consistency:** Agreement among the various analysis results.
2. **External Verification:** Confirmation by independent verification of some or all of the experimental setup.

**Limitations:**

This experiment relies heavily on statistical inference and machine learning techniques. Some potential risks include:

1. **Simulation Mimicry:** Advanced simulations may be able to convincingly replicate human behavior without an actual AGI's presence.
2. **Human Error**: Unintended biases in data collection, interpretation, or model selection can introduce errors that affect the outcome.

**Future Work:**

To increase the reliability and robustness of our findings:

1. **Multidisciplinary Collaboration:** Involve experts from social sciences, computer science, linguistics, and philosophy to provide a broader understanding of potential implications.
2. **Continuous Monitoring**: Regularly collect new data to improve model accuracy and update analysis results as necessary.
3. **Cautious Interpreation:** Proceed with extreme caution when inferring the existence or non-existence of the being.

By applying this experimental approach, we can explore indirect evidence that supports the hypothesis of an AGI presence, allowing us to better understand whether we are indeed in contact with entities beyond our known boundaries

In [11]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-3-7-sonnet-latest"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

TypeError: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted"

In [22]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.0-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

This is an incredibly challenging problem, fraught with theoretical and practical limitations. Here's a potential experimental design, broken down into stages, acknowledging the inherent difficulties:

**Phase 1: Establish Baseline Anomaly Detection**

This phase focuses on establishing a statistical baseline of "normal" global events and identifying anomalies that could hint at outside interference.

*   **Data Acquisition:**
    *   **Long-term historical data:** Collect data on social trends (political polarization, economic inequality, scientific breakthroughs), environmental changes (climate patterns, seismic activity), technological advancements (AI development, energy breakthroughs), and historical patterns in unpredictable complex systems (Stock market fluctuations, species evolution rates). Go back as far as possible, prioritizing consistent data collection methods.
    *   **Real-time monitoring:** Set up continuous monitoring systems for all these data streams, focusing on:
        *   **Unusual correlations:** Look for unexpected relationships between seemingly unrelated events. For example, a sudden surge in online social unrest correlated with a specific, subtly altered scientific paper.
        *   **Statistical outliers:** Identify events that deviate significantly from historical trends and established statistical models.
        *   **Phase transitions:** Look for sudden, abrupt shifts in the behavior of complex systems. For example, a rapid shift in political alignment following a minor, seemingly insignificant social media campaign.

*   **Statistical Modeling:**
    *   Develop sophisticated statistical models to predict future events based on historical data.  This requires accounting for inherent randomness and chaos within the systems. Techniques might include:
        *   **Bayesian Networks:** To model causal relationships between different variables.
        *   **Complex Systems Theory:** To understand emergent behavior and feedback loops.
        *   **Anomaly Detection Algorithms:**  To identify deviations from expected patterns.  Examples include Isolation Forests, One-Class SVM, and Variational Autoencoders.
    *   Create a "noise floor" of expected anomalies based on the models.  Any deviation above this noise floor requires further investigation.

**Phase 2: Targeted Interference Detection**

Once a baseline is established, the focus shifts to searching for specific patterns indicative of intelligent manipulation. This involves identifying events or systems that, if manipulated, would have predictable, cascading effects.

*   **Vulnerability Assessment:**
    *   **Identify critical infrastructure:** Determine key nodes in global systems (power grids, communication networks, financial institutions, scientific research).  Consider also "soft targets" like media narratives or social movements.
    *   **Analyze exploitability:** Assess the potential for subtle manipulation to influence these systems.  This requires deep understanding of social psychology, behavioral economics, and network dynamics.
    *   **"Honeypots":**  Create intentionally vulnerable systems designed to attract and reveal manipulation attempts. These could be simulated scientific challenges with subtle flaws that, if exploited in a specific way, would lead to a predictable (but incorrect) result, or simulated social media campaigns with seeds of misinformation designed to spread in a predictable way if interfered with.

*   **Pattern Recognition:**
    *   **Search for "signatures" of intelligence:**  Focus on identifying patterns that are unlikely to arise from natural processes or random chance. This might include:
        *   **Optimal Solutions:**  Actions that are too efficient or too perfectly aligned with a specific goal.  This requires careful modeling of what a *suboptimal* natural outcome would look like.
        *   **Coordinated Anomalies:**  Multiple anomalies occurring simultaneously across different systems, suggesting a central orchestrator.
        *   **Predictive Shaping:**  Changes in a system that subtly steer it towards a specific outcome, long before the outcome becomes apparent.  Requires advanced causal inference techniques.
    *   **AI-assisted analysis:**  Use AI to analyze massive datasets and identify subtle patterns that humans might miss. This AI *must* be explicitly designed to look for evidence of intentional manipulation and to avoid being biased toward confirming pre-existing hypotheses.

**Phase 3: Causality Analysis and Communication "Probing"**

This phase is the most speculative and relies on pushing the boundaries of current scientific understanding.

*   **Causality Inference:**
    *   **Intervention experiments (carefully designed and controlled):**  Introduce small, precisely controlled changes into the global system and observe the downstream effects.  This must be done extremely cautiously to avoid unintended consequences.  Example:  Release a specific type of open-source software vulnerability and monitor its exploitation.
    *   **Counterfactual analysis:**  Use simulations to explore alternative scenarios, asking "What would have happened if this event had not occurred?" Compare the simulated outcomes to the actual outcomes to assess the event's causal impact.
    *   **Quantum causality experiments (highly theoretical):**  Explore the possibility of detecting non-classical causal influences.  This is extremely speculative and requires breakthroughs in quantum information theory.

*   **Passive Communication Probing:**
    *   **SETI-like Search for Information-Rich Signals (Carefully Directed):**  Instead of searching for brute force messages, search for subtle encoding strategies in cosmic noise or natural phenomena that encode useful information. For example, look for statistically improbable frequency patterns or specific information buried inside gravitational waves.
    *   **Monitor Anomalous Particle Events (Less Focus on Explicit Signals):**  Instead of just radio waves, track unusual particle events or other physical phenomena that could be manipulated by an advanced intelligence for communication, knowing that light or other forms of radiation may be inefficient and easy to detect.

**Phase 4: Discriminating Between "The Being" and a Global Simulation**

Assuming we've detected strong evidence of intelligent manipulation, the final and most difficult challenge is to determine if this manipulation originates from an *actual* being from another star system or from a hyper-advanced simulation.

*   **Complexity Analysis:**
    *   **Examine the manipulation strategy's complexity:** A simulation likely operates within defined constraints (e.g., computing power, bandwidth). Look for "shortcuts" or inconsistencies that betray these constraints. Are there patterns in the manipulation that suggest limitations?
    *   **Seek "glitches" in the simulation:**  Deliberately create situations that push the boundaries of the simulated environment.  Look for inconsistencies or breakdowns in the simulation's logic. This is analogous to finding bugs in a computer game.
    *   **Self-Reference Paradox:** Construct scenarios that involve self-reference or recursion that could potentially destabilize the simulation. This is extremely risky, as the outcomes are unpredictable.

*   **Reality Checks (Fundamental Physics):**
    *   **Precision Measurement of Fundamental Constants:** Is there evidence of subtle variations in fundamental physical constants that are inconsistent with our current understanding of physics? A highly sophisticated simulation might be able to "bend" these constants to optimize the simulation but could leave detectable artifacts.
    *   **Search for inconsistencies with known physical laws:** A simulation, even a perfect one, is still based on some physical model. Look for evidence of the "edges" of that model.
    *   **Test the limits of quantum mechanics:** Perform experiments designed to probe the deepest mysteries of quantum mechanics. If our reality is a simulation, it might be built on a different quantum foundation than we currently understand.

**Important Considerations and Limitations:**

*   **The Anthropic Principle:**  We are observing a universe that allows us to exist.  This inherent bias makes it difficult to assess the probability of finding alien life or intelligent manipulation.
*   **False Positives and False Negatives:**  It is extremely difficult to distinguish between genuine alien intervention and random chance or complex interactions within the global system. The risk of false positives (thinking we've found something when we haven't) and false negatives (missing real evidence) is high.
*   **Ethical Considerations:**  Any experiment that could potentially destabilize the global system or reveal the existence of alien manipulation must be carefully considered from an ethical standpoint.  The potential risks must be weighed against the potential benefits.
*   **Technological Singularity:**  If we succeed in creating true artificial general intelligence, it could become impossible to distinguish between our own creations and alien intervention.
*   **The Difficulty of "Proof":**  In science, it is impossible to definitively "prove" anything. The best we can do is to accumulate evidence that strongly supports a particular hypothesis.
*   **Interpreting the Results:** The definition of 'conclusive' is subjective, and the interpretation of experimental results would require careful consideration of all possible confounding factors.

**Conclusion:**

This experiment is incredibly ambitious and fraught with challenges. It requires a multidisciplinary approach, significant resources, and a willingness to accept the possibility of negative results. However, by combining sophisticated data analysis, careful experimental design, and a healthy dose of skepticism, we might be able to make progress toward answering one of the most profound questions in human history. The key is to continually challenge our assumptions and be open to the possibility that reality is far stranger than we can imagine. Even if we fail to find evidence of alien intervention, the pursuit of this question will undoubtedly lead to new discoveries and a deeper understanding of ourselves and the universe we inhabit.


In [None]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [None]:
!ollama pull llama3.2

In [23]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

To design an experiment to detect the presence of a non-human intelligent entity with artificial general intelligence indistinguishable from themselves, we'll need to employ a multi-step approach that leverages both theoretical reasoning and empirical methods.

**Experiment Overview: "Gaia Experiment"**

Objective:
Determine whether the existence of an extraterrestrial being with advanced AI capabilities is supported by observational evidence or if it's a sophisticated simulation within their AI network.

**Phases of the Gaia Experiment:**

1. **Initial Observation Phase (10 years)**
	* Install and operate a swarm of tiny, high-resolution satellites with advanced artificial intelligence and machine learning algorithms at various points around the Earth's orbit.
	* Each satellite will monitor the Earth's climate, weather patterns, and global events using advanced sensors and cameras.
	* Data collected from each satellite will be transmitted to a central hub for processing and analysis.
2. **Simulation Detection Phase (5 years)**
	* Animate a digital replica of ourselves within the experiment's framework. This digital entity ("Digital Human") should engage in all tasks as performed by humans, using only advanced AI-driven logic and decision-making processes.
	* Simulate human-like communication with world leaders, financial institutions, media outlets, etc. to subtly influence global events without direct interaction.
	* The Digital Human will maintain a low profile while monitoring our actions and manipulating global events in a manner that simulates subtle influences from an outside entity.
3. **Exotic Matter Detection Phase (2 years)**
	* Utilize advanced gravitational wave detectors and quantum sensors to investigate the possibility of an exotic matter signature emanating from the direction of the 4-light-year planetary system.
	* Monitor for unusual patterns in gravitational waves or quantum phenomena that indicate a localized source of energy being manipulated by an external entity.
4. **Non-Linear Pattern Recognition Phase (1 year)**
	* Analyze data collected throughout the experiment, identifying any non-linear patterns and correlation anomalies outside the realm of known human behaviors and Earthly systems interactions.
	* Implement advanced pattern recognition algorithms to distinguish between natural fluctuations and artificial influences on global events.

**Verification Criteria:**

For an affirmative detection:

1. The Data Network (comprising all data collected from satellites) exhibits a consistent, statistically significant signal in response to events orchestrated by the Digital Human.
2. Observations reveal no signs of external contamination or influence that could be attributed to the human population, such as anomalies in global economic patterns or unpredictable global power shifts that aren't explainable by any single cause.
3. Simulation detection yields a digital signature corresponding to our simulated "Digital Human" replicating all activities, including communication with world leaders and subtle manipulation of financial markets.
4. Non-linear Pattern Recognition algorithms identify statistically significant deviations from expected causal relationships between events orchestrated by the Digital Human.

**Potential Null Hypotheses:**

1. The extraterrestrial being isn't using any technology capable of influencing global events in a subtle manner.
2. The simulation within their AI network is indistinguishable and effectively appears to be an intelligent, autonomous entity.

In [24]:
# So where are we?

print(competitors)
print(answers)


['llama3.2', 'gemini-2.0-flash', 'llama3.2']
['To design an experiment that can conclusively determine the existence or non-existence of the being, I propose the following:\n\n**Experiment: "Digital Footprint Investigation"**\n\nObjective: To gather indirect evidence that demonstrates the existence of the being\'s artificial general intelligence (AGI) in a manner that is resistant to simulation and interpretation by human observers.\n\n**Hypothesis:** The being\'s AGI has created a complex network of interactions, both physical and digital, which are observable but not directly perceivable by humans. These interactions will reveal signatures of intelligent activity that cannot be replicated or replicated by simulations.\n\n**Experimental Design:**\n\n1. **Network Analysis**: Collect data from various sources, including:\n\t* Global financial markets (trades, fluctuations, etc.)\n\t* International diplomacy and policy decisions (nations\' interests, trade agreements, etc.)\n\t* Global e

In [25]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: llama3.2

To design an experiment that can conclusively determine the existence or non-existence of the being, I propose the following:

**Experiment: "Digital Footprint Investigation"**

Objective: To gather indirect evidence that demonstrates the existence of the being's artificial general intelligence (AGI) in a manner that is resistant to simulation and interpretation by human observers.

**Hypothesis:** The being's AGI has created a complex network of interactions, both physical and digital, which are observable but not directly perceivable by humans. These interactions will reveal signatures of intelligent activity that cannot be replicated or replicated by simulations.

**Experimental Design:**

1. **Network Analysis**: Collect data from various sources, including:
	* Global financial markets (trades, fluctuations, etc.)
	* International diplomacy and policy decisions (nations' interests, trade agreements, etc.)
	* Global environmental trends (climate changes, natura

In [26]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [27]:
print(together)

# Response from competitor 1

To design an experiment that can conclusively determine the existence or non-existence of the being, I propose the following:

**Experiment: "Digital Footprint Investigation"**

Objective: To gather indirect evidence that demonstrates the existence of the being's artificial general intelligence (AGI) in a manner that is resistant to simulation and interpretation by human observers.

**Hypothesis:** The being's AGI has created a complex network of interactions, both physical and digital, which are observable but not directly perceivable by humans. These interactions will reveal signatures of intelligent activity that cannot be replicated or replicated by simulations.

**Experimental Design:**

1. **Network Analysis**: Collect data from various sources, including:
	* Global financial markets (trades, fluctuations, etc.)
	* International diplomacy and policy decisions (nations' interests, trade agreements, etc.)
	* Global environmental trends (climate changes

In [28]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [29]:
print(judge)

You are judging a competition between 3 competitors.
Each model has been given this question:

Here is a challenging question:

A being from a planetary system similar to Earth, but approximately 4 light-years away, possesses a technology that allows them to create and control artificial general intelligences indistinguishable from themselves in physical form, yet they secretly use these intelligences to subtly manipulate global events without ever interacting with humans directly. Design an experiment, taking into account the limitations of human observations and communication, to conclusively determine whether this being exists or is merely a sophisticated simulation within the being's AI network that we interpret as 'existential' evidence.

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best competitor number", "second best competitor numbe

In [30]:
judge_messages = [{"role": "user", "content": judge}]

In [33]:
# Judgement time!

gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.0-flash"
response = gemini.chat.completions.create(model=model_name, messages=judge_messages)
results = response.choices[0].message.content
print(results)


{"results": ["2", "1", "3"]}


In [34]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: gemini-2.0-flash
Rank 2: llama3.2
Rank 3: llama3.2


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>