"""

This script generates a challenging question using OpenAI, collects answers 
from multiple LLMs (OpenAI, Google Gemini, Groq, Ollama), and ranks them using 
an OpenAI model as a judge. It demonstrates multi-agent coordination, API 
integration, automated answer collection, and LLM evaluation workflows.
"""


In [1]:
import os,json
from IPython.display import Markdown,display
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv(override=True)


True

In [6]:
openai_api_key = os.getenv('OPENAI_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins with {openai_api_key[:5]}")
else:
    print("OpenAI API Key not set")

if google_api_key:
    print(f"Google API Key exists and begins with {google_api_key[:5]}")
else:
    print("Google API Key not set")

if groq_api_key:
    print(f"groq api key exists and begins with {groq_api_key[:5]}")

OpenAI API Key exists and begins with sk-pr
Google API Key exists and begins with AIzaS
groq api key exists and begins with gsk_V


In [7]:
request ="Please come up with a challenging,nuanced question that i can ask a number of llms to evaluate their intelligence."
request +="Answer only with the question,no explanation."

messages=[{"role":"user","content":request}]
messages

[{'role': 'user',
  'content': 'Please come up with a challenging,nuanced question that i can ask a number of llms to evaluate their intelligence.Answer only with the question,no explanation.'}]

In [8]:
#create an instance of openai class for python client library
#generate a question from openai

openai=OpenAI()
response=openai.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=messages,
)
question=response.choices[0].message.content
print(question)


What is the significance of Gödel's incompleteness theorems in the field of mathematics?


In [9]:
messages=[{"role":"user","content":question}]

answers=[]
competitors=[]

In [10]:
#Generate answers with different models like openai,google,groq

#openai



model_name="gpt-3.5-turbo"

response = openai.chat.completions.create(
    model=model_name,
    messages=messages)

answer =response.choices[0].message.content


In [11]:
display(Markdown(answer))

answers.append(answer)
competitors.append(model_name)

Gödel's incompleteness theorems are considered to be some of the most important results in the field of mathematics. They have had a profound impact on the philosophy of mathematics, logic, and computer science.

The first incompleteness theorem states that in any formal system that is sufficiently complex to represent basic arithmetic, there will always be statements that are true but cannot be proven within that system. This result has challenged the idea of complete and consistent mathematical theories.

The second incompleteness theorem states that no consistent formal system can prove its own consistency. This has important implications for the foundations of mathematics, as it implies that there will always be statements that are undecidable within a given system.

Overall, Gödel's incompleteness theorems have shown that there are limits to what can be achieved with formal systems of mathematics. They have inspired new approaches to foundational questions in mathematics and have sparked debates about the nature of mathematical truth and reasoning. They have also had practical implications in computer science, as they have led to the development of formal methods for verifying the correctness of computer programs.

In [12]:
# gemini

gemini =OpenAI(api_key=google_api_key,base_url="https://generativelanguage.googleapis.com/v1beta/openai/")

model_name ='gemini-2.5-flash'

response =gemini.chat.completions.create(
    model=model_name,
    messages=messages
)

answer = response.choices[0].message.content

In [13]:
display(Markdown(answer))
answers.append(answer)
competitors.append(model_name)

Gödel's Incompleteness Theorems, published by Kurt Gödel in 1931, are arguably some of the most profound and far-reaching results in the history of mathematics and logic. They fundamentally reshaped our understanding of what mathematics is and what its limits are.

Here's a breakdown of their significance:

1.  **Shattering Hilbert's Program (The End of a Dream):**
    *   **Hilbert's Vision:** In the early 20th century, mathematician David Hilbert proposed a grand program to axiomatize all of mathematics. His goal was to find a *complete*, *consistent*, and *decidable* set of axioms for mathematics.
        *   **Complete:** Every true statement could be proven from the axioms.
        *   **Consistent:** No contradictions could be derived from the axioms (you couldn't prove both a statement and its negation).
        *   **Decidable:** There would be an algorithm to determine whether any given statement was provable or not.
    *   **Gödel's Blow:** Gödel's theorems dealt a decisive blow to this foundationalist program. They showed that such a system, encompassing arithmetic, is impossible.

2.  **The First Incompleteness Theorem: Inherent Limits of Formal Systems:**
    *   **What it states:** For any consistent formal system (like arithmetic or set theory) that is powerful enough to include basic arithmetic, there will always be true statements that cannot be proven *within that system*. These are called "undecidable" statements.
    *   **Significance:**
        *   **Truth vs. Provability:** It clearly distinguishes between *truth* and *provability*. A statement can be objectively true, but not provable from a given set of axioms.
        *   **No "Final" Axiomatic System:** It implies that no single, finite, consistent set of axioms can ever capture all mathematical truths, even for something as fundamental as arithmetic. There will always be truths that lie beyond the reach of any particular formal system.
        *   **The Gödel Sentence:** Gödel constructed a self-referential statement (often paraphrased as "This statement cannot be proven within this system"). If it *were* provable, it would be false, leading to a contradiction. Therefore, it must be unprovable; and if it's unprovable, then the statement itself is true.

3.  **The Second Incompleteness Theorem: Inability to Prove Own Consistency:**
    *   **What it states:** For any consistent formal system that is powerful enough to include basic arithmetic, the consistency of the system cannot be proven *within the system itself*.
    *   **Significance:**
        *   **No Self-Justification:** To prove a system's consistency, one needs a meta-system that is "stronger" or at least equally powerful, and whose own consistency must be assumed. This prevents a "self-contained" foundation for mathematics where a system rigorously proves its own reliability.
        *   **Impact on Foundations:** It means we can never be *absolutely certain* that our most fundamental mathematical systems (like ZFC set theory, the current foundation for most of mathematics) are entirely free of contradictions, *without assuming the consistency of an even stronger system*.

4.  **Broader Philosophical and Mathematical Impact:**
    *   **Nature of Mathematical Knowledge:** The theorems profoundly shifted our understanding of mathematical knowledge. They suggest that mathematical intuition and creativity play an irreducible role, as no formal system can fully capture all of mathematics.
    *   **Stimulated New Research:** Gödel's work didn't destroy mathematics; rather, it opened up entirely new fields and lines of inquiry, such as proof theory (investigating what can be proven with various formal systems) and model theory (studying the relationship between formal theories and their interpretations).
    *   **Continuum Hypothesis:** Later work by Paul Cohen (showing the independence of the Continuum Hypothesis from ZFC) demonstrated another aspect of incompleteness: some questions are simply undecidable within our standard axiomatic framework.
    *   **Limits of Computation/AI:** While not directly about computation, the ideas resonate with the limits of algorithmic processes (e.g., Turing's Halting Problem, which has strong parallels to Gödel's work). It suggests inherent limits to what any purely mechanical or algorithmic system can achieve in terms of knowledge acquisition and self-verification.

In essence, Gödel's Incompleteness Theorems showed that mathematics, despite its logical rigor, is fundamentally open-ended. It revealed inherent limits to formalization and demonstrated that the quest for a completely self-contained and exhaustive foundation for all mathematical truth is unattainable. This revelation, rather than diminishing mathematics, highlighted its richness, complexity, and the persistent need for human insight beyond mere algorithmic deduction.

In [14]:
#ollama
groq =OpenAI(api_key=groq_api_key,base_url="https://api.groq.com/openai/v1")

model_name ="llama-3.3-70b-versatile"

response =groq.chat.completions.create(
    model=model_name,
    messages=messages
)

answer = response.choices[0].message.content

In [15]:
answers.append(answer)
competitors.append(model_name)
display(Markdown(answer))

Gödel's incompleteness theorems are two theorems in mathematical logic that have had a profound impact on the foundations of mathematics. They were first proposed by Austrian mathematician Kurt Gödel in 1931 and have since been widely accepted as a fundamental aspect of mathematical theory.

**The First Incompleteness Theorem:**

The first incompleteness theorem states that any formal system that is powerful enough to describe basic arithmetic is either incomplete or inconsistent. In other words, if a system is consistent, it must be incomplete, and if it is complete, it must be inconsistent.

To understand this, consider a formal system as a set of axioms and rules for deducing theorems from those axioms. The system is said to be complete if every statement that can be expressed in the system can be either proved or disproved within the system. The system is said to be consistent if it is not possible to prove both a statement and its negation within the system.

Gödel showed that any formal system that is powerful enough to describe basic arithmetic (such as Peano arithmetic) cannot be both complete and consistent. This means that there must be statements in the system that cannot be proved or disproved within the system, and these statements are known as undecidable statements.

**The Second Incompleteness Theorem:**

The second incompleteness theorem states that if a formal system is consistent, then it cannot prove its own consistency. In other words, a consistent system cannot prove that it is consistent.

This theorem has far-reaching implications for the foundations of mathematics. It means that any attempt to prove the consistency of a formal system using only the axioms and rules of that system is doomed to fail. Instead, the consistency of the system must be assumed as an axiom, or it must be proved using a more powerful system that is not subject to the limitations of the original system.

**Significance of Gödel's Incompleteness Theorems:**

Gödel's incompleteness theorems have had a profound impact on the foundations of mathematics, and their significance can be summarized as follows:

1. **Limits of Formal Systems:** Gödel's theorems show that formal systems have limitations and that there are statements that cannot be proved or disproved within a given system.
2. **Undecidability:** The theorems introduce the concept of undecidability, which means that there are statements that cannot be proved or disproved within a given system.
3. **Consistency and Completeness:** The theorems highlight the trade-off between consistency and completeness in formal systems.
4. **Foundations of Mathematics:** Gödel's theorems have led to a re-evaluation of the foundations of mathematics, with a greater emphasis on rigor and formalism.
5. **Philosophical Implications:** The theorems have implications for the philosophy of mathematics, particularly with regards to the nature of truth and the limits of human knowledge.

**Impact on Mathematics:**

Gödel's incompleteness theorems have had a significant impact on various areas of mathematics, including:

1. **Number Theory:** The theorems have led to a greater understanding of the limitations of formal systems in number theory.
2. **Logic:** The theorems have influenced the development of mathematical logic, particularly with regards to the study of formal systems and undecidability.
3. **Computer Science:** The theorems have implications for the study of computability and the limits of algorithms.
4. **Cryptography:** The theorems have been used to develop secure cryptographic protocols, such as public-key cryptography.

In conclusion, Gödel's incompleteness theorems are a fundamental aspect of mathematical theory, and their significance cannot be overstated. They have had a profound impact on our understanding of the foundations of mathematics, the limits of formal systems, and the nature of truth and knowledge.

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded 

In [16]:
#ollama

ollama =OpenAI(base_url ="http://127.0.0.1:11434/v1",api_key='ollama')

model_name ='llama3.1:8b'

response =ollama.chat.completions.create(
model=model_name,
messages=messages
)

answer = response.choices[0].message.content



In [17]:
display(Markdown(answer))

answers.append(answer)
competitors.append(model_name)

Gödel's incompleteness theorems, proved by Kurt Gödel in 1931, are considered one of the most significant results in the history of mathematics. They have far-reaching implications for various areas of mathematics and philosophy.

**The Incompleteness Theorem:**

Gödel's first incompleteness theorem states that any formal system (such as a mathematical axiomatic system) powerful enough to describe basic arithmetic is either:

1. **Incomplete**: It cannot prove all true statements in the language, or
2. **Inconsistent**: If it proves all true statements, then it must also prove a false statement within the same system.

**The Completeness Theorem:**

Gödel's second incompleteness theorem states that if a formal system is powerful enough to describe basic arithmetic and is consistent, then:

* It cannot be proved to be consistent within that system itself.
* There will always be statements (theorems) in the system that can neither be proved nor disproved.

**Significance:**

Gödel's incompleteness theorems have profound implications for mathematics and philosophy. Some key consequences include:

1. **Limits of formalism**: Gödel's results demonstrate that it is impossible to capture all mathematical truth with a set of formal rules, axioms, or algorithms within an effectively computable system.
2. **Undecidability**: There are undecidable statements in mathematics, meaning they cannot be either proved or disproved within a given formal system.
3. **Consistency vs. Completeness**: The theorem shows that an inconsistent system can prove both true and false statements, but a consistent system cannot prove its own consistency.
4. **Impact on Foundations of Mathematics**: Gödel's incompleteness theorems challenge the idea of mathematical foundations as "complete" or "consistent," which had been an ideal since Hilbert's Program in the 19th century.

**Key Implications:**

Gödel's incompleteness theorems have influenced various areas of mathematics, philosophy, and computer science. Some notable implications include:

1. **Cognitive Science**: Challenges computationalism (the idea that mental processes can be reduced to algorithms) and raises questions about the limits of knowledge.
2. **Mathematical Philosophy**: Gödel's results have triggered a reevaluation of mathematical philosophy, highlighting the difference between syntax and semantics in mathematical reasoning.
3. **Computability Theory**: Incompleteness has influenced our understanding of computation and the limitations of algorithms.

**Legacy:**

Gödel's incompleteness theorems are considered fundamental contributions to mathematics, computer science, and philosophy. They have:

1. **Transformed how we think about formal systems**.
2. **Inspired new areas of research**, such as model theory and modal logic.
3. **Influenced subsequent mathematical theories**, such as category theory, homotopy type theory, and intuitionistic mathematics.

The significance of Gödel's incompleteness theorems continues to inspire ongoing debates in mathematics, philosophy, and computer science. They remain a cornerstone for challenging foundational assumptions and a stimulus for new developments in these fields.

In [18]:
print(competitors)
print(answers)

['gpt-3.5-turbo', 'gemini-2.5-flash', 'llama-3.3-70b-versatile', 'llama3.1:8b']
["Gödel's incompleteness theorems are considered to be some of the most important results in the field of mathematics. They have had a profound impact on the philosophy of mathematics, logic, and computer science.\n\nThe first incompleteness theorem states that in any formal system that is sufficiently complex to represent basic arithmetic, there will always be statements that are true but cannot be proven within that system. This result has challenged the idea of complete and consistent mathematical theories.\n\nThe second incompleteness theorem states that no consistent formal system can prove its own consistency. This has important implications for the foundations of mathematics, as it implies that there will always be statements that are undecidable within a given system.\n\nOverall, Gödel's incompleteness theorems have shown that there are limits to what can be achieved with formal systems of mathemati

In [19]:
for competetior,answer in zip(competitors,answers):
    print(f"competitor:{competetior}\n\n\n{answer}\n")

competitor:gpt-3.5-turbo


Gödel's incompleteness theorems are considered to be some of the most important results in the field of mathematics. They have had a profound impact on the philosophy of mathematics, logic, and computer science.

The first incompleteness theorem states that in any formal system that is sufficiently complex to represent basic arithmetic, there will always be statements that are true but cannot be proven within that system. This result has challenged the idea of complete and consistent mathematical theories.

The second incompleteness theorem states that no consistent formal system can prove its own consistency. This has important implications for the foundations of mathematics, as it implies that there will always be statements that are undecidable within a given system.

Overall, Gödel's incompleteness theorems have shown that there are limits to what can be achieved with formal systems of mathematics. They have inspired new approaches to foundational questio

In [20]:
together =''

for index,answer in enumerate(answers):
    together+= f'# Response from competetior {index+1}\n\n'
    together+= f'{answer} \n\n\n\n'


In [21]:
print(together)

# Response from competetior 1

Gödel's incompleteness theorems are considered to be some of the most important results in the field of mathematics. They have had a profound impact on the philosophy of mathematics, logic, and computer science.

The first incompleteness theorem states that in any formal system that is sufficiently complex to represent basic arithmetic, there will always be statements that are true but cannot be proven within that system. This result has challenged the idea of complete and consistent mathematical theories.

The second incompleteness theorem states that no consistent formal system can prove its own consistency. This has important implications for the foundations of mathematics, as it implies that there will always be statements that are undecidable within a given system.

Overall, Gödel's incompleteness theorems have shown that there are limits to what can be achieved with formal systems of mathematics. They have inspired new approaches to foundational que

In [22]:
print(len(competitors))

4


In [23]:
judge=f"""
You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Respond with JSON and only with JSON,with the following format
{{"results":['first best competetior number','second best competetior number',...]}}

Here are the responses from each competitor:

{together}

Now respond with JSON and nothing else , Dont make markdown formatting or code blocks

"""

In [24]:
print(judge)


You are judging a competition between 4 competitors.
Each model has been given this question:

What is the significance of Gödel's incompleteness theorems in the field of mathematics?

Respond with JSON and only with JSON,with the following format
{"results":['first best competetior number','second best competetior number',...]}

Here are the responses from each competitor:

# Response from competetior 1

Gödel's incompleteness theorems are considered to be some of the most important results in the field of mathematics. They have had a profound impact on the philosophy of mathematics, logic, and computer science.

The first incompleteness theorem states that in any formal system that is sufficiently complex to represent basic arithmetic, there will always be statements that are true but cannot be proven within that system. This result has challenged the idea of complete and consistent mathematical theories.

The second incompleteness theorem states that no consistent formal system can

In [32]:
judge_messages =[{"role":"user","content":judge}]

In [33]:
print(judge_messages)

[{'role': 'user', 'content': '\nYou are judging a competition between 4 competitors.\nEach model has been given this question:\n\nWhat is the significance of Gödel\'s incompleteness theorems in the field of mathematics?\n\nRespond with JSON and only with JSON,with the following format\n{"results":[\'first best competetior number\',\'second best competetior number\',...]}\n\nHere are the responses from each competitor:\n\n# Response from competetior 1\n\nGödel\'s incompleteness theorems are considered to be some of the most important results in the field of mathematics. They have had a profound impact on the philosophy of mathematics, logic, and computer science.\n\nThe first incompleteness theorem states that in any formal system that is sufficiently complex to represent basic arithmetic, there will always be statements that are true but cannot be proven within that system. This result has challenged the idea of complete and consistent mathematical theories.\n\nThe second incompletenes

In [34]:
#for judging call open ai since it is says and better in some angles(not in accuracy)
openai=OpenAI()
content =openai.chat.completions.create(
    model='o3-mini', #this model is a bigger used and pricey but used for better accuracy
    messages=judge_messages
)

result =content.choices[0].message.content

In [35]:
print(result)

{"results": ["2", "4", "3", "1"]}


In [38]:
print(competitors)

['gpt-3.5-turbo', 'gemini-2.5-flash', 'llama-3.3-70b-versatile', 'llama3.1:8b']


In [None]:
result_dict =json.loads(result)
print(result_dict)
ranks =result_dict['results']
print(ranks)


{'results': ['2', '4', '3', '1']}
['2', '4', '3', '1']


In [48]:
print(competitors)

['gpt-3.5-turbo', 'gemini-2.5-flash', 'llama-3.3-70b-versatile', 'llama3.1:8b']


In [52]:
for index,rank in enumerate(ranks):
    competetior=competitors[int(rank)-1]
    print(f"rank:{index+1}\tmodel_name: {competetior}")

rank:1	model_name: gemini-2.5-flash
rank:2	model_name: llama3.1:8b
rank:3	model_name: llama-3.3-70b-versatile
rank:4	model_name: gpt-3.5-turbo
