In [14]:
import json
import os
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

In [15]:
load_dotenv(override=True)

True

In [16]:
openai_api_key = os.getenv("OPENAI_API_KEY")
google_api_key = os.getenv("GOOGLE_API_KEY")

if openai_api_key:
    print(f"OpenAI API Key: {openai_api_key[:8]}")
else:
    print("OpenAI API Key not found")

if google_api_key:
    print(f"Google API Key: {google_api_key[:8]}")
else:
    print("Google API Key not found")

OpenAI API Key: sk-proj-
Google API Key: AIzaSyAt


In [17]:
request = "Please, define a complex question that can be answered by different LLMs in order to evaluation their intelligence"
messages = [{"role": "user", "content": request}]

In [18]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)
question = response.choices[0].message.content
print(question)

Sure! Here’s a complex question that can be used to evaluate the understanding and reasoning abilities of different LLMs:

**Question:**

"In the context of climate change, discuss the interplay between economic development, social equity, and environmental sustainability. Provide examples of current policies or initiatives in three different countries that illustrate this interplay, and analyze their effectiveness in achieving a balance among these three factors."

---

### Evaluation Criteria:

1. **Depth of Analysis**: Evaluate how thoroughly the LLM addresses the complexity of the issue, discussing the nuances of each factor.
   
2. **Use of Examples**: Look for specific, well-researched examples from credible sources that improve the answer's context and relevance.
   
3. **Critical Thinking**: Assess how well the model synthesizes the information and draws connections between economic development, social equity, and environmental sustainability, including potential trade-offs and

In [19]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

Open AI model

In [20]:
model_name = "gpt-4o-mini"
response = openai.chat.completions.create(
    model=model_name,
    messages=messages
)
answer = response.choices[0].message.content
display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

### Response:

The interplay between economic development, social equity, and environmental sustainability is a critical challenge in the context of climate change. Balancing these three factors requires nuanced understanding and effective policy implementation. Each factor influences the others, creating a complex web of interactions that can facilitate or hinder progress. Below, I will discuss policies and initiatives from three countries—Denmark, India, and Brazil—that illustrate this interplay and analyze their effectiveness.

#### Economic Development

Economic development often drives increased consumption of resources, contributing to environmental degradation. However, it can also provide the means for financing sustainable initiatives. For example, Denmark has invested significantly in renewable energy, particularly wind power. As a leader in wind energy production, Denmark has reduced its carbon emissions while simultaneously fostering economic growth. According to the Danish Energy Agency, over 40% of the nation's energy came from wind in 2020, highlighting the effectiveness of their green energy transition. The economic benefits have also been significant, with the wind sector providing thousands of jobs and stimulating new high-tech industries.

#### Social Equity

Social equity is essential for ensuring that all segments of society benefit from economic growth and environmental initiatives. In India, the Pradhan Mantri Ujjwala Yojana (PMUY) is a government initiative aimed at providing clean cooking fuel to women from low-income households. The initiative not only reduces air pollution from traditional biomass fuels, which can cause severe health issues, but also empowers women by saving time and improving household resources. Nevertheless, challenges remain in ensuring that this initiative leads to long-term social and economic betterment for all beneficiaries, as disparities still exist among different socioeconomic groups.

#### Environmental Sustainability

Environmental sustainability is crucial for maintaining the planet's ecosystems and resources for future generations. Brazil’s Forest Code serves as a notable example of an attempt to balance economic development with environmental conservation. The Code regulates land use in the Amazon rainforest, aiming to reduce deforestation and promote sustainable agricultural practices. While it has faced criticism for loopholes and enforcement challenges—particularly in regions where illegal logging occurs—it illustrates the difficulty of integrating economic incentives with the need for environmental protection. Moreover, successful implementation of the Forest Code can result in positive outcomes for indigenous communities that rely on the forest for their livelihoods.

### Interplay Analysis

The effectiveness of these initiatives can be analyzed through the lens of the connections between economic growth, social equity, and environmental sustainability:

1. **Denmark's Transition to Wind Energy**: Denmark's success demonstrates that economic development and environmental sustainability can go hand in hand. However, it is essential to ensure that the benefits of this transition also lead to social equity—such as ensuring job training programs are accessible to a diverse population.

2. **India's Clean Cooking Initiative**: This shows a direct link between social equity and environmental sustainability, but it raises questions about economic development—specifically, ongoing sustainability of fuel access and health benefits. The long-term effectiveness depends on continuous support and community engagement.

3. **Brazil's Forest Code**: This policy exemplifies the challenges of intertwining economic growth with environmental needs. While creating regulations that promote sustainability, the implementation reveals tensions with economic interests driven by agriculture and deforestation.

### Conclusion

In conclusion, the interplay among economic development, social equity, and environmental sustainability is multifaceted and deeply interconnected. While Denmark has demonstrated a successful model of integrating these factors through technological advancement in renewable energy, India and Brazil show the challenges that arise when social needs and environmental policies must simultaneously be addressed. Each country's specific context greatly influences the effectiveness of its policies. Ongoing evaluation and adaptive governance will be critical in achieving a sustainable balance among these three essential factors globally. 

Ultimately, a truly effective approach to climate change will recognize the necessity of valuing socio-economic diversity while harmonizing environmental stewardship, paving the way for resilient and inclusive future economies.

Gemini model

In [21]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.0-flash"
response = gemini.chat.completions.create(
    model=model_name,
    messages=messages
)
answer = response.choices[0].message.content
display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


This is a great, multi-faceted question! It requires not just knowledge but also the ability to synthesize information, apply critical thinking, and communicate clearly. Here's a breakdown of how different LLMs might respond and how to further evaluate their answers:

**Expected Elements of a Strong Response:**

*   **Conceptual Understanding:**
    *   **Economic Development:**  Recognizes it as more than just GDP growth.  Includes aspects like innovation, job creation, infrastructure development, and poverty reduction.
    *   **Social Equity:** Understands it involves fairness in access to resources, opportunities, and rights, particularly for marginalized groups. This includes considerations like income inequality, healthcare access, education, and political participation.
    *   **Environmental Sustainability:**  Defines it as meeting the needs of the present without compromising the ability of future generations to meet their own needs.  This includes resource management, pollution reduction, biodiversity conservation, and climate change mitigation/adaptation.
    *   **Interplay:**  Recognizes the inherent tensions and synergies between these three goals.  For example:
        *   **Tensions:** Rapid economic growth can lead to environmental degradation and increased inequality.  Strict environmental regulations can sometimes hinder economic development, especially in developing countries. Policies aimed at social equity can sometimes be perceived as burdens on economic competitiveness.
        *   **Synergies:** Green technologies can drive economic growth while reducing environmental impact.  Investing in education and healthcare can improve both social equity and economic productivity.  Sustainable resource management can provide long-term economic benefits.

*   **Country-Specific Examples:**  Provides *concrete* examples of policies or initiatives, ideally mentioning specific laws, programs, or projects. The countries chosen should ideally offer diverse approaches and contexts (e.g., a developed nation, a developing nation, a resource-rich nation). For each example, the LLM should:
    *   **Describe the Policy/Initiative:**  Provide a clear summary of what the policy is designed to do.
    *   **Analyze its Effectiveness:**  This is critical. The LLM needs to go beyond simply stating the policy exists and evaluate its actual impact on economic development, social equity, and environmental sustainability. This requires citing data, studies, or reports (even if summarized).
    *   **Highlight the Trade-offs:**  Identify any potential negative consequences or unintended side effects of the policy.

*   **Critical Thinking & Synthesis:**  The LLM should be able to:
    *   **Connect the Dots:**  Explicitly explain how the chosen policies or initiatives illustrate the interplay between economic development, social equity, and environmental sustainability.
    *   **Offer a Balanced Perspective:**  Acknowledge the complexities and challenges involved in balancing these three factors.  Avoid overly simplistic or optimistic conclusions.
    *   **Suggest Improvements:**  (Optional, but highly valuable)  Propose ways to make the policies or initiatives more effective in achieving a better balance.

**How to Evaluate LLM Responses:**

1.  **Fact-Checking:**  Verify the accuracy of the information presented, especially regarding the specific policies and initiatives. Use reputable sources (e.g., government websites, academic journals, reports from international organizations like the UN, World Bank, IPCC) to confirm the details.

2.  **Source Evaluation:**  While LLMs might not explicitly cite sources (depends on the model and settings), you should be able to search for information that supports their claims. If the LLM makes a statement about the effectiveness of a policy, look for data or studies that back it up.

3.  **Depth of Understanding:**  Don't just look for the right keywords.  Assess whether the LLM truly *understands* the concepts and their interrelationships.  Look for evidence of critical thinking and nuanced analysis.  Is it just regurgitating information, or is it synthesizing it in a meaningful way?

4.  **Contextual Awareness:**  Does the LLM demonstrate an understanding of the specific economic, social, and political contexts of the countries it discusses?  For example, does it recognize that policies that work in one country may not be suitable for another?

5.  **Bias Detection:**  Be aware of potential biases in the LLM's response.  For example, does it overemphasize economic growth at the expense of social equity or environmental sustainability?  Does it present a particular political ideology as the only viable solution?

**Example Country Examples (Illustrative, you'd expect the LLM to expand on these):**

*   **Costa Rica:**  Focus on its efforts in ecotourism and reforestation.  Analyze how these initiatives have contributed to economic development, biodiversity conservation, and social equity (e.g., job creation in rural communities).  Also discuss potential challenges, such as the impact of tourism on local cultures and the potential for greenwashing.

*   **Germany:**  Examine its "Energiewende" (energy transition) policy.  Assess its impact on renewable energy adoption, carbon emissions reduction, and energy security.  Analyze the social and economic costs of the transition, including the impact on jobs in the coal industry and the potential for increased energy prices for consumers.

*   **China:**  Analyze its investments in renewable energy and its efforts to reduce air pollution in major cities.  Evaluate the effectiveness of these policies in achieving environmental sustainability.  Also consider the social and economic trade-offs, such as the potential for increased inequality and the impact on economic growth.

By carefully evaluating the LLM's response using these criteria, you can gain a valuable understanding of its capabilities and limitations.  Remember to focus on the depth of analysis, the quality of the examples, and the evidence of critical thinking. Good luck!


Ollama model

In [None]:
!ollama pull llama3.2

In [23]:
ollama = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
model_name = "llama3.2"
response = ollama.chat.completions.create(
    model=model_name,
    messages=messages
)
display(Markdown(answer))
answer = response.choices[0].message.content
competitors.append(model_name)
answers.append(answer)

This is a great, multi-faceted question! It requires not just knowledge but also the ability to synthesize information, apply critical thinking, and communicate clearly. Here's a breakdown of how different LLMs might respond and how to further evaluate their answers:

**Expected Elements of a Strong Response:**

*   **Conceptual Understanding:**
    *   **Economic Development:**  Recognizes it as more than just GDP growth.  Includes aspects like innovation, job creation, infrastructure development, and poverty reduction.
    *   **Social Equity:** Understands it involves fairness in access to resources, opportunities, and rights, particularly for marginalized groups. This includes considerations like income inequality, healthcare access, education, and political participation.
    *   **Environmental Sustainability:**  Defines it as meeting the needs of the present without compromising the ability of future generations to meet their own needs.  This includes resource management, pollution reduction, biodiversity conservation, and climate change mitigation/adaptation.
    *   **Interplay:**  Recognizes the inherent tensions and synergies between these three goals.  For example:
        *   **Tensions:** Rapid economic growth can lead to environmental degradation and increased inequality.  Strict environmental regulations can sometimes hinder economic development, especially in developing countries. Policies aimed at social equity can sometimes be perceived as burdens on economic competitiveness.
        *   **Synergies:** Green technologies can drive economic growth while reducing environmental impact.  Investing in education and healthcare can improve both social equity and economic productivity.  Sustainable resource management can provide long-term economic benefits.

*   **Country-Specific Examples:**  Provides *concrete* examples of policies or initiatives, ideally mentioning specific laws, programs, or projects. The countries chosen should ideally offer diverse approaches and contexts (e.g., a developed nation, a developing nation, a resource-rich nation). For each example, the LLM should:
    *   **Describe the Policy/Initiative:**  Provide a clear summary of what the policy is designed to do.
    *   **Analyze its Effectiveness:**  This is critical. The LLM needs to go beyond simply stating the policy exists and evaluate its actual impact on economic development, social equity, and environmental sustainability. This requires citing data, studies, or reports (even if summarized).
    *   **Highlight the Trade-offs:**  Identify any potential negative consequences or unintended side effects of the policy.

*   **Critical Thinking & Synthesis:**  The LLM should be able to:
    *   **Connect the Dots:**  Explicitly explain how the chosen policies or initiatives illustrate the interplay between economic development, social equity, and environmental sustainability.
    *   **Offer a Balanced Perspective:**  Acknowledge the complexities and challenges involved in balancing these three factors.  Avoid overly simplistic or optimistic conclusions.
    *   **Suggest Improvements:**  (Optional, but highly valuable)  Propose ways to make the policies or initiatives more effective in achieving a better balance.

**How to Evaluate LLM Responses:**

1.  **Fact-Checking:**  Verify the accuracy of the information presented, especially regarding the specific policies and initiatives. Use reputable sources (e.g., government websites, academic journals, reports from international organizations like the UN, World Bank, IPCC) to confirm the details.

2.  **Source Evaluation:**  While LLMs might not explicitly cite sources (depends on the model and settings), you should be able to search for information that supports their claims. If the LLM makes a statement about the effectiveness of a policy, look for data or studies that back it up.

3.  **Depth of Understanding:**  Don't just look for the right keywords.  Assess whether the LLM truly *understands* the concepts and their interrelationships.  Look for evidence of critical thinking and nuanced analysis.  Is it just regurgitating information, or is it synthesizing it in a meaningful way?

4.  **Contextual Awareness:**  Does the LLM demonstrate an understanding of the specific economic, social, and political contexts of the countries it discusses?  For example, does it recognize that policies that work in one country may not be suitable for another?

5.  **Bias Detection:**  Be aware of potential biases in the LLM's response.  For example, does it overemphasize economic growth at the expense of social equity or environmental sustainability?  Does it present a particular political ideology as the only viable solution?

**Example Country Examples (Illustrative, you'd expect the LLM to expand on these):**

*   **Costa Rica:**  Focus on its efforts in ecotourism and reforestation.  Analyze how these initiatives have contributed to economic development, biodiversity conservation, and social equity (e.g., job creation in rural communities).  Also discuss potential challenges, such as the impact of tourism on local cultures and the potential for greenwashing.

*   **Germany:**  Examine its "Energiewende" (energy transition) policy.  Assess its impact on renewable energy adoption, carbon emissions reduction, and energy security.  Analyze the social and economic costs of the transition, including the impact on jobs in the coal industry and the potential for increased energy prices for consumers.

*   **China:**  Analyze its investments in renewable energy and its efforts to reduce air pollution in major cities.  Evaluate the effectiveness of these policies in achieving environmental sustainability.  Also consider the social and economic trade-offs, such as the potential for increased inequality and the impact on economic growth.

By carefully evaluating the LLM's response using these criteria, you can gain a valuable understanding of its capabilities and limitations.  Remember to focus on the depth of analysis, the quality of the examples, and the evidence of critical thinking. Good luck!


In [24]:
print(competitors)
print(answers)

['gpt-4o-mini', 'gemini-2.0-flash', 'llama3.2']
["### Response:\n\nThe interplay between economic development, social equity, and environmental sustainability is a critical challenge in the context of climate change. Balancing these three factors requires nuanced understanding and effective policy implementation. Each factor influences the others, creating a complex web of interactions that can facilitate or hinder progress. Below, I will discuss policies and initiatives from three countries—Denmark, India, and Brazil—that illustrate this interplay and analyze their effectiveness.\n\n#### Economic Development\n\nEconomic development often drives increased consumption of resources, contributing to environmental degradation. However, it can also provide the means for financing sustainable initiatives. For example, Denmark has invested significantly in renewable energy, particularly wind power. As a leader in wind energy production, Denmark has reduced its carbon emissions while simultane

In [25]:
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")

Competitor: gpt-4o-mini

### Response:

The interplay between economic development, social equity, and environmental sustainability is a critical challenge in the context of climate change. Balancing these three factors requires nuanced understanding and effective policy implementation. Each factor influences the others, creating a complex web of interactions that can facilitate or hinder progress. Below, I will discuss policies and initiatives from three countries—Denmark, India, and Brazil—that illustrate this interplay and analyze their effectiveness.

#### Economic Development

Economic development often drives increased consumption of resources, contributing to environmental degradation. However, it can also provide the means for financing sustainable initiatives. For example, Denmark has invested significantly in renewable energy, particularly wind power. As a leader in wind energy production, Denmark has reduced its carbon emissions while simultaneously fostering economic growth

In [26]:
together = ""
for index, answer in enumerate(answers):
    together += f"#Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [27]:
print(together)

#Response from competitor 1

### Response:

The interplay between economic development, social equity, and environmental sustainability is a critical challenge in the context of climate change. Balancing these three factors requires nuanced understanding and effective policy implementation. Each factor influences the others, creating a complex web of interactions that can facilitate or hinder progress. Below, I will discuss policies and initiatives from three countries—Denmark, India, and Brazil—that illustrate this interplay and analyze their effectiveness.

#### Economic Development

Economic development often drives increased consumption of resources, contributing to environmental degradation. However, it can also provide the means for financing sustainable initiatives. For example, Denmark has invested significantly in renewable energy, particularly wind power. As a leader in wind energy production, Denmark has reduced its carbon emissions while simultaneously fostering economic gr

In [38]:
judge = f""""You are judge a competition between {len(competitors)} competitors.
To each model was given the same question:

{question}

Your job is to evaluate each response for the clarity and strength of the arguments and order them from best to worst.
Repond with a JSON with the following format:
{{"results": ["1", "2", "3", ...]}}

Here are the responses of each competitor:

{together}

Now respond with JSON just with the order number of the compeitors. Do not include markdown or any code block."""


In [39]:
print(judge)

"You are judge a competition between 3 competitors.
To each model was given the same question:

Sure! Here’s a complex question that can be used to evaluate the understanding and reasoning abilities of different LLMs:

**Question:**

"In the context of climate change, discuss the interplay between economic development, social equity, and environmental sustainability. Provide examples of current policies or initiatives in three different countries that illustrate this interplay, and analyze their effectiveness in achieving a balance among these three factors."

---

### Evaluation Criteria:

1. **Depth of Analysis**: Evaluate how thoroughly the LLM addresses the complexity of the issue, discussing the nuances of each factor.
   
2. **Use of Examples**: Look for specific, well-researched examples from credible sources that improve the answer's context and relevance.
   
3. **Critical Thinking**: Assess how well the model synthesizes the information and draws connections between economic 

In [40]:
judge_messages = [{"role": "user", "content": judge}]

In [41]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="o3-mini",
    messages=judge_messages
)
results = response.choices[0].message.content
print(results)


{"results": ["1", "3", "2"]}


In [None]:
results_dict = json.loads(results)

ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")



Rank 1: gpt-4o-mini
Rank 2: llama3.2
Rank 3: gemini-2.0-flash
