In [1]:
# Import required packages
import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display


In [2]:
# Load environment variables
load_dotenv(override=True)


True

In [3]:
# Initialize API clients
openai = OpenAI()
claude = Anthropic()


In [4]:
# Original question generation
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


If you could choose one historical figure to converse with for an hour, who would it be, what specific questions would you ask them, and how do you believe their responses could shape or challenge your current understanding of a contemporary issue?


In [5]:
# Get responses from multiple models
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

# OpenAI
response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
answer = response.choices[0].message.content
competitors.append("gpt-4o-mini")
answers.append(answer)
display(Markdown(answer))

# Claude
response = claude.messages.create(model="claude-3-7-sonnet-latest", messages=messages, max_tokens=1000)
answer = response.content[0].text
competitors.append("claude-3-7-sonnet-latest")
answers.append(answer)
display(Markdown(answer))


If I could choose one historical figure to converse with for an hour, it would be Mahatma Gandhi. His philosophies of non-violence, civil disobedience, and grassroots activism have had a profound impact on social justice movements worldwide.

### Specific Questions to Ask:

1. **Moral Philosophy and Tactics**: How did you reconcile the need for political action with your commitment to non-violence, especially when faced with violent opposition?
   
2. **Globalization and Modern Politics**: In today's globalized world, how would you adapt your methods and philosophies to address issues like climate change, economic inequality, and political corruption?

3. **Youth Activism**: What role do you believe young people should play in social movements today, and what advice would you give to them about sustaining their motivation and passion?

4. **Cultural Dialogue**: How do you view the relationship between different cultural identities in the pursuit of social justice? How can we foster empathy and understanding in a polarized society?

5. **Technology and Activism**: With the advent of social media and technology, how do you think these tools can be used effectively for nonviolent activism, and what potential pitfalls should activists be wary of?

### Potential Impact on Contemporary Issues:

Gandhi's responses could profoundly challenge and shape my understanding of contemporary issues, especially regarding non-violent protest and grassroots movements. For instance:

- **Non-Violent Protest**: His insights might reinforce the effectiveness of peaceful protests in the face of oppression, encouraging current movements to uphold these principles even when confronted with brutality.

- **Adaptation of Philosophy**: By discussing how his philosophies might adapt to modern problems like climate change, I could better understand the importance of integrating moral imperatives with urgent global challenges, reshaping how I view activism in various contexts.

- **Youth Empowerment**: Gandhi's perspective on youth activism could inspire innovative strategies for engaging young people in current social issues and highlight the importance of intergenerational dialogue.

- **Cultural Understanding**: His thoughts on cultural dialogue could reshape my approach to bridging divides in contemporary society, emphasizing the need for empathy in discussions around race, class, and identity.

- **Technology's Role**: His views on the use of technology in activism would provide a critical lens for evaluating both the benefits and drawbacks of digital platforms in mobilizing social movements.

Overall, a conversation with Gandhi could not only reinforce the value of his teachings but also provide new strategies and insights for addressing the complex challenges we face today.

# A Conversation with Ada Lovelace

I would choose Ada Lovelace, the 19th-century mathematician recognized as the first computer programmer. In our hour together, I would explore:

1. **Her algorithmic vision**: "When writing notes for Babbage's Analytical Engine, you foresaw that machines could manipulate symbols beyond mere calculation. What specific applications beyond mathematics did you envision for computing machines?"

2. **Creative computing**: "You described the 'poetical science' concept that married imagination with mathematical reasoning. How would you apply this framework to evaluate today's AI systems?"

3. **Gender and science**: "What strategies helped you succeed in mathematics despite societal constraints on women's intellectual pursuits?"

4. **Ethical foresight**: "Did you anticipate any potential societal consequences of mechanical thinking machines?"

I believe her responses would challenge our contemporary separation between technical and humanistic approaches to computing. Her holistic understanding might offer alternative frameworks for addressing AI ethics questions, particularly around creativity and human-machine collaboration. Her perspective could illuminate blind spots in our current technology development paradigms that emphasize efficiency over broader human values.

In [6]:
# NEW: Chain of Thought Evaluation
# First, let's create a detailed evaluation prompt that encourages step-by-step reasoning

evaluation_prompt = f"""You are an expert evaluator of AI responses. Your task is to analyze and rank the following responses to this question:

{question}

Please follow these steps in your evaluation:

1. For each response:
   - Identify the main arguments presented
   - Evaluate the clarity and coherence of the reasoning
   - Assess the depth and breadth of the analysis
   - Note any unique insights or perspectives

2. Compare the responses:
   - How do they differ in their approach?
   - Which response demonstrates the most sophisticated understanding?
   - Which response provides the most practical and actionable insights?

3. Provide your final ranking with detailed justification for each position.

Here are the responses:

{'\\n\\n'.join([f'Response {i+1} ({competitors[i]}):\\n{answer}' for i, answer in enumerate(answers)])}

Please provide your evaluation in JSON format with the following structure:
{{
    "detailed_analysis": [
        {{"competitor": "name", "strengths": [], "weaknesses": [], "unique_aspects": []}},
        ...
    ],
    "comparative_analysis": "detailed comparison of responses",
    "final_ranking": ["ranked competitor numbers"],
    "justification": "detailed explanation of the ranking"
}}"""


In [7]:
# Get the detailed evaluation
evaluation_messages = [{"role": "user", "content": evaluation_prompt}]

response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=evaluation_messages,
)
detailed_evaluation = response.choices[0].message.content
print(detailed_evaluation)


```json
{
    "detailed_analysis": [
        {
            "competitor": "gpt-4o-mini",
            "strengths": [
                "Wide range of questions touching on multiple relevant contemporary issues.",
                "Deep understanding of Gandhi's philosophies and their applications today.",
                "Focus on youth activism and cultural dialogue, emphasizing empathy."
            ],
            "weaknesses": [
                "Responses could be more succinct and focused on fewer questions for deeper analysis."
            ],
            "unique_aspects": [
                "Integrates Gandhi's principles with modern technological challenges and social movements."
            ]
        },
        {
            "competitor": "claude-3-7-sonnet-latest",
            "strengths": [
                "Focuses on a less conventional historical figure, Ada Lovelace, providing a unique perspective.",
                "Questions engage with both technical and ethical dimensions of 

In [9]:
# Parse and display the results in a more readable format

# Clean up the JSON string by removing markdown code block markers
json_str = detailed_evaluation.replace("```json", "").replace("```", "").strip()

evaluation_dict = json.loads(json_str)

print("Detailed Analysis:")
for analysis in evaluation_dict["detailed_analysis"]:
    print(f"\nCompetitor: {analysis['competitor']}")
    print("Strengths:")
    for strength in analysis['strengths']:
        print(f"- {strength}")
    print("\nWeaknesses:")
    for weakness in analysis['weaknesses']:
        print(f"- {weakness}")
    print("\nUnique Aspects:")
    for aspect in analysis['unique_aspects']:
        print(f"- {aspect}")

print("\nComparative Analysis:")
print(evaluation_dict["comparative_analysis"])

print("\nFinal Ranking:")
for i, rank in enumerate(evaluation_dict["final_ranking"]):
    print(f"{i+1}. {competitors[int(rank)-1]}")

print("\nJustification:")
print(evaluation_dict["justification"])


Detailed Analysis:

Competitor: gpt-4o-mini
Strengths:
- Wide range of questions touching on multiple relevant contemporary issues.
- Deep understanding of Gandhi's philosophies and their applications today.
- Focus on youth activism and cultural dialogue, emphasizing empathy.

Weaknesses:
- Responses could be more succinct and focused on fewer questions for deeper analysis.

Unique Aspects:
- Integrates Gandhi's principles with modern technological challenges and social movements.

Competitor: claude-3-7-sonnet-latest
Strengths:
- Focuses on a less conventional historical figure, Ada Lovelace, providing a unique perspective.
- Questions engage with both technical and ethical dimensions of computing.
- Encourages a holistic understanding of technology and its societal implications.

Weaknesses:
- The scope of questions is narrower and less diverse compared to Gandhi's approach.

Unique Aspects:
- Addresses the intersection of gender and science, adding depth to the inquiry.

Comparativ