## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [1]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [2]:
# Always remember to do this!
load_dotenv(override=True)

True

In [3]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set (and this is optional)
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key exists and begins gsk_


In [4]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [5]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [6]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


How would you propose balancing the ethical implications of AI decision-making in high-stakes environments, such as healthcare or criminal justice, with the need for efficiency and accuracy in those decisions?


In [7]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]
print(messages)

[{'role': 'user', 'content': 'How would you propose balancing the ethical implications of AI decision-making in high-stakes environments, such as healthcare or criminal justice, with the need for efficiency and accuracy in those decisions?'}]


In [None]:
# The API we know well

model_name = "gpt-4o-mini"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)



Balancing the ethical implications of AI decision-making in high-stakes environments like healthcare and criminal justice with the need for efficiency and accuracy is a complex challenge. Here are several strategies to approach this balance:

1. **Transparency and Explainability**: Ensure that AI systems are transparent and provide explanations for their decisions. Stakeholders, including patients, defendants, and practitioners, should understand how AI reached a decision. Explainability can foster trust and allow for informed consent in health or legal contexts.

2. **Robust Ethical Frameworks**: Establish clear ethical guidelines that prioritize fairness, accountability, and the rights of affected individuals. These frameworks should involve multidisciplinary teams, including ethicists, technologists, legal experts, and representatives from impacted communities to evaluate AI systems' implications.

3. **Rigorous Validation and Testing**: Implement strict validation protocols to ensure AI systems are assessed not just for efficiency and accuracy but also for bias and ethics. Use diverse datasets during testing to minimize disparities and ensure that AI performs equitably across different demographics.

4. **Human Oversight**: Maintain a human-in-the-loop approach where AI aids decision-making rather than replaces it. Qualified professionals should ultimately make final decisions, particularly in high-stakes situations, incorporating AI recommendations while applying their expertise and ethical judgement.

5. **Continuous Monitoring and Feedback Loops**: Develop mechanisms for ongoing monitoring of AI systems in practice. Establish feedback loops for users to report issues, biases, or inaccuracies, enabling iterative improvements and adjustments based on real-world outcomes.

6. **Stakeholder Involvement**: Engage with stakeholders, including community members, patients, and advocacy groups, in the design and implementation phases of AI systems. Their insights can ensure that the technology aligns with the needs and values of those it impacts.

7. **Education and Training**: Provide training for professionals in healthcare and criminal justice on the capabilities and limitations of AI. This will equip them to work alongside AI tools effectively and prepare them to interpret AI recommendations critically.

8. **Regulatory Oversight**: Advocate for regulatory frameworks that set standards for ethical AI usage across industries. Regulatory bodies can provide guidelines for compliance and accountability, ensuring that AI systems protect individual rights while also enhancing efficiency.

9. **Risk Assessment**: Identify and assess potential risks associated with AI decision-making and establish protocols for mitigation. Quantifying risks can help organizations make informed decisions about when and how to deploy AI effectively.

10. **Adaptive Learning**: Incorporate adaptive learning algorithms that can evolve based on new data and outcomes. This can help ensure that AI systems remain relevant, accurate, and sensitive to ethical concerns over time.

By combining these strategies, organizations can harness the efficiencies and accuracy of AI while addressing the ethical implications that arise from its use in high-stakes environments. Balancing these priorities will require ongoing commitment and collaboration among stakeholders to ensure the responsible deployment of AI technologies.

In [None]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-3-7-sonnet-latest"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [9]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.0-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Balancing the ethical implications of AI decision-making in high-stakes environments with the need for efficiency and accuracy is a complex challenge that requires a multi-faceted approach. Here's a proposed strategy:

**1. Prioritize Ethical Frameworks & Principles:**

*   **Human Rights & Dignity:**  AI systems must be designed and implemented in a way that respects and protects fundamental human rights, dignity, and autonomy.  This includes ensuring fairness, justice, and equal opportunities for all individuals, regardless of their background.
*   **Beneficence & Non-Maleficence:**  The AI should strive to maximize benefits and minimize potential harm.  Thorough risk assessments are crucial to identify potential negative consequences and develop mitigation strategies.
*   **Justice & Fairness:**  AI algorithms should be designed to avoid bias and ensure fair treatment for all individuals. This requires careful consideration of data sources, algorithm design, and potential for discriminatory outcomes.  Regular audits and evaluations are necessary to detect and correct any biases.
*   **Transparency & Explainability:**  The decision-making process of AI systems should be transparent and explainable to the extent possible.  This allows humans to understand how the AI reached a particular conclusion and to challenge or override the decision if necessary.  This is particularly important in healthcare and criminal justice, where decisions can have significant consequences for individuals' lives.
*   **Accountability & Responsibility:**  Clear lines of accountability must be established for the development, deployment, and use of AI systems. This includes identifying who is responsible for the AI's actions and who should be held accountable for any errors or biases.

**2. Data Management & Bias Mitigation:**

*   **Data Diversity & Representation:**  AI algorithms are only as good as the data they are trained on.  It is essential to use diverse and representative datasets that accurately reflect the population on which the AI will be used.  Over-sampling underrepresented groups and actively seeking out missing data can help reduce bias.
*   **Bias Detection & Mitigation Techniques:**  Implement rigorous bias detection and mitigation techniques throughout the AI development lifecycle. This includes:
    *   **Pre-processing Techniques:** Cleaning, transforming, and re-balancing data to reduce bias before training the AI model.
    *   **In-processing Techniques:** Modifying the algorithm itself to reduce bias during training (e.g., adversarial debiasing).
    *   **Post-processing Techniques:** Adjusting the AI's output to reduce bias after the model has been trained.
*   **Data Privacy & Security:**  Protecting the privacy and security of data used to train and operate AI systems is paramount.  Implement robust data security measures, including anonymization, encryption, and access controls.

**3. Algorithm Design & Evaluation:**

*   **Explainable AI (XAI) Techniques:**  Prioritize the development and use of XAI techniques to make AI decision-making more transparent and understandable.  This can involve using techniques such as:
    *   **Feature importance analysis:** Identifying the most important factors that influenced the AI's decision.
    *   **Rule-based systems:**  Expressing the AI's logic in the form of human-readable rules.
    *   **Counterfactual explanations:**  Identifying the changes in input that would have led to a different outcome.
*   **Rigorous Testing & Validation:**  Thoroughly test and validate AI systems on diverse datasets to ensure accuracy, reliability, and fairness.  This includes conducting adversarial testing to identify potential vulnerabilities and biases.
*   **Human Oversight & Control:**  Maintain human oversight and control over AI decision-making, especially in high-stakes environments.  Humans should be able to review and override the AI's decisions when necessary, and they should be responsible for making the final decision in critical cases.
*   **Performance Metrics Beyond Accuracy:**  Evaluate AI performance based on a broader range of metrics beyond just accuracy. Consider metrics such as fairness, calibration, and robustness.

**4. Regulatory Frameworks & Guidelines:**

*   **Establish clear regulatory frameworks and guidelines** for the development, deployment, and use of AI in high-stakes environments.  These frameworks should address issues such as:
    *   Data privacy and security
    *   Bias detection and mitigation
    *   Transparency and explainability
    *   Accountability and responsibility
    *   Independent audits and oversight
*   **Promote collaboration between regulators, industry experts, and ethicists** to develop best practices and standards for AI development and deployment.

**5. Education & Training:**

*   **Provide education and training to healthcare professionals, legal professionals, and other stakeholders** on the ethical implications of AI and how to use AI systems responsibly.  This includes training on how to interpret AI outputs, identify potential biases, and exercise appropriate human oversight.
*   **Promote public understanding of AI** and its potential benefits and risks.  This can help build trust in AI systems and ensure that they are used in a way that benefits society as a whole.

**6. Continuous Monitoring & Improvement:**

*   **Continuously monitor the performance of AI systems** after deployment to detect and correct any errors, biases, or unintended consequences.
*   **Establish mechanisms for feedback and reporting** to allow individuals and organizations to report concerns about AI systems and to contribute to their improvement.
*   **Regularly update and retrain AI models** with new data to ensure that they remain accurate, reliable, and fair.

**Addressing the Efficiency Trade-off:**

While prioritizing ethical considerations, we must also address the need for efficiency. Here's how:

*   **Incremental Implementation:**  Start with AI applications focused on specific tasks and areas where the risk of harm is relatively low. This allows for gradual learning and refinement of ethical practices before deploying AI in more sensitive areas.
*   **Hybrid Approach:**  Combine AI decision support with human expertise.  AI can provide insights and recommendations, while humans retain the ultimate decision-making authority. This can improve efficiency while also ensuring that ethical considerations are taken into account.
*   **Invest in Efficient XAI Techniques:**  Develop XAI techniques that are not only accurate but also computationally efficient.  The ability to quickly explain AI decisions is crucial in high-stakes environments.
*   **Optimize Algorithms for Fairness and Efficiency:**  Explore algorithm designs that optimize for both accuracy and fairness.  This may involve using techniques such as constrained optimization or regularization to penalize biased outcomes.

**In Summary:**

Balancing ethical implications with efficiency and accuracy requires a comprehensive and ongoing commitment to ethical principles, data quality, transparency, accountability, and continuous improvement.  By adopting a multi-faceted approach that addresses these key areas, we can harness the power of AI to improve decision-making in high-stakes environments while also protecting fundamental human rights and ensuring fairness for all. This is an iterative process that requires constant adaptation and refinement as AI technology continues to evolve.


In [10]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Balancing the ethical implications of AI decision-making in high-stakes environments like healthcare and criminal justice with the need for efficiency and accuracy requires a multi-faceted approach. Here’s a structured proposal:

### **1. Ethical Frameworks and Principles**
   - **Adopt Established Guidelines**: Use frameworks like the *AI Ethics Guidelines* from the EU, OECD, or IEEE, emphasizing transparency, fairness, accountability, and human oversight.
   - **Human-in-the-Loop (HITL)**: Ensure AI systems support—rather than replace—human judgment, particularly in life-or-death decisions (e.g., medical diagnoses, parole decisions).
   - **Bias Mitigation**: Regularly audit AI models for biases (e.g., racial, gender, socioeconomic) using fairness-aware algorithms and diverse training data.

### **2. Transparency and Explainability**
   - **Interpretable AI**: Prioritize models that provide explainable outputs (e.g., decision trees, SHAP values) over "black-box" systems like deep neural networks where possible.
   - **Right to Explanation**: Allow stakeholders (patients, defendants) to request understandable justifications for AI-driven decisions.
   - **Documentation & Auditing**: Maintain clear records of AI decision processes for regulatory and legal scrutiny.

### **3. Accountability and Legal Compliance**
   - **Clear Liability Standards**: Define whether responsibility lies with developers, deployers, or end-users when AI makes errors.
   - **Regulatory Oversight**: Ensure compliance with sector-specific laws (e.g., HIPAA in healthcare, due process in criminal justice).
   - **Redress Mechanisms**: Establish pathways for appealing or correcting AI-driven decisions (e.g., human review boards).

### **4. Robustness and Accuracy**
   - **High-Quality Data**: Use representative, up-to-date datasets to train models, minimizing errors from outdated or skewed inputs.
   - **Continuous Validation**: Test AI systems in real-world scenarios before deployment and monitor performance post-deployment.
   - **Fallback Protocols**: Implement safeguards where AI uncertainty exceeds a threshold, deferring to human experts.

### **5. Public Trust and Stakeholder Engagement**
   - **Inclusive Development**: Involve ethicists, domain experts (doctors, judges), and affected communities in AI design.
   - **Public Transparency**: Disclose where and how AI is used, without revealing proprietary details that could enable gaming the system.
   - **Education**: Train professionals and the public on AI’s role, limitations, and ethical considerations.

### **6. Trade-offs Between Efficiency and Ethics**
   - **Context-Specific Calibration**: In emergencies (e.g., triage), faster AI decisions may be justified if error rates are low and human review is available later.
   - **Gradual Integration**: Start with low-stakes AI assistance (e.g., administrative tasks) before expanding to critical decisions.

### **Example Applications**
   - **Healthcare**: AI flags potential diagnoses for doctor review, reducing workload while ensuring human oversight.
   - **Criminal Justice**: Risk-assessment tools are used alongside judicial discretion, with mandatory explanations for high-risk classifications.

### **Conclusion**
The key is not to view ethics and efficiency as opposing forces but as complementary goals. By embedding ethical principles into AI design, ensuring accountability, and maintaining human oversight, we can harness AI’s benefits while minimizing harm in high-stakes domains.

In [11]:
groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


Balancing the ethical implications of AI decision-making in high-stakes environments, such as healthcare or criminal justice, with the need for efficiency and accuracy in those decisions requires a multifaceted approach. Here are some proposals to achieve this balance:

**Ethical Considerations**

1. **Transparency and Explainability**: AI systems should be designed to provide transparent and explainable decision-making processes. This can be achieved through techniques such as model interpretability, feature attribution, and model-agnostic explanations.
2. **Fairness and Bias Mitigation**: AI systems should be regularly audited for biases and fairness issues. This can be done by analyzing the data used to train the models, as well as the outcomes of the decision-making processes.
3. **Human Oversight and Review**: Implement human oversight and review processes to detect and correct potential errors or biases in AI decision-making.
4. **Value Alignment**: Ensure that AI systems are aligned with human values, such as respect for human dignity, autonomy, and fairness.

**Efficiency and Accuracy**

1. **Data Quality and Availability**: Ensure that high-quality, diverse, and representative data is available for training and testing AI models.
2. **Model Validation and Testing**: Regularly validate and test AI models to ensure they are performing accurately and efficiently.
3. **Continuous Learning and Improvement**: Implement continuous learning and improvement mechanisms, such as online learning, transfer learning, and human-in-the-loop feedback.
4. **Scalability and Automation**: Leverage AI to automate routine and repetitive tasks, freeing up human resources for more complex and high-value tasks.

**Balancing Ethical Implications and Efficiency**

1. **Hybrid Approaches**: Implement hybrid approaches that combine the strengths of human decision-making with the efficiency and accuracy of AI. For example, human-AI collaboration, where AI provides recommendations and humans make the final decisions.
2. **Context-Dependent Decision-Making**: Develop AI systems that can adapt to different contexts and environments, taking into account the specific ethical implications and requirements of each situation.
3. **Regulatory Frameworks**: Establish regulatory frameworks that provide clear guidelines and standards for the development and deployment of AI systems in high-stakes environments.
4. **Stakeholder Engagement**: Engage with stakeholders, including patients, defendants, and community members, to ensure that their concerns and values are taken into account in the development and deployment of AI systems.

**Implementation Roadmap**

1. **Conduct Ethical Impact Assessments**: Conduct thorough ethical impact assessments to identify potential risks and benefits of AI decision-making in high-stakes environments.
2. **Develop and Implement Guidelines**: Develop and implement guidelines and standards for the development and deployment of AI systems in high-stakes environments.
3. **Establish Oversight and Review Mechanisms**: Establish oversight and review mechanisms to detect and correct potential errors or biases in AI decision-making.
4. **Provide Education and Training**: Provide education and training for developers, deployers, and users of AI systems to ensure they understand the ethical implications and requirements of AI decision-making in high-stakes environments.

By following this multifaceted approach, we can balance the ethical implications of AI decision-making in high-stakes environments with the need for efficiency and accuracy in those decisions.

## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [None]:
!ollama pull llama3.2

In [12]:
# api_key puede ser cualquier cosa, solo es para identificar el modelo.
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='gemma')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Balancing the ethical implications of AI decision-making in high-stakes environments requires a multifaceted approach that considers multiple factors. Here are some proposals to address this challenge:

1. **Transparency and Explainability**: Develop AI systems that can provide transparent and understandable explanations for their decisions. This will help build trust among stakeholders, including patients, defendants, and clinicians. Techniques such as model interpretability, feature importance, and decision tables can be used to facilitate understanding.
2. **Human Oversight and Review**: Implement a layer of human review to ensure AI-driven decisions are accurate and fair. This can include clinical validation, legal review, or peer review for AI recommendations in various domains.
3. **Data Quality and Ethical Data Sources**: Ensure that the data used to train AI models is accurate, complete, and free from bias. Additionally, consider using ethical data sources, such as data anonymization or aggregation techniques, to protect individual identities and prevent harm.
4. **Accountability Mechanisms**: Establish clear accountability mechanisms for AI-driven decisions, including audits, risk assessments, and liability frameworks. This will help ensure that AI systems are held accountable for their mistakes.
5. **Multidisciplinary Governance**: Foster multidisciplinary governance models that bring together experts from various fields, such as medicine, law, ethics, and social sciences. This will encourage a comprehensive understanding of the ethical implications of AI decision-making.
6. **Value Alignment**: Ensure that AI systems are designed to align with human values, such as dignity, respect, and fairness. Value alignment can be achieved through design principles, such as the development of value-sensitive AI frameworks.
7. **Continuous Monitoring and Evaluation**: Regularly assess the performance and impact of AI systems in high-stakes environments. This will help identify potential biases or errors and inform improvements to AI decision-making processes.
8. **Inclusive Design and Collaboration**: Engage diverse stakeholders, including patients, clinicians, defendants, and communities, in AI development processes. This collaborative approach can ensure that AI systems meet the needs of various groups and address potential ethical concerns.
9. **Regulatory Frameworks and Standards**: Develop regulatory frameworks and standards for the use of AI in high-stakes environments, such as healthcare or criminal justice. These frameworks should balance efficiency and accuracy with strict safety and ethics requirements.
10. **Education and Training**: Provide ongoing education and training programs for professionals who interact with AI-driven systems, ensuring they understand the capabilities, limitations, and potential biases of these systems.

To achieve a balance between the need for efficiency and accuracy in AI decision-making, consider the following:

1. **Hybrid Human-AI Systems**: Design hybrid systems that combine human judgment with AI-driven recommendations, providing both rapid responses and more informed decision-making.
2. **Optimization Strategies**: Utilize optimization strategies to minimize potential biases and errors in AI decision-making. This can involve techniques like data cleansing, feature engineering, or fairness metrics.
3. **Value-Based Outcomes**: Prioritize value-based outcomes over efficiency metrics, ensuring that AI-driven decisions align with human values and social norms.

Ultimately, the proposal for balancing ethical implications of AI decision-making requires a nuanced approach that addresses both technical and practical challenges in high-stakes environments.

In [13]:
# So where are we?

print(competitors)
print(answers)


['gpt-4o-mini', 'gemini-2.0-flash', 'deepseek-chat', 'llama-3.3-70b-versatile', 'llama3.2']
["Balancing the ethical implications of AI decision-making in high-stakes environments like healthcare and criminal justice with the need for efficiency and accuracy is a complex challenge. Here are several strategies to approach this balance:\n\n1. **Transparency and Explainability**: Ensure that AI systems are transparent and provide explanations for their decisions. Stakeholders, including patients, defendants, and practitioners, should understand how AI reached a decision. Explainability can foster trust and allow for informed consent in health or legal contexts.\n\n2. **Robust Ethical Frameworks**: Establish clear ethical guidelines that prioritize fairness, accountability, and the rights of affected individuals. These frameworks should involve multidisciplinary teams, including ethicists, technologists, legal experts, and representatives from impacted communities to evaluate AI systems' im

In [14]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gpt-4o-mini

Balancing the ethical implications of AI decision-making in high-stakes environments like healthcare and criminal justice with the need for efficiency and accuracy is a complex challenge. Here are several strategies to approach this balance:

1. **Transparency and Explainability**: Ensure that AI systems are transparent and provide explanations for their decisions. Stakeholders, including patients, defendants, and practitioners, should understand how AI reached a decision. Explainability can foster trust and allow for informed consent in health or legal contexts.

2. **Robust Ethical Frameworks**: Establish clear ethical guidelines that prioritize fairness, accountability, and the rights of affected individuals. These frameworks should involve multidisciplinary teams, including ethicists, technologists, legal experts, and representatives from impacted communities to evaluate AI systems' implications.

3. **Rigorous Validation and Testing**: Implement strict val

In [15]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [16]:
print(together)

# Response from competitor 1

Balancing the ethical implications of AI decision-making in high-stakes environments like healthcare and criminal justice with the need for efficiency and accuracy is a complex challenge. Here are several strategies to approach this balance:

1. **Transparency and Explainability**: Ensure that AI systems are transparent and provide explanations for their decisions. Stakeholders, including patients, defendants, and practitioners, should understand how AI reached a decision. Explainability can foster trust and allow for informed consent in health or legal contexts.

2. **Robust Ethical Frameworks**: Establish clear ethical guidelines that prioritize fairness, accountability, and the rights of affected individuals. These frameworks should involve multidisciplinary teams, including ethicists, technologists, legal experts, and representatives from impacted communities to evaluate AI systems' implications.

3. **Rigorous Validation and Testing**: Implement stric

In [17]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [18]:
print(judge)

You are judging a competition between 5 competitors.
Each model has been given this question:

How would you propose balancing the ethical implications of AI decision-making in high-stakes environments, such as healthcare or criminal justice, with the need for efficiency and accuracy in those decisions?

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}

Here are the responses from each competitor:

# Response from competitor 1

Balancing the ethical implications of AI decision-making in high-stakes environments like healthcare and criminal justice with the need for efficiency and accuracy is a complex challenge. Here are several strategies to approach this balance:

1. **Transparency and Explainability**: Ensure that AI systems are transparent and pro

In [19]:
judge_messages = [{"role": "user", "content": judge}]

In [20]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="o3-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results": ["2", "3", "1", "5", "4"]}


In [21]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: gemini-2.0-flash
Rank 2: deepseek-chat
Rank 3: gpt-4o-mini
Rank 4: llama3.2
Rank 5: llama-3.3-70b-versatile


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

In [None]:
import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display

load_dotenv(override=True)

# API Keys
openai_api_key = os.getenv('OPENAI_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

# Check API keys
api_keys = {
    "openai": openai_api_key,
    "google": google_api_key, 
    "deepseek": deepseek_api_key,
    "groq": groq_api_key
}

for name, key in api_keys.items():
    status = "exists" if key else "not set"
    print(f"API Key for {name}: {status}")

print("--------------------------------")

def get_llm_response(model_name, messages, api_key=None, base_url=None):
    """Simple function to get response from any OpenAI-compatible API"""
    if base_url:
        client = OpenAI(api_key=api_key, base_url=base_url)
    else:
        client = OpenAI(api_key=api_key)
    
    response = client.chat.completions.create(model=model_name, messages=messages)
    return response.choices[0].message.content

# Cordinator: ejecuta la pregunta en cada modelo listado y recoge las respuestas
def coordinator_llm_responses(question, models):
    responses = {}
    messages = [{"role": "user", "content": question}]
    for model_name, api_key, base_url in models:
        if api_key:
            try:
                response = get_llm_response(model_name, messages, api_key, base_url)
                responses[model_name] = response
            except Exception as e:
                responses[model_name] = f"Error: {str(e)}"
        else:
            responses[model_name] = "No response (API key not set)"
    return responses



# Synthesizer: utiliza o3-mini para resumir y comparar las respuestas
def synthesize_responses(responses, synthesizer_model="o3-mini", api_key=openai_api_key):
    prompt = f"You are a judge in a competition among {len(competitors)} LLMs. Please summarize and compare the following responses from different LLMs:\n"
    for model, answer in responses.items():
        prompt += f"\nModel: {model}\nResponse: {answer}\n"
    prompt += "\nSummarize concisely and compare the responses. Your job is to evaluate each model's response and assign a score from 1 to 100. The model with the highest score is the best."
    
    messages = [{"role": "user", "content": prompt}]
    summary = get_llm_response(synthesizer_model, messages, api_key)

# Generate question using gpt-4o-mini
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]
question = get_llm_response("gpt-4o-mini", messages, openai_api_key)
print("Question:", question)

# Models to test
models = [
    ("gpt-4o-mini", openai_api_key, None),
    ("gemini-2.0-flash", google_api_key, "https://generativelanguage.googleapis.com/v1beta/openai/"),
    ("llama-3.3-70b-versatile", groq_api_key, "https://api.groq.com/openai/v1"),
    ("deepseek-chat", deepseek_api_key, "https://api.deepseek.com/v1")
   # ("llama3.2", "gemma", "http://localhost:11434/v1")
]

# Orquestar las respuestas de los distintos LLMs
llm_responses = coordinator_llm_responses(question, models)

# Mostrar cada respuesta individual
for model_name, answer in llm_responses.items():
    display(Markdown(f"### {model_name}\n{answer}"))

# Sintetizar las respuestas utilizando o3-mini para resumir y compararlas
synthesized_summary = synthesize_responses(llm_responses)

display(Markdown(f"""#Síntesis Comparativa {synthesized_summary}"""))

API Key for openai: exists
API Key for google: exists
API Key for deepseek: exists
API Key for groq: exists
--------------------------------
Question: If you could design an education system from scratch that prioritizes critical thinking, creativity, and emotional intelligence over standardized testing, what fundamental principles would you incorporate, and how would you assess the success of such a system?


### gpt-4o-mini
Designing an education system that prioritizes critical thinking, creativity, and emotional intelligence involves several fundamental principles and a comprehensive assessment strategy. Here’s a detailed framework:

### Fundamental Principles

1. **Student-Centered Learning**:
   - Prioritize personalized learning plans that cater to individual interests, strengths, and learning styles.
   - Encourage self-directed learning where students take initiative in their own education.

2. **Interdisciplinary Curriculum**:
   - Develop a curriculum that integrates subjects rather than isolating them, allowing students to make connections between different fields of knowledge.
   - Emphasize real-world problems and project-based learning that require critical thinking and creativity to solve.

3. **Focus on Skills Over Content Memorization**:
   - Shift the emphasis from rote memorization to the mastery of skills such as analytical reasoning, problem-solving, collaboration, and communication.
   - Include workshops and hands-on activities that foster creativity, like design thinking or maker spaces.

4. **Emotional Intelligence Development**:
   - Incorporate social-emotional learning (SEL) programs that teach self-awareness, self-regulation, empathy, and relationship skills.
   - Create a safe and supportive environment where students can express their emotions and experience mental well-being.

5. **Collaborative Learning**:
   - Foster collaboration through group projects, peer teaching, and encouraging diversity of thought.
   - Promote a culture of dialogue where students learn to discuss and debate ideas respectfully.

6. **Teacher Training and Autonomy**:
   - Provide comprehensive training for teachers in facilitating inquiry-based learning, emotional intelligence, and differentiated instruction.
   - Allow teachers autonomy in tailoring their teaching methods and curriculum to best meet their students' needs.

7. **Community Engagement**:
   - Functionally connect with the community through service-learning projects, internships, or partnerships with local organizations.
   - Encourage students to think critically about their roles in society and the impact they can make.

8. **Lifelong Learning Mindset**:
   - Instill a passion for lifelong learning by making education relevant to students’ lives and interests.
   - Encourage curiosity, questioning, and the exploration of new ideas as a constant practice.

### Assessment Strategies

1. **Portfolio-Based Assessment**:
   - Utilize student portfolios that showcase a variety of work over time, including projects, reflections, and assessments of skills and competencies.
   - Encourage self-assessment and peer-assessment within portfolios to foster ownership of learning.

2. **Project and Performance Assessments**:
   - Assess students through project-based learning outcomes, performances, or presentations that demonstrate the application of critical thinking and creativity.
   - Include interdisciplinary projects that require collaboration and innovation.

3. **Reflective Journals**:
   - Incorporate reflective practices where students regularly write about their learning experiences, emotional insights, and personal growth.
   - Use these reflections to help students articulate their understanding and assess their development in emotional intelligence.

4. **Formative Feedback**:
   - Utilize continuous formative assessment methods, like observations, check-ins, and discussions, to provide ongoing feedback rather than relying solely on summative assessments.
   - Encourage feedback loops where students can iterate on their work based on constructive feedback.

5. **Peer and Community Feedback**:
   - Involve peers and community members in assessing student projects and presentations to provide a broader perspective on student performance and impact.
   - Develop rubrics that reflect the importance of creativity, critical thinking, and emotional intelligence.

6. **Well-Being and Engagement Metrics**:
   - Assess emotional intelligence development through surveys or tools that measure student well-being, engagement, and relationship skills.
   - Monitor attendance, participation in extracurricular activities, and community involvement as indicators of a thriving education environment.

### Conclusion

This education system is built on the principles of holistic development, prioritizing emotional and social capacities alongside academic abilities. Success measurement would not be solely through traditional metrics but by observing the growth of empowered, innovative, and emotionally intelligent individuals ready to contribute positively to society.

### gemini-2.0-flash
Okay, if I could design an education system from scratch prioritizing critical thinking, creativity, and emotional intelligence over standardized testing, here's what it would look like:

**Fundamental Principles:**

1.  **Inquiry-Based Learning:**  The core of the curriculum would revolve around posing questions, investigating, and discovering answers through hands-on activities, projects, and real-world problem-solving.  Less lecturing, more exploring.  Curiosity would be celebrated and nurtured.

2.  **Personalized and Adaptive Learning Paths:** Recognizing that every student learns differently and at a different pace, the system would adapt to individual needs, interests, and learning styles.  Technology would play a key role in identifying strengths and weaknesses and providing tailored learning experiences.  Students would have some agency in choosing their areas of focus and projects.

3.  **Interdisciplinary and Project-Based Learning:** Subjects would be interwoven to reflect the interconnectedness of the real world.  Students would engage in long-term, collaborative projects that require them to apply knowledge and skills from multiple disciplines to solve complex problems.  Examples could include designing sustainable communities, creating interactive museum exhibits, or developing social impact campaigns.

4.  **Emphasis on Process Over Product (Initially):**  While outcomes are important, the initial focus would be on the learning *process* itself.  Students would be encouraged to experiment, take risks, and learn from their mistakes without fear of failure.  Feedback would be formative and focused on growth, rather than simply assigning grades.  Later, as they mature, product quality would be emphasized, still within the frame of continuous improvement.

5.  **Cultivating Emotional Intelligence:** Emotional intelligence would be explicitly taught and practiced.  Students would learn about self-awareness, self-regulation, empathy, social skills, and motivation.  Activities could include mindfulness practices, role-playing scenarios, group discussions, and community service projects.

6.  **Developing Critical Thinking Skills:**  Critical thinking would be integrated into every aspect of the curriculum.  Students would be taught how to analyze information, evaluate arguments, identify biases, and form their own well-reasoned opinions.  Activities could include debates, simulations, case studies, and the analysis of primary sources.

7.  **Fostering Creativity and Innovation:**  Creativity would be nurtured through activities that encourage experimentation, imagination, and unconventional thinking.  Students would have opportunities to explore different art forms, design solutions to problems, and develop their own unique perspectives.  This could involve maker spaces, design thinking workshops, and opportunities to pursue personal projects.

8.  **Community Engagement and Real-World Application:**  Learning would extend beyond the classroom walls.  Students would engage in community service projects, internships, and mentorship programs that connect them with real-world problems and opportunities.  They would learn to apply their knowledge and skills to make a positive impact on their communities.

9.  **Ethical and Civic Responsibility:** The curriculum would emphasize the importance of ethical decision-making, social justice, and civic engagement. Students would learn about their rights and responsibilities as citizens and be encouraged to participate actively in their communities.

10. **Teacher as Facilitator and Mentor:** Teachers would transition from being lecturers to facilitators and mentors.  They would guide students in their learning journeys, provide personalized support, and create a supportive and collaborative classroom environment.  Ongoing professional development would be essential for teachers to stay current with best practices and emerging technologies.

**Assessment of Success:**

Instead of relying solely on standardized tests, the success of this education system would be assessed through a multi-faceted approach:

1.  **Portfolio-Based Assessment:** Students would create portfolios showcasing their work, reflections, and achievements over time. These portfolios would demonstrate their growth in critical thinking, creativity, emotional intelligence, and other key areas.

2.  **Project-Based Assessment:** Students would be assessed on their ability to complete complex projects that require them to apply knowledge and skills from multiple disciplines. Assessment criteria would include creativity, problem-solving skills, communication skills, and teamwork.  Rubrics would focus on both the process and the outcome.

3.  **Performance-Based Assessment:** Students would be assessed on their ability to perform specific tasks or demonstrate specific skills in real-world contexts. Examples could include presenting research findings, leading a group discussion, or designing a marketing campaign.

4.  **Self-Assessment and Peer Assessment:** Students would be encouraged to reflect on their own learning and provide feedback to their peers. This would help them develop self-awareness, critical thinking skills, and communication skills.

5.  **Observations and Anecdotal Records:** Teachers would observe students in the classroom and document their progress, strengths, and areas for improvement. This would provide a more holistic picture of student learning than standardized tests.

6.  **Surveys and Interviews:** Students, parents, and community members would be surveyed and interviewed to gather feedback on the effectiveness of the education system. This would help identify areas for improvement and ensure that the system is meeting the needs of all stakeholders.

7.  **Longitudinal Studies:** Track students' progress after they graduate from the system.  Measure their success in higher education, careers, and civic engagement.  This would provide valuable data on the long-term impact of the education system.  Specifically, look at:

    *   **Job satisfaction and career advancement:** Are graduates thriving in their chosen fields?
    *   **Entrepreneurial success:** Are graduates creating new businesses and innovations?
    *   **Community involvement:** Are graduates actively engaged in their communities and contributing to society?
    *   **Lifelong learning:** Are graduates continuing to learn and grow throughout their lives?
    *   **Mental and emotional well-being:** Are graduates leading fulfilling and healthy lives?

8. **Qualitative Data Collection:** Use focus groups and case studies to gather in-depth information about students' experiences and perspectives.

**Key Metrics of Success:**

*   **Student Engagement and Motivation:**  Higher levels of participation, intrinsic motivation, and a love of learning.
*   **Problem-Solving Abilities:**  Demonstrated ability to analyze complex problems and develop creative solutions.
*   **Communication Skills:**  Effective communication in both written and oral forms.
*   **Collaboration Skills:**  Ability to work effectively in teams and contribute to group goals.
*   **Emotional Intelligence:**  Demonstrated self-awareness, empathy, and social skills.
*   **Civic Engagement:**  Active participation in community service and civic activities.
*   **Innovation and Creativity:**  Generation of new ideas and solutions.
*   **Adaptability and Resilience:**  Ability to adapt to change and overcome challenges.
*   **Critical Thinking:**  The ability to analyze information objectively and make reasoned judgments.
*   **Ethical Reasoning:** Demonstrating sound moral judgment.

This education system would be a continuous work in progress, constantly evolving based on feedback and new research. It would be a system that empowers students to become lifelong learners, critical thinkers, creative problem-solvers, and engaged citizens who are prepared to thrive in a rapidly changing world. It would also require a significant investment in teacher training and ongoing professional development to ensure that educators are equipped to implement this innovative approach.


### llama-3.3-70b-versatile
If I could design an education system from scratch, I would prioritize critical thinking, creativity, and emotional intelligence over standardized testing. Here are the fundamental principles I would incorporate and how I would assess the success of such a system:

**Fundamental Principles:**

1. **Student-Centered Learning**: The education system would be centered around the individual needs, interests, and learning styles of each student, rather than a one-size-fits-all approach.
2. **Interdisciplinary Curriculum**: The curriculum would be integrated across subjects, focusing on connections and relationships between different disciplines, rather than compartmentalization.
3. **Experiential Learning**: Hands-on, project-based learning would be emphasized, allowing students to apply theoretical concepts to real-world problems and scenarios.
4. **Collaboration and Feedback**: Students would work in teams, receiving feedback from peers and instructors, to foster critical thinking, problem-solving, and effective communication.
5. **Emotional Intelligence and Well-being**: The education system would prioritize emotional intelligence, social-emotional learning, and mental health, recognizing the importance of these aspects in overall well-being and academic success.
6. **Flexibility and Autonomy**: Students would have flexibility in their learning pathways, with opportunities to explore different subjects and interests, and autonomy to make choices about their own learning.
7. **Community Engagement**: The education system would be connected to the broader community, with partnerships and collaborations that provide students with real-world learning opportunities and mentorship.

**Assessment Methods:**

1. **Competency-Based Progression**: Students would progress through the education system based on demonstrated competencies, rather than age or grade level.
2. **Project-Based Assessments**: Students would complete projects that demonstrate their mastery of skills and knowledge, rather than traditional multiple-choice tests.
3. **Peer and Self-Assessment**: Students would be encouraged to reflect on their own learning and provide feedback to peers, promoting self-awareness and metacognition.
4. **Narrative Evaluations**: Instructors would provide narrative evaluations of student progress, highlighting strengths, areas for improvement, and suggestions for future growth.
5. **Exhibition of Learning**: Students would have opportunities to exhibit their learning through presentations, exhibitions, and performances, allowing them to showcase their skills and knowledge in a more authentic and engaging way.

**Success Indicators:**

1. **Student Engagement and Motivation**: Students would be enthusiastic about learning, with high levels of engagement and motivation.
2. **Critical Thinking and Problem-Solving**: Students would demonstrate strong critical thinking and problem-solving skills, with the ability to apply knowledge to complex, real-world scenarios.
3. **Creativity and Innovation**: Students would exhibit creativity and innovation in their work, with a willingness to take risks and explore new ideas.
4. **Emotional Intelligence and Well-being**: Students would demonstrate high levels of emotional intelligence, with strong social-emotional skills, and good mental health and well-being.
5. **Community Involvement and Civic Engagement**: Students would be actively engaged in their communities, with a sense of social responsibility and a desire to make a positive impact.
6. **Graduate Outcomes**: Graduates would be well-prepared for success in their chosen fields, with high levels of employability, entrepreneurship, and further education.

**Implementation and Evaluation:**

1. **Pilot Programs**: The education system would be piloted in a small number of schools or districts, with ongoing evaluation and refinement.
2. **Teacher Training and Support**: Teachers would receive training and support to implement the new education system, with a focus on developing their own critical thinking, creativity, and emotional intelligence.
3. **Ongoing Evaluation and Improvement**: The education system would be continuously evaluated and improved, with feedback from students, teachers, parents, and the broader community.
4. **Research and Development**: The education system would be informed by ongoing research and development, with a focus on staying up-to-date with the latest advances in educational theory and practice.

By incorporating these fundamental principles and assessment methods, the education system would prioritize critical thinking, creativity, and emotional intelligence, while providing students with a more engaging, relevant, and effective learning experience.

### deepseek-chat
Designing an education system that prioritizes **critical thinking, creativity, and emotional intelligence (EQ)** over standardized testing requires a fundamental shift in philosophy, pedagogy, and assessment. Below are the key principles and methods for evaluating success:

### **Fundamental Principles**  

1. **Holistic Development Over Rote Memorization**  
   - Focus on **problem-solving, inquiry-based learning, and interdisciplinary projects** rather than memorization.  
   - Encourage students to ask questions, challenge assumptions, and explore multiple perspectives.  

2. **Student-Centered Learning**  
   - Personalized learning paths based on interests, strengths, and developmental needs.  
   - Flexible curricula that adapt to individual learning styles (e.g., project-based, experiential, or self-directed learning).  

3. **Emotional Intelligence (EQ) Integration**  
   - Teach **self-awareness, empathy, resilience, and conflict resolution** as core skills.  
   - Incorporate mindfulness, social-emotional learning (SEL), and peer collaboration into daily routines.  

4. **Creativity & Innovation as Core Competencies**  
   - Emphasize **divergent thinking, artistic expression, and open-ended problem-solving**.  
   - Provide opportunities for **design thinking, entrepreneurship, and maker-spaces** (hands-on creation).  

5. **Collaborative & Real-World Learning**  
   - Group projects, community engagement, and internships to foster teamwork and practical skills.  
   - Connect learning to real-world issues (e.g., sustainability, social justice, technology ethics).  

6. **Assessment Through Growth, Not Just Outcomes**  
   - Replace standardized tests with **portfolios, presentations, peer reviews, and self-reflections**.  
   - Measure progress through **longitudinal development** (e.g., how a student’s reasoning evolves over time).  

7. **Teacher as Facilitator, Not Authority**  
   - Educators should act as **mentors and guides**, fostering curiosity rather than dispensing facts.  
   - Professional development should focus on **SEL, adaptive teaching, and creative pedagogy**.  

### **How to Assess Success**  

1. **Qualitative & Narrative Evaluations**  
   - **Portfolios** showcasing projects, essays, and creative work.  
   - **Reflective journals** where students analyze their growth in critical thinking and EQ.  

2. **Performance-Based Assessments**  
   - **Debates, simulations, and real-world problem-solving challenges** (e.g., designing a solution to a local issue).  
   - **Peer & self-assessments** to evaluate collaboration and emotional intelligence.  

3. **Long-Term Outcomes**  
   - **Tracking alumni success** in careers, civic engagement, and personal well-being.  
   - Surveys on **life satisfaction, adaptability, and continued learning habits**.  

4. **Behavioral & Social Metrics**  
   - **Conflict resolution skills** observed in group settings.  
   - **Empathy and ethical decision-making** assessed through role-playing scenarios.  

5. **Innovation & Problem-Solving Indicators**  
   - Number of **student-led initiatives, patents, or creative works** produced.  
   - Ability to **apply knowledge in novel situations** (e.g., hackathons, design competitions).  

### **Conclusion**  
This system would move away from ranking students based on test scores and instead nurture **adaptable, empathetic, and innovative thinkers**. Success would be measured by **how well students apply their learning to real life**, not just how they perform on exams.  

Would you incorporate any additional elements, such as technology integration or global citizenship?

#Síntesis Comparativa A continuación, se presenta un resumen conciso y una comparación de las respuestas de cada modelo, junto con un puntaje evaluativo de 1 a 100:

──────────────────────────────
1. gpt-4o-mini  
• Resumen: Propone una transformación educativa integral basada en el estudiante, con énfasis en el desarrollo holístico (pensamiento crítico, creatividad y competencia emocional). Detalla ocho principios fundamentales (desde el aprendizaje centrado en el estudiante hasta el compromiso comunitario) y ofrece estrategias de evaluación variadas (portafolios, evaluaciones por proyectos, retroalimentación formativa y métricas de bienestar).  
• Puntos fuertes: Enfoque bien estructurado, balance claro entre contenido pedagógico y métodos de evaluación, y un robusto énfasis en la formación y autonomía docente.  
• Puntaje: 88

──────────────────────────────
2. gemini-2.0-flash  
• Resumen: Ofrece un diseño de sistema educativo innovador fundamentado en diez principios (inquiry-based, aprendizaje personalizado, interdisciplinariedad, énfasis en el proceso, inteligencia emocional, entre otros). Además, describe en detalle métodos de evaluación que incluyen portafolios, evaluaciones basadas en proyectos, autoevaluaciones y estudios longitudinales, resaltando tanto el proceso como los resultados.  
• Puntos fuertes: Gran profundidad y extensión en la descripción de principios y métodos de evaluación, ejemplos concretos y una visión integral que abarca desde el rol del profesor hasta la conexión con la comunidad y el desarrollo ético.  
• Puntaje: 95

──────────────────────────────
3. llama-3.3-70b-versatile  
• Resumen: Plantea un sistema enfocado en el aprendizaje centrado en el estudiante, la interdisciplinariedad y la experiencia práctica. Se incorporan siete principios y métodos de evaluación basados en el progreso por competencias, proyectos y evaluaciones narrativas, haciendo hincapié en la auto-reflexión y el compromiso comunitario.  
• Puntos fuertes: Respuesta clara y coherente con un enfoque realista que integra principios fundamentales y evaluación cualitativa, aunque en menor profundidad y variedad de ejemplos comparado con las respuestas extendidas.  
• Puntaje: 85

──────────────────────────────
4. deepseek-chat  
• Resumen: Enfoca la propuesta en el desarrollo holístico, combinando el aprendizaje basado en la experiencia con una fuerte integración de la inteligencia emocional y la creatividad. Detalla siete principios clave y métodos de evaluación innovadores, desde evaluaciones narrativas y basadas en el rendimiento hasta indicadores de desarrollo a largo plazo y métricas de comportamiento.  
• Puntos fuertes: Presenta un equilibrio atractivo entre la filosofía educativa y la práctica evaluativa, con un enfoque en el crecimiento continuo y la aplicación real del aprendizaje; sus métodos son innovadores y actualizados.  
• Puntaje: 90

──────────────────────────────
Comparación y Conclusión:
• Todos los modelos comparten la visión de alejarse de la educación tradicional basada en exámenes estandarizados y priorizan un aprendizaje que fomente el pensamiento crítico, la creatividad y la inteligencia emocional.  
• gemini-2.0-flash destaca por la amplitud y profundidad de su propuesta, ofreciendo numerosos principios y métodos evaluativos con ejemplos concretos, lo que le otorga el puntaje más alto.  
• gpt-4o-mini y deepseek-chat se acercan en calidad, presentando respuestas bien estructuradas y robustas, mientras que llama-3.3-70b-versatile, aunque claro y coherente, resulta algo más condensado y menos detallado.  

El modelo con el puntaje más alto es gemini-2.0-flash (95), por lo que se considera que su respuesta es la mejor en esta competencia.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>