## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [2]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [3]:
# Always remember to do this!
load_dotenv(override=True)

True

In [4]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set (and this is optional)
Google API Key exists and begins AI
DeepSeek API Key not set (and this is optional)
Groq API Key not set (and this is optional)


In [5]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [6]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [12]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


How would you approach reconciling the ethical implications of artificial intelligence in decision-making processes across different cultural perspectives, while also considering the potential for biased outcomes?


In [13]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

In [14]:
# The API we know well

model_name = "gpt-4o-mini"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Reconciling the ethical implications of artificial intelligence (AI) in decision-making processes across different cultural perspectives involves a multifaceted approach that acknowledges cultural diversity, addresses biases, and promotes fairness and accountability. Here are some key steps to consider:

### 1. **Understand Diverse Cultural Contexts**
   - **Research Cultural Norms**: Engage in in-depth research to understand the values, beliefs, and social norms of different cultures. This includes recognizing how these factors influence perceptions of fairness, justice, and decision-making.
   - **Stakeholder Engagement**: Involve representatives from diverse cultural backgrounds in conversations about AI. Collaborate with ethicists, community leaders, and affected individuals to ensure varied viewpoints are incorporated.

### 2. **Identify and Mitigate Biases**
   - **Bias Detection**: Implement rigorous testing for biases in AI systems. This includes examining training data for representational gaps and employing techniques to assess outcomes across different demographic groups.
   - **Regular Audits**: Conduct regular audits of AI systems to monitor and evaluate their impact on various cultural groups, making necessary adjustments as findings dictate.

### 3. **Framework Development**
   - **Create Ethical Guidelines**: Develop a set of ethical guidelines that incorporate cultural perspectives and address potential biases. These guidelines should be adaptable to different contexts while maintaining core principles of fairness, transparency, and accountability.
   - **Promote Multidisciplinary Collaboration**: Involve ethicists, sociologists, technologists, and cultural anthropologists in the AI development process to foster a holistic approach to decision-making.

### 4. **Transparency and Accountability**
   - **Clear Communication**: Ensure that the processes and algorithms used in AI decision-making are transparent and understandable to all stakeholders. This helps build trust and allows for informed feedback regarding potential biases.
   - **Accountability Mechanisms**: Establish clear mechanisms for accountability in AI systems, including processes for individuals to appeal decisions made by AI and hold institutions accountable for biased outcomes.

### 5. **Educate and Train**
   - **Awareness Programs**: Develop educational programs that raise awareness about the ethical implications of AI and the importance of cultural sensitivity in technology.
   - **Training for Developers**: Provide training for AI developers on cultural competence and ethical decision-making, emphasizing the importance of considering diverse cultural perspectives in their work.

### 6. **Cultural Adaptation of AI Systems**
   - **Customization**: Design AI systems that can be adapted to reflect the cultural contexts in which they are being deployed. This may involve localizing algorithms and interfaces to align with cultural preferences and ethical standards.
   - **User-Centric Design**: Focus on user-centric design principles that incorporate feedback from local communities to create AI systems that resonate with cultural values and nuances.

### 7. **International Collaboration**
   - **Global Networks**: Build international collaborations to share best practices, research findings, and frameworks related to AI ethics across cultures. This can create a stronger foundation for understanding and addressing global challenges related to AI.
   - **Policy Recommendations**: Advocate for the development of international standards and policies that promote ethical AI practices while respecting cultural differences.

### Conclusion
In addressing the ethical implications of AI decision-making across cultural perspectives, it is essential to cultivate an inclusive, transparent, and accountable environment. Balancing ethical considerations with the challenges of bias requires continuous dialogue, adaptation, and a commitment to understanding the complex interplay of technology and culture. By taking proactive steps in these areas, we can work toward AI systems that are not only ethical but also culturally sensitive and equitable.

# Free Alternatives

## Gemini (Gemine 2.0 Flash)

In [15]:
from openai import OpenAI
from IPython.display import Markdown, display

GEMINI_API_KEY = os.getenv('GEMINI_API_KEY')
GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"

request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

openai = OpenAI(api_key=GEMINI_API_KEY, base_url=GEMINI_BASE_URL)

response = openai.chat.completions.create(model="gemini-2.0-flash", messages=messages)
question = response.choices[0].message.content
print(question)

competitors = []
answers = []
messages = [{"role": "user", "content": question}]

response = openai.chat.completions.create(model="gemini-2.0-flash", messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append("gemini-2.0-flash")
answers.append(answer)


Considering the socio-political implications of widespread, advanced AI deployment, at what point does the potential for systemic bias mitigation outweigh the risks associated with the consolidation of power necessary to enforce standardized ethical frameworks across diverse AI models, and how would you quantify that point?



This is a complex question with no easy answer. It essentially asks: at what point does the potential benefit of reducing bias in AI, which requires standardization and thus potentially central control, outweigh the risk of that centralized control being used for undesirable purposes (e.g., oppression, suppression of dissent)?  It's a balancing act between preventing widespread AI-driven discrimination and risking authoritarianism in the name of fairness.

Here's a breakdown of the factors to consider and how we might approach quantifying the "tipping point":

**Arguments for Standardized Ethical Frameworks & Centralized Enforcement:**

* **Bias Mitigation:** AI models are trained on data, and if that data reflects societal biases (e.g., sexism, racism), the AI will perpetuate and amplify them. Standardized ethical frameworks can define prohibited biases and create metrics to measure and mitigate them across all AI models.  Central enforcement ensures these frameworks are followed.
* **Accountability:**  When AI systems make mistakes, it's often difficult to assign responsibility. Standardized frameworks can help clarify accountability by setting clear expectations and establishing mechanisms for redress.  Central enforcement can hold developers and deployers accountable.
* **Interoperability & Safety:**  Standardized ethical guidelines can ensure that AI systems interact safely and predictably with each other and with humans.  This is crucial for fields like autonomous vehicles and healthcare.
* **Public Trust:** Widespread deployment of biased or unsafe AI can erode public trust, hindering the adoption of beneficial AI applications. Standardized ethical frameworks can help build trust by demonstrating a commitment to fairness and safety.

**Arguments Against Standardized Ethical Frameworks & Centralized Enforcement:**

* **Consolidation of Power:**  Enforcing standardized ethical frameworks requires a powerful entity (government, international body, or corporation) with the ability to monitor and regulate AI development and deployment. This concentration of power could be abused to suppress dissent, manipulate public opinion, or favor certain groups over others.
* **Stifling Innovation:**  Overly rigid frameworks can stifle innovation by limiting the freedom of researchers and developers to explore new ideas. A centrally enforced "one-size-fits-all" approach may not be suitable for all AI applications.
* **Lack of Adaptability:**  Societal values and ethical considerations evolve over time. Centralized frameworks may be slow to adapt to these changes, leading to outdated or ineffective regulations.
* **Cultural Relativism:**  What is considered ethical in one culture may not be ethical in another. Imposing a single, globally enforced framework could lead to cultural imperialism and resentment.
* **Unintended Consequences:**  Even well-intentioned frameworks can have unintended consequences. For example, overly strict regulations on facial recognition technology could hinder law enforcement and security efforts.

**Quantifying the "Tipping Point" - A Difficult Task:**

Quantifying this tipping point is extremely challenging and likely impossible to do with perfect accuracy. However, we can use a framework that considers various factors and assigns them relative weights:

**1.  Define Key Metrics:**

*   **Bias Reduction (BR):** Measured as a percentage reduction in bias across various AI applications (e.g., loan applications, hiring processes, criminal justice). This would require developing standardized metrics for identifying and measuring bias in AI.
*   **Innovation Loss (IL):** Measured as a percentage decrease in AI research and development funding, patent applications, and the emergence of new AI startups.  Could also be measured by a decline in fundamental breakthroughs.
*   **Power Consolidation (PC):** Measured using indicators of authoritarianism, such as freedom of speech, freedom of the press, political rights, and civil liberties (e.g., using indices from Freedom House or Reporters Without Borders).  This would need to be AI-specific – focusing on control over AI development, deployment, and access.
*   **Unintended Consequences (UC):** A composite measure based on observed negative impacts, such as job displacement, increased surveillance, and reduced privacy. This would require ongoing monitoring and evaluation of AI deployments.
*   **Public Trust (PT):** Measured through surveys and public opinion polls regarding trust in AI systems and the entities that regulate them.

**2.  Assign Weights (Subjective but Crucial):**

This is where societal values come into play.  Different societies may prioritize different values. For example:

*   A society that highly values individual liberty might assign a high weight to Innovation Loss (IL) and Power Consolidation (PC).
*   A society that prioritizes social justice might assign a high weight to Bias Reduction (BR).

Example Weights (these are hypothetical and should be determined through democratic processes):

*   BR = 40%
*   IL = 20%
*   PC = 25%
*   UC = 10%
*   PT = 5%

**3.  Develop a Formula:**

A simple formula could be:

`Value = (W_BR * BR) - (W_IL * IL) - (W_PC * PC) - (W_UC * UC) + (W_PT * PT)`

Where:

*   `W_x` is the weight assigned to metric `x`.
*   `BR`, `IL`, `PC`, `UC`, and `PT` are the values of the respective metrics (expressed as percentages).

**4.  Monitor and Adjust:**

*   Continuously monitor the key metrics and adjust the weights as societal values evolve and new information becomes available.
*   Implement pilot programs and conduct rigorous evaluations to assess the impact of different regulatory approaches.
*   Involve diverse stakeholders in the decision-making process, including AI researchers, developers, ethicists, policymakers, and the public.

**The "Tipping Point":**

The tipping point would be when the "Value" calculated by the formula starts to decline significantly, indicating that the negative consequences of centralized enforcement are outweighing the benefits of bias reduction.

**Challenges & Considerations:**

*   **Measurement Difficulty:** Accurately measuring bias, innovation loss, power consolidation, and unintended consequences is extremely difficult and requires sophisticated methodologies.
*   **Subjectivity:**  The assignment of weights is inherently subjective and requires careful consideration of societal values.
*   **Dynamic System:** The AI landscape is constantly evolving, so the framework must be adaptable and updated regularly.
*   **Global Coordination:**  AI is a global technology, and effective regulation requires international cooperation and coordination.
*   **Defining "Ethical":** Whose definition of "ethical" is used?  A framework needs to be inclusive and account for diverse perspectives.

**Conclusion:**

Finding the right balance between bias mitigation and power consolidation is a critical challenge.  While a precise, universally accepted quantitative solution is unlikely, a framework based on carefully defined metrics, weighted according to societal values, and continuously monitored and adjusted, can provide a valuable tool for navigating this complex issue.  The key is to be transparent, inclusive, and adaptable in our approach.  Furthermore, decentralization techniques such as federated learning and differential privacy should be explored to minimize the need for centralized data collection and control.  Ultimately, the goal is to create a future where AI is used for the benefit of all, without sacrificing fundamental values such as freedom, autonomy, and fairness.


## Huggingface (Mixtral)

In [7]:
import requests
from IPython.display import Markdown, display

HF_TOKEN = os.getenv('HUGGINGFACE_TOKEN')
API_URL = "https://api-inference.huggingface.co/models/mistralai/Mixtral-8x7B-Instruct-v0.1"
headers = {"Authorization": f"Bearer {HF_TOKEN}"}

request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."

response = requests.post(API_URL, headers=headers, json={"inputs": request})
question = response.json()[0]["generated_text"]
print(question)

competitors = []
answers = []

response = requests.post(API_URL, headers=headers, json={"inputs": question})
answer = response.json()[0]["generated_text"]

display(Markdown(answer))
competitors.append("Mixtral-8x7B")
answers.append(answer)


Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.

What is the difference between "legal personality" and "natural personality" and why does it matter both in general and for international investment law?


Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.

What is the difference between "legal personality" and "natural personality" and why does it matter both in general and for international investment law? To what extent is the concept of a "legal personality" compatible with some fundamental ideas about the relationship between state power and individual rights or does the distinction instead represent another way in which states have the power to reallocate resources without sufficient checks or balances?

## Ollama (TinyLlama)

In [18]:
import requests
import json
from IPython.display import Markdown, display

OLLAMA_URL = "http://localhost:11434/api/chat"
model_name = "tinyllama"

def get_ollama_response(prompt):
    response = requests.post(OLLAMA_URL, json={
        "model": model_name,
        "messages": [{"role": "user", "content": prompt}],
        "stream": False  # you can also disable streaming this way
    })
    return response.json()["message"]["content"]

# STEP 1: Get question
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
question = get_ollama_response(request)
print(question)

# STEP 2: Get answer to question
answer = get_ollama_response(question)

display(Markdown(answer))
competitors = [model_name]
answers = [answer]


Question: Can you explain how AI systems and artificial intelligence technologies work, using simple terms and examples?

Adapted from this interview with Dr. Jyotsna Sareen on AI and Artificial Intelligence in India.

Interviewer: Explain the concept of "intelligent machines" to a group of LLB students.


Certainly! I'd be happy to explain the concept of "intelligent machines" or AI systems using simple terms and examples.

1. What is an "intelligent machine"?

An "intelligent machine" is a device that can perform tasks without being explicitly programmed. Examples of intelligent machines include robotic arms, self-driving cars, chatbots, and virtual assistants on smartphones and computers. AI systems are not made up of individual components but rather work together to accomplish their goals based on complex algorithms and programming.

2. How do AI systems learn and make decisions?

AI systems use a process called reinforcement learning to make decisions and improve their performance over time. This involves training the system on past data or examples, such as user interactions or machine-learned patterns, using a reward system. The system can then use this experience to solve new problems or adapt its behavior based on the feedback it receives.

For example, an AI chatbot might learn from previous conversations with users about what they like and dislike to make more personalized recommendations for future interactions. It might also adjust its language, tone, and style based on feedback from other chatbots or human support staff, improving its overall performance over time.

3. What are some advantages and limitations of AI systems?

Advantages:
- High levels of accuracy in tasks such as translation, speech recognition, and image recognition.
- Improved decision-making capabilities for complex problems like healthcare, finance, and transportation.
- Increased efficiency and productivity in fields like manufacturing, logistics, and customer service.

Limitations:
- Dependence on data and algorithms to function correctly.
- Lack of human interaction or empathy, which can be problematic in some scenarios.
- Scalability issues as more data is added or tasks become more complex.

In conclusion, AI systems are becoming increasingly advanced and sophisticated, with the potential to revolutionize various industries and fields. While there are both advantages and limitations, it's clear that AI systems will continue to improve over time thanks to advancements in technology and machine learning algorithms.

## Other API Providers we are not Paying For

In [None]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-3-7-sonnet-latest"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.0-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [None]:
!ollama pull llama3.2

In [None]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
# So where are we?

print(competitors)
print(answers)


In [None]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


In [20]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [None]:
print(together)

In [22]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [None]:
print(judge)

In [29]:
judge_messages = [{"role": "user", "content": judge}]

In [None]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="o3-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


In [None]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            and common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>