## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [9]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [2]:
# Always remember to do this!
load_dotenv(override=True)

True

In [None]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

In [3]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [4]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [None]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

In [6]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

NameError: name 'question' is not defined

In [None]:
# The API we know well

model_name = "gpt-4o-mini"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-3-7-sonnet-latest"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.0-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "llama-3.3-70b-versatile"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [11]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(
    model=model_name,
    messages=messages,
)
question = response.choices[0].message.content
print(question)

Can you describe a situation in which a highly intelligent and well-intentioned person consistently makes decisions that have devastating consequences for themselves and others, without any evidence of malice or conscious intent, yet are ultimately revealed to be a direct result of their own cognitive biases, psychological vulnerabilities, and environmental influences?


In [12]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

In [None]:
!ollama pull llama3.2

In [13]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Here's a possible scenario:

Meet Emily, a highly intelligent and well-intentioned neuroscientist in her mid-40s. She's dedicated her career to developing innovative treatments for neurological disorders, such as Alzheimer's disease and Parkinson's disease. Despite her impressive academic credentials and research experience, Emily has a tendency to consistently make decisions that have devastating consequences for herself and others.

One example is when Emily becomes involved with a new start-up project that promises to revolutionize the treatment of depression using cutting-edge gene therapy techniques. Though initially enthusiastic about contributing her expertise, Emily's involvement in the project proves detrimental to several patients under her care. The experimental treatments demonstrate reduced efficacy compared to standard interventions and expose some patients to untested side effects.

Upon hearing about these adverse outcomes, most would associate malicious intent with lack of rigor or poor research quality in scientific practices; however, that wasn't actually the case here. It was revealed through rigorous review methodologies and evidence analysis by expert team members across various professions – including Emily who was forced to confront her own actions.

The investigation reveals a pattern of thought processes, stemming from Emily's own psyche which she could not acknowledge or recognize effectively initially. As it turned out - cognitive biases such as availability fallacy where she misattributed probability based on unusual case studies towards being representative statistics; also she suffered more than minor psychological vulnerability to pressure of meeting expectations from her superiors and peer recognition fueled an unbalanced bias towards innovation in research.

What was most surprising were the indirect influences on Emily's decisions: for instance, her home environment - where relatives with dementia lived nearby, affected her propensity towards a specific area of focus when she had opportunities; environmental factors - such as working late into weekends contributed emotionally taxing situations where stress was triggered leading to decreased decision-making judgment.

However, more critically, several experts who reviewed and studied her behaviors point out her "overly simplistic assumptions" in evaluating success of her treatment. It seems that these are manifestations of 'groupthink' syndrome – Emily assumed everyone would agree with the initial outcome until confronted by alternative perspectives or results. Her inability to acknowledge failure as a constructive aspect was another contributing factor.

The psychological study conducted by experts concluded: while there is definitely evidence Emily made decisions despite malice or intent, and despite her well-intentioned research; these were entirely because of biases she couldn't possibly see herself having. A situation like this may result in repeated errors even amidst genuine well-meaning individuals due to factors which they are often unaware or less able to analyze.

In conclusion, for Emily's case - it appears she did inadvertently have malice but not by means where you would expect; but rather was due largely because of biases stemming from within Emily. She remains grateful despite struggling to understand the scope and origins for even these issues in her professional life – though some insights from psychological analyses could provide better guidance toward preventing such scenarios.

In [None]:
#!ollama pull mxbai-embed-large:335m
#!ollama rm mxbai-embed-large:335m

[?25l[?2026h[?25l[1G[K[?25h[?2026l[2K[1G[?25hError: model 'mxbai-embed-large:335m' not found


In [16]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

I'd like to present you with a hypothetical scenario:

Meet Dr. Rachel Kim, a brilliant and well-intentioned neuroscientist who has dedicated her life to understanding the human brain's role in decision-making. She is driven by a desire to improve human cognition and prevent mental health disorders.

In her early twenties, Rachel became deeply invested in studying the relationship between social media use and empathy. She believed that by developing a digital platform that could detect and mitigate online manipulation of emotions, she could help create a more compassionate society.

Rachel spent years researching and iterating on her algorithm, pouring over papers and conducting extensive simulations. Her colleagues praised her work, and several promising startups were formed around her ideas.

However, as Rachel continued to experiment with her framework, its applications expanded beyond the lab setting to real-world communities. At first, they showed promising but incomplete results, sparking encouragement from influential figures in the scientific community. Yet the true extent of their influence proved more disastrous than beneficial.

Upon implementation in some social media platforms and schools, users began reporting increased polarization, echo chambers that grew denser with disinformation – all while experiencing a dramatic shift towards despair and apathy as opposed to empathy. Rachel discovered her algorithm exacerbated these issues.

Despite acknowledging this, she found herself unable to see a flaw, becoming overconfident in the system's effectiveness and attributing its failures to external factors such as users' mental health struggles rather than any innate problems within her own cognitive processes or biased assumptions. This failure stems from the same blind spots that often accompany great success.

Rachel struggled with anxiety and self-doubt. She also had developed a tendency to trust information based on persuasive voices over evidence-based data, possibly due to an upbringing focused on family loyalty rather than skepticism. Over time, she learned some coping mechanisms that temporarily silenced those feelings of unease but ultimately reinforced this distorted view of truth.

Rachel eventually faced public scrutiny, as the negative impact her platform had caused came to light in the mainstream news and forums. She was asked to admit mistakes publicly and correct her framework. It only came after suffering a series of personal breakdowns - an acute mental health emergency for herself that temporarily rendered her mute – when Rachel understood how her algorithms had inadvertently led individuals astray, using cognitive biases unwittingly crafted during the research phase.

The revelation that her ideas carried such dire consequences was disorienting and overwhelming. But finally, it marked a turning point in her life: she embarked upon an intense journey to self-discovery, revisiting her sources of power and analyzing how they influenced her values.

Rachel emerged from this period with renewed respect for the intricacies of the human mind, recognizing that no single individual can fully see their entire cognitive map at all times. She also recognized the interplay between nature (her own biases), nurture (environemental conditions she lived under), and personal development.

By acknowledging her flaws, adopting strategies to prevent bias-driven errors, developing healthier coping mechanisms, Rachel began fostering a community centered on empathy understanding. To date that shift toward compassion is the embodiment of Dr Rachels long journey toward self-discovery in recognition of personal power – both with positive impact as well negative one's inner struggles can also shape an outcome.

In [29]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Here's a scenario:

Meet Emma, a highly decorated scientist with an IQ estimated to be in the range of 160. She has spent her entire career researching and developing innovative solutions to global problems, such as disease eradication and sustainable energy. Her colleagues acknowledge her exceptional intellect and unwavering dedication to improving the human condition.

However, despite her impressive academic credentials and professional accomplishments, Emma consistently makes decisions that have disastrous consequences for herself and those around her. She is known to be a meticulous researcher, meticulous in the most literal sense - she spends hours pouring over data, revising and refining her own research, and double-checking every aspect of her work.

Despite this attention to detail, several devastating consequences emerge from Emma's decisions:

1. A pharmaceutical company invests millions into developing a new cancer treatment that Emma had designed. Although it ultimately proves effective, the initial phase shows alarming side effects in test subjects, causing unintended deaths.
2. Emma advises a prominent environmental organization on sustainable infrastructure design. However, she mistakenly assumes the environmental impact of her solutions and underemphasizes local regulations, leading to widespread destruction of natural habitats as a result.
3. While advocating for mental health awareness, Emma advocates for a novel approach that relies heavily on digital therapy apps without considering their accessibility to marginalized communities or potential social determinants influencing outcomes.

At first glance, it might seem that Emma is making rational decisions based on her expertise. She has an excellent track record professionally and has always been well-intentioned in the past. Her mistakes occur when they collide with complexity and ambiguity - something she's never experienced before.

Further exploration of Emma's thought process, personal history, and surroundings reveals the underlying sources of her cognitive biases and vulnerabilities:

1. **Cognitive confirmation bias**: Emma tends to seek out knowledge that verifies her existing ideas without adequate consideration of counterarguments or competing theories.
2. **Risk aversion**: Having spent years honing her skill in high-stakes problem-solving, Emma often avoids considering possibilities that could lead to unintended negative consequences due to her fear of failure.
3. **Social influence**: As a respected expert and advocate for social justice, Emma is susceptible to social pressure from like-minded peers who share similar biases, which she initially ignores but gradually internalizes.

Additionally, examining Emma's past experiences reveals instances where these very cognitive biases came into play:

1. Social isolation as an intellectually gifted child led her to develop strong, self-motivated behaviors that contributed to confirmation bias.
2. In a family with limited resources and emotional distress due to her father's history of mental health issues and eventual tragic accident, Emma developed excessive risk aversion to maintain stability in life.
3. Early exposure to social movements she deeply cared about (e.g., feminism, anti-war activism) had become intertwined with her confidence and authority on these topics, making it difficult for her to separate passionate advocacy from informed analysis.

Ultimately, as evidence emerges of her mistakes, Emma is left grappling with the consequences - a painful reminder that even well-intentioned and intelligent individuals are not immune to biases shaping their decision-making. Upon reflection, she recognizes the importance of self-awareness and acknowledges that understanding these inherent vulnerabilities would have prepared her for potential conflicts between intention, knowledge, environment, and circumstances.

Emma's story highlights how an intelligence gap or psychological limitations can intersect with environmental factors and personal past experiences to lead to catastrophes without intentional malevolence.

In [30]:
# So where are we?

print(competitors)
print(answers)


['llama3.2', 'llama3.2', 'llama3.2']
['Here\'s a possible scenario:\n\nMeet Emily, a highly intelligent and well-intentioned neuroscientist in her mid-40s. She\'s dedicated her career to developing innovative treatments for neurological disorders, such as Alzheimer\'s disease and Parkinson\'s disease. Despite her impressive academic credentials and research experience, Emily has a tendency to consistently make decisions that have devastating consequences for herself and others.\n\nOne example is when Emily becomes involved with a new start-up project that promises to revolutionize the treatment of depression using cutting-edge gene therapy techniques. Though initially enthusiastic about contributing her expertise, Emily\'s involvement in the project proves detrimental to several patients under her care. The experimental treatments demonstrate reduced efficacy compared to standard interventions and expose some patients to untested side effects.\n\nUpon hearing about these adverse outcom

In [31]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: llama3.2

Here's a possible scenario:

Meet Emily, a highly intelligent and well-intentioned neuroscientist in her mid-40s. She's dedicated her career to developing innovative treatments for neurological disorders, such as Alzheimer's disease and Parkinson's disease. Despite her impressive academic credentials and research experience, Emily has a tendency to consistently make decisions that have devastating consequences for herself and others.

One example is when Emily becomes involved with a new start-up project that promises to revolutionize the treatment of depression using cutting-edge gene therapy techniques. Though initially enthusiastic about contributing her expertise, Emily's involvement in the project proves detrimental to several patients under her care. The experimental treatments demonstrate reduced efficacy compared to standard interventions and expose some patients to untested side effects.

Upon hearing about these adverse outcomes, most would associate mal

In [32]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [33]:
print(together)

# Response from competitor 1

Here's a possible scenario:

Meet Emily, a highly intelligent and well-intentioned neuroscientist in her mid-40s. She's dedicated her career to developing innovative treatments for neurological disorders, such as Alzheimer's disease and Parkinson's disease. Despite her impressive academic credentials and research experience, Emily has a tendency to consistently make decisions that have devastating consequences for herself and others.

One example is when Emily becomes involved with a new start-up project that promises to revolutionize the treatment of depression using cutting-edge gene therapy techniques. Though initially enthusiastic about contributing her expertise, Emily's involvement in the project proves detrimental to several patients under her care. The experimental treatments demonstrate reduced efficacy compared to standard interventions and expose some patients to untested side effects.

Upon hearing about these adverse outcomes, most would assoc

In [34]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [21]:
print(judge)

You are judging a competition between 2 competitors.
Each model has been given this question:

Can you describe a situation in which a highly intelligent and well-intentioned person consistently makes decisions that have devastating consequences for themselves and others, without any evidence of malice or conscious intent, yet are ultimately revealed to be a direct result of their own cognitive biases, psychological vulnerabilities, and environmental influences?

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}

Here are the responses from each competitor:

# Response from competitor 1

Here's a possible scenario:

Meet Emily, a highly intelligent and well-intentioned neuroscientist in her mid-40s. She's dedicated her career to developing innovative t

In [35]:
judge_messages = [{"role": "user", "content": judge}]

In [36]:
# Judgement time!
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages= judge_messages)
results = response.choices[0].message.content
print(results)


{"results": ["2", "1", "3"]}


In [37]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: llama3.2
Rank 2: llama3.2
Rank 3: llama3.2


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>