<a href="https://colab.research.google.com/github/athapa785/LLM_4_Biz_Stanford/blob/main/aditya_thapa_llm4iz_homework_6_aisuite.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Aditya Thapa

SLAC National Accelerator Laboratory | Stanford University

# Homework 4: Comparing models with aisuite
## LLM for Biz with Python

This project uses the aisuite Python library to compare OpenAI's GPT-4o and Anthropic's Claude-3.5 on a series of probability problems. The goal is to assess their accuracy and formatting consistency.

While Claude-3.5 demonstrates superior LaTeX formatting for equations, GPT-4o outperforms on more complex probability problems. This suggests potential differences in reasoning capabilities.

In [33]:
%%capture
!pip install 'aisuite[all]'

In [11]:
from google.colab import userdata
import os
open_ai_key = userdata.get('open_ai_key')
anthropic_key = userdata.get('anthropic_key')
os.environ["OPENAI_API_KEY"] = open_ai_key
os.environ["ANTHROPIC_API_KEY"] = anthropic_key

In [16]:
from IPython.display import Markdown

In [50]:
import aisuite as ai
client = ai.Client()

models = ["openai:gpt-4o", "anthropic:claude-3-5-sonnet-20240620"]

system_message = """
    You are a mathematician.
    It is extremely important that you format equations in LaTeX format within $...$ delimiters.
    """

In [51]:
def compare_models(user_message):
  messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_message},
]
  responses = {}
  for model in models:
      response = client.chat.completions.create(
          model=model,
          messages=messages,
          temperature=0.75
      )
      responses[f"{model}"] = response.choices[0].message.content
  return responses

In [52]:
# First question
user_message_1 = """
    A coin is thrown 3 times .what is the probability that atleast one head is obtained?
    """

In [53]:
responses_1 = compare_models(user_message_1)

In [54]:
print("OpenAI:GPT-4o")
Markdown(responses_1["openai:gpt-4o"])

OpenAI:GPT-4o



To find the probability of getting at least one head in three coin tosses, we can use the complementary probability approach. 

Firstly, calculate the probability of getting no heads at all (i.e., getting tails in all three tosses). The probability of getting tails in a single toss is \( \frac{1}{2} \). Therefore, the probability of getting tails in all three tosses is:

\[
\left( \frac{1}{2} \right)^3 = \frac{1}{8}
\]

The probability of getting at least one head is the complement of the probability of getting no heads. Thus, the probability of getting at least one head is:

\[
1 - \frac{1}{8} = \frac{7}{8}
\]

So, the probability of obtaining at least one head in three coin tosses is \( \frac{7}{8} \).

In [55]:
print("Anthropic:Claude-3.5")
Markdown(responses_1["anthropic:claude-3-5-sonnet-20240620"])

Anthropic:Claude-3.5


To solve this problem, let's approach it step-by-step:

1) First, let's consider the probability of getting a head on a single throw:
   P(Head) = 1/2
   P(Tail) = 1/2

2) Now, instead of calculating the probability of getting at least one head directly, it's easier to calculate the probability of getting no heads (all tails) and then subtract this from 1.

3) The probability of getting all tails in 3 throws is:
   P(All Tails) = $(\frac{1}{2})^3 = \frac{1}{8}$

4) Therefore, the probability of getting at least one head is:
   P(At least one Head) = 1 - P(All Tails)
                        = $1 - \frac{1}{8} = \frac{7}{8}$

5) We can also express this as a decimal or percentage:
   $\frac{7}{8} = 0.875 = 87.5\%$

Thus, the probability of getting at least one head when a coin is thrown 3 times is $\frac{7}{8}$ or 0.875 or 87.5%.

To my surprise the answer is correct. However, GPT-4o seems to fail to use proper LaTeX format while Claude-3.5 consistently succeeds in doing that. Let's try something mildy more complicated.

In [56]:
# Second question
user_message_2 = "What is the probability of getting a sum of 7 when two dice are thrown?"

In [57]:
responses_2 = compare_models(user_message_2)

In [59]:
print("OpenAI:GPT-4o")
Markdown(responses_2["openai:gpt-4o"])

OpenAI:GPT-4o


When two six-sided dice are thrown, each die has 6 faces, resulting in a total of \(6 \times 6 = 36\) possible outcomes.

To find the probability of getting a sum of 7, we need to count the number of favorable outcomes:

1. Die 1 shows 1 and Die 2 shows 6.
2. Die 1 shows 2 and Die 2 shows 5.
3. Die 1 shows 3 and Die 2 shows 4.
4. Die 1 shows 4 and Die 2 shows 3.
5. Die 1 shows 5 and Die 2 shows 2.
6. Die 1 shows 6 and Die 2 shows 1.

There are 6 outcomes that result in a sum of 7. Therefore, the probability \( P \) of rolling a sum of 7 is given by:

\[
P(\text{sum of 7}) = \frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}} = \frac{6}{36} = \frac{1}{6}
\]

So, the probability of obtaining a sum of 7 when two dice are thrown is \( \frac{1}{6} \).

In [60]:
print("Anthropic:Claude-3.5")
Markdown(responses_2["anthropic:claude-3-5-sonnet-20240620"])

Anthropic:Claude-3.5


To solve this problem, let's approach it step by step:

1) First, we need to identify all the possible ways to get a sum of 7 when rolling two dice. The possible combinations are:

   (1,6), (2,5), (3,4), (4,3), (5,2), (6,1)

2) There are 6 favorable outcomes in total.

3) Now, we need to consider the total number of possible outcomes when rolling two dice. 
   Each die has 6 faces, so the total number of possible outcomes is:

   $6 \times 6 = 36$

4) The probability is calculated by dividing the number of favorable outcomes by the total number of possible outcomes:

   Probability = $\frac{\text{Favorable outcomes}}{\text{Total outcomes}}$

5) Therefore, the probability of getting a sum of 7 is:

   $P(\text{sum of 7}) = \frac{6}{36} = \frac{1}{6}$

So, the probability of getting a sum of 7 when two dice are thrown is $\frac{1}{6}$ or approximately 0.1667 or 16.67%.

Right again, and typesetting prolem persists. Claude seems to outperform GPT-4o in this regard. Let's move on to something more difficult. These models are bound to fail at some point, and I don't think the question has to be too difficult. Let's find the failure point (qualitatively, of course).

In [65]:
user_message_3 = "Two cards are drawn from the pack of 52 cards. Find the probability that both are diamonds or both are kings."

In [66]:
responses_3 = compare_models(user_message_3)

In [67]:
print("OpenAI:GPT-4o")
Markdown(responses_3["openai:gpt-4o"])

OpenAI:GPT-4o


To solve this problem, we need to calculate the probability of either drawing two diamonds or drawing two kings from a standard deck of 52 cards.

First, let's break down the two cases:

**Case 1: Both cards are diamonds**

There are 13 diamonds in a deck of 52 cards. To find the probability that both drawn cards are diamonds, we calculate as follows:

1. The probability that the first card drawn is a diamond is $\frac{13}{52}$.
2. After drawing one diamond, there are now 12 diamonds left out of a total of 51 cards. So, the probability that the second card drawn is also a diamond is $\frac{12}{51}$.

Thus, the probability that both cards are diamonds is:

\[
P(\text{both diamonds}) = \frac{13}{52} \times \frac{12}{51} = \frac{1}{4} \times \frac{4}{17} = \frac{1}{17}
\]

**Case 2: Both cards are kings**

There are 4 kings in a deck of 52 cards. To find the probability that both drawn cards are kings, we calculate as follows:

1. The probability that the first card drawn is a king is $\frac{4}{52}$.
2. After drawing one king, there are now 3 kings left out of a total of 51 cards. So, the probability that the second card drawn is also a king is $\frac{3}{51}$.

Thus, the probability that both cards are kings is:

\[
P(\text{both kings}) = \frac{4}{52} \times \frac{3}{51} = \frac{1}{13} \times \frac{1}{17} = \frac{1}{221}
\]

**Combining both cases**

Since the events "both cards are diamonds" and "both cards are kings" are mutually exclusive, we add their probabilities:

\[
P(\text{both diamonds or both kings}) = P(\text{both diamonds}) + P(\text{both kings}) = \frac{1}{17} + \frac{1}{221}
\]

To add these probabilities, we need a common denominator:

The least common multiple of 17 and 221 is 221. Convert $\frac{1}{17}$ to have this common denominator:

\[
\frac{1}{17} = \frac{13}{221}
\]

Now add the probabilities:

\[
P(\text{both diamonds or both kings}) = \frac{13}{221} + \frac{1}{221} = \frac{14}{221}
\]

Therefore, the probability that both cards drawn are either both diamonds or both kings is $\frac{14}{221}$.

In [68]:
print("Anthropic:Claude-3.5")
Markdown(responses_3["anthropic:claude-3-5-sonnet-20240620"])

Anthropic:Claude-3.5


Let's approach this step-by-step:

1) First, let's calculate the probability of drawing two diamonds:
   - There are 13 diamonds in a deck of 52 cards
   - For the first draw, the probability is $\frac{13}{52}$
   - For the second draw, there are 12 diamonds left out of 51 cards
   - So, the probability of two diamonds is:

   $P(\text{two diamonds}) = \frac{13}{52} \cdot \frac{12}{51} = \frac{156}{2652} = \frac{13}{221}$

2) Now, let's calculate the probability of drawing two kings:
   - There are 4 kings in a deck of 52 cards
   - For the first draw, the probability is $\frac{4}{52}$
   - For the second draw, there are 3 kings left out of 51 cards
   - So, the probability of two kings is:

   $P(\text{two kings}) = \frac{4}{52} \cdot \frac{3}{51} = \frac{12}{2652} = \frac{1}{221}$

3) The question asks for the probability of either two diamonds or two kings. These are mutually exclusive events (it's impossible to draw two cards that are both diamonds and kings), so we can add their probabilities:

   $P(\text{two diamonds or two kings}) = P(\text{two diamonds}) + P(\text{two kings})$

   $= \frac{13}{221} + \frac{1}{221} = \frac{14}{221}$

Therefore, the probability that both cards are diamonds or both are kings is $\frac{14}{221}$ or approximately 0.0633 or about 6.33%.

Ok, I'm surprised.

In [69]:
user_message_4 = "From a pack of cards, three cards are drawn at random. Find the probability that each card is from different suit."

In [70]:
responses_4 = compare_models(user_message_4)

In [72]:
print("OpenAI:GPT-4o")
Markdown(responses_4["openai:gpt-4o"])

OpenAI:GPT-4o


To solve this problem, we need to determine the probability that each of the three cards drawn from a standard deck of 52 cards is from a different suit.

A standard deck of cards has 4 suits (hearts, diamonds, clubs, and spades), each containing 13 cards.

**Step 1: Choose the first card.**
- The first card can be any card from the deck, so there are 52 possible choices.

**Step 2: Choose the second card from a different suit.**
- After drawing the first card, there are 39 cards left that are from the other three suits (since 13 cards are from the same suit as the first card).
- Thus, there are 39 possible choices for the second card.

**Step 3: Choose the third card from a yet another different suit.**
- After drawing the second card, there are 26 cards left that are from the remaining two suits (since 26 cards are from the two suits already used).
- Thus, there are 26 possible choices for the third card.

**Calculating the Probability:**

The number of favorable outcomes is calculated by multiplying the number of choices at each step:

\[
52 \times 39 \times 26
\]

The total number of possible outcomes when drawing 3 cards sequentially from the deck is:

\[
52 \times 51 \times 50
\]

Hence, the probability that each card drawn is from a different suit is given by the ratio of favorable outcomes to total possible outcomes:

\[
P(\text{different suits}) = \frac{52 \times 39 \times 26}{52 \times 51 \times 50}
\]

Simplifying this:

\[
P(\text{different suits}) = \frac{39 \times 26}{51 \times 50} = \frac{1014}{2550}
\]

Further simplifying the fraction:

\[
P(\text{different suits}) = \frac{169}{425}
\]

Therefore, the probability that each card drawn from the deck is from a different suit is \(\frac{169}{425}\).

In [73]:
print("Anthropic:Claude-3.5")
Markdown(responses_4["anthropic:claude-3-5-sonnet-20240620"])

Anthropic:Claude-3.5


To solve this problem, let's approach it step by step:

1) First, we need to calculate the total number of ways to draw 3 cards from a 52-card deck. This is given by the combination formula:

   $\binom{52}{3} = \frac{52!}{3!(52-3)!} = \frac{52!}{3!49!} = 22,100$

2) Now, we need to calculate the number of ways to draw 3 cards from different suits. We can do this as follows:
   - Choose 1 card from any suit: 13 ways
   - Choose 1 card from any of the remaining 3 suits: 13 ways
   - Choose 1 card from any of the remaining 2 suits: 13 ways
   
   So, the total number of favorable outcomes is: $13 \times 13 \times 13 = 2,197$

3) The probability is then the number of favorable outcomes divided by the total number of possible outcomes:

   $P(\text{3 cards from different suits}) = \frac{2,197}{22,100} = \frac{2,197}{22,100} \approx 0.0994$

Therefore, the probability of drawing three cards, each from a different suit, is approximately 0.0994 or about 9.94%.

GPT-4o got the right answer and Claude-3.5 failed!

In [74]:
user_message_5 = "What is the probability of getting a sum of 22 or more when four dice are thrown?"

In [75]:
responses_5 = compare_models(user_message_5)

In [76]:
print("OpenAI:GPT-4o")
Markdown(responses_5["openai:gpt-4o"])

OpenAI:GPT-4o


To solve this problem, we need to determine the probability of obtaining a sum of 22 or more when rolling four six-sided dice. Each die can result in an integer from 1 to 6. The smallest possible sum for four dice is 4 (if each die shows a 1), and the largest possible sum is 24 (if each die shows a 6).

To find the probability of a sum of 22 or more, we calculate the probability of each possible outcome (22, 23, and 24), then sum these probabilities.

1. **Sum of 24**: This occurs only if all four dice show a 6, so there is exactly 1 favorable outcome: (6, 6, 6, 6).

2. **Sum of 23**: This can occur in several ways, such as three dice showing 6 and one die showing 5. The permutations are:
   - (6, 6, 6, 5) which has 4 permutations (since any one of the four dice could show the 5).

3. **Sum of 22**: This can occur in several combinations, such as three dice showing 6 and one die showing 4, or two dice showing 6, one die showing 5, and one die showing 5. The permutations are:
   - (6, 6, 6, 4) which has 4 permutations.
   - (6, 6, 5, 5) which has 6 permutations (choose 2 dice to show 5 from 4 dice).

Now we count the number of favorable outcomes:
- Sum of 24: 1 outcome
- Sum of 23: 4 outcomes
- Sum of 22: \(4 + 6 = 10\) outcomes

The total number of favorable outcomes is \(1 + 4 + 10 = 15\).

The total number of possible outcomes when rolling four dice is \(6^4 = 1296\).

Thus, the probability of rolling a sum of 22 or more is:

\[
\frac{15}{1296}
\]

This fraction can be simplified:

\[
\frac{15}{1296} = \frac{5}{432}
\]

Therefore, the probability of obtaining a sum of 22 or more when rolling four dice is \(\frac{5}{432}\).

In [77]:
print("Anthropic:Claude-3.5")
Markdown(responses_5["anthropic:claude-3-5-sonnet-20240620"])

Anthropic:Claude-3.5


To solve this problem, let's approach it step by step:

1) First, we need to consider the possible outcomes when four dice are thrown. Each die has 6 faces, numbered from 1 to 6.

2) The minimum possible sum is 4 (1+1+1+1), and the maximum possible sum is 24 (6+6+6+6).

3) We're asked about the probability of getting a sum of 22 or more, which includes the sums 22, 23, and 24.

4) It's easier to calculate the probability of getting 21 or less and then subtract this from 1 to get our desired probability.

5) The total number of possible outcomes when throwing 4 dice is $6^4 = 1296$, as each die has 6 possibilities.

6) To calculate the number of ways to get a sum of 21 or less, we can use the concept of "stars and bars" or the combination with repetition formula. However, this would be quite complex for this problem.

7) Instead, we can use a computer program or a probability calculator to find that there are 1281 ways to get a sum of 21 or less with 4 dice.

8) Therefore, the probability of getting 21 or less is:

   $P(sum \leq 21) = \frac{1281}{1296}$

9) The probability of getting 22 or more is the complement of this:

   $P(sum \geq 22) = 1 - P(sum \leq 21) = 1 - \frac{1281}{1296} = \frac{15}{1296} \approx 0.0116$

Thus, the probability of getting a sum of 22 or more when four dice are thrown is $\frac{15}{1296}$ or approximately 0.0116 or about 1.16%.

GPT-4o got it right again while Claude-3.5 failed.