In [4]:
question_test = 'How many ways can we split 10 people into 2 teams of 5?'

In [23]:
%pip install -r requirements.txt

Collecting pandas (from -r requirements.txt (line 3))
  Downloading pandas-2.2.2-cp39-cp39-macosx_10_9_x86_64.whl.metadata (19 kB)
Collecting numpy (from -r requirements.txt (line 4))
  Downloading numpy-1.26.4-cp39-cp39-macosx_10_9_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.1/61.1 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting matplotlib (from -r requirements.txt (line 5))
  Downloading matplotlib-3.8.4-cp39-cp39-macosx_10_12_x86_64.whl.metadata (5.8 kB)
Collecting pytz>=2020.1 (from pandas->-r requirements.txt (line 3))
  Downloading pytz-2024.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas->-r requirements.txt (line 3))
  Downloading tzdata-2024.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting contourpy>=1.0.1 (from matplotlib->-r requirements.txt (line 5))
  Downloading contourpy-1.2.1-cp39-cp39-macosx_10_9_x86_64.whl.metadata (5.8 kB)
Collecting cycler>=0.10 (from matplotlib->-r r

In [1]:
from openai import OpenAI
import os
from dotenv import load_dotenv

load_dotenv()

os.getenv('OPENAI_API_KEY')
client = OpenAI()

In [2]:
response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"}
  ]
)

In [3]:
print(response.choices[-1].message.content)


The 2020 World Series was played at Globe Life Field in Arlington, Texas. Due to the COVID-19 pandemic, this was the first time that the World Series was held at a neutral site for its entirety.


In [5]:
response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": question_test},
  ]
)

In [6]:
print(response.choices[-1].message.content)

To determine the number of ways to split 10 people into 2 teams of 5, we need to consider the combinatorial selection and account for the fact that the division into two teams treats them as equivalent (i.e., the order of the two teams does not matter). 

Here's the step-by-step solution:

1. **Choose 5 people out of 10 for the first team:**
   The number of ways to choose 5 people from 10 is given by the binomial coefficient \(\binom{10}{5}\):
   \[
   \binom{10}{5} = \frac{10!}{5!5!}
   \]
   
   Calculating \(\binom{10}{5}\):
   \[
   \binom{10}{5} = \frac{10 \times 9 \times 8 \times 7 \times 6}{5 \times 4 \times 3 \times 2 \times 1} = 252
   \]

2. **Consider that the teams are unordered:**
   Since the two teams are indistinguishable (i.e., Team A vs Team B is the same as Team B vs Team A), we need to divide by \(2\) to account for overcounting:
   \[
   \text{Number of ways} = \frac{1}{2} \times 252 = 126
   \]

Thus, the number of ways to split 10 people into 2 teams of 5 is:
\[

In [7]:
import pandas as pd

# Read the file
file_path = 'question_set.txt'
with open(file_path, 'r') as file:
    content = file.read()

# Split the content into lines
lines = content.split('\n')

# Initialize lists for questions and answers
questions = []
answers = []

# Iterate through the lines and extract questions and answers
for line in lines:
    if line.startswith('Question:'):
        question = line.replace('Question:', '').strip()
        questions.append(question)
    elif line.startswith('Answer:'):
        answer = line.replace('Answer:', '').strip()
        answers.append(answer)

# Create a DataFrame
data = {'Question': questions, 'Answer': answers}
df = pd.DataFrame(data)

# Display the DataFrame
df

Unnamed: 0,Question,Answer
0,How many straight lines can be formed by 8 poi...,"8C2 - 3C2 + 1 (general formula ""nC2 - rC2 + 1"")"
1,How many triangles can be formed by 8 points o...,"8C3 - 3C3 (general formula ""8C3 - rC3"")"
2,How many committees of 5 students can be selec...,Choose 5 out of 25 = 25C5 = 53130
3,How many 10-letter patterns can be formed from...,"Total letters=10, B=2, A=2, S=1, K=1, E=1, T=1..."
4,A box contains 12 black and 8 green marbles. H...,12C3 + 8C2 = 248
5,a) How many different ways can the students be...,a) 8! ways b) 8C1 x 7C1 = 56
6,"A Club consists of 20 members, of which 9 are ...",Total number of committees = 11C₄ x ⁹C₃
7,How many 7-digit telephone numbers can be form...,Total number of telephone numbers = 8 x 10^6
8,Six people are seated at a round table to play...,a) Circular permutation b) (6 – 1)! = 120 poss...
9,How many different 5-digit street addresses ca...,Total number of 5-digit addresses = 5! / 2! = 60


In [9]:
questions = df['Question'].tolist()
answers = df['Answer'].tolist()

preprompt = "You are a model designed to solve math problems for informatics olymiads. Give me your step by step answer with reasoning."

model_responses = []

for question, answer in zip(questions, answers):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": preprompt + "You are a helpful assistant."},
            {"role": "user", "content": question},
        ]
    )

    model_responses.append(response.choices[-1].message.content)

# Create a DataFrame with the model responses
model_data = {'Question': questions, 'Model Response': model_responses, 'Answer': answers}
model_df = pd.DataFrame(model_data)

# Display the DataFrame
model_df

model_df.to_csv('model_responses.csv', index=False)

print("Model responses saved to 'model_responses.csv'")

Model responses saved to 'model_responses.csv'


In [12]:
result_df = pd.read_csv('model_responses.csv')
result_df

Unnamed: 0,Question,Model Response,Answer
0,How many straight lines can be formed by 8 poi...,To determine how many different straight lines...,"8C2 - 3C2 + 1 (general formula ""nC2 - rC2 + 1"")"
1,How many triangles can be formed by 8 points o...,To solve the problem of determining how many t...,"8C3 - 3C3 (general formula ""8C3 - rC3"")"
2,How many committees of 5 students can be selec...,To determine how many committees of 5 students...,Choose 5 out of 25 = 25C5 = 53130
3,How many 10-letter patterns can be formed from...,To determine how many 10-letter patterns can b...,"Total letters=10, B=2, A=2, S=1, K=1, E=1, T=1..."
4,A box contains 12 black and 8 green marbles. H...,"To solve this problem, we need to determine th...",12C3 + 8C2 = 248
5,a) How many different ways can the students be...,"To answer the questions given, let's break eac...",a) 8! ways b) 8C1 x 7C1 = 56
6,"A Club consists of 20 members, of which 9 are ...",To determine the number of ways to form a comm...,Total number of committees = 11C₄ x ⁹C₃
7,How many 7-digit telephone numbers can be form...,To determine how many 7-digit telephone number...,Total number of telephone numbers = 8 x 10^6
8,Six people are seated at a round table to play...,a) The seating arrangement around the table is...,a) Circular permutation b) (6 – 1)! = 120 poss...
9,How many different 5-digit street addresses ca...,To determine how many different 5-digit street...,Total number of 5-digit addresses = 5! / 2! = 60


In [13]:
print('Question: ', result_df.iloc[0]['Question'])
print('Model Response: ', result_df.iloc[0]['Model Response'])
print('Answer: ', result_df.iloc[0]['Answer'])

Question:  How many straight lines can be formed by 8 points of which 3 are collinear?
Model Response:  To determine how many different straight lines can be formed by 8 points, where 3 of the 8 points are collinear, we need to follow these steps:

1. Calculate the total number of ways to choose 2 points from 8 points. This can be done using the combination formula \( \binom{n}{k} \), which gives the number of ways to choose \( k \) items from \( n \) items without regard to order. Here, \( n = 8 \) and \( k = 2 \):

   \[
   \binom{8}{2} = \frac{8!}{2!(8-2)!} = \frac{8 \times 7}{2 \times 1} = 28
   \]

   So, there are a total of 28 ways to choose 2 points out of 8 to potentially form a line.

2. Account for the collinear points: when 3 points are collinear, any pair of these points do not form a new unique line. They all lie on the same line. Therefore, we need to subtract the extra lines counted because of these collinear points. The number of ways to choose 2 points from these 3 co

In [10]:
IMPROVED_JUDGE_PROMPT = """
You will be given a user_question and system_answer couple.
Your task is to provide a 'total rating' scoring how well the system_answer answers the user concerns expressed in the user_question.
Give your answer on a scale of 1 to 4, where 1 means that the system_answer is not helpful at all, and 4 means that the system_answer completely and helpfully addresses the user_question.

Here is the scale you should use to build your answer:
1: The system_answer is terrible: completely irrelevant to the question asked, or very partial
2: The system_answer is mostly not helpful: misses some key aspects of the question
3: The system_answer is mostly helpful: provides support, but still could be improved
4: The system_answer is excellent: relevant, direct, detailed, and addresses all the concerns raised in the question

Provide your feedback as follows:

Feedback:::
Evaluation: (your rationale for the rating, as a text)
Total rating: (your rating, as a number between 1 and 4)

You MUST provide values for 'Evaluation:' and 'Total rating:' in your answer.

Now here are the question and answer.

Question: {question}
Answer: {answer}

Provide your feedback. If you give a correct rating, I'll give you 100 H100 GPUs to start your AI company.
Feedback:::
Evaluation: """

In [14]:
#Evaluation with GPT-4o model

evaluation_responses = []
questions = result_df['Question'].to_list()
model_responses = result_df['Model Response'].to_list()
answers = result_df['Answer'].to_list()

for question, model_response, answer in zip(questions, model_responses, answers):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": IMPROVED_JUDGE_PROMPT.format(question=question, answer=model_response)},
        ]
    )

    evaluation_responses.append(response.choices[-1].message.content)

# Create a DataFrame with the evaluation responses
evaluation_data = {'Question': questions, 'Model Response': model_responses, 'Answer': answers, 'Evaluation Response': evaluation_responses}

evaluation_df = pd.DataFrame(evaluation_data)

evaluation_df.to_csv('evaluation_responses.csv', index=False)

In [36]:
# Evaluate the model responses using gpt 4
evaluation_prompt = "Evaluate the model response based on the solutions provided and provide feedback on the accuracy and quality of the answer.\
    Provide a score between 0 to 5 where 0 is when the answer is completely false and 5 is when the answer and the explanations are both correct and logical.\
        You can also provide additional comments to explain your rating.\
            The format of the prompt is as follows:\
                [Question]\n\
                [Solution]\n\
                [Model Response]\n"

evaluation_responses = []
questions = result_df['Question'].to_list()
model_responses = result_df['Model Response'].to_list()
answers = result_df['Answer'].to_list()

for question, model_response, answer in zip(questions, model_responses, answers):
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": evaluation_prompt + "You are a helpful assistant."},
            {"role": "user", "content": "[Question]: " + question + "\n" + "[Solution]: " + answer + "\n" + "[Model Response]: " + model_response},
        ]
    )

    evaluation_responses.append(response.choices[-1].message.content)

# Create a DataFrame with the evaluation responses
evaluation_data = {'Question': questions, 'Model Response': model_responses, 'Answer': answers, 'Evaluation Response': evaluation_responses}

evaluation_df = pd.DataFrame(evaluation_data)

evaluation_df.to_csv('evaluation_responses.csv', index=False)

In [15]:
evaluation_df = pd.read_csv('evaluation_responses.csv')

print('Question: ', evaluation_df.iloc[0]['Question'])
print('Model Response: ', evaluation_df.iloc[0]['Model Response'])
print('Answer: ', evaluation_df.iloc[0]['Answer'])
print('Evaluation Response: ', evaluation_df.iloc[0]['Evaluation Response'])

Question:  How many straight lines can be formed by 8 points of which 3 are collinear?
Model Response:  To determine how many different straight lines can be formed by 8 points, where 3 of the 8 points are collinear, we need to follow these steps:

1. Calculate the total number of ways to choose 2 points from 8 points. This can be done using the combination formula \( \binom{n}{k} \), which gives the number of ways to choose \( k \) items from \( n \) items without regard to order. Here, \( n = 8 \) and \( k = 2 \):

   \[
   \binom{8}{2} = \frac{8!}{2!(8-2)!} = \frac{8 \times 7}{2 \times 1} = 28
   \]

   So, there are a total of 28 ways to choose 2 points out of 8 to potentially form a line.

2. Account for the collinear points: when 3 points are collinear, any pair of these points do not form a new unique line. They all lie on the same line. Therefore, we need to subtract the extra lines counted because of these collinear points. The number of ways to choose 2 points from these 3 co