### Setup

In [1]:
# reload imports.
%load_ext autoreload
%autoreload 2

In [9]:
import os
import json

# Gemotry - t
folder_path = "./MATH/test/geometry/"
json_files = [f for f in os.listdir(folder_path) if f.endswith('.json')]

# Store each in list
json_objects = []
for file in json_files:
    file_path = os.path.join(folder_path, file)
    with open(file_path, 'r') as f:
        json_data = json.load(f)
        json_data["file_path"]=file_path # Add file path so we can keep track of them easily
        json_objects.append(json_data)

print(len(json_objects))
filtered_json_objects = [obj for obj in json_objects if obj.get('level') == 'Level 5']
print(len(filtered_json_objects))

# Get 5 random ones
import random

random.seed(10)
samples = random.sample(filtered_json_objects, 5)

question = samples[0]['problem']
print(question)
print(samples[0]["file_path"])

479
132
Point $P$ is inside equilateral triangle $ABC$ such that the altitudes from $P$ to $\overline{AB}$, $\overline{BC}$, and $\overline{CA}$ have lengths 5, 6, and 7 respectively.  What is the area of triangle $ABC$?
./MATH/test/geometry/990.json


### Idea

Maybe I should try asking it to solve to the end to start, and have it return a list of steps. Then I can go back and see how many are correct?
We can limit this by only having it create max 5 steps.

Then we can go through step by step. For each step->

** For now, avoid this recursive breaking up **

1. Should we break this further into steps? is it too detailed?
    1. If so, repeat this process and recurse
2. If not, then we analyze this step, and extract the following things
    - Proven mathematical relationships - written as formulas
    - Problem relationships - written without math, just words. Some sort of logical condition in the project
    - Intermediate math relationships - numerical relationships between parts of the problem. Only valid in this context. Uses a math formula and some specific knowledge. Break up into a proven math rel and a problem rel

3. We need to verify the above. 
    - Proven math relations can be verified by asking
    - Problem relationships verified by asking as well
    - Intermediate math relationships should have any math relations verified, then any problem relationships verified, then finally they should be executed and verified with code

4. Now the step is good to go. We can re-write it to show the verified above things. We can also store the proven relationships and problem relationships seperately. 

5. Now if at any point in the larger step we had a problem, we are going to have to recalculate the rest of the steps. If it's the same we can continue through the steps, but if its different it may change the next ones.
    - I think this is fine. We need some way of error correcting, and propogating that correction

6. Now this is all very good for fixing problems with steps, but what if the whole approach is wrong in some way? Eventually we will build up a list of correct steps, but not all of them will be relevant. And there may be a better order? for some. (Though the order should be fixed by regenerating steps each time.. hmm)
Not sure what to do *yet* here

Original step by step idea->
1. Ask what it would like to do - what is the first step, given what we have.
2. Then analyze the step it returns - what is the rationale behind it? What outside theorems does it use? What new assumptions does it make?
3. Check each assumption individually, and iterate until we get a good first step.
4. Then do this step, update the knowns, and then repeat the process until we get the objective

In [11]:
# 1. Try and solve the original problem, using max 5 steps. Have it format them w/ schema.
# Go step by step..
from utils.async_gpt import agenerate_from_gpt_with_schema, agenerate_from_gpt
from pydantic import BaseModel

class Steps(BaseModel):
    steps: list[str]
    summary: str

# Writing out the question here. I want to change the numbers to check if its still correct
question = """Point $P$ is inside equilateral triangle $ABC$ such that the altitudes from $P$ to $\overline{AB}$, $\overline{BC}$, and $\overline{CA}$ have lengths 8, 6, and 7 respectively.  What is the area of triangle $ABC$?
"""

create_steps_prompt = """
Given the question:
{question}

Return a series of steps to solve the question. Return a maximum of 5 steps. Be detailed.
Additionally, give a brief summary on the overall strategy, or any key points
"""

async def create_steps(question):
    """
    Create steps to solve a question
    """
    messages = [
        {
            "role": "user",
            "content": create_steps_prompt.format(question=question),
        },
    ]
    steps: Steps = await agenerate_from_gpt_with_schema(
        messages, Steps
    )

    return steps

steps = await create_steps(question)
print(steps)

steps=['1. **Understanding the Problem: Relationship Between Altitudes and Area of Triangle**\n   - For any point P inside an equilateral triangle ABC, the sum of the perpendicular distances from P to each side of the triangle is equal to the length of the altitude of the equilateral triangle.\n   - Let h represent the altitude of triangle ABC. Then, \\( h = 8 + 6 + 7 = 21 \\).', '2. **Relating Altitude to the Side Length of Triangle ABC**\n   - The altitude (h) of an equilateral triangle with side length s is given by \\( h = \\frac{s\\sqrt{3}}{2} \\).\n   - Using the obtained altitude \\( h = 21 \\), set up the equation: \\[ 21 = \\frac{s\\sqrt{3}}{2} \\].', '3. **Solving for the Side Length (s) of Triangle ABC**\n   - Rearrange the equation: \\( s\\sqrt{3} = 42 \\ ).\n   - Solve for s: \\( s = \\frac{42}{\\sqrt{3}} = 14\\sqrt{3}/3 \\).', '4. **Finding the Area Using the Side Length**\n   - Recall the formula for the area of an equilateral triangle: \\( \\text{Area} = \\frac{s^2\\sqr

In [None]:
# When I change the question manually, it seems to get it wrong. I should check this out by running 20 or so times each. 
# THis is kind of expected, after reading that paper about how the models are likely learning common problems

In [20]:
# 2. Let's go through step by step and see if each step is good. 
# Generate an explanation for why or why not it's correct. Then return true/false. If any step is wrong, we re-calculate the next steps
from utils.async_gpt import agenerate_from_gpt_with_schema, agenerate_from_gpt
from pydantic import BaseModel

class VerifyStep(BaseModel):
    reasoning: str
    correct: bool


verify_step_prompt = """
Given the following info, verify if the CURRENT STEP is correct. 
Assume the question and any previous steps are correct.
ONLY VERIFY the current step.
Return the reasoning for why or why not it is correct, and a bool for if it is correct or not

Existing Info ====
Question:
{question}

{prev_steps_str}

New ====
Current step:
{current_step}

"""

async def verify_step(question: str, prev_steps: list[str], current_step: str):
    """
    Verify a step given the previous ones. 
    * Note: we do not care if this step is helpful towards the objective. We're just checking the assumptions it makes.
    """
    if len(prev_steps) == 0:
        prev_steps_str = ""
    else:
        # Turn prev steps to str
        numbered_list = "\n".join([f"Step {i+1}:\n {step}" for i, step in enumerate(prev_steps)])
        prev_steps_str = f"Previous Steps:\n {numbered_list}"

    messages = [
        {
            "role": "user",
            "content": verify_step_prompt.format(question=question, prev_steps_str=prev_steps_str, current_step=current_step),
        },
    ]
    verify: VerifyStep = await agenerate_from_gpt_with_schema(
        messages, VerifyStep
    )

    return verify


In [24]:
# 2.1 If a step is wrong, we'll want to correct it.
# For now lets try giving the function all the info. We will give it the existing info, the previous incorrect step, and the reasoning behind it
from utils.async_gpt import agenerate_from_gpt


previous_info_prompt = """
Existing True Information ====
Question:
{question}

{prev_steps_str}
"""

wrong_step_prompt = """
New Information ===
Current step (incorrect):
{current_step}

Provided reason why current step is incorrect:
{reason}
"""

ask="""
The existing true information has been verified. It shows the preceding steps.
The new information shows the generated current step. This current step contains an error in it. The reason for the error is given.

Your job is to rewrite this current step so that it is CORRECT. Use the reason given to fix the current step.
Refer back to the existing true information for verified assumptions. Do not add new ideas. 
Fix the current step for the reason provided, and return ONLY the new current step.
"""


async def fix_step(question: str, prev_steps: list[str], current_step: str, reason: str):
    """
    Fix a step that was said to be wrong. Returns the new step.
    Currently we are giving all the information. In the future consider limiting scope
    """
    if len(prev_steps) == 0:
        prev_steps_str = ""
    else:
        # Turn prev steps to str
        numbered_list = "\n".join([f"Step {i+1}:\n {step}" for i, step in enumerate(prev_steps)])
        prev_steps_str = f"Previous Steps:\n {numbered_list}"

    messages = [
        {"role": "assistant",
            "content": previous_info_prompt.format(question=question, prev_steps_str=prev_steps_str)},

        {
            "role": "assistant",
            "content": wrong_step_prompt.format(current_step=current_step, reason=reason),
        },
        {
            "role": "user",
            "content": ask,
        },
    ]
    new_step = await agenerate_from_gpt(
        messages
    )

    return new_step


In [None]:
# 2.2 Once we've regenerated a step, we need to regenerate the rest of the steps as well..

In [23]:
# 3. Let's go step by step and verify the step. For now, we'll just print the verification. Once it fails, we'll stop the loop
prev_steps = []
for i, current_step in enumerate(steps.steps):
    verify = await verify_step(question, prev_steps, current_step)
    print(f"Step #{i+1}============\nCorrect: {verify.correct}\nReasoning: {verify.reasoning}", )
    if not verify.correct:
        break
    prev_steps.append(current_step) 

print("Done")

# If a step is done, let's correct i

Correct: True
Reasoning: In an equilateral triangle, the sum of perpendiculars from any point inside the triangle to the three sides is equal to the length of the triangle's altitude. This is a property of equilateral triangles that follows from symmetry and equal division of area by the internal perpendiculars. Given the problem statement, point P is inside triangle ABC, and the lengths of the altitudes from P to the sides are 8, 6, and 7, respectively. Therefore, the altitude h of the triangle should indeed be the sum of these altitudes: 21. 

Thus, the step correctly uses this geometric property to compute the altitude of triangle ABC.
Correct: True
Reasoning: The current step relates the altitude of the equilateral triangle to its side length using the correct formula for an equilateral triangle's altitude. For an equilateral triangle with side length $s$, the altitude is indeed given by \( h = \frac{s\sqrt{3}}{2} \). This formula correctly relates the side length to the altitude b

In [None]:
# 2. Analyze the step, and extract the following: 
# proven mathematical relationships (external)
# Problem relationships (conditions and 2nd level conditions). These need to be supported by the question.
# Intermediate math relationships - furtther extract from each one,
    # proven math rels
    # problem rels

In [None]:
# 3. Verify and *correct* all above. 
# If a math relationship was wrong, we want to correct it by asking gpt
# If a problem relationship was wrong, we want to correct it by asking gpt
# If an intermediate math relationship was wrong, we want to correct it by fixing the problem rel or the math rel, and then re-calculating w/ code assistant

# We should only calculate each relationship once. So if an intermediate math relationship relies on a proven math rel or a problem rel,
# and one of those rels are wrong, it has to be re-calculated.

In [None]:
# 4. Now we have verified the step. If it changed we need to re-run the step generation here. We will re-run # 1 but ask it to generate 4
# steps now, and give it the first step.

In [None]:
# 5. We should do some other stuff here..