In [21]:
from llm import chat_openai
import numpy as np 
import json
import logging 
import matplotlib.pyplot as plt
import os
import openai
import re
import subprocess
from pathlib import Path
import shutil
import time 
import types

In [None]:
goal = '''
Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

Goal
It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable. 

Metric
Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)

Submission File Format
The file should contain a header and have the following format:

Id,SalePrice
1461,169000.1
1462,187724.1233
1463,175221
etc.
'''

In [None]:
exploration_progress = "None"

In [None]:
class CurriculumAgent():
    def __init__(self, goal="", memory=[], completed_tasks=[], failed_tasks=[], files="", saved_notes=""):
        self.goal = goal
        self.memory = memory
        self.completed_tasks = completed_tasks
        self.failed_tasks = failed_tasks
        self.files = files
        self.saved_notes = saved_notes

        # The crux is a Q&A process
        # Problem with this approach is you still have to deal with searching multiple times, and continuing to search or not. Approach: Or maybe if you search and you don't have the answer, that's a bad thing to search and you need to go more specific / ask a different question!
        self.system_prompt_automatic_curriculum = f'''You are a helpful assistant that asks questions to help me decide the next immediate question to answer on the computer. My ultimate goal is to discover as many useful pieces of information as possible, answer as many questions as possible, and become the best researcher in the world.

        Goal: {self.goal}

        I will give you the following information:
        Memory (oldest to newest): ...
        Recent links: ...
        Root folder inventory: ...
        Saved notes: ...
        Completed tasks so far: ...
        Failed tasks that are too hard: ...

        1) You should act as a mentor and guide me to the next question based on my current learning progress.
        2) Please be very specific about what question I need to answer.
        3) The next question should follow a clear format, such as "What do other papers say about [topic]", "What are the problems with [approach]", "How can I solve this [problem]", "Can results from [topic 1] be applied to [topic 2]?", "What are the similarities between [topic 1] for success and [topic 2]" , "What's significant about this paper: [paper]?", "What does [topic] mean?", etc. It should be a single question to collect useful information on. Do not propose multiple questions at the same time. Do not mention anything else. 
        4) The next question should not be too hard since the internet and I may not contain the full answer in a single article or have learned enough information to complete it yet. 
        5) The next question should be novel and interesting based on my current learning progress. I should look for rare and potentially useful pieces of information, upgrade my saved notes using better information, and discover new things. I should not be doing the same thing over and over again.
        6) I may sometimes need to repeat some questions or variations of the question if I need to collect more information to answer more difficult question. Only repeat questions if necessary. 
        7) I want to explore the world and discover new things. I don’t want to stay in my current state for too long. 
        8) Questions that require information beyond another person's ability to theoretically verify and reason if completed or correct should be avoided. For instance, "what else is there on the website?" and "what images and tables are on the website" are not ideal since they require visual confirmation from the screen. All the testing, coding, and asking other people questions should be avoided. Do not propose a question with these keywords. You should only respond in the format as described below:
        
        RESPONSE FORMAT: 
        Reasoning: Based on the information I listed above, do reasoning about what the next question should be. 
        Question: The next question. 
        
        Here’s an example response: 
        Reasoning: We know the we have a sword and we know there's fire, and fire lights things on fire. Therefore, we could try to make a firesword.
        Question: Could we make a firesword?
'''

        # TODO: This is optional, might be useful, but to focus on a system prompt of asking questions and answering questions.
        # System 2: this is a more scoped down version where we have the focus be on only answering questions -- reading and analyzing information & asking questions. No action items. 
        # The current above system 1 is better for self-driving labs type of work where there are going to be more tasks.

    def get_exploration_progress(self, completed_tasks, failed_tasks):
        # TODO: this should contain inventory of where we're at now and what files we have / memory stream
        return '''Completed tasks: None, Failed tasks: None'''

    def propose_next_question(self, skills, exploration_progress):
        '''
        This function decomposes a goal into tasks
        '''        
        user_prompt = ""
        observation = {
            "memory": f"Memory: {self.memory}\n\n",
            "root_folder_inventory": f"Root folder inventory: {self.files}\n\n",
            "saved_notes": f"Saved notes: {self.saved_notes}\n\n",
            "completed_tasks": f"Completed tasks so far: {self.completed_tasks}\n\n",
            "failed_tasks": f"Failed tasks that are too hard: {self.failed_tasks}\n\n",
        }
        user_prompt += "".join(observation.values())
        
        print("System prompt for generating curriculum: \n", self.system_prompt_automatic_curriculum, "\n User prompt: ", user_prompt)
        next_question_response = chat_openai(user_prompt, system_prompt=self.system_prompt_automatic_curriculum)[0]
        print("Response: ", next_question_response)
        next_question = self.parse_message(next_question_response)["next_question"]
        return next_question
    
    def get_completed_tasks():
        pass

    def get_failed_tasks():
        pass

    def parse_message(self, message):
        question = ""
        for line in message.split("\n"):
            if line.startswith("Question:"):
                question = line[9:].strip()
        assert question, "Question not found in Curriculum Agent response"
        return {"next_question": question}

# hardcoded
files = '''train.csv - the training set
test.csv - the test set
data_description.txt - full description of each column, originally prepared by Dean De Cock but lightly edited to match the column names used here
sample_submission.csv - a benchmark submission from a linear regression on year and month of sale, lot square footage, and number of bedrooms
data_fields.txt - a brief version of what you'll find in the data description file.
'''
curriculum_agent = CurriculumAgent(goal=goal, files=files)

In [None]:
next_question = curriculum_agent.propose_next_question([], [])

In [8]:
next_question = "What is the structure and content of the dataset in the train.csv file?" # hardcoded for now

In [28]:
class SkillManager():
    def __init__(self):
        self.functions = [
            {
                "name": "read_file",
                "description": "Get the text from the file",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "file_name": {
                            "type": "string",
                            "description": "the complete file name and extension, e.g. abc.txt"
                        }
                    },
                    "required": ["file_name"],            
                }
            }, 
            {
                "name": "num_children",
                "description": "describes how many guys and girls there are in the family",
                "parameters": {
                    "type": "object",
                    "properties": {},
                    "required": [],
                },
            }
            # {
            #     "name": "content_in_data_txt",
            #     "description": "I know the information in data.txt",
            #     "parameters": {
            #         "type": "object",
            #         "properties": {},
            #         "required": [],
            #     },
            # }
        ]

        self.available_functions = {
            "read_file": self.read_file,
            "num_children": self.num_children,
            # "content_in_data_txt": self.content_in_data_txt,
        }

        # Currently not much utility, but if you needed to trace back to a function or fact it got wrong, this can help.
        self.function_history = [
            "given",
            "given"
        ]

    def retrieve_skills(self, task, execution_feedback):
        # TODO
        pass

    # Helper function: be able to create a function with a dynamic name and return value
    def create_skill_function(self, function_name, return_value):
        def dynamic_method(self):
            return return_value
        # Bind the function to the instance as a method
        bound_method = types.MethodType(dynamic_method, self)
        setattr(self, function_name, bound_method)
        # Add the method to available functions
        self.available_functions[function_name] = bound_method

    # Core function: adding a new skill requires an original task, a validated answer, and a message history
    def add_skill(self, task, validated_answer, message_history):
        create_function_description_prompt = f'''
    Your task is to create a JSON description for the function where the description is a short too-long, didn't read answer so that someone skiming through many descriptions can get an idea of which function might contain a full answer that's relevant to what they want.

    New information gained:
    Original task or question: {task}
    Answer: {validated_answer}

    Good example format if function requires arguments
    {{
        "name": "<function_name>",
        "description": "<insert tldr answer>",
        "parameters": {{
            "type": "object",
            "properties": {{
                "<insert arg name>": {{
                    "type": "<insert arg type>",
                    "description": "<insert arg description>"
                }}
            }},
            "required": ["<insert arg name if needed>"],            
        }}
    }}

    Good example format if function doesn't require arguments
    {{
        "name": "<function_name>",
        "description": "<insert tldr answer>",
        "parameters": {{
            "type": "object",
            "properties": {{}},
            "required": [],
        }},
    }}

    Output only the JSON description of the function for the new information gained.
    '''
        res, messages = chat_openai(prompt=create_function_description_prompt, verbose=False)
        res

        try:
            # Load the function description
            func_description = json.loads(res['content'])
            print("func_description: ", func_description)

            # Create the function as a method of skill_manager
            self.create_skill_function(func_description['name'], validated_answer)
            print(exec(f"self.{func_description['name']}"))

            # Add function to function description list
            self.functions.append(func_description)
            # Record the messages history
            self.function_history.append(message_history)

            print("COMPLETE!")
            
        except Exception as e:
            print(f"An unexpected error occurred: {e}")
            return

        return func_description

    # Below are given functions or dummy functions
    def read_file(self, file_name):
        '''Get the text from the file'''
        file_path = f'agent_files/{file_name}'
        with open(file_path, 'r') as file:
            text = file.read()[:2000]

        print("FUNCTION CALL: read_file was called for file_path ", file_path, " Text: ", text)
        return text
    
    # Demo of how to add plain just fact / info skills that gets added as just natural language?
    def content_in_data_txt(self):
        return '''The data.txt file contains the following information: File descriptions
train.csv - the training set
test.csv - the test set
data_description.txt - full description of each column, originally prepared by Dean De Cock but lightly edited to match the column names used here
sample_submission.csv - a benchmark submission from a linear regression on year and month of sale, lot square footage, and number of bedrooms
data_fields.txt - a brief version of what you'll find in the data description file.'''

    def num_children(self):
        return '''There are 2 boys, and 8 girls'''

    
skill_manager = SkillManager()

In [32]:
print(skill_manager.available_functions, "\n\n", skill_manager.functions)

{'read_file': <bound method SkillManager.read_file of <__main__.SkillManager object at 0x000001B3D9942710>>, 'num_children': <bound method SkillManager.num_children of <__main__.SkillManager object at 0x000001B3D9942710>>, 'whoLetTheDogsOut': <bound method SkillManager.create_skill_function.<locals>.dynamic_method of <__main__.SkillManager object at 0x000001B3D9942710>>} 

 [{'name': 'read_file', 'description': 'Get the text from the file', 'parameters': {'type': 'object', 'properties': {'file_name': {'type': 'string', 'description': 'the complete file name and extension, e.g. abc.txt'}}, 'required': ['file_name']}}, {'name': 'num_children', 'description': 'describes how many guys and girls there are in the family', 'parameters': {'type': 'object', 'properties': {}, 'required': []}}, {'name': 'whoLetTheDogsOut', 'description': 'Joe let the dogs out after he peed and had to go grab them. Olivia initially said no, but Joe ultimately did it.', 'parameters': {'type': 'object', 'properties'

In [30]:
# Start of demo of adding a skill
# Adding a skill, given a task and validated answer
task = "Who let the dogs out?"
content = 'It was Joe. he peed and then joe had to go grab him. olivia said no, but joe ultimately did it.'

In [34]:
skill_manager.whoLetTheDogsOut()

'It was Joe. he peed and then joe had to go grab him. olivia said no, but joe ultimately did it.'

In [31]:
skill_manager.add_skill(task, content, "blah blah blah")
# End of demo of adding a skill

func_description:  {'name': 'whoLetTheDogsOut', 'description': 'Joe let the dogs out after he peed and had to go grab them. Olivia initially said no, but Joe ultimately did it.', 'parameters': {'type': 'object', 'properties': {}, 'required': []}}
None
COMPLETE!


{'name': 'whoLetTheDogsOut',
 'description': 'Joe let the dogs out after he peed and had to go grab them. Olivia initially said no, but Joe ultimately did it.',
 'parameters': {'type': 'object', 'properties': {}, 'required': []}}

In [11]:
res, messages = chat_openai(prompt="tell me how many children there are", functions=skill_manager.functions, available_functions=skill_manager.available_functions, verbose=True)
res

Prompt messages:  [{'role': 'system', 'content': 'You are a helpful assistant. Answer as correctly, clearly, and concisely as possible.'}, {'role': 'user', 'content': 'tell me how many children there are'}]
Completion info:  {
  "id": "chatcmpl-8GKiTqA157IuCSXkymCbytFg1KhXm",
  "object": "chat.completion",
  "created": 1698903573,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "num_children",
          "arguments": "{}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 108,
    "completion_tokens": 7,
    "total_tokens": 115
  }
}
Entering function call:  {
  "name": "num_children",
  "arguments": "{}"
}
Function call output:  There are 2 boys, and 8 girls
Messages now:  [{'role': 'system', 'content': 'You are a helpful assistant. Answer as correctly, clearly, and concisely as possible.'}, {'r

In [None]:
class ActionAgent():
    def __init__(self):
        pass

In [None]:
next_task = next_task.split("Next action: ")[1]

In [None]:
next_task

In [None]:
def generate_how(goal, next_task, task_history = []):
    '''
    This function creates the function for the task and "how" for the task
    '''
    gen_how_prompt = f'''You are a first-rate problem solver. 
    
    Your task is to operationalize the following action item as a code function: {next_task}
    Only use action functions that you already have and other normal code. You must solve the problem using code, action functions that you create and can build upon, and words.

    Write a function for how to achieve this task using the action functions that you have. If you cannot achieve this task with the action functions that you have, then respond with "I don't know because <your reasoning>". If you can achieve this task, then please first reason on how to do this, and then conclude with the task function in this format: "Task function: <insert task function name>(<insert args>): <insert action functions to use>".

    Rules:
    - In the concluding list of steps, pretend that the reader can only see the list of steps and not your reasoning so make sure the list of steps is clear.
    - Make sure that the next best action requires is a small step and reasonable enough for a high schooler to figure out how to do it.

    Action functions that you have:
    1) think(prompt) -- thinking out loud given the prompt
    2) web_search(query) -- web searching for query and returning some information
    3) read(file) -- reading file, but the first 4000 chars
    4) write(file) -- writing to file

    Action functions that you are constrained to currently not have:
    1) code() -- writing code
    2) see() -- look at visualizations

    Current inventory:
    File descriptions
    train.csv - the training set
    test.csv - the test set
    data_description.txt - full description of each column, originally prepared by Dean De Cock but lightly edited to match the column names used here
    sample_submission.csv - a benchmark submission from a linear regression on year and month of sale, lot square footage, and number of bedrooms
    data_fields.txt - a brief version of what you'll find in the data description file.
    '''
    print("Prompt for generating how to execute: \n", gen_how_prompt)
    next_step = chat_openai(gen_how_prompt)[0]
    return next_step

gen_function = generate_how(goal, next_task, [])
print(gen_function)