Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement structured planning & evaluation #4107

Open
Boostrix opened this issue May 11, 2023 · 4 comments
Open

Implement structured planning & evaluation #4107

Boostrix opened this issue May 11, 2023 · 4 comments
Assignees
Labels
AI efficacy AutoGPT Agent function: planning meta Meta-issue about a topic that multiple issues already exist for re-arch Roadmapped Issues that were spawned by roadmap items

Comments

@Boostrix
Copy link
Contributor

Boostrix commented May 11, 2023

Discussion

Before we can start managing the workflow of the agent, we have to give it some more structure. The different ways to implement this can generally be subdivided into 3 groups:

💬 Your input is appreciated!
If you think something important is missing from this comparison, please leave a comment below or bring it up on Discord!

  1. Planning mechanism that is controlled by the agent

    • a. Let the agent manage its plan and to-do list through commands

      This approach seems non-ideal for a couple of reasons:

      • Adds commands to the executive prompt, increasing its complexity
      • Requires a full step for any self-management action → adds significant overhead
    • b. Add (an) output field(s) to the executive prompt through which the agent can manage its plan / to-do list

      More efficient than 1a, but unsure how well this will work:

      • Does not require extra steps just for self-management → limited additional overhead
      • This adds yet another function to the already complex prompt. Multi-purpose prompts are usually outperformed by a combination of multiple single-purpose prompts.
  2. Planning mechanism that is part of the agent loop

    • a. Extra step in the agent’s loop to evaluate progress and update the plan / to-do list.

      Good separation of concerns, but adding a sequential LLM-powered step to a loop will slow down the process.

      • Additional step = additional latency in every cycle
      • Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
    • b. Parallel thread that evaluates the agent’s performance and intervenes / gives feedback when needed.

      A compromise, trading sequential (= simple to implement) evaluation for zero additional latency:

      • Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
      • Asynchronous/parallel evaluation
        • zero/low additional latency
        • eval for step i will arrive after executing step i+1 → “wasted” resources+time on cycle i+1 if evaluator decides to intervene
    • c. Add output fields to the executive prompt through which the agent communicates its qualitative & quantitative assessment of the previous step.

      Quantitative output can be used directly to trigger an evaluation/intervention as in 2a, saving resources when this isn’t needed

    • Possible addition to a. and b.: rewind the agent’s state to an earlier point in time and adjust its parameters if an approach proves unproductive.
       

    💡 Note
    A component that evaluates the agent’s performance and provides it with feedback or other input is also in a good position to manage its directives.

  3. Planning mechanism that controls the agent

    • a. Compose plan consisting of subtasks & employ the agent to complete subtasks

      • i. Contextless (planning without knowledge of the agent’s abilities)
      • ii. Contextful (planning with knowledge of the agent’s abilities)
         

      Implementing planning outside the agent offers interesting possibilities, and may add value regardless of the agent’s own planning capabilities and mechanism. This could be a logical next step, towards multi-agent orchestration and/or tackling problems with arbitrary complexity.

Proposal

We propose a combination of solutions [2a] and [2c] based on the comparison above.

Why

  • [2a] provides for a relatively simple implementation
  • The additional latency of [2a] can be mitigated by using faster models. With narrowly scoped prompts, this shouldn’t be a problem.
  • The average additional latency of [2a] is reduced by combining it with [2c], only running the evaluation/planning step when necessary
  • [2c] doesn’t add much complexity to the existing executive prompt

To do

  1. Work out:

    1. Once we have a list of subtasks, how do we make the agent execute it?
      e.g. do we replace the agent’s task in the prompt by the current subtask?
    2. Should subtasks be specified like actionables or like sub-objectives, or both? Does it make a difference?
  2. Sketch amended agent loop diagram

  3. Prototype planning+evaluation prompt(s) (in a Jupyter notebook)

  4. Implement [2a]

  5. Work out:

    1. What are useful indicators for how a task is progressing?
      • "Is your approach to the problem working out?"

        → if not, evaluate & adjust plan

      • "Is this the last step needed to complete the [sub]task?"

        → if so, verify & move on to next subtask

  6. Implement [2c]

    1. Add thought outputs corresponding to the indicators from [5.i] to OneShotPromptStrategy (AKA the executive prompt)
    2. Implement mechanism to only run planning/evaluation when needed

Related

Earlier issues regarding improvements to tasking/planning:

This issue has been hijacked by a maintainer to replace+expand the solution proposal. Click here to see the original proposal by @Boostrix.

 
BIFs are built-in functions (aka commands), weighted by a chat_with_ai() instance for the given task at hand and their utility, and then recursively evaluated by adding them to a queue: #3933 (comment)
task-planning

task2

Motivation 🔦

At the very least, even without getting fancy, we should be able to get the LLM to provide a list of tasks and move those into a queue to recursively call the LLM to provide sub-tasks and work those, pushing/popping as needed (possibly involving sub-stacks, which is kinda logical to do once you think about multiple agents: #3549)

This would need to take place with certain constraints:

  • sequentially (for now)
  • requirements
  • constraints
  • mutual dependencies
@Boostrix
Copy link
Contributor Author

Boostrix commented Jul 3, 2023

The following blog article goes into detail about our lack of planning: https://lorenzopieri.com/autogpt_fix/

Subgoaling, the creation of intermediate tasks to achieve the user defined goals, is crucial to task-directed AIs such as AutoGPT. If even just a single key subgoal is missing, the whole plan fails. The current architecture is very primitive here: just ask the LLM to break down the goal into subtasks! There is something poetic about the simplicity of this approach, which is basically “to get the solution, just ask the LLM the solution”. If you are familiar with the exploitation-exploration trade off, this is a case of going all-in on exploitation: the best plan should already be in the train data or very easy to extrapolate from it. This is rarely the case in interesting tasks, so we want to introduce the ability to explore different possible subgoals, the ability to plan. More on this later, now let’s focus on the creation of subgoals. Breaking down a goal into subgoals is an active area of research, spanning approaches such as Creative Problem Solving, Hierarchical Reinforcement Learning and Goal Reasoning. For instance using Creative Problem Solving we can tackle situations where the initial concept space (the LLM knowledge base) is insufficient to lay down all the subgoals. We can use Boden’s three levels of creativity: 1. Exploration of the universal conceptual space. 2. Combining concepts within the initial conceptual space or 3. Applying functions or transformations to the initial conceptual space.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

@github-actions github-actions bot added the Stale label Sep 6, 2023
@Pwuts Pwuts added meta Meta-issue about a topic that multiple issues already exist for and removed Stale labels Sep 14, 2023
@Pwuts Pwuts changed the title [WIP] Tasking and Planning Basic Planning & Task Management Mar 4, 2024
@Pwuts Pwuts added AI efficacy function: planning AutoGPT Agent Roadmapped Issues that were spawned by roadmap items labels Mar 12, 2024
@Pwuts Pwuts changed the title Basic Planning & Task Management Implement structured planning & evaluation Mar 12, 2024
@Nichebiche

This comment was marked as spam.

@Nichebiche

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI efficacy AutoGPT Agent function: planning meta Meta-issue about a topic that multiple issues already exist for re-arch Roadmapped Issues that were spawned by roadmap items
Projects
None yet
Development

No branches or pull requests

4 participants