Implement structured planning & evaluation #4107

Boostrix · 2023-05-11T10:07:39Z

🚀 AutoGPT Roadmap - Workflow efficacy 🧠 #6964

Discussion

Before we can start managing the workflow of the agent, we have to give it some more structure. The different ways to implement this can generally be subdivided into 3 groups:

💬 Your input is appreciated!
If you think something important is missing from this comparison, please leave a comment below or bring it up on Discord!

Planning mechanism that is controlled by the agent
- a. Let the agent manage its plan and to-do list through commands
  This approach seems non-ideal for a couple of reasons:
  - Adds commands to the executive prompt, increasing its complexity
  - Requires a full step for any self-management action → adds significant overhead
- b. Add (an) output field(s) to the executive prompt through which the agent can manage its plan / to-do list
  More efficient than 1a, but unsure how well this will work:
  - Does not require extra steps just for self-management → limited additional overhead
  - This adds yet another function to the already complex prompt. Multi-purpose prompts are usually outperformed by a combination of multiple single-purpose prompts.
Planning mechanism that is part of the agent loop
- a. Extra step in the agent’s loop to evaluate progress and update the plan / to-do list.
  Good separation of concerns, but adding a sequential LLM-powered step to a loop will slow down the process.
  - Additional step = additional latency in every cycle
  - Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
- b. Parallel thread that evaluates the agent’s performance and intervenes / gives feedback when needed.
  A compromise, trading sequential (= simple to implement) evaluation for zero additional latency:
  - Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
  - Asynchronous/parallel evaluation
    - zero/low additional latency
    - eval for step i will arrive after executing step i+1 → “wasted” resources+time on cycle i+1 if evaluator decides to intervene
- c. Add output fields to the executive prompt through which the agent communicates its qualitative & quantitative assessment of the previous step.
  
  Quantitative output can be used directly to trigger an evaluation/intervention as in 2a, saving resources when this isn’t needed
- Possible addition to a. and b.: rewind the agent’s state to an earlier point in time and adjust its parameters if an approach proves unproductive.
💡 Note
A component that evaluates the agent’s performance and provides it with feedback or other input is also in a good position to manage its directives.
Planning mechanism that controls the agent
- a. Compose plan consisting of subtasks & employ the agent to complete subtasks
  - i. Contextless (planning without knowledge of the agent’s abilities)
  - ii. Contextful (planning with knowledge of the agent’s abilities)
  Implementing planning outside the agent offers interesting possibilities, and may add value regardless of the agent’s own planning capabilities and mechanism. This could be a logical next step, towards multi-agent orchestration and/or tackling problems with arbitrary complexity.

Proposal

We propose a combination of solutions [2a] and [2c] based on the comparison above.

Why

[2a] provides for a relatively simple implementation
The additional latency of [2a] can be mitigated by using faster models. With narrowly scoped prompts, this shouldn’t be a problem.
The average additional latency of [2a] is reduced by combining it with [2c], only running the evaluation/planning step when necessary
[2c] doesn’t add much complexity to the existing executive prompt

To do

Work out:
1. Once we have a list of subtasks, how do we make the agent execute it?
  e.g. do we replace the agent’s task in the prompt by the current subtask?
2. Should subtasks be specified like actionables or like sub-objectives, or both? Does it make a difference?
Sketch amended agent loop diagram
Prototype planning+evaluation prompt(s) (in a Jupyter notebook)
Implement [2a]
Work out:
1. What are useful indicators for how a task is progressing?
  - "Is your approach to the problem working out?"
    
    → if not, evaluate & adjust plan
  - "Is this the last step needed to complete the [sub]task?"
    
    → if so, verify & move on to next subtask
Implement [2c]
1. Add thought outputs corresponding to the indicators from [5.i] to OneShotPromptStrategy (AKA the executive prompt)
2. Implement mechanism to only run planning/evaluation when needed

At the very least, even without getting fancy, we should be able to get the LLM to provide a list of tasks and move those into a queue to recursively call the LLM to provide sub-tasks and work those, pushing/popping as needed (possibly involving sub-stacks, which is kinda logical to do once you think about multiple agents: #3549)

This would need to take place with certain constraints:

sequentially (for now)
requirements
constraints
mutual dependencies

The text was updated successfully, but these errors were encountered:

Boostrix · 2023-07-03T09:01:09Z

The following blog article goes into detail about our lack of planning: https://lorenzopieri.com/autogpt_fix/

Subgoaling, the creation of intermediate tasks to achieve the user defined goals, is crucial to task-directed AIs such as AutoGPT. If even just a single key subgoal is missing, the whole plan fails. The current architecture is very primitive here: just ask the LLM to break down the goal into subtasks! There is something poetic about the simplicity of this approach, which is basically “to get the solution, just ask the LLM the solution”. If you are familiar with the exploitation-exploration trade off, this is a case of going all-in on exploitation: the best plan should already be in the train data or very easy to extrapolate from it. This is rarely the case in interesting tasks, so we want to introduce the ability to explore different possible subgoals, the ability to plan. More on this later, now let’s focus on the creation of subgoals. Breaking down a goal into subgoals is an active area of research, spanning approaches such as Creative Problem Solving, Hierarchical Reinforcement Learning and Goal Reasoning. For instance using Creative Problem Solving we can tackle situations where the initial concept space (the LLM knowledge base) is insufficient to lay down all the subgoals. We can use Boden’s three levels of creativity: 1. Exploration of the universal conceptual space. 2. Combining concepts within the initial conceptual space or 3. Applying functions or transformations to the initial conceptual space.

github-actions · 2023-09-06T20:51:53Z

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

Boostrix mentioned this issue May 12, 2023

[DRAFT] additional challenges, based on github talks #4133

Draft

5 tasks

samuelbutler added the re-arch label May 13, 2023

Boostrix mentioned this issue May 16, 2023

[WIP] Implement task management system for more structured progress on goals #495

Open

2 tasks

Pwuts self-assigned this Jun 22, 2023

github-actions bot added the Stale label Sep 6, 2023

Pwuts mentioned this issue Sep 10, 2023

Auto-GPT Performance 📈 #5190

Open

Pwuts added meta Meta-issue about a topic that multiple issues already exist for and removed Stale labels Sep 14, 2023

Pwuts mentioned this issue Nov 15, 2023

[issue in need of revisiting] Retrieval Augmentation / Memory #3536

Open

19 tasks

Pwuts changed the title ~~[WIP] Tasking and Planning~~ Basic Planning & Task Management Mar 4, 2024

Pwuts added AI efficacy function: planning AutoGPT Agent Roadmapped Issues that were spawned by roadmap items labels Mar 12, 2024

Pwuts changed the title ~~Basic Planning & Task Management~~ Implement structured planning & evaluation Mar 12, 2024

This comment was marked as spam.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement structured planning & evaluation #4107

Implement structured planning & evaluation #4107

Boostrix commented May 11, 2023 •

edited by Pwuts

Motivation 🔦

Boostrix commented Jul 3, 2023

github-actions bot commented Sep 6, 2023

This comment was marked as spam.

This comment was marked as spam.

Implement structured planning & evaluation #4107

Implement structured planning & evaluation #4107

Comments

Boostrix commented May 11, 2023 • edited by Pwuts

Discussion

Proposal

Why

To do

Related

Motivation 🔦

Boostrix commented Jul 3, 2023

github-actions bot commented Sep 6, 2023

This comment was marked as spam.

This comment was marked as spam.

Boostrix commented May 11, 2023 •

edited by Pwuts