CodeActAgent to delegate to BrowsingAgent #2103

li-boxuan · 2024-05-28T07:58:29Z

Let CodeActAgent delegate to BrowsingAgent for browsing tasks. With this PR, CodeActAgent could also pass the test test_browse_internet - which it couldn't before because that test requires locating a button and clicking on it.

Closes #1945

enyst

Nice! The big test for delegates? 😅

Exciting stuff. At least in theory, delegating should work better.

assertion · 2024-05-28T12:39:10Z

👍👍👍👍 Expecting this feature for a long time. Firstly I want to confirm, if this means when running CodeActAgent should not lauch a browser env? Only BrowsingAgent does this after all browse action modified to this delegate agent mode. @li-boxuan

frankxu2004

This is awesome! It's clean! I have a little bit of overall question: what would the calling agent obtain from the subtask agent's execution? Is it just the final text message from the sub-task agent (e.g. browsing agent in this case), or should the calling agent also have access to more detailed information (e.g. HTML observations in the subtask, etc.) as part of the final observation for the calling agent?

agenthub/codeact_agent/codeact_agent.py

frankxu2004 · 2024-05-28T21:02:20Z

agenthub/browsing_agent/browsing_agent.py

@@ -101,7 +103,7 @@ def step(self, state: State) -> Action:
                isinstance(prev_action, MessageAction) and prev_action.source != 'user'
            ):
                # agent has responded, task finish.
-                return AgentFinishAction()
+                return AgentFinishAction(outputs={'content': prev_action.content})


Less of a review more of a question:

Just curious, is this AgentFinishAction's outputs becoming the AgentDelegationObservation for the calling agent? What can the calling agent see after the subtask has finished, is this just this LLM-generated finishing message, or it also contains history observations, etc. I am asking because I wonder if the calling agent want to browse some websites and expect HTML as the subtask finishing result, but in this case it would be a LLM summary of it right?

is this AgentFinishAction's outputs becoming the AgentDelegationObservation for the calling agent?

Yep. It doesn't contain the history. I tried that initially and it didn't work well - CodeActAgent not capable of passing test_browse_internet test. Likely, CodeActAgent got confused by the history because it didn't know about those browsing commands issued by BrowsingAgent.

We had a discussion around this topic here: #1910 (comment) and we didn't reach a conclusion, so this might change.

I believe there are pros & cons for the parent agent to know about child agent's history. For this delegation in particular, I don't see much benefit for CodeActAgent to know about BrowsingAgent's history though.

Got it. I agree that for now we just keep it this way for simplicity - trust the child agent's ability to summarize enough useful information.

As for the use case, I could imagine if the parent want the final browser state containing the final observation, i.e., raw contents (such as html, axtree) after a series of of actions, it might be useful to provide the parent agent with the final observation, instead of only letting the LLM to summarize a message back to the parent agent. Of course it has pros and cons but that's one possible scenario where it could be needed.

Feel free to merge this! This design choice should probably be another issue.

Ideally, we can just save the state (e.g., all the histories) of the child agent (i.e., the browsing agent) and kept it there. And allows the parent delegator to ask follow-up question if needed.

Something like this from the delegator:

# In[1] browser_agent = BrowserAgent() response = browser_agent.run("Search google and tell me about OpenDevin") print(response) # Out[1] OpenDevin is a platform for autonomous software engineers, powered by AI and LLMs. Here are some key points about it: Repository Name: OpenDevin/OpenDevin Description: OpenDevin: Code Less, Make More Purpose: A platform for autonomous software engineers, powered by AI and LLMs. Key Features: OpenDevin agents collaborate with human developers to write code, fix bugs, and ship features. The platform runs inside a Docker container. Getting Started: Requires the most recent version of Docker (26.0.0). Works on Linux, Mac OS, or WSL on Windows. Commands to start the app are provided. ... # In[2] response = browser_agent.run("Can you tell me the actual command i need to use to start the APP") # The answer to this question should actually lies in the sub-task agent's observation history print(response) # Out[2] Oh sure, it is XXX

Yeah when I think more about it, I feel like we should (again) include child's history. We do need some prompting so that the parent agent doesn't get confused.

Maybe we can use #2021 to condense actions from the child browsing agent?

Yes, it's almost ready for it. It currently identifies the 'chunks' of child events, in order to (insert here: exclude them from parent stream/summarize them in parent stream).

li-boxuan · 2024-05-29T03:02:40Z

if this means when running CodeActAgent should not lauch a browser env?

@assertion No it has nothing to do with browser env. At the moment, a browser env is always launched regardless of the agent.

assertion · 2024-05-29T03:14:33Z

No it has nothing to do with browser env. At the moment, a browser env is always launched regardless of the agent.

Yes, this is true right now. I mean if browse action delegated to BrowsingAgent after this pr, starting a browserEnv for other agents like CodeActAgent is not reasonable.

Well, this is actually an optimization that can reduce the number of started processes during agent running.

li-boxuan · 2024-05-29T03:14:36Z

I mean if browse action delegated to BrowsingAgent, starting a browserEnv for other agents like CodeActAgent is not reasonable.

That's a good point. We probably could refactor how runtime and agents interact. ~~A similar problem is AgentSkills are always installed even though not every agent uses it.~~ Let me create a thread on slack so that we could discuss this more.

UPDATE: I quickly checked AgentSkills code and seems it's only installed for CodeActAgent.
UPDATE2: Created https://opendevin.slack.com/archives/C06QKSD9UBA/p1716952778225289

frankxu2004 · 2024-05-29T04:04:23Z

I mean if browse action delegated to BrowsingAgent, starting a browserEnv for other agents like CodeActAgent is not reasonable.

That's a good point. We probably could refactor how runtime and agents interact. ~~A similar problem is AgentSkills are always installed even though not every agent uses it.~~ Let me create a thread on slack so that we could discuss this more.

UPDATE: I quickly checked AgentSkills code and seems it's only installed for CodeActAgent. UPDATE2: Created https://opendevin.slack.com/archives/C06QKSD9UBA/p1716952778225289

Love this, making it optional can save a lot of resources.

…-browser

xingyaoww

LGTM! And this is exciting! Tried a random query and it worked pretty well!

One minor bug: the screenshot will NOT display when the agent is trying to predict the next action (cc @frankxu2004 - any ideas?)

Another minor issue: when it finishes, the frontend will repeat the message (one from the Browsing Agent, one from the delegator), It even got a little empty box in the middle:

neubig · 2024-06-01T15:43:09Z

This is exciting!

li-boxuan · 2024-06-02T22:27:46Z

I am putting this PR on hold so that it doesn't affect current evaluations.

li-boxuan · 2024-06-07T07:53:07Z

Discussed with @xingyaoww offline and we will put the second issue on hold:

when it finishes, the frontend will repeat the message (one from the Browsing Agent, one from the delegator), It even got a little empty box in the middle

I don't have a good way to solve it right now. If browsing action is just a subtask of the original big task, then user won't feel any redundancy, but if the user simply gives a browsing task, then indeed they would feel the agent is repeating itself. We might need to think about the interactive design better to not confuse users.

tobitege · 2024-06-07T10:55:56Z

Hello @li-boxuan
this looks like an issue right now for failing integration tests:

https://github.com/OpenDevin/OpenDevin/blame/45ce09d70ec9e46ded03df3b168f684e030c6dee/tests/integration/mock/CodeActAgent/test_browse_internet/prompt_003.log#L121

Diff:
--- /home/runner/work/OpenDevin/OpenDevin/tests/integration/mock/CodeActAgent/test_browse_internet/prompt_003.log	2024-06-07 09:06:47.574055654 +0000
+++ /tmp/tmpezug6yc1	2024-06-07 09:17:54.012024643 +0000
@@ -118,14 +118,13 @@
 	[8] heading 'The Ultimate Answer'
 	[9] paragraph ''
 		StaticText 'Click the button to reveal the answer to life, the universe, and everything.'
-	[10] button 'Click me'
+	[10] button 'Click me', clickable

Should be fixed with #2307

li-boxuan force-pushed the codeact/delegate-to-browser branch from 6f83748 to 50a3225 Compare May 28, 2024 08:00

li-boxuan mentioned this pull request May 28, 2024

Refactor agent delegation and tweak micro agents #1910

Merged

6 tasks

li-boxuan force-pushed the codeact/delegate-to-browser branch from 50a3225 to 1881874 Compare May 28, 2024 08:14

enyst approved these changes May 28, 2024

View reviewed changes

li-boxuan requested review from frankxu2004 and rbren May 28, 2024 18:25

enyst mentioned this pull request May 28, 2024

Add CodeActSWEAgent to remove browsing & github + improvements on agentskills #2105

Merged

frankxu2004 reviewed May 28, 2024

View reviewed changes

CodeActAgent: Delegate to BrowsingAgent for browsing tasks

6b7d114

li-boxuan force-pushed the codeact/delegate-to-browser branch from 1881874 to 6b7d114 Compare May 29, 2024 03:06

yufansong approved these changes May 29, 2024

View reviewed changes

li-boxuan mentioned this pull request May 29, 2024

Lazy launching BrowseEnv / making BrowseEnv optional #2120

Closed

Merge remote-tracking branch 'upstream/main' into codeact/delegate-to…

fa0a3b2

…-browser

li-boxuan changed the title ~~CodeactAgent to delegate to BrowsingAgent~~ CodeActAgent to delegate to BrowsingAgent May 30, 2024

xingyaoww reviewed May 30, 2024

View reviewed changes

li-boxuan self-assigned this May 31, 2024

li-boxuan marked this pull request as draft June 2, 2024 22:27

li-boxuan marked this pull request as ready for review June 7, 2024 07:53

li-boxuan merged commit 45ce09d into OpenDevin:main Jun 7, 2024
18 checks passed

li-boxuan deleted the codeact/delegate-to-browser branch June 7, 2024 07:53

This was referenced Jun 7, 2024

Better UI for agent delegation #2309

Open

[Bug]: Web page is not displayed when browsing agent predicts next action #2310

Closed

tobitege mentioned this pull request Jun 7, 2024

tests: more Agentskills tests; updated .gitignore #2307

Merged

yufansong mentioned this pull request Jun 7, 2024

Fix failed test_browse_internet CodeActAgent integration prompts #2318

Merged

li-boxuan mentioned this pull request Jun 8, 2024

CodeActAgent: Only delegate to BrowsingAgent as last resort #2326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeActAgent to delegate to BrowsingAgent #2103

CodeActAgent to delegate to BrowsingAgent #2103

li-boxuan commented May 28, 2024 •

edited

Loading

enyst left a comment

assertion commented May 28, 2024

frankxu2004 left a comment

frankxu2004 May 28, 2024

li-boxuan May 29, 2024

frankxu2004 May 29, 2024

xingyaoww May 30, 2024

li-boxuan Jun 7, 2024

xingyaoww Jun 7, 2024

enyst Jun 7, 2024

li-boxuan commented May 29, 2024

assertion commented May 29, 2024

li-boxuan commented May 29, 2024 •

edited

Loading

frankxu2004 commented May 29, 2024

xingyaoww left a comment •

edited

Loading

neubig commented Jun 1, 2024

li-boxuan commented Jun 2, 2024

li-boxuan commented Jun 7, 2024

tobitege commented Jun 7, 2024 •

edited

Loading

CodeActAgent to delegate to BrowsingAgent #2103

CodeActAgent to delegate to BrowsingAgent #2103

Conversation

li-boxuan commented May 28, 2024 • edited Loading

enyst left a comment

Choose a reason for hiding this comment

assertion commented May 28, 2024

frankxu2004 left a comment

Choose a reason for hiding this comment

frankxu2004 May 28, 2024

Choose a reason for hiding this comment

li-boxuan May 29, 2024

Choose a reason for hiding this comment

frankxu2004 May 29, 2024

Choose a reason for hiding this comment

xingyaoww May 30, 2024

Choose a reason for hiding this comment

li-boxuan Jun 7, 2024

Choose a reason for hiding this comment

xingyaoww Jun 7, 2024

Choose a reason for hiding this comment

enyst Jun 7, 2024

Choose a reason for hiding this comment

li-boxuan commented May 29, 2024

assertion commented May 29, 2024

li-boxuan commented May 29, 2024 • edited Loading

frankxu2004 commented May 29, 2024

xingyaoww left a comment • edited Loading

Choose a reason for hiding this comment

neubig commented Jun 1, 2024

li-boxuan commented Jun 2, 2024

li-boxuan commented Jun 7, 2024

tobitege commented Jun 7, 2024 • edited Loading

li-boxuan commented May 28, 2024 •

edited

Loading

li-boxuan commented May 29, 2024 •

edited

Loading

xingyaoww left a comment •

edited

Loading

tobitege commented Jun 7, 2024 •

edited

Loading