Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(backend) changes to improve Command-R+ behavior, plus file i/o error improvements. #1347

Merged
merged 14 commits into from Apr 27, 2024

Conversation

computer-whisperer
Copy link
Contributor

With these changes Command-R+ runs alright. The prompt changes should also help other llms.

…sandbox when resolving paths in fileio operations, add customizable timeout for bash commands, mention said timeout in llm prompt.
@computer-whisperer
Copy link
Contributor Author

The absolute path changes have been removed, instead fetching the current sandbox path when calculating paths for fileio actions. I also added a "SANDBOX_TIMEOUT" configuration that is shared with the LLM in it's prompt. This way the agent is expecting commands to timeout if they execute for too long.

…ng to delete it from the response afterwards, fixed get_working_directory for ssh_box.
@@ -169,7 +169,7 @@ def setup_user(self):

def start_ssh_session(self):
# start ssh session at the background
self.ssh = pxssh.pxssh()
self.ssh = pxssh.pxssh(echo=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingyaoww wdyt, does this fix the "weird behavior" you saw?

Copy link
Collaborator

@xingyaoww xingyaoww Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this just prevents the command from being returned. it might break the way how ssh_box.sh currently finds the response.

The issue i was seeing is that, the first time you call .ssh.before, the return value you get might not be complete: you only get a part of the output. And in the next round, when you try to issue echo $? to get the exit code, you will get the other part of the output of previous command along with the exit code, which is highly undesirable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also dramatically simplifies the response-finding logic, all it does is return the text found .before the next cmd prompt match.

Do you have an example situation that can cause the return data to extend past the next cmd prompt match? I have some unit tests that I ran locally and couldn't easily come up with such a scenario.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why I call it "weird" in the sense that it is not easily reproducible (sometime happens, sometimes works) :(
But anyway, we can go with your current logic for now and try to fix it in the future if that issue comes up again.

…r-plus-changes

# Conflicts:
#	opendevin/action/fileop.py
@computer-whisperer
Copy link
Contributor Author

Looks like the integration test failure is due to the prompt messages changing.

@li-boxuan
Copy link
Collaborator

@computer-whisperer Glad to see you fixed the integration tests after the prompt changes. Did you encounter any difficulty? Did the README doc help? Anything you think could be improved, including but not limited to doc?

@computer-whisperer
Copy link
Contributor Author

It was definitely a pain to diagnose and fix as-is. I had to get the tests running locally with a debugger before I realized what was going wrong, and I had to manually edit all of the prompt_00x.log files to match the new prompt notes. I think at the very least the error should be a lot more informative about why it can't find a prompt response at the very least.

I also don't like how making little changes to the agent prompt text can trigger the need to go hunt down and edit many different test case .log files, especially if you intend the system to be expanded with many more test sequences. Maybe a script to regenerate them all using an LLM would help? I'm not sure.

@li-boxuan
Copy link
Collaborator

li-boxuan commented Apr 26, 2024

@computer-whisperer Did you get a chance to read https://github.com/OpenDevin/OpenDevin/blob/main/tests/integration/README.md?

You should be able to do

poetry run python ./opendevin/main.py -i 10 -t "Write a shell script 'hello.sh' that prints 'hello'." -c "MonologueAgent" -d "./workspace"

and simply replace the test folder with new logs generated under logs folder. This process, will, however, be more complicated when we have more tests.

It shouldn't take more than a few minutes to do so. Manually editing prompt_00x.log files is definitely a huge pain.

@li-boxuan
Copy link
Collaborator

Initially I tried using a local vector DB to mock LLM, so that tiny changes to prompts shouldn't require any changes to test files. That didn't work well. The vector DB wasn't smart enough to retrieve the correct response.

I think going forward, when we have more tests, we need a script to run poetry run python ./opendevin/main.py -i 10 -t "<task>" -c <agent> -d "./workspace" in a batch.

@rbren
Copy link
Collaborator

rbren commented Apr 27, 2024

@li-boxuan it would be awesome if you could run something like

REGENERATE_TEST_FILES=true poetry run python ./opendevin/main.py -i 10 -t "Write a shell script 'hello.sh' that prints 'hello'." -c "MonologueAgent" -d "./workspace"

@rbren rbren enabled auto-merge (squash) April 27, 2024 11:51
@rbren rbren merged commit 44aea95 into OpenDevin:main Apr 27, 2024
7 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants