New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(backend) changes to improve Command-R+ behavior, plus file i/o error improvements. #1347
fix(backend) changes to improve Command-R+ behavior, plus file i/o error improvements. #1347
Conversation
…ous file IO errors, added timeout and max return token configurations for the LLM api.
7fb3b53
to
a137efe
Compare
…sandbox when resolving paths in fileio operations, add customizable timeout for bash commands, mention said timeout in llm prompt.
The absolute path changes have been removed, instead fetching the current sandbox path when calculating paths for fileio actions. I also added a "SANDBOX_TIMEOUT" configuration that is shared with the LLM in it's prompt. This way the agent is expecting commands to timeout if they execute for too long. |
…ng to delete it from the response afterwards, fixed get_working_directory for ssh_box.
@@ -169,7 +169,7 @@ def setup_user(self): | |||
|
|||
def start_ssh_session(self): | |||
# start ssh session at the background | |||
self.ssh = pxssh.pxssh() | |||
self.ssh = pxssh.pxssh(echo=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xingyaoww wdyt, does this fix the "weird behavior" you saw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this just prevents the command from being returned. it might break the way how ssh_box.sh
currently finds the response.
The issue i was seeing is that, the first time you call .ssh.before
, the return value you get might not be complete: you only get a part of the output. And in the next round, when you try to issue echo $?
to get the exit code, you will get the other part of the output of previous command along with the exit code, which is highly undesirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR also dramatically simplifies the response-finding logic, all it does is return the text found .before the next cmd prompt match.
Do you have an example situation that can cause the return data to extend past the next cmd prompt match? I have some unit tests that I ran locally and couldn't easily come up with such a scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's why I call it "weird" in the sense that it is not easily reproducible (sometime happens, sometimes works) :(
But anyway, we can go with your current logic for now and try to fix it in the future if that issue comes up again.
…r-plus-changes # Conflicts: # opendevin/action/fileop.py
Looks like the integration test failure is due to the prompt messages changing. |
@computer-whisperer Glad to see you fixed the integration tests after the prompt changes. Did you encounter any difficulty? Did the README doc help? Anything you think could be improved, including but not limited to doc? |
It was definitely a pain to diagnose and fix as-is. I had to get the tests running locally with a debugger before I realized what was going wrong, and I had to manually edit all of the prompt_00x.log files to match the new prompt notes. I think at the very least the error should be a lot more informative about why it can't find a prompt response at the very least. I also don't like how making little changes to the agent prompt text can trigger the need to go hunt down and edit many different test case .log files, especially if you intend the system to be expanded with many more test sequences. Maybe a script to regenerate them all using an LLM would help? I'm not sure. |
@computer-whisperer Did you get a chance to read https://github.com/OpenDevin/OpenDevin/blob/main/tests/integration/README.md? You should be able to do poetry run python ./opendevin/main.py -i 10 -t "Write a shell script 'hello.sh' that prints 'hello'." -c "MonologueAgent" -d "./workspace" and simply replace the test folder with new logs generated under It shouldn't take more than a few minutes to do so. Manually editing |
Initially I tried using a local vector DB to mock LLM, so that tiny changes to prompts shouldn't require any changes to test files. That didn't work well. The vector DB wasn't smart enough to retrieve the correct response. I think going forward, when we have more tests, we need a script to run |
… command-r-plus-changes # Conflicts: # agenthub/monologue_agent/utils/prompts.py # opendevin/llm/llm.py # opendevin/schema/config.py # tests/test_fileops.py
@li-boxuan it would be awesome if you could run something like
|
With these changes Command-R+ runs alright. The prompt changes should also help other llms.