Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use LLM to analyze ML-Bench failure cases (#2399)
* add ml-bench w/o exec env * fix typos (#1956) no functional change * Refactored Logs (#1939) * [Feat] A competitive Web Browsing agent (#1856) * initial attempt at a browsing only agent * add browsing agent * update * implement agent * update * fix comments * remove unnecessary things from memory extras * update image processing --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update README.md SWE-bench score (#1959) * Update README.md SWE-bench score Our most recent results on swe-bench lite are 25%, so this updates the README accordingly. * Update * fix: llm is_local function logic error (#1961) Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> * doc: update documentation about poetry update (#1962) * add doc * Update Development.md --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * feat: add metrics related to cost for better observability (#1944) * add metrics for total_cost * make lint * refact codeact * change metrics into llm * add costs list, add into state * refactor log completion * refactor and test others * make lint * Update opendevin/core/metrics.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/llm/llm.py Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * refactor * add code --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * doc: add more cmd in unit test documentation (#1963) * --- (#1975) updated-dependencies: - dependency-name: boto3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * --- (#1976) updated-dependencies: - dependency-name: litellm dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Logging security (#1943) * update .gitignore * Rename the confusing 'INFO' style to 'DETAIL' * override str and repr * feat: api_key desensitize * feat: add SensitiveDataFilter in file handler * tweak regex, add tests * more tweaks, include other attrs * add env vars, those with equivalent config * fix tests * tests are invaluable --------- Co-authored-by: Shimada666 <649940882@qq.com> * --- (#1967) updated-dependencies: - dependency-name: react-dom dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: "@types/react-dom" dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * --- (#1968) updated-dependencies: - dependency-name: "@reduxjs/toolkit" dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * --- (#1969) updated-dependencies: - dependency-name: husky dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * --- (#1970) updated-dependencies: - dependency-name: tailwind-merge dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * --- (#1971) updated-dependencies: - dependency-name: i18next dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Refactor session management (#1810) * refactor session mgmt * defer file handling to runtime * add todo * refactor sessions a bit more * remove messages logic from FE * fix up socket handshake * refactor frontend auth a bit * first pass at redoing file explorer * implement directory suffix * fix up file tree * close agent on websocket close * remove session saving * move file refresh * remove getWorkspace * plumb path/code differently * fix build issues * fix the tests * fix npm build * add session rehydration * fix event serialization * logspam * fix user message rehydration * add get_event fn * agent state restoration * change history tracking for codeact * fix responsiveness of init * fix lint * lint * delint * fix prop * update tests * logspam * lint * fix test * revert codeact * change fileService to use API * fix up session loading * delint * delint * fix integration tests * revert test * fix up access to options endpoints * fix initial files load * delint * fix file initialization * fix mock server * fixl int * fix auth for html * Update frontend/src/i18n/translation.json Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * refactor sessions and sockets * avoid reinitializing the same session * fix reconnect issue * change up intro message * more guards on reinit * rename agent_session * delint * fix a bunch of tests * delint * fix last test * remove code editor context * fix build * fix any * fix dot notation * Update frontend/src/services/api.ts Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * fix up error handling * Update opendevin/server/session/agent.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/server/session/agent.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update frontend/src/services/session.ts Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * fix build errs * fix else * add closed state * delint * Update opendevin/server/session/session.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * fix #1960 (#1964) * Add ruff for shared mutable defaults (B) (#1938) * Add ruff for shared mutable defaults (B) * Apply B006, B008 on current files, except fast API * Update agenthub/SWE_agent/prompts.py Co-authored-by: Graham Neubig <neubig@gmail.com> * fix unintended behavior change * this is correct, tell Ruff to leave it alone --------- Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated (#1888) * Add MacOS to integration tests * Switch back to python 3.11 * Install Docker for macos pipeline * regenerate.sh: Use environmental variable for sandbox type * Pack different agents' tests into a single check * Fix CodeAct tests * Reduce file match and extensive debug logs * Add TEST_IN_CI mode that reports codecov * Small fix: don't quit if reusing old responses failed * Merge codecov results * Fix typos * Remove coverage merge step - codecov automatically does that * Make mac integration tests as optional - too slow * Fix codecov args * Add comments in yaml * Include sandbox type in codecov report name * Fix codecov report merge * Revert renaming of test_matrix_success * Remove SWEAgent and PlannerAgent from tests * Mark planner agent and SWE agent as deprecated * CodeCov: Ignore planner and sweagent * Revert "Remove SWEAgent and PlannerAgent from tests" This reverts commit 040cb3b. * Remove all tests for SWE Agent * Only keep basic tests for MonologueAgent and PlannerAgent * Mark SWE Agent as deprecated, and ignore code coverage for it --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Fix Repeated Responses in Chat by Adding IPythonRunCellObservation (#1987) Co-authored-by: jianghongwei <jianghongwei@58.com> Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> * Save CI cycles for backend tests (#1985) * Fix typo in prompt (#1992) * Refactor monologue and SWE agent to use the messages in state history (#1863) * Refactor monologue to use the messages in state history * add messages, clean up * fix monologue * update integration tests * move private method * update SWE agent to use the history from State * integration tests for SWE agent * rename monologue to initial_thoughts, since that is what it is * fix: catch session file not existed exception when init EventStream(maybe creating a new session with no session files stored). (#1994) * add ml-bench in readme * Bump boto3 from 1.34.110 to 1.34.111 (#2001) Bumps [boto3](https://github.com/boto/boto3) from 1.34.110 to 1.34.111. - [Release notes](https://github.com/boto/boto3/releases) - [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst) - [Commits](boto/boto3@1.34.110...1.34.111) --- updated-dependencies: - dependency-name: boto3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump docker from 7.0.0 to 7.1.0 (#2002) Bumps [docker](https://github.com/docker/docker-py) from 7.0.0 to 7.1.0. - [Release notes](https://github.com/docker/docker-py/releases) - [Commits](docker/docker-py@7.0.0...7.1.0) --- updated-dependencies: - dependency-name: docker dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump litellm from 1.37.20 to 1.38.0 (#2005) Bumps [litellm](https://github.com/BerriAI/litellm) from 1.37.20 to 1.38.0. - [Release notes](https://github.com/BerriAI/litellm/releases) - [Commits](BerriAI/litellm@v1.37.20...v1.38.0) --- updated-dependencies: - dependency-name: litellm dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix SWE-Bench evaluation due to setuptools version (#1995) * correctly setup plugins for swebench eval * bump swe-bench version and add logging * Revert "correctly setup plugins for swebench eval" This reverts commit 2bd1055. * bump version * fix session state after resuming (#1999) * fix state resuming * fix session reconnection * fix lint * Implement `agentskills` for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941) * add draft for skills * Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file * Remove new_sample.txt file * add some work from opendevin w/ fixes * Add unit tests for agentskills module * fix some issues and updated tests * add more tests for open * tweak and handle goto_line * add tests for some edge cases * add tests for scrolling * add tests for edit * add tests for search_dir * update tests to use pytest * use pytest --forked to avoid file op unit tests to interfere with each other via global var * update doc based on swe agent tool * update and add tests for find_file and search_file * move agent_skills to plugins * add agentskills as plugin and docs * add agentskill to ssh box and fix sandbox integration * remove extra returns in doc * add agentskills to initial tool for jupyter * support re-init jupyter kernel (for agentskills) after restart * fix print window's issue with indentation and add testcases * add prompt for codeact with the newest edit primitives * modify the way line number is presented (remove leading space) * change prompt to the newest display format * support tracking of costs via metrics * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * implement and add tests for py linting * remove extra text arg for incompatible subprocess ver * remove sample.txt * update test_edits integration tests * fix all integration * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/runtime/plugins/agent_skills/agentskills.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * correctly setup plugins for swebench eval * bump swe-bench version and add logging * correctly setup plugins for swebench eval * bump swe-bench version and add logging * Revert "correctly setup plugins for swebench eval" This reverts commit 2bd1055. * bump version * remove _AGENT_SKILLS_DOCS * move flake8 to test dep * update poetry.lock * remove extra arg * reduce max iter for eval * update poetry * fix integration tests --------- Co-authored-by: OpenDevin <opendevin@opendevin.ai> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * build: Add poetry command to use Python 3.11 for environment setup (#1972) * Bump @react-types/shared from 3.23.0 to 3.23.1 in /frontend (#2006) Bumps [@react-types/shared](https://github.com/adobe/react-spectrum) from 3.23.0 to 3.23.1. - [Release notes](https://github.com/adobe/react-spectrum/releases) - [Commits](https://github.com/adobe/react-spectrum/compare/@react-types/shared@3.23.0...@react-types/shared@3.23.1) --- updated-dependencies: - dependency-name: "@react-types/shared" dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump @types/react-syntax-highlighter in /frontend (#2007) Bumps [@types/react-syntax-highlighter](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-syntax-highlighter) from 15.5.11 to 15.5.13. - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-syntax-highlighter) --- updated-dependencies: - dependency-name: "@types/react-syntax-highlighter" dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump @typescript-eslint/parser from 7.9.0 to 7.10.0 in /frontend (#2008) Bumps [@typescript-eslint/parser](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/parser) from 7.9.0 to 7.10.0. - [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases) - [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/parser/CHANGELOG.md) - [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v7.10.0/packages/parser) --- updated-dependencies: - dependency-name: "@typescript-eslint/parser" dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump lint-staged from 15.2.2 to 15.2.4 in /frontend (#2009) Bumps [lint-staged](https://github.com/okonet/lint-staged) from 15.2.2 to 15.2.4. - [Release notes](https://github.com/okonet/lint-staged/releases) - [Changelog](https://github.com/lint-staged/lint-staged/blob/master/CHANGELOG.md) - [Commits](lint-staged/lint-staged@v15.2.2...v15.2.4) --- updated-dependencies: - dependency-name: lint-staged dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update README.md * Update README.md * add run_infer.sh * fix input output * fix docker sandbox * fix run * update and clean run_infer.py * add script to clean up dockers * update repo uid * add description * new * Update README.md * use root for sandbox * update readme * update ml-bench conda env * update readme * update readme * use try except * modify raise exception * add int * update README * longer time * fix existing issues * fix existing issue * new docker image * add metrics of cost * add result parsing cost * fix * fix * update summarize * fix * add analyze * update readme * use 4o * add eval output --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-157.ec2.internal> Co-authored-by: RainRat <rainrat78@yahoo.ca> Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> Co-authored-by: Frank Xu <frankxu2004@gmail.com> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Shimada666 <649940882@qq.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: Rahul Anand <62982824+zeul22@users.noreply.github.com> Co-authored-by: jiangleo <jiangleo@users.noreply.github.com> Co-authored-by: jianghongwei <jianghongwei@58.com> Co-authored-by: Jeremi Joslin <jeremi@newlogic.com> Co-authored-by: Aaron Xia <zhhuaxia@gmail.com> Co-authored-by: OpenDevin <opendevin@opendevin.ai> Co-authored-by: DaxServer <7479937+DaxServer@users.noreply.github.com> Co-authored-by: Robert <871607149@qq.com>
- Loading branch information