diff --git a/docs/agent.md b/docs/agent.md new file mode 100644 index 0000000..9af1d8a --- /dev/null +++ b/docs/agent.md @@ -0,0 +1,44 @@ +Commit0 provides a command-line `agent` for configuring and +running AI agents to assist with code development and testing. +In this example we use [Aider](https://aider.chat/) as the +baseline code completion agent + +```bash +pip install aider +``` + +First we assume there is an underlying `commit0` +project that is configured. To create a new project, +run the commit0 `setup` command. + +```bash +commit0 setup lite +``` + +Next we need to configure the backend for the agent. +Currently we only support the aider backend. Config +can also be used to pass in arguments. + +```bash +export ANTHROPIC_API_KEY="..." +agent config aider +``` + +Finally we run the underlying agent. This will create a display +that shows the current progress of the agent. + +```bash +agent run +``` + + +### Extending +Refer to `class Agents` in `agent/agents.py`. You can design your own agent by inheriting `Agents` class and implement the `run` method. + +## Notes + + +* Aider automatically retries certain API errors. For details, see [here](https://github.com/paul-gauthier/aider/blob/75e1d519da9b328b0eca8a73ee27278f1289eadb/aider/sendchat.py#L17). +* When increasing --max-parallel-repos, be mindful of aider's [60-second retry timeout](https://github.com/paul-gauthier/aider/blob/75e1d519da9b328b0eca8a73ee27278f1289eadb/aider/sendchat.py#L39). Set this value according to your API tier to avoid RateLimitErrors stopping processes. +* Currently, agent will skip file with more than 1500 lines. See `agent/agent_utils.py#L199` for details. +* Running a full `all` commit0 split costs approximately $100 with Claude Sonnet 3.5. diff --git a/docs/api.md b/docs/api.md new file mode 100644 index 0000000..530a46b --- /dev/null +++ b/docs/api.md @@ -0,0 +1,135 @@ +## Commit0 + +Commit0 provides several commands to facilitate the process of cloning, building, testing, and evaluating repositories. Here's an overview of the available commands: + +### Setup + +Use `commit0 setup [OPTIONS] REPO_SPLIT` to clone a repository split. +Available options include: + +| Argument | Type | Description | Default | +|----------|------|-------------|---------| +| `repo_split` | str | Split of repositories to clone | | +| `--dataset-name` | str | Name of the Huggingface dataset | `wentingzhao/commit0_combined` | +| `--dataset-split` | str | Split of the Huggingface dataset | `test` | +| `--base-dir` | str | Base directory to clone repos to | `repos/` | +| `--commit0-dot-file-path` | str | Storing path for stateful commit0 configs | `.commit0.yaml` | + +### Build + +Use `commit0 build [OPTIONS]` to build the Commit0 split chosen in the Setup stage. +Available options include: + +| Argument | Type | Description | Default | +|----------|------|-------------|---------| +| `--num-workers` | int | Number of workers | `8` | +| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` | +| `--verbose` | int | Verbosity level (1 or 2) | `1` | + +### Get Tests + +Use `commit0 get-tests REPO_NAME` to get tests for a Commit0 repository. + +| Argument | Type | Description | Default | +|----------|------|-------------|---------| +| `repo_name` | str | Name of the repository to get tests for | | + +### Test + +Use `commit0 test [OPTIONS] REPO_OR_REPO_PATH [TEST_IDS]` to run tests on a Commit0 repository. +Available options include: + +| Argument | Type | Description | Default | +|----------|------|-------------|---------| +| `repo_or_repo_path` | str | Directory of the repository to test | | +| `test_ids` | str | Test IDs to run | | +| `--branch` | str | Branch to test | | +| `--backend` | str | Backend to use for testing | `modal` | +| `--timeout` | int | Timeout for tests in seconds | `1800` | +| `--num-cpus` | int | Number of CPUs to use | `1` | +| `--reference` | bool | Test the reference commit | `False` | +| `--coverage` | bool | Get coverage information | `False` | +| `--rebuild` | bool | Rebuild an image | `False` | +| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` | +| `--verbose` | int | Verbosity level (1 or 2) | `1` | +| `--stdin` | bool | Read test names from stdin | `False` | + +### Evaluate + +Use `commit0 evaluate [OPTIONS]` to evaluate the Commit0 split chosen in the Setup stage. +Available options include: + +| Argument | Type | Description | Default | +|----------|------|-------------|---------| +| `--branch` | str | Branch to evaluate | | +| `--backend` | str | Backend to use for evaluation | `modal` | +| `--timeout` | int | Timeout for evaluation in seconds | `1800` | +| `--num-cpus` | int | Number of CPUs to use | `1` | +| `--num-workers` | int | Number of workers to use | `8` | +| `--reference` | bool | Evaluate the reference commit | `False` | +| `--coverage` | bool | Get coverage information | `False` | +| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` | +| `--rebuild` | bool | Rebuild images | `False` | + +### Lint + +Use `commit0 lint [OPTIONS] REPO_OR_REPO_DIR` to lint files in a repository. +Available options include: + +| Argument | Type | Description | Default | +|----------|------|-------------|---------| +| `repo_or_repo_dir` | str | Directory of the repository to test | | +| `--files` | List[Path] | Files to lint (optional) | | +| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` | +| `--verbose` | int | Verbosity level (1 or 2) | `1` | + +### Save + +Use `commit0 save [OPTIONS] OWNER BRANCH` to save the Commit0 split to GitHub. +Available options include: + +| Argument | Type | Description | Default | +|----------|------|-------------|---------| +| `owner` | str | Owner of the repository | | +| `branch` | str | Branch to save | | +| `--github-token` | str | GitHub token for authentication | | +| `--commit0-dot-file-path` | str | Path to the commit0 dot file | `.commit0.yaml` | + +## Agent + +### Config + +Use `agent config [OPTIONS] AGENT_NAME` to set up the configuration for an agent. +Available options include: + +| Argument | Type | Description | Default | +|----------|------|-------------|---------| +| `agent_name` | str | Agent to use, we only support [aider](https://aider.chat/) for now. | `aider` | +| `--model-name` | str | LLM model to use, check [here](https://aider.chat/docs/llms.html) for all supported models. | `claude-3-5-sonnet-20240620` | +| `--use-user-prompt` | bool | Use a custom prompt instead of the default prompt. | `False` | +| `--user-prompt` | str | The prompt sent to agent. | See code for details. | +| `--run-tests` | bool | Run tests after code modifications for feedback. You need to set up `docker` or `modal` before running tests, refer to commit0 docs. | `False` | +| `--max-iteration` | int | Maximum number of agent iterations. | `3` | +| `--use-repo-info` | bool | Include the repository information. | `False` | +| `--max-repo-info-length` | int | Maximum length of the repository information to use. | `10000` | +| `--use-unit-tests-info` | bool | Include the unit tests information. | `False` | +| `--max-unit-tests-info-length` | int | Maximum length of the unit tests information to use. | `10000` | +| `--use-spec-info` | bool | Include the spec information. | `False` | +| `--max-spec-info-length` | int | Maximum length of the spec information to use. | `10000` | +| `--use-lint-info` | bool | Include the lint information. | `False` | +| `--max-lint-info-length` | int | Maximum length of the lint information to use. | `10000` | +| `--pre-commit-config-path` | str | Path to the pre-commit config file. This is needed for running `lint`. | `.pre-commit-config.yaml` | +| `--agent-config-file` | str | Path to write the agent config. | `.agent.yaml` | + +### Running + +Use `agent run [OPTIONS] BRANCH` to execute an agent on a specific branch. +Available options include: + +| Argument | Type | Description | Default | +|----------|------|-------------|---------| +| `branch` | str | Branch to run the agent on, you can specific the name of the branch | | +| `--backend` | str | Test backend to run the agent on, ignore this option if you are not adding `run_tests` option to agent. | `modal` | +| `--log-dir` | str | Log directory to store the logs. | `logs/aider` | +| `--max-parallel-repos` | int | Maximum number of repositories for agent to run in parallel. Running in sequential if set to 1. | `1` | +| `--display-repo-progress-num` | int | Number of repo progress displayed when running. | `5` | diff --git a/docs/arch.png b/docs/arch.png new file mode 100644 index 0000000..3b40853 Binary files /dev/null and b/docs/arch.png differ diff --git a/docs/baseline.md b/docs/baseline.md new file mode 100644 index 0000000..6236d34 --- /dev/null +++ b/docs/baseline.md @@ -0,0 +1,7 @@ +# Baseline + +Commit0 contains a baseline system based on +the [Aider](https://aider.chat/) code generation +system. + +... diff --git a/docs/commit0.gif b/docs/commit0.gif new file mode 100644 index 0000000..4489037 Binary files /dev/null and b/docs/commit0.gif differ diff --git a/docs/index.md b/docs/index.md index e49b2bd..24db8f3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -3,19 +3,23 @@ # -Commit-0 is a real-world AI coding challenge. -Can your agent generate a working library from commit 0? +## Overview + +Commit-0 is a from scratch AI coding challenge. +Can you create a library from commit 0? The benchmark consists of 57 core Python libraries. -Libraries are selected based on: +The challenge is to rebuild these libraries and +pass their unit tests. All libraries have: -* Significant unit-test coverage +* Significant test coverage * Detailed specification and documentation * Lint and type checking -The [commit0 tool](setup) allows you to: +Commit-0 is an interactive environment that makes it easy +to design and test new agents. You can: -* Efficiently run interactive tests in isolated environemnts +* Efficiently run tests in isolated environemnts * Distribute testing and development across cloud systems * Track and log all changes made throughout. @@ -25,6 +29,14 @@ To install run: pip install commit0 ``` +## Architecture + +![](arch.png) + + +![](commit0.gif) + +## Libraries | | Name | Repo | Commit0 | Tests | | |--|--------|-------|----|----|------| diff --git a/docs/setupdist.md b/docs/setupdist.md index e04679b..28d7c8c 100644 --- a/docs/setupdist.md +++ b/docs/setupdist.md @@ -44,3 +44,40 @@ you can commit to the branch and call with the --branch command. ```bash commit0 test simpy tests/test_event.py::test_succeed --branch my_branch ``` + +## Local Mode + +To run in local mode you first be sure that you have [docker tools](https://docs.docker.com/desktop/install/mac-install/) +installed. On Debian systems: + +```bash +apt install docker +``` + +To get started, run the `setup` command with the dataset +split that you are interested in working with. +We'll start with the `lite` split. + + +```bash +commit0 setup lite +``` + +This will install a clone the code for subset of libraries to your `repos/` directory. + +Next run the `build` command which will configure Docker containers for +each of the libraries with isolated virtual environments. The command uses the +[uv](https://github.com/astral-sh/uv) library for efficient builds. + +```bash +commit0 build +``` + +The main operation you can do with these enviroments is to run tests. +Here we run [a test](https://github.com/commit-0/simpy/blob/master/tests/test_event.py#L11) in the `simpy` library. + +```bash +commit0 test simpy tests/test_event.py::test_succeed +``` + +See [distributed setup](/setupdist) for more commands. diff --git a/docs/setuplocal.md b/docs/setuplocal.md index c238b39..32b4c4b 100644 --- a/docs/setuplocal.md +++ b/docs/setuplocal.md @@ -33,4 +33,4 @@ Here we run [a test](https://github.com/commit-0/simpy/blob/master/tests/test_ev commit0 test simpy tests/test_event.py::test_succeed ``` -See [distributed setup](setupdist) for more commands. +See [distributed setup](/setupdist) for more commands. diff --git a/docs/table.md b/docs/table.md new file mode 100644 index 0000000..94576ab --- /dev/null +++ b/docs/table.md @@ -0,0 +1,58 @@ +| | Name | Repo | Commit0 | Tests | | +|--|--------|-------|----|----|------| +| | [minitorch](https://minitorch.github.io/) | [[orig](http://github.com/minitorch/minitorch)] | [[commit0](http://github.com/commit-0/minitorch)] | 230 | | +| | [simpy](https://simpy.readthedocs.io/en/4.1.1/) | [[orig](http://github.com/wenting-zhao/simpy)] | [[commit0](http://github.com/commit-0/simpy)] | 140 | | +| | [bitstring](https://bitstring.readthedocs.io/en/stable/) | [[orig](http://github.com/scott-griffiths/bitstring)] | [[commit0](http://github.com/commit-0/bitstring)] | 834 | | +| | [tinydb](https://tinydb.readthedocs.io/_/downloads/en/v4.8.0/pdf/) | [[orig](http://github.com/msiemens/tinydb)] | [[commit0](http://github.com/commit-0/tinydb)] | 201 | | +| | [marshmallow](https://marshmallow.readthedocs.io/_/downloads/en/stable/pdf/) | [[orig](http://github.com/marshmallow-code/marshmallow)] | [[commit0](http://github.com/commit-0/marshmallow)] | 1229 | | +| | [python-prompt-toolkit](https://python-prompt-toolkit.readthedocs.io/_/downloads/en/3.0.43/pdf/) | [[orig](http://github.com/prompt-toolkit/python-prompt-toolkit)] | [[commit0](http://github.com/commit-0/python-prompt-toolkit)] | 151 | | +| | [parsel](https://parsel.readthedocs.io/_/downloads/en/latest/pdf/) | [[orig](http://github.com/scrapy/parsel)] | [[commit0](http://github.com/commit-0/parsel)] | 343 | | +| pyjwt | [pyjwt](https://pyjwt.readthedocs.io/_/downloads/en/2.8.0/pdf/) | [[orig](http://github.com/jpadilla/pyjwt)] | [[commit0](http://github.com/commit-0/pyjwt)] | 259 | | +| | [networkx](https://networkx.org/documentation/networkx-3.3/) | [[orig](http://github.com/networkx/networkx)] | [[commit0](http://github.com/commit-0/networkx)] | 5440 | | +| | [graphene](https://docs.graphene-python.org/en/stable/) | [[orig](http://github.com/graphql-python/graphene)] | [[commit0](http://github.com/commit-0/graphene)] | 447 | | +| tlslite-ng | [tlslite-ng](https://tlslite-ng.readthedocs.io/en/latest/) | [[orig](http://github.com/tlsfuzzer/tlslite-ng)] | [[commit0](http://github.com/commit-0/tlslite-ng)] | 1653 | | +| wcwidth | [wcwidth](https://wcwidth.readthedocs.io/en/stable/) | [[orig](http://github.com/jquast/wcwidth)] | [[commit0](http://github.com/commit-0/wcwidth)] | 38 | | +| chardet | [chardet](https://chardet.readthedocs.io/_/downloads/en/stable/pdf/) | [[orig](http://github.com/chardet/chardet)] | [[commit0](http://github.com/commit-0/chardet)] | 376 | | +| dnspython | [dnspython](https://dnspython.readthedocs.io/en/stable/) | [[orig](http://github.com/rthalley/dnspython)] | [[commit0](http://github.com/commit-0/dnspython)] | 1304 | | +| imapclient | [imapclient](https://imapclient.readthedocs.io/en/3.0.1/) | [[orig](http://github.com/mjs/imapclient)] | [[commit0](http://github.com/commit-0/imapclient)] | 267 | | +| | [virtualenv](https://virtualenv.pypa.io/en/20.26.3/) | [[orig](http://github.com/pypa/virtualenv)] | [[commit0](http://github.com/commit-0/virtualenv)] | 284 | | +| pexpect | [pexpect](https://pexpect.readthedocs.io/_/downloads/en/stable/pdf/) | [[orig](http://github.com/pexpect/pexpect)] | [[commit0](http://github.com/commit-0/pexpect)] | 255 | | +| | [web3.py](https://web3py.readthedocs.io/_/downloads/en/v6.20.2/pdf/) | [[orig](http://github.com/ethereum/web3.py)] | [[commit0](http://github.com/commit-0/web3.py)] | 40433 | | +| | [babel](https://babel.pocoo.org/_/downloads/en/stable/pdf/) | [[orig](http://github.com/python-babel/babel)] | [[commit0](http://github.com/commit-0/babel)] | 5663 | | +| | [geopandas](https://geopandas.org/en/stable/) | [[orig](http://github.com/geopandas/geopandas)] | [[commit0](http://github.com/commit-0/geopandas)] | 2196 | | +| dulwich | [dulwich](https://dulwich.readthedocs.io/_/downloads/en/latest/pdf/) | [[orig](http://github.com/jelmer/dulwich)] | [[commit0](http://github.com/commit-0/dulwich)] | 1522 | | +| | [flask](https://flask.palletsprojects.com/en/3.0.x/) | [[orig](http://github.com/pallets/flask)] | [[commit0](http://github.com/commit-0/flask)] | 477 | | +| voluptuous | [voluptuous](https://alecthomas.github.io/voluptuous/docs/_build/html/) | [[orig](http://github.com/alecthomas/voluptuous)] | [[commit0](http://github.com/commit-0/voluptuous)] | 149 | | +| | [jinja](https://jinja.palletsprojects.com/en/3.1.x/) | [[orig](http://github.com/pallets/jinja)] | [[commit0](http://github.com/commit-0/jinja)] | 851 | | +| | [seaborn](https://seaborn.pydata.org/) | [[orig](http://github.com/mwaskom/seaborn)] | [[commit0](http://github.com/commit-0/seaborn)] | 2362 | | +| requests | [requests](https://requests.readthedocs.io/_/downloads/en/latest/pdf/) | [[orig](http://github.com/psf/requests)] | [[commit0](http://github.com/commit-0/requests)] | 590 | | +| | [scrapy](https://docs.scrapy.org/_/downloads/en/2.11/pdf/) | [[orig](http://github.com/scrapy/scrapy)] | [[commit0](http://github.com/commit-0/scrapy)] | 2904 | | +| | [fastapi](https://fastapi.tiangolo.com/reference/) | [[orig](http://github.com/fastapi/fastapi)] | [[commit0](http://github.com/commit-0/fastapi)] | 2013 | | +| | [click](https://click.palletsprojects.com/en/8.1.x/) | [[orig](http://github.com/pallets/click)] | [[commit0](http://github.com/commit-0/click)] | 589 | | +| | [python-rsa](https://stuvel.eu/python-rsa-doc/) | [[orig](http://github.com/sybrenstuvel/python-rsa)] | [[commit0](http://github.com/commit-0/python-rsa)] | 86 | | +| | [statsmodels](https://www.statsmodels.org/stable/) | [[orig](http://github.com/statsmodels/statsmodels)] | [[commit0](http://github.com/commit-0/statsmodels)] | 17669 | | +| more-itertools | [more-itertools](https://more-itertools.readthedocs.io/en/v10.4.0/) | [[orig](http://github.com/more-itertools/more-itertools)] | [[commit0](http://github.com/commit-0/more-itertools)] | 662 | | +| | [moviepy](https://zulko.github.io/moviepy/) | [[orig](http://github.com/Zulko/moviepy)] | [[commit0](http://github.com/commit-0/moviepy)] | 109 | | +| deprecated | [deprecated](https://deprecated.readthedocs.io/en/latest/) | [[orig](http://github.com/laurent-laporte-pro/deprecated)] | [[commit0](http://github.com/commit-0/deprecated)] | 171 | | +| | [pydantic](https://docs.pydantic.dev/2.8/) | [[orig](http://github.com/pydantic/pydantic)] | [[commit0](http://github.com/commit-0/pydantic)] | 5091 | | +| | [loguru](https://loguru.readthedocs.io/_/downloads/en/0.7.2/pdf/) | [[orig](http://github.com/Delgan/loguru)] | [[commit0](http://github.com/commit-0/loguru)] | 1461 | | +| | [pypdf](https://pypdf.readthedocs.io/_/downloads/en/4.3.1/pdf/) | [[orig](http://github.com/py-pdf/pypdf)] | [[commit0](http://github.com/commit-0/pypdf)] | 911 | | +| | [attrs](https://www.attrs.org/en/24.2.0/) | [[orig](http://github.com/python-attrs/attrs)] | [[commit0](http://github.com/commit-0/attrs)] | 1414 | | +| | [mimesis](https://mimesis.name/en/v17.0.0/) | [[orig](http://github.com/lk-geimfari/mimesis)] | [[commit0](http://github.com/commit-0/mimesis)] | 6159 | | +| | [cookiecutter](https://cookiecutter.readthedocs.io/_/downloads/en/2.6.0/pdf/) | [[orig](http://github.com/cookiecutter/cookiecutter)] | [[commit0](http://github.com/commit-0/cookiecutter)] | 367 | | +| | [tornado](https://www.tornadoweb.org/_/downloads/en/stable/pdf/) | [[orig](http://github.com/tornadoweb/tornado)] | [[commit0](http://github.com/commit-0/tornado)] | 1150 | | +| | [imbalanced-learn](https://imbalanced-learn.org/stable/) | [[orig](http://github.com/scikit-learn-contrib/imbalanced-learn)] | [[commit0](http://github.com/commit-0/imbalanced-learn)] | 2310 | | +| | [python-progressbar](https://progressbar-2.readthedocs.io/_/downloads/en/stable/pdf/) | [[orig](http://github.com/wolph/python-progressbar)] | [[commit0](http://github.com/commit-0/python-progressbar)] | 385 | | +| | [PyBoy](https://docs.pyboy.dk/) | [[orig](http://github.com/Baekalfen/PyBoy)] | [[commit0](http://github.com/commit-0/PyBoy)] | 201 | | +| | [pytest](https://docs.pytest.org/_/downloads/en/8.3.x/pdf/) | [[orig](http://github.com/pytest-dev/pytest)] | [[commit0](http://github.com/commit-0/pytest)] | 3612 | | +| | [pylint](https://pylint.readthedocs.io/en/v3.2.6/) | [[orig](http://github.com/pylint-dev/pylint)] | [[commit0](http://github.com/commit-0/pylint)] | 1878 | | +| | [sphinx](https://www.sphinx-doc.org/en/master/) | [[orig](http://github.com/sphinx-doc/sphinx)] | [[commit0](http://github.com/commit-0/sphinx)] | 2187 | | +| | [joblib](https://joblib.readthedocs.io/en/stable/) | [[orig](http://github.com/joblib/joblib)] | [[commit0](http://github.com/commit-0/joblib)] | 1450 | | +| | [xarray](https://docs.xarray.dev/en/v2024.07.0/) | [[orig](http://github.com/pydata/xarray)] | [[commit0](http://github.com/commit-0/xarray)] | 15643 | | +| cachetools | [cachetools](https://cachetools.readthedocs.io/en/v5.5.0/) | [[orig](http://github.com/tkem/cachetools)] | [[commit0](http://github.com/commit-0/cachetools)] | 215 | | +| paramiko | [paramiko](https://www.paramiko.org/) | [[orig](http://github.com/paramiko/paramiko)] | [[commit0](http://github.com/commit-0/paramiko)] | 557 | | +| | [fabric](https://www.fabfile.org/) | [[orig](http://github.com/fabric/fabric)] | [[commit0](http://github.com/commit-0/fabric)] | 353 | | +| | [filesystem_spec](https://filesystem-spec.readthedocs.io/en/stable/) | [[orig](http://github.com/fsspec/filesystem_spec)] | [[commit0](http://github.com/commit-0/filesystem_spec)] | 698 | | +| jedi | [jedi](https://jedi.readthedocs.io/en/stable/) | [[orig](http://github.com/davidhalter/jedi)] | [[commit0](http://github.com/commit-0/jedi)] | 3854 | | +| sqlparse | [sqlparse](https://sqlparse.readthedocs.io/en/stable/) | [[orig](http://github.com/andialbrecht/sqlparse)] | [[commit0](http://github.com/commit-0/sqlparse)] | 461 | | +| | [portalocker](https://portalocker.readthedocs.io/en/stable/) | [[orig](http://github.com/wolph/portalocker)] | [[commit0](http://github.com/commit-0/portalocker)] | 38 | | diff --git a/mkdocs.yml b/mkdocs.yml index f6e64f8..863d5bd 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -3,10 +3,9 @@ site_name: "" docs_dir: docs nav: - Home: index.md - - Distributed: setupdist.md - - Local: setuplocal.md - - Extending: repos.md - - About: about.md + - Commit0: setupdist.md + - Agent: agent.md + - API: api.md theme: name: material logo: "logo2.webp" @@ -17,5 +16,5 @@ extra_css: markdown_extensions: - pymdownx.inlinehilite - pymdownx.snippets - - pymdownx.superfences - pymdownx.highlight + - pymdownx.superfences