Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
remove_all_containers.sh		remove_all_containers.sh
run.sh		run.sh
run_and_eval.sh		run_and_eval.sh
run_from_url.sh		run_from_url.sh
run_jsonl.sh		run_jsonl.sh
run_replay.sh		run_replay.sh

README.md

Scripts

This README contains documentation for the main inference script run.sh along with some miscellaneous scripts that may be helpful.

⚠️ These scripts have been written to be invoked from the root of this codebase (i.e. ./scripts/run.sh).

🏃 Inference Script

The ./run.sh script has been provided as an example of how to invoke run.py.

A single run.py call will generate a trajectory/<username>/<experiment name> folder containing the trajectories and predictions generated by a <model_name> model run on every instance in the <data_path> dataset.

The following is a comprehensive guide to using the provided run.py script, detailing available command-line arguments, their purposes, and default values. Flags that you might find helpful have been marked with a 💡.

The code and explanation of the implementations for configuration based workflows are explained in agent/.

Run python run.py --help to view this documentation on the command line.

Optional Arguments

-h, --help: Show the help message and exit.

Script Arguments

These arguments configure the script's behavior:

--instance_filter <str> 💡: Run instances that match this regex pattern. Default is .*.
--noskip_existing, --skip_existing,: [Do not] skip instances that have been completed before.
--suffix <str>: Appends a suffix to the name of the folder containing the trajectories for an experiment run.

Environment Arguments

These arguments are related to the environment configuration:

--data_path <str> 💡: Path to the data file -or- a Hugging Face dataset -or- a GitHub issue URL.
--base_commit <str>: You can specify the base commit sha to checkout. This is determined automatically for instances in SWE-bench.
--image_name <str>: Name of the Docker image to use. Default is swe-agent.
--noinstall_environment, --install_environment: [Do not] install the environment. Default is True.
--noverbose, --verbose: Enable verbose output. Default is False.
--timeout <int>: Timeout in seconds. Default is 35.
--container_name <str> 💡: Name of the Docker container if you would like to create a persistent container. Optional.

⚠️ If you specify a container name, do not run multiple instances of run.py with the same container name!

AgentArguments

Configure agent behavior:

--config_file <Path> 💡: Path to the configuration YAML file. Default is config/default.yaml.

ModelArguments

Configure model parameters:

--model_name <str> 💡: Name of the model. Default is gpt4.
--per_instance_cost_limit <float> 💡: Per-instance cost limit (interactive loop will automatically terminate when cost limit is hit). Default is 3.0.
--temperature <float> 💡: Model temperature. Default is 0.0.
--top_p <float> 💡: Top p filtering. Default is 0.95.
--total_cost_limit <float>: Total cost limit. Default is 0.0 (unlimited).

📙 Example Usage

Run with custom data path and verbose mode:

python run.py --data_path /path/to/data.json --verbose

Specify a model and adjust the temperature and top_p parameters:

python run.py --model_name gpt4 --temperature 0.2 --top_p 0.9

🛠️ Miscellaneous Scripts

remove_all_containers.sh: Forcibly removes all Docker containers currently present on the system.
run_and_eval.sh: Runs SWE-agent inference and evaluation on a specified dataset N times. You can specify the dataset_path, num_runs, template, and suffix arguments.
run_jsonl.sh: Run SWE-agent inference from a .jsonl file that contains a SWE-bench style task instance.
run_replay.sh: Run SWE-agent inference from a .traj file. This is useful for automatically creating a new demonstration for a new config from an existing sequence of actions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

README.md

README.md

remove_all_containers.sh

remove_all_containers.sh

run.sh

run.sh

run_and_eval.sh

run_and_eval.sh

run_from_url.sh

run_from_url.sh

run_jsonl.sh

run_jsonl.sh

run_replay.sh

run_replay.sh

README.md

Scripts

🏃 Inference Script

Optional Arguments

Script Arguments

Environment Arguments

AgentArguments

ModelArguments

📙 Example Usage

🛠️ Miscellaneous Scripts

Files

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

Scripts

🏃 Inference Script

Optional Arguments

Script Arguments

Environment Arguments

AgentArguments

ModelArguments

📙 Example Usage

🛠️ Miscellaneous Scripts