Skip to content

adding a script to fetch and convert devin's output for evaluation#81

Merged
xingyaoww merged 5 commits into
OpenHands:mainfrom
Jiaxin-Pei:main
Mar 21, 2024
Merged

adding a script to fetch and convert devin's output for evaluation#81
xingyaoww merged 5 commits into
OpenHands:mainfrom
Jiaxin-Pei:main

Conversation

@Jiaxin-Pei

Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we put this file to SWE-Bench/scripts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about this. It seems more reasonable to keep dataset related files in the dataset folder to me. @JustinLin610 @libowen2121 any thoughts on this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh i suggest we do this: mv src/prepare_devin_outputs_for_evaluation.py scripts/prepare_devin_outputs_for_evaluation.py

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my bad, I thought we are moving it outside the evaluation folder. Will do

Comment thread evaluation/README.md Outdated
- `devin_eval_analysis.ipynb`: notebook analyzing devin's outputs
- src
- `prepare_devin_outputs_for_evaluation.py`: script fetching and converting devin's output into the desired json file for evaluation.
- outputs: two json files under `evaluation/SWE-bench/data/` that can be directly used for evaluation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you upload the post-processed file to our huggingface datasets, and add curl or wget command here so people can directly download those for debugging? You can request to join if you haven't already: https://huggingface.co/OpenDevin

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requested


with open(os.path.join(output_dir, "fail_output.json"), "w") as fail_file:
json.dump(failed_files_info, fail_file, indent=4)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm debating whether we want to make this two separate files, or just one file -- how about we merge them into one, and add an additional bool field like devin_pass?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only takes ~1 minute to fetch and process the files. The purpose of having two files is you can directly start from the passed files for pilot testing. I can generate another merged file and upload it to HF

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having both options is a good option! maybe we can add an argument in the script to switch that behavior; and we can upload both version to HF and have user decide which one they want to download

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

@xingyaoww xingyaoww left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@xingyaoww xingyaoww merged commit dc88dac into OpenHands:main Mar 21, 2024
xingyaoww added a commit to xingyaoww/OpenHands that referenced this pull request Mar 22, 2024
JustinLin610 pushed a commit that referenced this pull request Mar 22, 2024
* a starting point for SWE-Bench evaluation with docker

* fix the swe-bench uid issue

* typo fixed

* fix conda missing issue

* move files based on new PR

* Update doc and gitignore using devin prediction file from #81

* fix typo

* add a sentence

* fix typo in path

* fix path

---------

Co-authored-by: Binyuan Hui <binyuan.hby@alibaba-inc.com>
xcodebuild pushed a commit to xcodebuild/OpenDevin that referenced this pull request Mar 31, 2024
…penHands#81)

* adding code to fetch and convert devin's output for evaluation

* update README.md

* update code for fetching and processing devin's outputs

* update code for fetching and processing devin's outputs
xcodebuild pushed a commit to xcodebuild/OpenDevin that referenced this pull request Mar 31, 2024
* a starting point for SWE-Bench evaluation with docker

* fix the swe-bench uid issue

* typo fixed

* fix conda missing issue

* move files based on new PR

* Update doc and gitignore using devin prediction file from OpenHands#81

* fix typo

* add a sentence

* fix typo in path

* fix path

---------

Co-authored-by: Binyuan Hui <binyuan.hby@alibaba-inc.com>
malhotra5 pushed a commit that referenced this pull request Mar 25, 2026
Co-authored-by: openhands <openhands@all-hands.dev>
zxkane added a commit to zxkane/OpenHands that referenced this pull request May 14, 2026
Swaps DefaultUserAuth with CognitoUserAuth.
Swaps FileSettingsStore with CognitoS3SettingsStore.
Swaps FileSecretsStore with CognitoS3SecretsStore.
All for multi-tenant user isolation.

Custom modules (cognito_user_auth.py, s3_settings_store.py, s3_secrets_store.py)
are dropped into /app/openhands/app_server/{user_auth,settings,secrets}/ at
Docker build time by openhands-infra/docker/Dockerfile (PR OpenHands#81).

V1 port of v1.6.0-fargate commit 00130ab. server_config.py is V0-tagged
upstream but the settings_store_class / secret_store_class / user_auth_class
fields are still active in v1.7.0 — they drive get_impl() in shared.py to
resolve the configured V1 ABC subclasses.

Refs: zxkane/openhands-infra#81
zxkane added a commit to zxkane/OpenHands that referenced this pull request May 16, 2026
The Fargate sandbox orchestrator stamps each DynamoDB sandbox record with
USER_ID from the start request environment so OpenResty can later enforce
cross-user runtime authorization (the runtime subdomain proxy checks that
the requesting user matches the sandbox owner).

Upstream RemoteSandboxService.start_sandbox knows the user_id (it stores
it as created_by_user_id) but never forwards it into the /start environment.
Result: DDB user_id="anonymous", OpenResty ownership check is skipped,
and any authenticated user can hit any runtime URL.

Inject environment["USER_ID"] = user_id right after _init_environment
returns. Fixes the cross-user runtime denial regression observed in PR OpenHands#81
staging E2E (TC-011).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants