feat: add SWE-bench fullset support #3477

xingyaoww · 2024-08-19T19:50:55Z

What is the problem that this fixes or functionality that this introduces? Does it fix any open issues?

Give a summary of what the PR does, explaining any non-trivial design decisions

Support setting the SWE-Bench dataset and split to run evaluation using command line argument.
Update the documentation to allow people to directly run_infer.sh without running a pull instance docker first -> run_infer.sh should be able to automatically pull them.
Add script with documentation for pushing new SWE-Bench images to remote registry.

Other references

yufansong · 2024-08-19T19:55:33Z

evaluation/swe_bench/scripts/docker/all-swebench-full-instance-images.txt

QQ: does the evaluation need so many images? 🙀

Each instance per image, so yes 😢 - that's why we need a good infra to run this at scale

Fine, that is crazy.

Thanks @xingyaoww . This is exactly what I was looking for...

if args.set == 'full-test': dataset = load_dataset('princeton-nlp/SWE-bench', split='test') elif args.set == 'lite-test': dataset = load_dataset('princeton-nlp/SWE-bench_Lite', split='test')

It would be awesome if you could add 'princeton-nlp/SWE-bench', split='dev' and 'princeton-nlp/SWE-bench_Lite', split='dev') as well :)

Hey @jatinganhotra, I tried to look into this by adding dev set -- but was blocked by princeton-nlp/SWE-bench#199. Will try to add support for dev again once that issue is resolved.

Being addressed by #3478

…webench-fullset

neubig · 2024-09-03T00:28:50Z

Looks good, thanks!

xingyaoww added 2 commits August 19, 2024 19:47

feat: add SWE-bench fullset support

865d626

fix instance image list

a8e35be

yufansong reviewed Aug 19, 2024

View reviewed changes

jatinganhotra mentioned this pull request Aug 19, 2024

add SWE-bench dev set support #3477 #3478

Open

xingyaoww added 2 commits September 2, 2024 14:09

Merge commit '57ad0583b78f0f2b1d9a11027cac6d75c2c01fac' into xw/add-s…

eb7433a

…webench-fullset

update eval script and documentation

27e77ba

xingyaoww requested a review from yufansong September 2, 2024 18:18

xingyaoww marked this pull request as ready for review September 2, 2024 18:18

xingyaoww requested review from neubig and li-boxuan September 2, 2024 18:19

xingyaoww added 3 commits September 2, 2024 18:45

add push script

1b92985

handle the case when ret push is an generator

dbd3dd3

update pbar

be1aaf6

neubig approved these changes Sep 3, 2024

View reviewed changes

neubig merged commit d283420 into main Sep 3, 2024
17 checks passed

neubig deleted the xw/add-swebench-fullset branch September 3, 2024 00:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add SWE-bench fullset support #3477

feat: add SWE-bench fullset support #3477

xingyaoww commented Aug 19, 2024 •

edited

Loading

yufansong Aug 19, 2024 •

edited

Loading

xingyaoww Aug 19, 2024

yufansong Aug 19, 2024

jatinganhotra Aug 19, 2024

xingyaoww Sep 2, 2024 •

edited

Loading

jatinganhotra Sep 5, 2024

neubig commented Sep 3, 2024

feat: add SWE-bench fullset support #3477

feat: add SWE-bench fullset support #3477

Conversation

xingyaoww commented Aug 19, 2024 • edited Loading

yufansong Aug 19, 2024 • edited Loading

Choose a reason for hiding this comment

xingyaoww Aug 19, 2024

Choose a reason for hiding this comment

yufansong Aug 19, 2024

Choose a reason for hiding this comment

jatinganhotra Aug 19, 2024

Choose a reason for hiding this comment

xingyaoww Sep 2, 2024 • edited Loading

Choose a reason for hiding this comment

jatinganhotra Sep 5, 2024

Choose a reason for hiding this comment

neubig commented Sep 3, 2024

xingyaoww commented Aug 19, 2024 •

edited

Loading

yufansong Aug 19, 2024 •

edited

Loading

xingyaoww Sep 2, 2024 •

edited

Loading