Will ReFact also join LiveSWEBench? #796

Unanswered

BradKML asked this question in Q&A

BradKML
May 30, 2025

Since people are now accusing multiple AI tools of "test-hacking" it might be good to start testing on LiveSWEBench https://github.com/livebench/liveswebench https://www.kprize.ai/

The same logic applies to LiveBench/LiveCodeBench vs BigCodeBench/EvalPlus https://livebench.ai/ https://livecodebench.github.io/leaderboard.html

Replies: 1 comment

JegernOUTT
May 30, 2025
Collaborator

Thank you for the links.

Right now, we're evaluating the SWE-multimodal (which also has a closed test set).
After that we thought about evaluation on one of the multilingual evalset (https://www.swebench.com/multilingual.html / https://multi-swe-bench.github.io/ / https://amazon-science.github.io/SWE-PolyBench/), to debug our pipeline with other languages.

I believe, after that we can try https://github.com/livebench/liveswebench out.

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment