Love the progress so far!
Will you guys test and publish the full swe-bench and the 25% subset test besides just the swe-bench lite?
On auto-code-rover repo, it says 22% on swe-bench lite and 16% on full swe-bench. However, you guys have ACR at 16% on swe-bench lite. Is that the result you guys got or a typo?
Love the progress so far!
Will you guys test and publish the full swe-bench and the 25% subset test besides just the swe-bench lite?
On auto-code-rover repo, it says 22% on swe-bench lite and 16% on full swe-bench. However, you guys have ACR at 16% on swe-bench lite. Is that the result you guys got or a typo?