Skip to content

Conversation

@jayhack
Copy link
Contributor

@jayhack jayhack commented Feb 20, 2025

Motivation

Adds a SWE Bench Harness to the codegen agent.

Content

  • Loads SWE Bench dataset
  • For each entry in the database a modal instance is created where an agent can run
  • Output of each agent is stored and tested on modal using swebench
  • documentation in readme

Contributions from:

Please check the following before marking your PR as ready for review

  • I have updated the documentation or added new documentation as needed

@jayhack jayhack requested review from a team and codegen-team as code owners February 20, 2025 21:12
Copy link
Contributor Author

@jayhack jayhack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Left a few comments

@codecov
Copy link

codecov bot commented Feb 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

Additional details and impacted files

@jemeza-codegen jemeza-codegen enabled auto-merge (squash) February 21, 2025 00:22
@jayhack jayhack assigned jayhack and jemeza-codegen and unassigned jayhack Feb 21, 2025
Copy link
Contributor

@jemeza-codegen jemeza-codegen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@jemeza-codegen jemeza-codegen merged commit 410ee85 into develop Feb 21, 2025
25 of 26 checks passed
@jemeza-codegen jemeza-codegen deleted the jmeza-swe-bench-harness branch February 21, 2025 00:33
@github-actions
Copy link
Contributor

🎉 This PR is included in version 0.29.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants