-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate MultiPL-E #44
Integrate MultiPL-E #44
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made some notes, but this is overall LGTM.
I ran a small number of problems with both Python and C++.
This code pulls out code that we normally run in the MultiPL-E evaluation container. I think the easiest way to address the dependency problem is the following:
|
Yes exactly! I'll upload some code and instructions to use the container |
Re: performance issues. I have obtained the following results for Python and Java on HumanEval:
Which are pretty consistent with previously self-reported numbers (off by < 0.02). |
This implementation now matches original MultiPL-E for all scores including for pass@100 after this fix
merging the PR 🥳 |
Integration of MultiPLE HumanEval version in 18 programming languages