Integrate MultiPL-E #44

loubnabnl · 2023-02-08T00:10:09Z

Integration of MultiPLE HumanEval version in 18 programming languages

lm_eval/tasks/custom_metrics/multiple_metrics/libeval.py

arjunguha

I've made some notes, but this is overall LGTM.

I ran a small number of problems with both Python and C++.

lm_eval/tasks/multiple.py

lm_eval/tasks/custom_metrics/multiple_metrics/evaluation.py

lm_eval/tasks/multiple.py

lm_eval/tasks/custom_metrics/multiple_metrics/evaluation.py

lm_eval/tasks/multiple.py

arjunguha · 2023-02-10T13:26:48Z

This code pulls out code that we normally run in the MultiPL-E evaluation container.

I think the easiest way to address the dependency problem is the following:

Tell a user "you had better have dependencies installed!"
We can give them a container with both the PL toochains installed and the eval-harness dependencies, along with some instructions on how to run commands in a container.

loubnabnl · 2023-02-15T09:41:28Z

Yes exactly! I'll upload some code and instructions to use the container

…ions

ytzi · 2023-03-31T19:29:36Z

Re: performance issues.

I have obtained the following results for Python and Java on HumanEval:

Python:
pass@1: 0.181 // temp 0.2
pass@10: 0.284 // temp 0.8
pass@100: 0.466 // temp 0.8
Java:
pass@1: 0.143, // temp 0.2
pass@10: 0.252, // temp 0.8
pass@100: 0.416 // temp 0.8

Which are pretty consistent with previously self-reported numbers (off by < 0.02).

loubnabnl · 2023-04-22T16:31:47Z

This implementation now matches original MultiPL-E for all scores including for pass@100 after this fix

{
  "multiple-py": {
    "pass@10": 0.29917045146858745,
    "pass@100": 0.4996997700167089
  },
  "config": {
    "model": "bigcode/santacoder",
    "temperature": 0.8,
    "n_samples": 200
  }
}

merging the PR 🥳

loubnabnl added 10 commits February 6, 2023 17:52

add data

a9bb0d0

add ,

4add04b

intermediate fixes

ef5b68b

remove some prints and fix completions

2297919

use stop words specific to languages

d32ceb2

import datasets

4234de9

fix issue with loading results from any other run

1b2914e

reformat code

f61a127

remove unnecessary prints

d718f2a

remove unecessary file

ba23561

arjunguha reviewed Feb 10, 2023

View reviewed changes

lm_eval/tasks/custom_metrics/multiple_metrics/libeval.py Show resolved Hide resolved

arjunguha reviewed Feb 10, 2023

View reviewed changes

remove condition on empty completions

09d0f2c

harm-devries added this to the Integrate MultiPL-E in eval-harness milestone Feb 21, 2023

loubnabnl and others added 10 commits February 27, 2023 11:00

use multiple postprocessing and remove prints

b86e265

do test of length generatiosn vs nsamples to before returning generat…

02cd608

…ions

remove unecessary file

0d737b1

fix typing

41a5cca

only return pass@k for n<nsamples

423bb7e

remove code for loading existing results

bd8252f

Merge branch 'bigcode-project:main' into integrate_multipl-e

71c34cd

add dockerfiles

63826a2

add docker instructions

390a162

Update README.md

e69d559

loubnabnl marked this pull request as ready for review March 14, 2023 16:24

loubnabnl added 2 commits March 14, 2023 17:25

Merge branch 'main' into integrate_multipl-e

7374616

fix CI

7bf1977

Update dockerfile path

e269620

loubnabnl mentioned this pull request Apr 21, 2023

Commit / Edit / Diff models & their evaluation #47

Closed

Merge branch 'main' into integrate_multipl-e

26c38c0

loubnabnl added 2 commits April 22, 2023 16:33

fix save generation path arg

f2ac68e

reformat code

afc0c69

loubnabnl merged commit 3ad3b8d into bigcode-project:main Apr 22, 2023
1 check passed

loubnabnl mentioned this pull request Apr 30, 2023

MultiPL-E Integration #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate MultiPL-E #44

Integrate MultiPL-E #44

loubnabnl commented Feb 8, 2023 •

edited

Loading

arjunguha left a comment

arjunguha commented Feb 10, 2023

loubnabnl commented Feb 15, 2023

ytzi commented Mar 31, 2023

loubnabnl commented Apr 22, 2023 •

edited

Loading

Integrate MultiPL-E #44

Integrate MultiPL-E #44

Conversation

loubnabnl commented Feb 8, 2023 • edited Loading

arjunguha left a comment

Choose a reason for hiding this comment

arjunguha commented Feb 10, 2023

loubnabnl commented Feb 15, 2023

ytzi commented Mar 31, 2023

loubnabnl commented Apr 22, 2023 • edited Loading

loubnabnl commented Feb 8, 2023 •

edited

Loading

loubnabnl commented Apr 22, 2023 •

edited

Loading