Add MMLU downstream tasks #468

OyvindTafjord · 2024-02-28T01:12:13Z

Together with @yulinggu-cs, we added the following tasks from MMLU to the in-loop downstream evaluation suite:

Validation sets:

mmlu_stem (321 qs)
mmlu_humanities (518 qs)
mmlu_social_sciences (337 qs)
mmlu_other (355 qs)

Test sets:

mmlu_stem_test (3018 qs)
mmlu_humanities_test (4705 qs)
mmlu_social_sciences_test (3077 qs)
mmlu_other_test (3242 qs)

We also updated the metric_type for boolq and arc_challenge from "pmi_dc" to "acc" and "len_norm". These are generally correlated, although ideally "pmi_dc" would be used for several datasets once it's properly implemented. boolq should be fine with "acc" though.

Note that this adds some processing time at startup.

OyvindTafjord · 2024-02-28T01:34:17Z

There's one failing type check that I can't quite figure out, it doesn't like the dataset_name: Optional[Union[str, List[str]]] = None combined with the subsequent assignment.

dirkgr

That's all it takes? Is it because hails/mmlu_no_train already has the data in a workable format?

OyvindTafjord · 2024-02-28T19:09:50Z

The HF dataset (either hails/mmlu_no_train, or the equivalent, but larger with random training stuff, cais/mmlu) has instances in the format shown here, so just need to extract the appropriate pieces similar to other datasets (like ARC etc).

… mmlu-downstream

dirkgr · 2024-02-28T19:38:02Z

MMLU also needs to be added to the configs, right?

OyvindTafjord added 5 commits February 27, 2024 16:32

Add support for task arguments

faa0de6

Add MMLU downstream tasks

d4f311e

Update metric type for boolq and arc challenge

54adc50

Fix bug with dataset_name is None

751a15d

Fix simple bug

a13e8c5

OyvindTafjord requested a review from epwalsh February 28, 2024 01:12

OyvindTafjord added 3 commits February 27, 2024 17:22

Style fixes

5b95787

Try to make type checker happy

b308302

Update CHANGELOG.md

211de2a

Update to new label_to_task_map format

f572add

dirkgr approved these changes Feb 28, 2024

View reviewed changes

dirkgr and others added 4 commits February 28, 2024 11:10

Make mypy happy

bae59a1

Make isort happy

9b80a0f

Bruh

c074f0b

Merge branch 'mmlu-downstream' of https://github.com/allenai/LLM into…

079616b

… mmlu-downstream

OyvindTafjord merged commit 67d24f5 into main Feb 28, 2024
11 checks passed

OyvindTafjord deleted the mmlu-downstream branch February 28, 2024 21:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MMLU downstream tasks #468

Add MMLU downstream tasks #468

OyvindTafjord commented Feb 28, 2024

OyvindTafjord commented Feb 28, 2024

dirkgr left a comment

OyvindTafjord commented Feb 28, 2024

dirkgr commented Feb 28, 2024

Add MMLU downstream tasks #468

Add MMLU downstream tasks #468

Conversation

OyvindTafjord commented Feb 28, 2024

OyvindTafjord commented Feb 28, 2024

dirkgr left a comment

Choose a reason for hiding this comment

OyvindTafjord commented Feb 28, 2024

dirkgr commented Feb 28, 2024