Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group agg rework #1741

Merged
merged 127 commits into from
Jul 8, 2024
Merged

Group agg rework #1741

merged 127 commits into from
Jul 8, 2024

Conversation

lintangsutawika
Copy link
Contributor

@lintangsutawika lintangsutawika commented Apr 23, 2024

  1. By default, group will not feature the aggregate scores of their subtasks.
  2. To show an aggregate, a group_config will need to be defined in the yaml that consists of aggregate_metric (True/False, default False) and weight_by_size (True/False, default False).
  3. Use task_id in ConfigurableGroup and ConfigurableTask to be used as identifier in lieu of task name/group name.

@lintangsutawika lintangsutawika marked this pull request as ready for review April 25, 2024 18:05
@lintangsutawika
Copy link
Contributor Author

@haileyschoelkopf I've only added the group_config to MMLU tasks and flan_held_in. Let me know what other benchmarks that would need the aggregation to be added back in.

lm_eval/evaluator.py Outdated Show resolved Hide resolved
lm_eval/api/task.py Outdated Show resolved Hide resolved
lm_eval/api/task.py Outdated Show resolved Hide resolved
@lintangsutawika lintangsutawika mentioned this pull request Jul 3, 2024
@haileyschoelkopf
Copy link
Contributor

Test failures are due to the changed printing tests--they seem to be comparing against the old main files in tests/testdata. Merging!

@haileyschoelkopf haileyschoelkopf merged commit 517aadc into main Jul 8, 2024
4 of 9 checks passed
@haileyschoelkopf haileyschoelkopf deleted the group-agg-rework branch July 8, 2024 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants