Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
haileyschoelkopf committed Jun 15, 2023
1 parent dd72456 commit 3c72b24
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions README.md
@@ -1,5 +1,20 @@
# Language Model Evaluation Harness

## Notice to Users
(as of 6/15/23)
We have a revamp of the Evaluation Harness library internals staged on the `big-refactor` branch! It is far along in progress, but before we start to move the `master` branch of the repository over to this new design with a new version release, we'd like to ensure that it's been tested by outside users and there are no glaring bugs!

We’d like your help to test it out! you can help by:
1. Trying out your current workloads on the big-refactor branch, and seeing if anything breaks or is counterintuitive,
2. Porting tasks supported in the previous version of the harness to the new YAML configuration format described in https://github.com/EleutherAI/lm-evaluation-harness/blob/big-refactor/docs/new_task_guide.md !

If you choose to port a task not yet completed according to https://github.com/EleutherAI/lm-evaluation-harness/blob/big-refactor/lm_eval/tasks/README.md , then you can contribute it by opening a PR containing [Refactor] in the name with:
- A shell command to run the task in the `master` branch, and what the score is
- A shell command to run the task in your PR branch to `big-refactor`, and what the resulting score is, to show that we achieve equality between the two implementations!

Lastly, we'll no longer be accepting new feature requests beyond those that are already open to the master branch as we carry out this switch to the new version over the next week, though we will be accepting bugfixes to `master` branch and PRs to `big-refactor`! Feel free to reach out in the #lm-thunderdome channel of the EAI discord for more information!


## Overview

This project provides a unified framework to test generative language models on a large number of different evaluation tasks.
Expand Down

0 comments on commit 3c72b24

Please sign in to comment.