-
Notifications
You must be signed in to change notification settings - Fork 368
Closed
Description
Two goals:
- have quantitative measurements for the paper's Broader impact section
- reflect these in the model cards when we release checkpoints
4 held out evaluation sets identified by the Evaluation WG:
- jigsaw_toxicity_pred
- crows_pairs
- winogender (AXG in SuperGLUE)
- winobias
At the current state, I see
crows_pairs
and winobias
prompted, jigsaw_toxicity_pred
has an opened PR (#451) we need to check, winogender
needs to be prompted.
Workflow:
- prompt those that were not prompted yet
- making sure these were actually cached (@VictorSanh can have this caching step done fairly quickly)
- evaluation (normal and score rank evaluation) or maybe they have some special evaluation?
- when we know which checkpoints exactly to eval, get the final numbers to report
Metadata
Metadata
Assignees
Labels
No labels