Bias/fairness quantitative measurements

Two goals:
- have quantitative measurements for the paper's Broader impact section
- reflect these in the model cards when we release checkpoints

4 held out evaluation sets identified by the Evaluation WG:
- jigsaw_toxicity_pred
- crows_pairs
- winogender (AXG in SuperGLUE)
- winobias

At the current state, I see
`crows_pairs` and `winobias` prompted, `jigsaw_toxicity_pred` has an opened PR (#451) we need to check, `winogender` needs to be prompted.

Workflow:
- [x] prompt those that were not prompted yet
- [x] making sure these were actually cached (@VictorSanh can have this caching step done fairly quickly)
- [x] evaluation (normal and score rank evaluation) or maybe they have some special evaluation?
- [x] when we know which checkpoints exactly to eval, get the final numbers to report

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bias/fairness quantitative measurements #477

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bias/fairness quantitative measurements #477

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions