Skip to content

Bias/fairness quantitative measurements #477

@VictorSanh

Description

@VictorSanh

Two goals:

  • have quantitative measurements for the paper's Broader impact section
  • reflect these in the model cards when we release checkpoints

4 held out evaluation sets identified by the Evaluation WG:

  • jigsaw_toxicity_pred
  • crows_pairs
  • winogender (AXG in SuperGLUE)
  • winobias

At the current state, I see
crows_pairs and winobias prompted, jigsaw_toxicity_pred has an opened PR (#451) we need to check, winogender needs to be prompted.

Workflow:

  • prompt those that were not prompted yet
  • making sure these were actually cached (@VictorSanh can have this caching step done fairly quickly)
  • evaluation (normal and score rank evaluation) or maybe they have some special evaluation?
  • when we know which checkpoints exactly to eval, get the final numbers to report

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions