Skip to content

Model evaluation #1908

@sanagno

Description

@sanagno

We need a better evaluation pipeline to better quantify model performance and compare models with each other.
Some ideas include

  • Evaluating on dataset for which we already have the chatGPT reference, e.g. HC3.
  • Using a fine-tuned reward model.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions