Added model answer generation and evaluation for TruthfulQA. #132

TheRootOf3 · 2024-08-26T14:23:37Z

This PR introduces evaluation of model responses for TruthfulQA in a similar manner to the beavertails eval.
Furthermore, it adds additional metrics (such as avg. model response length, empty response ratio, etc.) for both the beavterails and TruthfulQA evals.
Finally, a new metric is added which measures for how many prompts the prompt has been repeated in an answer. This is particularly useful for a better understanding of the high flagged ratio of beavertails answers.

Closes #123.

Signed-off-by: TheRootOf3 <aceszablewski@gmail.com>

Willmish

LGTM, some repetition introduced but fine in this instance as agreed on call

TheRootOf3 added 3 commits August 24, 2024 00:52

Added truthfulQA answer generation code.

49ae837

Signed-off-by: TheRootOf3 <aceszablewski@gmail.com>

Added evaluation scripts for truthfulQA generations.

c3cfb62

Signed-off-by: TheRootOf3 <aceszablewski@gmail.com>

Modified evaluation scripts for beavertails generations.

20bf06b

Signed-off-by: TheRootOf3 <aceszablewski@gmail.com>

TheRootOf3 requested review from Willmish and Adamliu1 August 26, 2024 14:23

Willmish approved these changes Aug 29, 2024

View reviewed changes

TheRootOf3 merged commit f9b6c52 into main Aug 29, 2024
2 checks passed

TheRootOf3 deleted the andrzej/truthfulqa-generations branch August 29, 2024 20:10

Provide feedback