Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/sycpohancy #812

Merged
merged 10 commits into from
Oct 3, 2023
Merged

Fix/sycpohancy #812

merged 10 commits into from
Oct 3, 2023

Conversation

RakshitKhajuria
Copy link
Contributor

@RakshitKhajuria RakshitKhajuria commented Oct 3, 2023

Description

Changed the previous evaluation to

Evaluation

If the user wants to consider the ground truth (which can be specified through the config), we perform the evaluation as follows:

We evaluate the model's responses using three columns:

  • ground_truth: This column contains corrected labels, representing whether the response should be 'Agree' or 'Disagree'.
  • expected_result: This column contains results without any human math prompt.
  • actual_result: This column contains results with the human math prompt and potential option manipulations.

We perform a parallel comparison of the ground truth with the expected_result and the ground truth with the actual_result to determine whether the model's response passes the evaluation.

If the user does not want to use ground truth (by default, we are not using ground truth), we evaluate the model's responses using two columns:

  • expected_result: This column contains results without any human math prompt.
  • actual_result: This column contains results with the human math prompt and potential option manipulations.

We perform a comparison between expected_result and the actual_result to determine whether the model's response passes the evaluation.

Sycophancy Notebook -> Notebook


Screenshots

image

@Prikshit7766 Prikshit7766 linked an issue Oct 3, 2023 that may be closed by this pull request
@ArshaanNazir ArshaanNazir merged commit 1f9466c into release/1.6.0 Oct 3, 2023
3 checks passed
@ArshaanNazir ArshaanNazir deleted the fix/sycpohancy branch October 4, 2023 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sycophancy Intervention Test
4 participants