Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding decontamination #1938

Open
dsdanielpark opened this issue Jun 9, 2024 · 0 comments
Open

Regarding decontamination #1938

dsdanielpark opened this issue Jun 9, 2024 · 0 comments

Comments

@dsdanielpark
Copy link

First, congratulations and thank you for the amazing project and the fantastic framework that is being used in almost all LLM evaluations. I appreciate you sharing it with us.

Regarding decontamination:

  1. I wonder if data decontamination is still effective.
  2. I am curious if there are plans or ongoing efforts to conduct strict data decontamination based on the current evaluation dataset.

Ultimately, if we strongly remove data contamination using an 8-gram(harder than 13-gram) approach through current data decontamination methods, can we ensure that the results on the current evaluation bench are free from contamination?

Many papers claim that they did not cheat or contaminate the data in their own ways, while some omit this process. Moreover, I believe this could ultimately raise issues regarding the reliability of the evaluation dataset. Although benchmark datasets can provide standardized quantitative metrics, the moment data contamination occurs, the benchmark's credibility will be significantly undermined. This might be fundamentally different from some forms of cheating in MMLU implementations. Additionally, methods that measure contamination based on the inference results of the model post hoc could also be misleading. Ultimately, a benchmark is needed for what the model has not been taught but can infer.

Is there any information that might be helpful for my brief perspective? If I use the current harness implementation for data decontamination, can I say that it sufficiently removes data contamination?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant