Regarding decontamination #1938

dsdanielpark · 2024-06-09T07:56:05Z

First, congratulations and thank you for the amazing project and the fantastic framework that is being used in almost all LLM evaluations. I appreciate you sharing it with us.

Regarding decontamination:

I wonder if data decontamination is still effective.
I am curious if there are plans or ongoing efforts to conduct strict data decontamination based on the current evaluation dataset.

Ultimately, if we strongly remove data contamination using an 8-gram(harder than 13-gram) approach through current data decontamination methods, can we ensure that the results on the current evaluation bench are free from contamination?

Many papers claim that they did not cheat or contaminate the data in their own ways, while some omit this process. Moreover, I believe this could ultimately raise issues regarding the reliability of the evaluation dataset. Although benchmark datasets can provide standardized quantitative metrics, the moment data contamination occurs, the benchmark's credibility will be significantly undermined. This might be fundamentally different from some forms of cheating in MMLU implementations. Additionally, methods that measure contamination based on the inference results of the model post hoc could also be misleading. Ultimately, a benchmark is needed for what the model has not been taught but can infer.

Is there any information that might be helpful for my brief perspective? If I use the current harness implementation for data decontamination, can I say that it sufficiently removes data contamination?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding decontamination #1938

Regarding decontamination #1938

dsdanielpark commented Jun 9, 2024

Regarding decontamination #1938

Regarding decontamination #1938

Comments

dsdanielpark commented Jun 9, 2024