Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify Lambada Task #356

Closed
StellaAthena opened this issue Nov 21, 2022 · 0 comments
Closed

Clarify Lambada Task #356

StellaAthena opened this issue Nov 21, 2022 · 0 comments
Labels
bug Something isn't working. documentation Improvements or additions to documentation. help wanted Contributors and extra help welcome.

Comments

@StellaAthena
Copy link
Member

When OpenAI created GPT-2, they also created a custom, non-standard lambada evaluation dataset. OpenAI also changed the metric for evaluation by counting number of times the last BPE token is predicted incorrectly instead of the last word. This produces a huge difference in performance score, totally over 10%. They used this easier version of Lambada for evaluating GPT-2 as well. For more details, see here and here.

According to @jon-tow, we implement the openai version, not the standard version. We should implement both, and call them lambada_standard and lambada_openai respectively. In particular, we should not implement a task called lambada because years after the fact this is still causing widespread confusion and we want to force the user to pay attention to it.

@StellaAthena StellaAthena added bug Something isn't working. documentation Improvements or additions to documentation. help wanted Contributors and extra help welcome. labels Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. documentation Improvements or additions to documentation. help wanted Contributors and extra help welcome.
Projects
None yet
Development

No branches or pull requests

1 participant