-
Notifications
You must be signed in to change notification settings - Fork 377
Add templates for story_cloze #443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add templates for story_cloze #443
Conversation
|
Thanks for opening this PR @zaidalyafeai ! I was initially suggesting that:
This has multiple advantages:
btw, you shouldn't push the |
|
Thanks @VictorSanh, I already opened a PR to add it to huggingface datasets but it seems because of the large number of open PRs it will take sometime to revise and merge. I was suggesting this approach in case if we need to add custom datasets quickly so that we don't need to wait for PRs to be merged to |
Oh awesome, I didn't find it initially for some reason... Yes, please do:
once this is done, i will ask you to seqio cache the dataset and upload it to the hub (i will share the command with you later) thank you! |
|
Sounds good, what about the templates, for |
|
yes, let's have the template here in this pr! |
|
@VictorSanh I followed your steps and this seems to work fine. I want to test the seqio cache script early on some test templates to make sure it works fine. |
|
Since the dataset requires manual download and is not part of hf datasets maybe it should be excluded from the build checks? |
VictorSanh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what i can saw from the data, LGTM! corrected a few minor things.
Letting @awebson validating one more time and we can cache it.
Note: I see you only did the 2016 subset. I see the recommend evaluation subset is 2018, is that a problem? (FLAN report numbers on the 2016 subset)
|
I discussed that with @awebson, the 2018 doesn't have labels for the test split only for validation. We can copy the same templates to the 2018 validation if you want to include that. Between in GPT-3 the use the 2016 subset. |
Added a custom data loader script for the offline dataset
story_cloze. Also, created a candidate template.