-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a researcher, I want a set of 20-25 representative texts so that I can conduct controlled experiments on a sample corpus. #6
Comments
@rlskoeser @laurejt @WHaverals tagging for review. Here is the link to the Google Sheet Acceptance criteria:
|
Here are my review notes:
FYI / question:
|
@rlskoeser thank you for these helpful notes! There is in fact one excerpt from a source but not the other currently in the test set (hvd.32044050827351). I would probably filter on unique id (source id + p#) but that's tricky since we are changing that for stability. Is it easier if I swap that record out for a different one? |
@mnaydan I think let's keep it in! That's a good edge case to have in mind; the filter script won't support it properly as currently written, but there are a couple of ways to handle that - do you want that kind of filtering supported this round or as a second pass? My preference would be to use unique ids once we fix them. |
Let's support it in the second pass! Once we fix the unique ids. |
I'm going to close this since we discussed during standup it is likely good enough and already quite big in terms of number of pages. |
Include:
The text was updated successfully, but these errors were encountered: