-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a researcher, I want to be able to get the text corpus for a subset of record IDs so that I can conduct textual analysis within particular groups of texts. #5
Comments
Here's a screen recording of my terminal showing the corpus subset functionality: Screen.Recording.2024-04-02.at.2.34.46.PM.mov |
Should an output file be created if there are no matches? |
I guess ideally not, but I don't think there's an easy way to detect this until after consuming the generator. I guess we could check and remove it after if it's zero size. |
Testing acceptance complete. The only outstanding issue, is whether the script should produce empty output files when no matches are found (either way seems reasonable). |
Thanks for testing. Let's leave the empty file behavior as it is for now, we've both put enough time into this feature already. We can always revisit and tweak it later if we find it's causing problems. |
acceptance criteria
idfile
(one source_id per line; non-excerpt sources)idfile
with leading whitespaceidfile
with trailing whitespaceidfiles
with sources with 1 or more excerptsidfiles
with a mix of existing and non-existing sourcesThe text was updated successfully, but these errors were encountered: