-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a content editor, I want to navigate the transcription export data on GitHub so that I can find exported content by PGPID. #1124
Comments
@mrustow could you weigh in on Rebecca's question? Do we want the transcriptions to go in their own directory/folder (like annotations)? And if we do, what do we want to call it (knowing this directory will have transcriptions AND translations in the future)? |
I don't understand it :( |
@mrustow sorry! context: Here's the test version of the new export layout, chunked by 1000s: https://github.com/Princeton-CDH/test-pgp-annotations/tree/test-tei-migration2 I have an |
@rlskoeser spoke with @mrustow - no need for a separate directory for transcriptions/translations - keep the organization as-is. But is it posisble to get a counter of how many files are in each of the 1000 folders, just so we get a preview of how many files are in each thousand increment? |
Can you say more about the preview / counter? where would you like to see this? We could maybe put something like that in the readme, but it would get out of date unless we recalculated it regularly... (we'd have to note when it was last updated) |
Thanks for reviewing and signing off on the layout. I'm going to close this as accepted, but if you have ideas on where we could provide counts, LMK and I will think about how we might implement. |
testing notes
review the revised directory structure generated by a fresh tei migration and export in this branch:
https://github.com/Princeton-CDH/test-pgp-annotations/tree/test-tei-migration2
question: would it be better if the numbered directories were in a labeled directory (parallel to the annotations directory), instead of all at the top level? I wasn't sure what to call it since I hope it will eventually include transcription as well as translation. Could we call it
text
? orcontent
?per @mrustow :
And the second level by the full numbers, not the last two digits.
ok, so files for PGPID 1018 would be under
1000/1018/
?And what about longer ids, would PGPID 36238 be
36000/36238/
That sounds reasonable enough; shouldn't be too hard to generate the paths, it will make the content easier to find and browse on GitHub (possible to browse on GitHub), and should still be a usable enough structure for anyone who wants to do computational work with the text as a corpus (or for us to generate a corpus from the text content).
Originally posted by @rlskoeser in #912 (comment)
The text was updated successfully, but these errors were encountered: