As a content editor, I want to navigate the transcription export data on GitHub so that I can find exported content by PGPID. #1124

rlskoeser · 2022-09-23T18:01:01Z

testing notes

review the revised directory structure generated by a fresh tei migration and export in this branch:
https://github.com/Princeton-CDH/test-pgp-annotations/tree/test-tei-migration2

content files are organized based on PGPID in chunks of thousands
no warnings from GitHub about truncating directory listing

question: would it be better if the numbered directories were in a labeled directory (parallel to the annotations directory), instead of all at the top level? I wasn't sure what to call it since I hope it will eventually include transcription as well as translation. Could we call it text? or content ?

per @mrustow :

OK — for me, it would be helpful if the first level was called like this:

1000
2000
3000

And the second level by the full numbers, not the last two digits.

ok, so files for PGPID 1018 would be under 1000/1018/ ?

And what about longer ids, would PGPID 36238 be 36000/36238/

That sounds reasonable enough; shouldn't be too hard to generate the paths, it will make the content easier to find and browse on GitHub (possible to browse on GitHub), and should still be a usable enough structure for anyone who wants to do computational work with the text as a corpus (or for us to generate a corpus from the text content).

Originally posted by @rlskoeser in #912 (comment)

The text was updated successfully, but these errors were encountered:

kseniaryzhova · 2022-09-28T18:51:48Z

@mrustow could you weigh in on Rebecca's question? Do we want the transcriptions to go in their own directory/folder (like annotations)? And if we do, what do we want to call it (knowing this directory will have transcriptions AND translations in the future)?

mrustow · 2022-09-28T19:18:03Z

I don't understand it :(

rlskoeser · 2022-09-28T21:01:54Z

@mrustow sorry! context:

Here's the test version of the new export layout, chunked by 1000s: https://github.com/Princeton-CDH/test-pgp-annotations/tree/test-tei-migration2

I have an annotations directory/folder (listed at the bottom) for the annotation format exports (which I don't expect you all to refer to directly), but I put the 1000s directories at the top level — which makes it kind of long. Should those go into a directory, and if so what would you call it? Right now it is the compiled transcription content but I expect translation content to be backed up in the same way eventually and think it should be included in the same location (so you can find by PGPID and then you have text files for both transcription and translation if available).

kseniaryzhova · 2022-10-03T18:19:04Z

@rlskoeser spoke with @mrustow - no need for a separate directory for transcriptions/translations - keep the organization as-is. But is it posisble to get a counter of how many files are in each of the 1000 folders, just so we get a preview of how many files are in each thousand increment?

rlskoeser · 2022-10-03T18:21:53Z

Can you say more about the preview / counter? where would you like to see this?

We could maybe put something like that in the readme, but it would get out of date unless we recalculated it regularly... (we'd have to note when it was last updated)

rlskoeser · 2022-10-04T15:16:50Z

Thanks for reviewing and signing off on the layout. I'm going to close this as accepted, but if you have ideas on where we could provide counts, LMK and I will think about how we might implement.

rlskoeser added this to the CDH/PGP end of grant year 2 milestone Sep 23, 2022

rlskoeser added the 🛠️ chore One-off task or update label Sep 23, 2022

rlskoeser changed the title ~~revise transcription export directory structure~~ As a content editor, I want to navigate the transcription export data on GitHub so that I can find exported content by PGPID. Sep 26, 2022

rlskoeser self-assigned this Sep 27, 2022

rlskoeser added a commit that referenced this issue Sep 27, 2022

Revise directory layout for transcription export #1124

0d6c841

rlskoeser added the 🗜️ awaiting testing Implemented and ready to be tested label Sep 28, 2022

rlskoeser closed this as completed Sep 28, 2022

rlskoeser reopened this Sep 28, 2022

rlskoeser closed this as completed Oct 4, 2022

rlskoeser removed 🗜️ awaiting testing Implemented and ready to be tested 🛠️ chore One-off task or update labels Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

As a content editor, I want to navigate the transcription export data on GitHub so that I can find exported content by PGPID. #1124

As a content editor, I want to navigate the transcription export data on GitHub so that I can find exported content by PGPID. #1124

rlskoeser commented Sep 23, 2022 •

edited by kseniaryzhova

kseniaryzhova commented Sep 28, 2022

mrustow commented Sep 28, 2022

rlskoeser commented Sep 28, 2022

kseniaryzhova commented Oct 3, 2022

rlskoeser commented Oct 3, 2022

rlskoeser commented Oct 4, 2022

As a content editor, I want to navigate the transcription export data on GitHub so that I can find exported content by PGPID. #1124

As a content editor, I want to navigate the transcription export data on GitHub so that I can find exported content by PGPID. #1124

Comments

rlskoeser commented Sep 23, 2022 • edited by kseniaryzhova

testing notes

kseniaryzhova commented Sep 28, 2022

mrustow commented Sep 28, 2022

rlskoeser commented Sep 28, 2022

kseniaryzhova commented Oct 3, 2022

rlskoeser commented Oct 3, 2022

rlskoeser commented Oct 4, 2022

rlskoeser commented Sep 23, 2022 •

edited by kseniaryzhova