-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As an admin, I want TEI transcription content regularly synchronized to the new database so that transcriptions are updated with changes in the current system. #321
Comments
Hello @richmanrachel @mrustow — I'm finally circling back to working on the transcription synchronization we planned for the fall MVP, and I have a number of questions. Here is the summary output the script is generating with totals for various categories:
My approach for now is to synchronize the transcription content from the TEI into the content field in the existing footnote; from the numbers above, you'll see there are some places where that's causing me some trouble. I need help investigating and resolving some of these. notes
questions
problems
What's the best way to get help investigating these? I can generate some lists; maybe multiple separate lists for each different kinds of problems? |
A lower priority question, about formatting. The TEI includes a number of
|
Ack, sorry — I have yet another question: I had been planning to have this tei synchronization script generate admin log entries on footnote records when it updates the transcription content, but now am questioning the usefulness of that, since this is an interim solution for transcriptions. Any opinions on whether this is valuable to document or would clutter the database needlessly? |
@rlskoeser - thanks for the research you did and the good questions. I will set aside most of my meeting with Marina on Wednesday morning to investigate, and put a marker in the agenda to bring it up the discussion points as well! |
@richmanrachel that sounds great. Should I go ahead and create a list of PGPIDs and xml files for reference/investigation? I should be able to do that sometime today so you'd have it available for your meeting tomorrow. |
@rlskoeser - that would be amazing. Thanks a million! |
Here are some lists. Hope this is helpful for investigating! Documents with multiple editions9121: 9121.xml, 5299.xml Documents with no edition footnote2855: 2855.xml Empty TEI files1596.xml Documents not found in database(displaying pgpid from TEI in case it differs) 9082.xml: 9082 tei with columns(markup contains 4523.xml |
@richmanrachel when you discuss with Marina, please also discuss how much of this we need to handle for a first-pass implementation; I'd like to get the transcription sync out for testing so we can build out the functionality that depends on it — I think we should be able to proceed with that while we work on resolving these problem. |
@rlskoeser - sounds good. Thank you so much! |
|
oh, sorry, that was unclear! "footnote with content" means the footnote that we attached the initial transcription text to when we did the spreadsheet import. (I think we got some of those wrong but they have been cleaned up / consolidated?) |
|
|
|
yes, absolutely! let's plan to talk through whatever is confusing or can't be resolved asynchronously |
|
|
Ah, sorry — rendition; it's an attribute that's usually used to indicate formatting |
|
|
Output from running the script in qa:
|
Looks great! Closing :) |
testing notes
Until we have a new solution for managing and editing transcriptions, we need to use the existing TEI and make sure the new database is pulling in updates regularly.
dev notes
We need a new management command that can be configured to run as a nightly cron job.
add and document settings for path to local copy of TEI git repo
clone/pull any changes from the TEI git repo to local copy
match TEI to the correct footnote based on PGPID and source note; needs to handle old PGPIDs for transcriptions associated with merged documents. Matching up properly may require some data cleanup in the source notes. (for initial implementation, handle simple cases only!)
convert TEI to html that can be used in an IIIF Annotation List; should include labels for any blocks and line numbers in the TEI. Adapt from prototype code https://github.com/Princeton-CDH/geniza/blob/experiment/search/scripts/tei_transcriptions.py
update transcription content associated with the footnote
add a django admin log entry if the transcription has changed (would be nice to use git log entry details here, but probably not be worth the effort, given that this is an interim solution)
script should include reporting to help with concerns raised in Numbers of transcriptions aren't populating correctly into the admin site. #295 — include total number of transcription files, documents with transcriptions, number of fragments, and how how many joins
update document detail admin to handle the new format
The text was updated successfully, but these errors were encountered: