-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fall 2019 Publication Thread #27
Comments
I would like to add "copyist" or "scribe" to the metadata for the relevant texts in Marcion copied by Victor son of Mercurius (Onophrius, Cyrus, possibly others). See Layton's catalog https://www.dropbox.com/s/s7gdapyphgpc3mb/pLondCopt%20II%20%28Layton%29.pdf?dl=0. Everyone please let me know ASAP if you have any objections. |
No objection.
…Sent from my iPhone
On Sep 16, 2019, at 1:15 PM, Caroline T. Schroeder ***@***.***> wrote:
I would like to add "copyist" or "scribe" to the metadata for the relevant texts in Marcion copied by Victor son of Mercurius (Onophrius, Cyrus, possibly others). See Layton's catalog https://www.dropbox.com/s/s7gdapyphgpc3mb/pLondCopt%20II%20%28Layton%29.pdf?dl=0. Everyone please let me know ASAP if you have any objections.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Also the list of materials for publication is now set. I am going to check to see if someone can review my doc from Johannes. Can everyone working on docs for this round of publication be sure that the docs are appropriately tagged in GitDox as "review"? |
Adding scribe sounds fine to me (sounds better than copyist for my ears, but I'm fine with either) I have reviewed all of the re-release documents (Shenoute, AP, Besa) and made sure the version is 3.0.0 + dated if they have been edited since last release, so those should be all good. The new Budge materials are either recent additions to the treebank (onno1, cyrus1, ephraim, respose), or they have been checked by either Lance or me, so I think they are OK as 'checked' and only need metadata review, no linguistic review necessary at this point. Now assigned for (metadata) review to @ctschroeder :
The following are assigned to others for sentences/translation, but also need metadata review:
Thanks! |
thx so much @amir-zeldes! I will deal with Proclus and Victor when others are done with them. Whoever's working on them can assign them to me when done. |
Greetings @amir-zeldes @lancealanmartin @eplatte @bkrawiec @cluckmarq. I'm working on URNs for the Marcion material. Part of the CTS URN is the "text group" and part is the "edition." A few questions have arisen. Apologies for the long post! Replies requested by the end of the week if humanly possible. This comment contains a fair bit of info in regular text with precise questions in bold. This is PART 1. There may be a PART 2 as I work through the other texts. For texts with known, identified authors, the "group" is the author. So urn:cts:copticLit:besa.aphthonia.monbba refers to the text "Letter to Aphthonia" in the text group "Writings of Besa" in the edition manuscript MONB.BA. For the material edited by Budge in Marcion we have two questions: What "text group" to designate and what "edition"?
Thank you!! Possibly more tomorrow on other Marcion texts/works. |
I like lives for the text group for Onophrius and Cyril, and Ephrem for the spelling and psephrem for the text group for the epistle. Budge also makes sense for the edition. |
Agreed on lives, budge and adding a pseudo prefix. For the spelling of Ephraim, I feel like we've been using mostly Latin spellings for some reason (onnophrius with 'u', cyrus with 'cy' and 'u'), so something like ephraem or ephrem seems more consistent than 'ai'. Whatever makes more sense as the 'Latin' form I would say. |
OK, auto sentence spans are now added to paths. Some things to note:
This all means we can now have the analytic vis for Paths. Note that because we do not have chapters, and the p tags (which seem fairly random) do not coincide with auto-sentences, we do not have a verses view for this data at the moment. |
@amir-zeldes I'll take a look this week about the chapters in PATHS. It sounds like other than that and the metadata, they are done? We are talking about Paul of Tamma, Phib, Aphou, Longinus and Luke (or no Longinus and Luke -- https://github.com/paths-erc/coptic-texts/blob/master/cc0418.xml). Thanks. |
Yes, since we're releasing this as auto NLP they are basically done. If you want to do chapters let me know, but time is getting short - if so, they should properly nest 'translation' so we can do the blockified (non-numbered) verses view. Thanks! And I think it is Longinus and Luke, the TEI header there is incorrectly copy-pasted from another file, right? |
Yes re Longinus and Luke. Re chapters: part of the issue is the document URN usually includes the chapters, but we can skip that and just use the edition namespace as the end. Am wondering if the edition should be "CMCL" since it's taken from Tito Orlandi's editions (see for example this referenced in the paths header for Paul of Tamma) http://www.cmcl.it/~cmcl/paolotamma1.PDF |
or should the edition be "paths"? I think this is the best strategy, actually. Something like urn:cts:copticLit:lives.pauloftamma.cmcl or urn:cts:copticLit:lives.pauloftamma.paths |
I also think it should be paths, since it includes paths annotations (e.g. their entity schema) and we don't actually know what processing steps happened between CMCL and their version. Saying it's paths is the simplest statement, and Paths's provenance from CMCL is something that should be described by Paths IMO |
Bingo |
I can add PATHS as the edition. What should the collection be? |
hello, @amir-zeldes. Johannes.canons is ready for viz check; any documents with to_publish or review status. Beth is reviewing the doc needing review. Thanks so much! |
version_date (and _n) has a validation, so that should get automatically flagged if someone used the wrong format. According to a SQL query on the database, there are now no longer any documents with 'Liz', so that should be fine, but yes let's remember to always do full names! Treebanking info:
Of these, everything was already in corpus metadata, except the only missing one I found was 1Cor, which had no corpus metadata. I copied it over from Mark and added all of the treebankers + Carrie, but I'm not sure who else has added 1Cor without treebanking (that's just who I'm seeing in the documents). Feel free to add if you know someone else! |
Thank you Amir! (I don't believe corpus metadata errors crop up in validation.) I will check 1 Cor annotators. |
I did entity annotation for the first three chapters of both 1 Cor and Mark as well as shenoute.fox. Should I add my name to these docs? |
Yes @lancealanmartin please add your name to any document you edited, and then also add it to the corpus metadatum for annotation. Giving full credit to everyone is a major principle of ours!! Most documents have the primary annotator first, subsequent annotators in the middle, and the senior editor(s) who reviewed the document (usually Amir or me, sometimes Beth) as the last name. |
I have no issues with adding Lance to those documents, as entity annotations will one day be released, but just to clarify, those entity annotations are not currently available in the online corpora. As for annotator order: I'm embarrassed to say I seem to have had this wrong. I think anything where I added the names I did alphabetically by last name... Since Carrie and I are alphabetically relatively high, this may often match the pattern Carrie is mentioning, but anything I added annotation/translation to is probably just alphabetic. Also, in the repo interface, these things get split up and are findable separately no matter the order they are listed in inside the field. |
No worries. I think order primarily a big deal for manually edited documents rather than the automated ones and especially by junior folks; I try to keep an eye out for this during publication. |
@amir-zeldes the Marcion corpora are ready and should be frozen. Marcion corpora that are also in the gold treebank corpora will need metadata updated for the treebank files. TY! |
Hi @amir-zeldes I'm almost done with the johannes corpus -- checking visualizations, and I noticed that the new document is not in ANNIS. I see that there are 8 docs in the private instance and in the public one. I checked and FA215-224 is missing from the private instance. Thank you! |
Got it. Try again now |
Oh goodness that was a doozy. I think due to the page layer being labeled pb_n instead of pb_xml_id. I hope that fixed it. Also I am really sick (v sore throat) and so while Johannes is done the rest will have to wait for tomorrow. |
Oh no, it's been going around here too. Feel better! New version with on fix is already online. |
Johannes is good to go! |
Thanks - right now TEI is not validating due to having chapter_n but not verse_n. We could revert it to 'p' mode, without chapters, but is there a reason the verses are 'ignore:'ed? |
Hi. Are we talking about Johannes or everything? For Johannes they’re ignored because I started and didn’t finish once we decided we didn’t need verse numbers for this release.
Re TEI this must be common for all the documents that don’t have verses? This is odd because I don’t remember this as a problem in the past. I’m also really too sick to brainstorm at the moment. Do what you think is best.
|
The decision is per corpus, so we can either switch off verse numbers in 'verses' for all documents, or I'm happy to add consecutive numbers to verses in each chapter myself if that would solve it. Also, if only one document doesn't have verses, it's TEI would have to look different from other documents in the corpus. Just give me your OK and I will add verse nums (they're mostly already there, I can easily finish) Feel better! |
I’m not confident the numbers I have already are good sentences. If you have time to check please be my guest!
|
@amir-zeldes it looks like we messed up the language/languages consistency in corpus metadata again. Is there an easy fix, or should I go back through all of them and check manually? |
@amir-zeldes sorry to bother you again but it appears the treebank annotators have not been added to document metadata in all the items. I'm noticing this in Mark. You've listed treebankers by corpus above, but I don't know which docs belong to whom. Can you please check the document level metadata to be sure the treebankers have been added? #27 (comment) Thank you! |
(This may mean the corpora we thought should be frozen need to be fixed. I assumed the treebank folks had been added to doc level annotation.) |
OK, I will look into these tomorrow |
A few final things for this evening:
|
PS - oh, weird, now that I've manually committed, I can actually commit small changes to Longinus, presumably because the diff is small(?).. |
RE language/languages:
Was it intentional for corpora to have 'languages' to differentiate from the document level metadatum? In ANNIS, metadata queries just 'apply', so it the two fields conflict and are called the same, it's possible exact meta-based searches will actually yield zero results for these if they're called the same. Let me know your thoughts about what to do and I can try to apply it. |
Beth did some digging into this a year ago. I don’t remember the logic, but we went with language for doc/languages for corpus. It gets mangled in corpus metadata bc that can’t be validated.
|
Also thanks so much for all of this! I will be offline almost all day. I think I’ve done everything I can (except for those additional 3 paths texts). Please ping me if you need anything and I’ll check in tonight. Take care. |
Sounds good! Which 3 texts though? I think there's Longinus and Phib, which have chapter numbers from PATHS (p_n), and Aphou and Paul, which have unnumbered paragraphs that we made (just p) |
Greetings from the airport. see the corpora checked/not checked at the top of the thread. L&L should be the only one checked/ready. That checklist at the top should be the final list. I’ve done everything I can on all the docs except the 3 unchecked PATHS texts. Leave those three alone. Everything else is either ready or has items to check off that only you can do. Good luck.
|
Closing. Info in #40. |
Timeline:
Version 3.0, version date 2019-09-30
List of materials as of 16 September.
PATHS texts (see #26 @ctschroeder )
Phib
Aphou
Paul of Tamma
L&L
possibly Treebank corpus (UD release is in Nov, ANNIS release this fall) Please check corpus AND document annotation for treebank annotator before ticking off box. CTS is checking other corpus metadata fields
Marcion
more Johannes (@ctschroeder; @eplatte is reviewing)
- [x] visualization review
- [x] corpus metadata
add copyist metadatum where relevant
check Correct the lemmas of the suffix conjug verboid, ounte #31 for corpora listed above
check Morph Errors #30 for corpora listed above
complete this checklist https://github.com/CopticScriptorium/budge-dev/issues/1
complete this checklist https://github.com/CopticScriptorium/budge-dev/issues/2
Later in Fall
Alin has contacted us about more material and there's someone who wants to do G Philip
The text was updated successfully, but these errors were encountered: