-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add etymology section from Jack's and Laurent's Paper #26
Comments
We haven't discussed this in great detail, but I need us to jumpstart this — also because my students in Lisbon need to encode some etymologies today in TEI Lex-0. For the time being, I think we need:
We will definitely discuss this and what our final recommendation will be. This is just to start the process. |
Merci, @laurentromary . I'll take a look. One more general question — for you or anybody:
This is from Johnson's dictionary: <etym type="borrowing"><pc>[</pc><cit type="etymon">
<form xml:lang="grc"><orth>λεξικὸν</orth></form>
</cit> and <cit type="etymon">
<form xml:lang="grc"><orth>γράφω</orth></form>
</cit>; <cit type="etymon">
<form xml:lang="fr">lexicographe</form>
<pc>,</pc>
<lang value="fr">Fr.</lang>
</cit><pc>]</pc>
</etym>
|
For xml:lang, we should refer to BCP 47 and not to ISO 639 directly (it sets rules on how to use part 2 and 3 for instance). My bible is alway the IANA language sub tag registry: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry |
Should not you put a |
Sure. I just don't like the fact that we have two-letter codes for modern languages and then a three-letter code for an ancient language, but I know that my 'liking' things is totally beside the point! 😃
Yes, I was rushing... I think it will be a hard sell (I can imagine the questions starting with: "why is this a label"?) but yes, we don't like mixed content etc. But, if I may ask again: are you ok with |
Do we need to take a decision on the fly now? My stomach relates this to |
We can't and don't need to make the final decision now. But I need to present something — as a temporary solution for our exercises today (we start in an hour and a half). I can put the |
Absolutely. One element is the notion of when xml:lang is used to indicate the object language (such as in entry) |
Pour BasNum, j’utilise toujours les codes pays à 3 lettres afin de réduire l’ambiguïté. J’utilise xml:lang sur entry, mais je le trouve un peu redondant du fait que meme si un mot est d’origine étranger, Furetière/Basnage le considérait comme un mot du français - voir aile (prononcé ale) pour la bière anglaise apprécié par les jeunes parisiens de la fin XVII
Geoffrey
… Le 3 juil. 2019 à 18:19, laurentromary ***@***.***> a écrit :
Absolutely. One element is the notion of when xml:lang is used to indicate the object language (such as in entry)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#26?email_source=notifications&email_token=AD63DP5CH67BFDRLN7CCQNTP5TGPPA5CNFSM4FS4IJCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZE62HA#issuecomment-508161308>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD63DP2RDKHMWYWBQBP3RCDP5TGPPANCNFSM4FS4IJCA>.
|
Two remarks: 1. text nodes Should we remove My initial thought here is that yes, we should disallow textNodes, but recommend in the narrative guidelines that those who do not go granular simply add a <etym>
<note>[λεξικὸν and γράφω; lexicographe, Fr.]</note>
</etym> 2. default type We will need to discuss the typing. At the moment we put the types from Laurent's and Jack's paper, but those will need narrative explanations in the context of TEI Lex-0 because they may not be self-evident. We need to leave that longer conversation for later. (@anacastrosalgado and I will try to look at how our current typology works with the Portuguese Academy dictionary and will report back.) But for the time being, with Any thoughts @laurentromary, @iljackb? |
I like the idea of the baseline provided with |
I think we should preserve Back in Berlin we were considering
I don't know what a chunk is but I like that segs are arbitrary. Whereas:
implies a complete description, not fragments of it. So, yes, I'd actually prefer |
In our TEI Lex-0 Etym paper we (@iljackb, @laurentromary and me) propose NB: To me, the whole business with avoiding mixed content feels a bit like over-engineering for prose centered texts such as many etymologies. It doesn't provide much benefit to the modeling proper. Basically you just sort of confirm that yes, I didn't forget to mark this up as something more specific, it's just any |
I just discovered that some of my Lex0 dictionaries (cf. https://gitlab.clarin.si/et/tei-lex0-sl) are no loger valid, because now etym/@type is required. I now found this issue and comment:
|
I totally agree. Our <etym> are word histories, and more story than history. I shall only try classifying, using type, once I have full encoding and talk with real etymologists.
I must say, I am wondering whether I can even attempt to stay in TLex0 as it is simply too simplistic for heritage dictionaries.
… Le 5 juil. 2019 à 20:19, Tomaž Erjavec ***@***.***> a écrit :
I just discovered that some of my Lex0 dictionaries (cf. https://gitlab.clarin.si/et/tei-lex0-sl <https://gitlab.clarin.si/et/tei-lex0-sl>) are no loger valid, because now ***@***.*** <https://github.com/type> is required. I now found this issue and comment:
But for the time being, with @type being required, I'm just wondering if we can come up with a default, catch-all type, which will be neither "borrowing" nor "inheritance" because those might simply be wrong in the given case.
I think it is more "XML like" that if you don't know a value for some attribute, you don't write the attribute, i.e. why make it required and then have a "I don't know" value, rather than it being optional?
Note that the documentation is rife with examples of etym without @type, so right now it is pretty misleading what is ok and what not. I'd also bet (1 beer) that for the most cases of legacy dictionary it won't be clear what kind of etymology an etym represents, or at least not simply machine inferrable, so the @type will the rather an exception than a rule.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#26?email_source=notifications&email_token=AD63DP2BCBRUMKG2VW7SJ3LP56GBJA5CNFSM4FS4IJCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZKCV7Q#issuecomment-508832510>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD63DPZ6WHZXTC7OBKKLAUDP56GBJANCNFSM4FS4IJCA>.
|
Hi! In Portuguese dictionaries, when etymologists do not know the source of the materials they handle, "De origem obscura" [From obscure origin] is the usual label. How do you recommend to encode this? Thanks (@ttasovac , @laurentromary , @iljackb )? Hi! In Portuguese dictionaries, when etymologists do not know the source of the materials they handle, "De origem obscura" [From obscure origin] is the usual label. How do you recommend to encode this? Thanks (@ttasovac , @laurentromary , @iljackb )? I would appreciate your help. `<entry type=“monolexicalWord" xml:lang="pt" xml:id=“cota_b"> cota kˈɔtɐ :2 s. f. ` |
If it alternates with what would be an |
So most simply I would do:
<etym>
<seg type="desc">De origem obscura</seg>
</etym>
If you want and/or think it would be useful, you could also put a value in
<etym @type> such as "unknown", "undefined", "obscure", etc. But you don't
necessarily need that as the term in <seg> is enough to be able to search
for where the etymology isn't known.
…On Wed, Sep 18, 2019 at 6:56 AM laurentromary ***@***.***> wrote:
If it alternates with what would be an <etym>, maybe we should be going
with one here as well, but typed undefined. <etym type="undefined">
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26?email_source=notifications&email_token=ABYQ2HH6VLZCCKGTBQSF5YLQKGYJRA5CNFSM4FS4IJCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD66ZMBA#issuecomment-532518404>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABYQ2HHFHWR5W3FWFHTGTI3QKGYJRANCNFSM4FS4IJCA>
.
|
I would imagine the term has variants and relying on a typing would univocally help finding the appropriate content. |
I know this is kind of off-topic, but can I ask why this aversion to mixed content? |
hi @ambs, i wouldn't call it an aversion. the only concern is that sometimes mixed content is more difficult to process, I know i've run into issues with white spaces in html that were really difficult to solve (and would differ between browsers etc.) but all in all I think everybody will agree with you that mixed content is sometimes a must, is often needed in humanistic texts (i.e. narratives, not tabular data), and yes, that's an argument in favor of XML over JSON, for sure. |
I have one question concerning etymologies in TEILex-0: Thank you for your answer. Best wishes, |
Etymology has not been officially added to TEI Lex-0 yet for no other reason than a lack of time on part of everybody involved. When etymology is finally added and documented properly, pRef and oRef are unlikely to make a comeback because we already reached a consensus that having specific elements for orthographic references and pronunciation references is unnecessary from the point of view of TEI Lex-0 since we can use typed ref elements for that. |
Thank you very much for your answer. So, I will use |
If you’re not in the hurry, we need to finalise a paper on this by the end of the month. I could send you a stable draft by then.
Laurent
… Le 11 mars 2021 à 15:25, tklampfl ***@***.***> a écrit :
Thank you very much for your answer. So, I will use ref instead to meet the requirements of TEI Lex-0.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABH5B3ZDEN6RXB5JTMBAAATTDDHEHANCNFSM4FS4IJCA>.
|
Hi Thomas,
Just to give a preview of how it is different in Lex0 Etym, If you are
encoding a declaration of an etymon, cognate or derivative, the format is
still within <cit type="etymon"> as in the first paper, but it with <form>
and <orth>/<pron>:
<cit type="etymon" xml:lang="pt">
<form>
<orth>humano</orth>
<!-- <pron> could also occur here -->
</form>
</cit>
But if it is a cross reference (such as the type that might occur in
running text), that is when you would use <ref> (within <xr>), e.g. as
follows:
....<xr type="related" subtype="etymon" xml:id="etym-dorsum" xml:lang="la"
<ref type="entry">dorsum</ref></xr>....
If this is a pronunciation form you can use @Notation (as you can with
<pRef>), otherwise it is assumed to be orthographic or simply unspecified.
So whether you should use <ref> or not according to our recommendations
depends on the function of the form..
This is just to let you know the difference of how we are treating these in
the new guidelines. But I see Laurent responded so the details will best be
explained in the paper itself when you get it.
Best,
Jack
…On Thu, Mar 11, 2021 at 3:25 PM tklampfl ***@***.***> wrote:
Thank you very much for your answer. So, I will use ref instead to meet
the requirements of TEI Lex-0.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYQ2HBK4DGTDDUQG43KH33TDDHEHANCNFSM4FS4IJCA>
.
|
Jack, what's your GitHub user name? I'd like to assign this to you.
The text was updated successfully, but these errors were encountered: