Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more on dictionary: The element <usg> inside <def> #1800

Open
chr-emil opened this issue Aug 1, 2018 · 12 comments
Open

more on dictionary: The element <usg> inside <def> #1800

chr-emil opened this issue Aug 1, 2018 · 12 comments

Comments

@chr-emil
Copy link

chr-emil commented Aug 1, 2018

I am currently working on defining a TEI format for three modern Norwegian dictionaries (two at www.ordbok.uib.no). The dictionaries are edited in a relational database system, are published both on the web and as printed books.

For each definition text and also for each usage example (mostly created by the editors as is usual for this kind of dictionaries) the editor may add information about the area of usage. In the given system this information is taken from a predefined list (zool., bot., mil., outdated,…). The element <sense> is in TEI used to encode the definition (meaning) structure, mostly a tree-structure. In each <sense> one may have a (list of) textual definitions experessed in <def> (e.g ‘;’ separated) followed by a (list of) examples of use in <cit>. For each of these textual definitions and examples one can add a usage marker. Intuitively these markers should be encoded by the use of <usg>. However, <usg> cannot occur inside a <def> element.

In the Guidelines we find: ‘usg’ can only occur inside:: dictScrap entry entryFree etym form gramGrp hom re sense xr. In my case I would need to encapsulate each <def> and <cit> in a separate <sense> which is artificial and logically wrong. Also, the element can contain almost anything even <email>, <height>, and <climate>.

The element has an area of application outside dictionaries. As <def> it may contain a rich variety of elements including <superEntry>!

The dictionaries I work with are real existing dictionaries. Since TEI is not prescriptive, it should be adjusted to cover these dictionaries.

Suggestion: Extend the formal definition of <def> and <cit> by adding <usg> as a possible sub elements.

@iljackb
Copy link

iljackb commented Aug 1, 2018 via email

@chr-emil
Copy link
Author

chr-emil commented Aug 1, 2018

@iljackb comments "the <usg> should generally be encoded as a child of <sense> not <def>" is a normative lexicographic statement. The dictionaries in question are not legacy dictionaries. I agree that may be better to have one <usg> for a list of semicolon separated definitions. However, if a lexicographer decide to open for a usage marker for each of a list of defintions (or examples), TEI is not in the position to say "this is not allowed, reorganize your dictionary!"

@scstanley7
Copy link
Contributor

Just to revisit this, does the example provided (pairing <usg> and <def> within <sense>) resolve the issue? It seems not from your initial comment, but I'm struggling to understand how "encapsulat[ing] each <def> and <cit> in a separate <sense>" would be "artificial and logically wrong" without an example. I think a real-world example would help me understand this problem much better.

@chr-emil
Copy link
Author

chr-emil commented May 7, 2019 via email

@raffazizzi raffazizzi assigned raffazizzi and unassigned scstanley7 Jan 16, 2020
@raffazizzi
Copy link
Contributor

raffazizzi commented Jan 16, 2020

@chr-emil have you had a chance to look at this again? Thanks!

@PFSchaffner
Copy link
Member

PFSchaffner commented Jan 16, 2020

In our TEI-derived (i.e. modified TEI) schema for our admittedly very legacy Middle English Dictionary, usg is certainly allowed within def, and is widely used; it is hard to think of an alternative, since the usage labels are embedded within running prose, and usually (or at least often) do not stand separably off from them.
<def>A rung of a ladder; also <usg type="semantic" expan="figurative">fig.</usg></def>

<def>A scraping tool used in carpentry; <usg type="field" expan="medicine">med.</usg> an instrument for scraping bone.</def>

<def><usg type="field" expan="chess">Chess</usg> A rook, castle; also, a representation of a rook in a coat of arms.</def>

<def n="a">A rooftop; a housetop;</def>
<def n="b">a roof as the highest part of a building or as a high or an exposed place; also <usg type="semantic" expan="figurative">fig.</usg>, in phrase: <hi rend="b">rote and ~</hi>.</def>
`

@ebeshero
Copy link
Member

ebeshero commented May 5, 2020

Council agrees this is good to implement with @PFSchaffner 's examples.

@raffazizzi
Copy link
Contributor

I can see two ways of implementing this (allowing <usg> within <def>) and would like the council's opinion before moving ahead.

  1. Brute force: <usg> within <def> allowed directly
<alternate minOccurs="0" maxOccurs="unbounded">
      <macroRef key="macro.paraContent"/>
      <elementRef key="usg"/>
</alternate>
  1. Add model.lexicalRefinement, the class with <usg> to the content of <def>. Which strikes me as more elegant, but would also allow colloc gramGrp lbl pos subc.
<alternate minOccurs="0" maxOccurs="unbounded">
      <macroRef key="macro.paraContent"/>
      <classRef key="model.lexicalRefinement"/>
</alternate>

@martinascholger
Copy link
Member

martinascholger commented Oct 14, 2021

Council meeting 2021-10-14: green for @raffazizzi to go with proposal 1 (brute force).

@sydb
Copy link
Member

sydb commented Oct 24, 2021

Nope. Brute force (1) approach simply will not work for DTDs. Remember that macro.paraContent boils down to ( #PCDATA | g | s |cl | phr | w | m … )*. Thus approach (1) produces <!ELEMENT def ( %macro.paraContent; | usg )* >, which fails because, roughly speaking, the #PCDATA can only exist as the 1st item in the content model (here a paren is 1st). See the spec if you care for the formal details.
In any case, I am not sure how should be the right way to do this sort of thing. One possibility (which I have not decided I like much) is to make the Stylesheets smart enough to notice this situation and “flatten out” the corresponding DTD content model. Another is to make a new macro, “paraContentPlusUsg”, I suppose.

@sydb sydb reopened this Oct 24, 2021
@sydb
Copy link
Member

sydb commented Nov 8, 2021

Talking this over with @martindholmes we are going to reverse this change for now, and re-visit how to accomplish <usg> in <def> if & when @chr-emil (or someone else) provides a use case example.

@ttasovac
Copy link

ttasovac commented Apr 10, 2024

Hi guys. This question keeps popping up on TEI Lex-0. You've asked for some examples, and here are two from @anacastrosalgado's Portuguese dictionaries (taken from DARIAH-ERIC/lexicalresources#152):

image

   <sense xml:id="MOR1.DLP.ASTROLABIO.s.1">
      <usg type="domain" corresp="#domain.astronomy" resp="#Salgado"/>
      <def>inſtrumento Aſtronomico,
         de que ſe uſa para ſe tomarem a altura dos
         aſtros</def>
      <pc>.</pc>
   </sense>

image

   <sense xml:id="MOR1.DLP.TELESCOPIO.s.1">
      <usg type="domain" corresp="#domain.astrology" resp="#Salgado"/>
      <def>inftrumento óptico de
         Aftronomia que ferve de obfervar na terra , ou
         no Ceo os objectos remotos, por meio da reflexão
         , ou refracção da luz</def>
      <pc>.</pc>
   </sense>

Things to consider:

  • a roundabout way of dealing with this is not to explicitly mark up the usg within a definition, but to do an empty <usg> element outside the <def> within the given sense. This is what Ana did above.
  • the argument in favor of having <usg> within <def> is that it would respect our reading of the lexicographer's intention. For instance, when Morais says "instrumento optico de Astronomia que...", the capital letter on A indicates that he considered Astronomia in this case to be a domain label. So to respect that, the argument goes, we should allow <def>instrumento optico de <usg>Atronomia</usg> que...</def>

I agree that this is a tricky situation, but as @iljackb says above legacy dictionaries are often not "conveniently" organized. We have many more examples in which definitions, indeed, include collocation information or grammatical information etc. So the idea of "pure" definitions simply doesn't work in older dictionaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants