Dictionary: an entry element in a sense element #1791

chr-emil · 2018-07-18T09:15:58Z

In many multiword expressions (collocations) are written and explained under a sense node of the definition tree in the entry of one of the central words of the multi word expression. The location under a sense node is ok. However, the multi word expression itself has to be encapsulated in a cit element. The definition of the expression will be encapsulated in a sense element at the same level.

A cleaner and clearer model is to consider the multiword expression as a headword and describe its meaning and use in the standard way of an entry. That is, in a TEI encoded dictionary an entry element must be allowed inside a sense element

In the current TEI a sense element may contain:
dictionaries: def dictScrap etym form gramGrp lang oRef pRef re sense usg xr

My suggestion is to extend this to
dictionaries: def dictScrap entry etym form gramGrp lang oRef pRef re sense usg xr

laurentromary · 2018-07-18T10:11:10Z

In theory, this would be a perfect use case for <re>, but we have just had a meeting of the TEI Lex group where we would actually support a wider application of <entry> (e.g. making it recursive) and in particular allow it to occur within <sense>. I am thus 100% supporting this ticket, with the hope that it can be an opportunity to do even a little more for <entry>.
This would boil down to making <entry> member of model.entryPart.top . Easy :-)

ttasovac · 2018-07-18T13:37:05Z

This is really a very lucky coincidence. As Laurent mentioned, the TEILex0 team (partly DARIAH WG Lexical Resources, partly ELEXIS: European Lexicographic Infrastructure) has spent a great deal of time discussing this issue. We've identified multiword-expressions, collocations (but sometimes also idioms and other type of phraseology) that would greatly benefit from being grouped as entries.

@laurentromary is right that we also have <re> but in our work, we are trying to simplify and streamline the options for encoding all entry-like entities elements using and not superEntry, hom, re etc.

<re> is especially strange: "(related entry) contains a dictionary entry for a lexical item related to the headword, such as a compound phrase or derived form, embedded inside a larger entry" but if it really is a an entry within an entry, that there is very little reason to use <re> instead of <entry>, possibly with a type that would indicate what it is (MWE, collocations etc...)

So, I would also like to urge the Council to consider making <entry> member of model.entryPart.top. It would address both @chr-emil's issue, but also make entry itself recursive, which is what we would like to see happen.

chr-emil · 2018-07-18T14:41:30Z

Just to mention it: In a dictionary one may write separate entries for multiword headwords and then create a link from the actual sense to this separate entry. This is the solution we have used in the editing system for the Norwegian Dictionary (http://no2014.uib.no). This is the typical database/linked data solution. However, in an TEI-realization of the dictionary the solution with nested/recursive entries is very clean and useful. It will help me in making a TEI verson of the this very complex dictionary.

ebeshero · 2018-07-18T18:30:00Z

@martinascholger just resolved a different, perhaps related ticket on the dictionary module: #1702 , so this might make sense to continue work...

kdepuydt · 2018-07-18T18:34:49Z

It is indeed important to have the flexibility of having both multiword headwords as part of a semantic description of the simplex they are related to, but at the same time having the opportunity to treat them as entries. Even more so since dictionaries treat mwu both or as separate entries, or as part of the description of an entry they are related to.

TomazErjavec · 2018-07-19T06:31:48Z

I can see problems (as well as advantages) of having entry recursive. So, I have to ask: what is wrong with dictScrap for containing such multiword expressions? It seems to be able to contain most of the stuff entry does.

ttasovac · 2018-07-19T06:34:48Z

<dictScrap> is too easy-going about things, you can use dictionary elements in whichever order you want — whereas <entry> remains highly structured, allowing for a much tighter lexical representation.

chr-emil · 2018-07-19T06:37:18Z

The ticket #1702 mentioned by @ebeshero is related but different. It addresses a problem one encounters frequently when encoding retrodigitized dictionaries - how to deal with all the characters (and spaces) used in the original as separators and decor and at the same time encode the logical structure of an entry. This would require either mixed content model (elements and cdata intermixed) or some neutral element that can appear almost anywhere to encapsulated the separators, punctuation and decor elements.

laurentromary · 2018-07-19T06:49:18Z

To further answer @TomazErjavec 's remark: I think one of the issues behind @chr-emil 's request is to be able to have the same object encoded in the same way wherever it appears: i.e. as an autonomous entry or a sub-entry somewhere (in his case within a <sense>). So having something like <entry type="multiWordExpression"> all over the place rather than <entry type="multiWordExpression">, or <re type="multiWordExpression">, or <dictScrap type="multiWordExpression"> depending of the context. This would indeed facilitate coherent searches across a variety of dictionary representations.

chr-emil · 2018-07-19T06:49:58Z

I agree with @ttasovac. It is nothing unstructured in such a <entry> under a <sense>. <dictScrap> mentioned by @TomazErjavec is in itself fine for many purposes for example when one cite a dictionary entry inside another, but is not the correct solution here.

lb42 · 2018-07-19T08:48:36Z

But an entry does have somewhat different semantics from a subentry or a nested entry surely? If asked "how many entries are there?" It's plausible to exclude those which are nested within a main entry, surely? Why did the dictionary writer organise the material in this way? So I am not convinced it really is "the same object".

laurentromary · 2018-07-19T09:12:25Z

Having a different semantics depends on the actual editorial stance associated to the dictionary. In many cases, the fact that en entry appears as a sub- (or super-) entry is accidental, i.e. results from practicalities. The point is that there are use cases where we need a more homogenous representation framework and we do not ask for the deprecation of <re> or <dictScrap> here, just provide an extra mechanism. No breaking of backward compatibility.

xlhrld · 2018-07-20T10:19:41Z

To back up the claim of @laurentromary with two examples: There are many dictionaries that use nests of entries purely for reasons of text compression in print (pretty common e. g. for German dictionaries). Those nests consist of some kind of common header and then a list of (typographically) subordinated entries. They are typically typeset just like one big entry, i. e. as one big paragraph on the surface – but all those entries still exhibit exactly the same types of lexicographic sub-elements like the »normal« entries do. Put differently (and exactly along the lines of what @ttasovac said): they perfectly fit the content model of entry. So yes, it's really just an editorial choice and not a real difference in the concept of an entry. You would of course count those nested entries just like all the other entries to get a total amount of entries for the dictionary.

A slightly different case can be seen in etymological dictionaries that may organize entries around word families. One form (typically a simplex form) comes first but often derivatives of this first headword may be discussed further on in what really is a separate (but embedded) entry. Derivatives may also have their own extensively described etymologies possibly deviating from the first headword and thus can be considered entries in their own right. It would be really elegant in my view to mark-up such a cluster of clearly typographically grouped entries as an <entry type="word-family"> and the actual entries for the individual headwords as <entry type="word"> or something similar.

In any case, when using superEntry or re the fact of two entries being related is already sufficiently reflected in the fact that one is embedded. In this respect, »related entry« seems rather like a misnomer anyway: entries that refer to one another via pointers (e.g. synonyms) may be termed »related« with a reason, too. But re could never be used here (without embedding, that is), of course.

If you need to pin down conceptual differences between types of entries it would be much more in line with common TEI practice to use the @type attribute on entry I think. Personally, I always perceived superEntry as actually meaning <entry type="super"> or <entry type="entry-group"> and re as <entry type="related-in-some-way"> (with the problem of vagueness wrt. the actual relation). So why not use @type on entry in the first place with the added bonus of being able to provide a much more flexible typology than the somewhat »hard-coded« types conveyed be superEntry and re? A recursive entry could provide this flexibility.

martinascholger · 2019-05-07T20:36:35Z

Council at F2F agrees to add entry to model.entryPart.top

ebeshero assigned peterstadler and martinascholger Jul 18, 2018

martinascholger added Status: Needs Discussion and removed Status: Needs Discussion labels May 5, 2019

martinascholger added the Status: Go label May 7, 2019

peterstadler closed this as completed in 905f307 May 8, 2019

martinascholger added this to the Guidelines 3.6.0 milestone May 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dictionary: an entry element in a sense element #1791

Dictionary: an entry element in a sense element #1791

chr-emil commented Jul 18, 2018

laurentromary commented Jul 18, 2018 •

edited

ttasovac commented Jul 18, 2018 •

edited

chr-emil commented Jul 18, 2018

ebeshero commented Jul 18, 2018

kdepuydt commented Jul 18, 2018

TomazErjavec commented Jul 19, 2018

ttasovac commented Jul 19, 2018

chr-emil commented Jul 19, 2018

laurentromary commented Jul 19, 2018

chr-emil commented Jul 19, 2018 •

edited by laurentromary

lb42 commented Jul 19, 2018

laurentromary commented Jul 19, 2018

xlhrld commented Jul 20, 2018

martinascholger commented May 7, 2019

Dictionary: an entry element in a sense element #1791

Dictionary: an entry element in a sense element #1791

Comments

chr-emil commented Jul 18, 2018

laurentromary commented Jul 18, 2018 • edited

ttasovac commented Jul 18, 2018 • edited

chr-emil commented Jul 18, 2018

ebeshero commented Jul 18, 2018

kdepuydt commented Jul 18, 2018

TomazErjavec commented Jul 19, 2018

ttasovac commented Jul 19, 2018

chr-emil commented Jul 19, 2018

laurentromary commented Jul 19, 2018

chr-emil commented Jul 19, 2018 • edited by laurentromary

lb42 commented Jul 19, 2018

laurentromary commented Jul 19, 2018

xlhrld commented Jul 20, 2018

martinascholger commented May 7, 2019

laurentromary commented Jul 18, 2018 •

edited

ttasovac commented Jul 18, 2018 •

edited

chr-emil commented Jul 19, 2018 •

edited by laurentromary