Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary: an entry element in a sense element #1791

Closed
chr-emil opened this issue Jul 18, 2018 · 14 comments
Closed

Dictionary: an entry element in a sense element #1791

chr-emil opened this issue Jul 18, 2018 · 14 comments
Assignees

Comments

@chr-emil
Copy link

In many multiword expressions (collocations) are written and explained under a sense node of the definition tree in the entry of one of the central words of the multi word expression. The location under a sense node is ok. However, the multi word expression itself has to be encapsulated in a cit element. The definition of the expression will be encapsulated in a sense element at the same level.

A cleaner and clearer model is to consider the multiword expression as a headword and describe its meaning and use in the standard way of an entry. That is, in a TEI encoded dictionary an entry element must be allowed inside a sense element

In the current TEI a sense element may contain:
dictionaries: def dictScrap etym form gramGrp lang oRef pRef re sense usg xr

My suggestion is to extend this to
dictionaries: def dictScrap entry etym form gramGrp lang oRef pRef re sense usg xr

@laurentromary
Copy link
Contributor

laurentromary commented Jul 18, 2018

In theory, this would be a perfect use case for <re>, but we have just had a meeting of the TEI Lex group where we would actually support a wider application of <entry> (e.g. making it recursive) and in particular allow it to occur within <sense>. I am thus 100% supporting this ticket, with the hope that it can be an opportunity to do even a little more for <entry>.
This would boil down to making <entry> member of model.entryPart.top . Easy :-)

@ttasovac
Copy link

ttasovac commented Jul 18, 2018

This is really a very lucky coincidence. As Laurent mentioned, the TEILex0 team (partly DARIAH WG Lexical Resources, partly ELEXIS: European Lexicographic Infrastructure) has spent a great deal of time discussing this issue. We've identified multiword-expressions, collocations (but sometimes also idioms and other type of phraseology) that would greatly benefit from being grouped as entries.

@laurentromary is right that we also have <re> but in our work, we are trying to simplify and streamline the options for encoding all entry-like entities elements using and not superEntry, hom, re etc.

<re> is especially strange: "(related entry) contains a dictionary entry for a lexical item related to the headword, such as a compound phrase or derived form, embedded inside a larger entry" but if it really is a an entry within an entry, that there is very little reason to use <re> instead of <entry>, possibly with a type that would indicate what it is (MWE, collocations etc...)

So, I would also like to urge the Council to consider making <entry> member of model.entryPart.top. It would address both @chr-emil's issue, but also make entry itself recursive, which is what we would like to see happen.

@chr-emil
Copy link
Author

Just to mention it: In a dictionary one may write separate entries for multiword headwords and then create a link from the actual sense to this separate entry. This is the solution we have used in the editing system for the Norwegian Dictionary (http://no2014.uib.no). This is the typical database/linked data solution. However, in an TEI-realization of the dictionary the solution with nested/recursive entries is very clean and useful. It will help me in making a TEI verson of the this very complex dictionary.

@ebeshero
Copy link
Member

@martinascholger just resolved a different, perhaps related ticket on the dictionary module: #1702 , so this might make sense to continue work...

@kdepuydt
Copy link

It is indeed important to have the flexibility of having both multiword headwords as part of a semantic description of the simplex they are related to, but at the same time having the opportunity to treat them as entries. Even more so since dictionaries treat mwu both or as separate entries, or as part of the description of an entry they are related to.

@TomazErjavec
Copy link

I can see problems (as well as advantages) of having entry recursive. So, I have to ask: what is wrong with dictScrap for containing such multiword expressions? It seems to be able to contain most of the stuff entry does.

@ttasovac
Copy link

<dictScrap> is too easy-going about things, you can use dictionary elements in whichever order you want — whereas <entry> remains highly structured, allowing for a much tighter lexical representation.

@chr-emil
Copy link
Author

The ticket #1702 mentioned by @ebeshero is related but different. It addresses a problem one encounters frequently when encoding retrodigitized dictionaries - how to deal with all the characters (and spaces) used in the original as separators and decor and at the same time encode the logical structure of an entry. This would require either mixed content model (elements and cdata intermixed) or some neutral element that can appear almost anywhere to encapsulated the separators, punctuation and decor elements.

@laurentromary
Copy link
Contributor

To further answer @TomazErjavec 's remark: I think one of the issues behind @chr-emil 's request is to be able to have the same object encoded in the same way wherever it appears: i.e. as an autonomous entry or a sub-entry somewhere (in his case within a <sense>). So having something like <entry type="multiWordExpression"> all over the place rather than <entry type="multiWordExpression">, or <re type="multiWordExpression">, or <dictScrap type="multiWordExpression"> depending of the context. This would indeed facilitate coherent searches across a variety of dictionary representations.

@chr-emil
Copy link
Author

chr-emil commented Jul 19, 2018

I agree with @ttasovac. It is nothing unstructured in such a <entry> under a <sense>. <dictScrap> mentioned by @TomazErjavec is in itself fine for many purposes for example when one cite a dictionary entry inside another, but is not the correct solution here.

@lb42
Copy link
Member

lb42 commented Jul 19, 2018

But an entry does have somewhat different semantics from a subentry or a nested entry surely? If asked "how many entries are there?" It's plausible to exclude those which are nested within a main entry, surely? Why did the dictionary writer organise the material in this way? So I am not convinced it really is "the same object".

@laurentromary
Copy link
Contributor

Having a different semantics depends on the actual editorial stance associated to the dictionary. In many cases, the fact that en entry appears as a sub- (or super-) entry is accidental, i.e. results from practicalities. The point is that there are use cases where we need a more homogenous representation framework and we do not ask for the deprecation of <re> or <dictScrap> here, just provide an extra mechanism. No breaking of backward compatibility.

@xlhrld
Copy link

xlhrld commented Jul 20, 2018

To back up the claim of @laurentromary with two examples: There are many dictionaries that use nests of entries purely for reasons of text compression in print (pretty common e. g. for German dictionaries). Those nests consist of some kind of common header and then a list of (typographically) subordinated entries. They are typically typeset just like one big entry, i. e. as one big paragraph on the surface – but all those entries still exhibit exactly the same types of lexicographic sub-elements like the »normal« entries do. Put differently (and exactly along the lines of what @ttasovac said): they perfectly fit the content model of entry. So yes, it's really just an editorial choice and not a real difference in the concept of an entry. You would of course count those nested entries just like all the other entries to get a total amount of entries for the dictionary.

A slightly different case can be seen in etymological dictionaries that may organize entries around word families. One form (typically a simplex form) comes first but often derivatives of this first headword may be discussed further on in what really is a separate (but embedded) entry. Derivatives may also have their own extensively described etymologies possibly deviating from the first headword and thus can be considered entries in their own right. It would be really elegant in my view to mark-up such a cluster of clearly typographically grouped entries as an <entry type="word-family"> and the actual entries for the individual headwords as <entry type="word"> or something similar.

In any case, when using superEntry or re the fact of two entries being related is already sufficiently reflected in the fact that one is embedded. In this respect, »related entry« seems rather like a misnomer anyway: entries that refer to one another via pointers (e.g. synonyms) may be termed »related« with a reason, too. But re could never be used here (without embedding, that is), of course.

If you need to pin down conceptual differences between types of entries it would be much more in line with common TEI practice to use the @type attribute on entry I think. Personally, I always perceived superEntry as actually meaning <entry type="super"> or <entry type="entry-group"> and re as <entry type="related-in-some-way"> (with the problem of vagueness wrt. the actual relation). So why not use @type on entry in the first place with the added bonus of being able to provide a much more flexible typology than the somewhat »hard-coded« types conveyed be superEntry and re? A recursive entry could provide this flexibility.

@martinascholger
Copy link
Member

Council at F2F agrees to add entry to model.entryPart.top

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants