Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprel of list item enumerators #1027

Closed
nschneid opened this issue Apr 27, 2024 · 11 comments
Closed

Deprel of list item enumerators #1027

nschneid opened this issue Apr 27, 2024 · 11 comments

Comments

@nschneid
Copy link
Contributor

nschneid commented Apr 27, 2024

In lists like "1. an item 2. another item" or "it depends on (a) first condition, (b) second condition, (c) third condition", it is not obvious how the list item marker—which we can call an enumerator because it indicates the position in a sequence—should attach.

The list docs currently note the "de facto" approach described in #156, which is to call it nummod.

@manning in #156 (comment)

These kind of list structures occur quite a bit in the English Web Treebank, and we have discussed them. The one thing that seems very clear is that the content and not the marker of each item should be the head. De facto, we have also been using nummod – regardless of whether it is numerals or roman numerals or letters – for the dependency. But, we don't want to make a strong argument for that. It just seemed the least bad thing to do by the time we discussed this case, after the UD 1.0 dependency set was frozen.

(That is, earlier we had introduced the list dependency to have something for organizing the items of lists and had started using appos for joining the items of key-value pairs in lists like "Name: Stuart Smith", but it seemed like these couldn't also be appos, and they are a form of numbering, so, it seemed the least bad choice at the time. Better suggestions for the future welcome! But we probably do want to be careful about introducing too many extra relations for paralinguistic elements.)

But the nummod docs say explicitly that it is for an expression to "modify the meaning of the noun with a quantity", and "a number that serves as a label for an entity rather than denoting quantity is not nummod".

The core group discussed and determined that discourse is the better label for attaching list item enumerators. I will update the guidelines to reflect this.

(Bullets are treated separately from enumerators: they are tagged PUNCT and therefore must attach as punct. A proposal to change the tag to SYM and uniformly attach list item markers as discourse was discussed, but did not attract enough support to amend the current rule about bullets.)

@sylvainkahane
Copy link
Contributor

I never understood the list relation and never used it. But, anyway, for your example "it depends on (a) first condition, (b) second condition, (c) third condition", it is a coordination for me, without any doubt. And I won't be against using cc for the markers "(a)", "(b)", and "(c)".

@nschneid
Copy link
Contributor Author

The policy about using discourse is not conditioned on using list, especially considering that many list items are separate sentences.

As part of a larger sentence, list structures may be closer to coordination. In general, though, lists in a document need not be constrained to have items that are syntactically similar, and might even contain multiple sentences. So it seems too strong to say that lists are always coordination constructions.

An example like "It is based on (a) first condition, (b) second condition, and (c) third condition" certainly qualifies as coordination in my view. But I would say "and" is the cc—not "(a)", "(b)", "(c)", which are paralinguistic labels for units of text that happen to be conjuncts.

Moreover, most cc dependents are tagged CCONJ (or SYM like "&" or "/" functioning like CCONJ). I think it would be confusing to tag list item enumerators as CCONJ, or to start attaching NUMs etc. as cc.

@sylvainkahane
Copy link
Contributor

Ok for not using cc, but it remains that these elements are structuring elements. You can replace them by bullets or dashes, but you cannot completely suppress them. This property prevents us to use discourse, because discourse markers are not structuring elements, not at all. And we don't want to have bullets as discourse. I would like to have the same relation for any kind of list markers. Maybe a new relation is needed.

@nschneid
Copy link
Contributor Author

We debated the issue of bullets and there was some support for making them attach the same way as enumerators, but the status quo for bullets is that they are PUNCT, which means they attach as punct. Bullets are not pronounced, which is characteristic of punctuation and different from enumerators.

@nschneid
Copy link
Contributor Author

To the point about discourse, I think it's reasonable to say that enumerators are structuring elements: they help structure the discourse. :) The deprel discourse doesn't have to be as narrow as the traditional class of discourse markers.

The current guidelines say:

This is used for interjections and other discourse particles and elements (which are not clearly linked to the structure of the sentence, except in an expressive way). We generally follow the guidelines of what the Penn Treebanks count as an INTJ. They define this to include: interjections (oh, uh-huh, Welcome), fillers (um, ah), and non-adverbial discourse markers (well, like, but not you know or actually).

Like these examples, enumerators are syntactically omissible and extrinsic to the propositional semantics of the sentence, and they do not really fit as ADV or CCONJ in my view.

@rueter
Copy link
Contributor

rueter commented Apr 28, 2024

Would it be of any help to compare enumerators with words, such as ‹finally›?

# sent_id = newsgroup-groups.google.com_jokecity_0566f0ba3b5f748f_ENG_20051125_240500-0010
# text = Finally, a boy in the back raises his hand.
1	Finally	finally	ADV	RB	_	8	advmod	8:advmod	SpaceAfter=No
2	,	,	PUNCT	,	_	1	punct	1:punct	_
3	a	a	DET	DT	Definite=Ind|PronType=Art	4	det	4:det	_
4	boy	boy	NOUN	NN	Number=Sing	8	nsubj	8:nsubj	_
5	in	in	ADP	IN	_	7	case	7:case	_
6	the	the	DET	DT	Definite=Def|PronType=Art	7	det	7:det	_
7	back	back	NOUN	NN	Number=Sing	4	nmod	4:nmod:in	_
8	raises	raise	VERB	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	0	root	0:root	_
9	his	his	PRON	PRP$	Case=Gen|Gender=Masc|Number=Sing|Person=3|Poss=Yes|PronType=Prs	10	nmod:poss	10:nmod:poss	_
10	hand	hand	NOUN	NN	Number=Sing	8	obj	8:obj	SpaceAfter=No
11	.	.	PUNCT	.	_	8	punct	8:punct	_

@nschneid
Copy link
Contributor Author

I think it's pretty different. "Finally" is an adverb, and does not need to go before the clause it modifies: it can go in other adverb-friendly positions ("the boy finally raises his hand"). You can use ordinal numbers to indicate sequence order as well ("First", "Second", ...) but "1.", "(a)", etc. would generally not be pronounced as ordinal.

@rueter
Copy link
Contributor

rueter commented Apr 29, 2024

Thank you. I would not have equated the meanings of 'finally', but it does prove the idea.
Would you agree that 1. or (a) might take appositions? If so, the would essentially be pronouns. Of course, we also need distinguish the enumerators from telephone number list relations. Telephone is specific, whereas the asignment of 1., 2., 3. is totally random, isn't it?

@nschneid
Copy link
Contributor Author

Pronouns: I don't think the PRON category should be extended beyond closed-class grammatical items.

Could you elaborate on the point about appositions and telephone numbers? Not sure I follow.

@rueter
Copy link
Contributor

rueter commented Apr 29, 2024

We have what I will refer here to as roster format:
chair=Kim
telephone= 65431
address=25 S.E. Ave. Pine

This, as I understand, is where the list relation is used. "equals" seems to be the meaning.

Although, the ordering of 1., 2., 3... Or (a), (b), (c) might be well thought out, the information they are associated with is not predictable. They are place holders asigned as random identifiers in numeric or alphabetical order.
We are essentially using numbers for counting and quick reference -- letters definitely removes counting for me.
I would read:
(n), which is blah blah.
So, maybe, this does not fit appos, either.

I now see this as a nonrestrictive relation.

@nschneid
Copy link
Contributor Author

OK I see—in the roster format, if multiple items are within one "sentence", list connects the items together, and appos connects the key with the value.

The enumerators are not a great fit because they do not contain any semantic content (maybe that's what you meant by "the information they are associated with is not predictable"). They merely signal discourse order, and assign a label that can be cross-referenced later. You can omit the enumerator without affecting the semantics, but you can't remove the content of the item. So it makes sense to treat the enumerator as dependent, not head. Note that appos is required to be left-headed, and is supposed to be a reversible relation.

@nschneid nschneid closed this as completed May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants