Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New prototypes for flat #974

Closed
nschneid opened this issue Oct 1, 2023 · 5 comments
Closed

New prototypes for flat #974

nschneid opened this issue Oct 1, 2023 · 5 comments

Comments

@nschneid
Copy link
Contributor

nschneid commented Oct 1, 2023

The current flat guidelines give four kinds of headless structures: 1) names, 2) dates, 3) complex numerals, 4) foreign phrases. While some of these examples ("Hillary Rodham Clinton") are clearly correct, others might actually be amenable to a headed treatment instead, as has been the subject of a number of discussions (e.g. #455).

Here is a proposed alternative for discussion (this would follow the general definition of flat as a structure with no single head):

The prototypes for flat are:

  • (a) personal names (or parts thereof) that lack the hallmarks of general grammatical constructions in the language (e.g. "Hillary Rodham Clinton")
  • (b) foreign expressions that may be borrowed or quoted, but whose original grammatical structure is not necessarily accessible to speakers of the language(s) being annotated. "Foreign" includes not just natural languages but also notational systems that are considered external to natural language proper and are governed by separate rules (e.g., musical chord progressions, software code excerpts). Foreign status should additionally be indicated with the feature Foreign=Yes (the subtyped relation flat:foreign is not recommended).
  • (c) items that occur in an iconic sequence rather than in head-dependent or coordination relationships (e.g. "do re mi"), including onomatopoeia ("quack quack quack") and gibberish ("blargety blarg blarg")
  • (d) items separated into parts for readability (e.g., telephone numbers; contrast goeswith, which addresses improper spacing, and space-separated numerals like "1 000 000" which may be treated as single words)

What is considered to be transparent linguistic syntax (as opposed to flat structure) is subject to treebank-specific policies (e.g., some treebanks might provide proper grammatical analyses in the presence of code-switching, or treat mathematical notation as following linguistic strategies like predication).

The application of flat may extend beyond the prototypical cases to, e.g., various kinds of name and number expressions. However, even if an expression is idiosyncratic or follows a specialized pattern, every effort should be made to find a head rather than employing flat. If a head can be found but no substantive dependency relation is appropriate, dep can be used.

@dan-zeman
Copy link
Member

  • numerals like "1 000 000"

I would not include this example in order to avoid confusion. It is actually a prototypical example (for me maybe the only example) of a legitimate word-with-spaces in languages like Czech or French. See the first paragraph here.

@nschneid
Copy link
Contributor Author

nschneid commented Oct 1, 2023

Ah I couldn't recall whether there was already a policy on that. Updated.

@nschneid
Copy link
Contributor Author

Should the part about onomatopoeia be qualified: "though lexicalized combinations (tick tock) may be treated as compounds"?

@Stormur
Copy link
Contributor

Stormur commented Oct 25, 2023

I would suggest flat:redup for those.

@nschneid
Copy link
Contributor Author

Closing in favor of #989, which incorporates the prototypes. Per the latest discussion there, we are not making any universal recommendation about specific subtypes, but languages may wish to use them.

nschneid added a commit that referenced this issue Dec 1, 2023
- Multiword Expressions (#974, #989)
- Semi-mandatory Relation Subtypes (#990)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants