Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unanalyzability #30

Closed
omriabnd opened this issue Sep 18, 2018 · 5 comments
Closed

Unanalyzability #30

omriabnd opened this issue Sep 18, 2018 · 5 comments

Comments

@omriabnd
Copy link
Member

Unify chapter 9 and 6, consider "the real deal" as unanalyzable.

@nschneid
Copy link
Collaborator

An initial attempt to clarify the criteria for unanalyzability—thoughts?:

If somebody who hadn't heard the expression or name before would be able to figure out what/who it is based on the meanings of the individual words/parts, it is probably analyzable.
Unanalyzable expressions include:

  • personal names like "John Q. Smith" (excluding titles like "Dr.", "St.", "President", "Queen", which are E)
  • titles of works of art/literature/law: A Tale of Two Cities (book), Marbury v. Madison (court case)
  • idiomatic multiword expressions with unpredictable (opaque or not-fully-transparent) meaning: "hot dog" (food), "give up" ('quit'), "the real deal", "kick the bucket" ('die'). This includes phrases from another language: "crème de la crème" (in English).

Proper names of places, organizations, and events are often analyzable, as are many specialized/technical terms:

  • Silicon_E Valley_C
  • Microsoft_C Corporation_E
  • University_C of_R California_E
  • UC_C Berkeley_C
  • Society_C of_R [Linguistics_E Undergraduate_E Students_C]_E
  • World_E War_C II_Q
  • [natural_E language_C]_A processing_P (?)
  • time_E signature_C (music)

However, a name embedded within another name generally is not:

  • [St. Lawrence UNA]_E River_C

@nschneid
Copy link
Collaborator

nschneid commented Sep 19, 2018

Another unanalyzable MWE: "French horn" (a kind of musical instrument, with a particular shape and physical properties, not necessarily made in France)

@omriabnd
Copy link
Member Author

The criterion is: are you able to recognize the semantic input of the part in the whole. Names are an exception.
By that token, "French horn" is analyzable.

Add a criterion: if something is named after something else, that something else is not analyzed internally:
[St._E Lawrence_C]_A was a kind man

But:
I live [by_R the_E [St. Lawrence]_E River_C]_A
I live [in_R [St. Paul]_C]_A

So no inner analysis of St. Paul / St. Lawrence.

Dotan?

@nschneid
Copy link
Collaborator

Instead of "named after something else", "named after a completely different kind of thing (e.g. city named after person)".

@dotdv
Copy link
Collaborator

dotdv commented Sep 20, 2018

I live [by_R the_E [St. Lawrence]_E River_C]_A
I live [in_R [St. Paul]_C]_A
So no inner analysis of St. Paul / St. Lawrence.
Dotan?

Yes, that's what I tell annotators if they ask. Sure we can add something about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants