Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elaborate documentation of X tag #1005

Closed
nschneid opened this issue Dec 13, 2023 · 8 comments
Closed

Elaborate documentation of X tag #1005

nschneid opened this issue Dec 13, 2023 · 8 comments
Labels
documentation universal UPOS Universal part-of-speech tags: definitions and examples
Milestone

Comments

@nschneid
Copy link
Contributor

I think a few cases where X is appropriate can be spelled out in more detail (and the code-switching case should be updated in light of #1001). Will implement this as it shouldn't be too controversial, but feel free to weigh in here.

nschneid added a commit that referenced this issue Dec 13, 2023
@dan-zeman dan-zeman added UPOS Universal part-of-speech tags: definitions and examples universal documentation labels Dec 13, 2023
@dan-zeman dan-zeman added this to the v2.14 milestone Dec 13, 2023
@sylvainkahane
Copy link
Contributor

For spoken corpora, X is used for unfinished words (scraps? false start? I am not sure how you call that in English). But we are inconsistent: most of the time we can figure out what will be the complete word and we use the POS of the reapir. We hesitated between two strategies:

  1. using the POS of the corrected word (the repair) when we can figure it out.
  2. using X everytime and put the POS of the corrected word in ExtPos when we can figure it out.

I think I prefer Solution 2 because, even if "a~" is repaired by "after" and I know that "a~" was used here as the start of an ADP, I don't want to have "a~" among the ADPs of my corpus.

In our corpora of spoken French it is incoherent and we should take a clear decision
See https://universal.grew.fr/?custom=657ad60d136b4. We use "~" to indicate unfinished words, because "-" is used in orthographic words. It would be easy to change the annotation with a Grew rule as soon as we have decided what to do.

@Stormur
Copy link
Contributor

Stormur commented Dec 14, 2023

I would like the definition to stress more that this POS (non-)tag should really be a last resort and that it is actually a non lexcial one, similarly as for dep.

@nschneid
Copy link
Contributor Author

@sylvainkahane For words truncated/unfinished due to a dysfluency, my gut feeling is that X would make sense, falling under the word fragment subcase. There are also uses of reparandum where a word is repeated, and there I would expect the regular tag to apply on both tokens.

@Stormur "It should be used very restrictively." seems to say that...are you seeing places where it is overused?

@Stormur
Copy link
Contributor

Stormur commented Dec 15, 2023

Maybe I am nitpicking, but it seems to leave space for creating own restrictions, which might be arbitrarily large as we know, instead of specifying that it is really the last thing you should do if there is no other possibility.

@nschneid
Copy link
Contributor Author

If there is general agreement I would be open to adding a sentence along the lines of "If the word is deemed a 'real' word of the language, then another tag should be used, even if that word's morphosyntactic behavior is unusual."

@nschneid
Copy link
Contributor Author

Thanks @Stormur: the group agreed to emphasize that it should be used narrowly. Updated https://universaldependencies.org/u/pos/X.html

@nschneid
Copy link
Contributor Author

nschneid commented Jan 10, 2024

And @sylvainkahane it now mentions truncated words. I think I agree with you about ExtPos being the right place for the intended word POS if it can be determined.

@sylvainkahane
Copy link
Contributor

Thanks @nschneid. I will adopt the POS X for all truncated words in our spoken corpora and add an ExtPos feature with the POS of the expected word.

@nschneid nschneid closed this as completed May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation universal UPOS Universal part-of-speech tags: definitions and examples
Projects
None yet
Development

No branches or pull requests

4 participants