Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language-specific PronTypes in Italian #353

Closed
dan-zeman opened this issue Nov 17, 2016 · 2 comments
Closed

Language-specific PronTypes in Italian #353

dan-zeman opened this issue Nov 17, 2016 · 2 comments

Comments

@dan-zeman
Copy link
Member

http://universaldependencies.org/it/feat/PronType.html
http://universaldependencies.org/v2/features.html

When looking for possible extensions of the feature set for UD v2, I came across several Italian-specific values of the PronType feature: Clit, Predet and Ord. I was wondering whether there are possibly better ways of annotating these words.

The examples in the documentation of PronType=Clit seem similar to what is labeled as personal pronoun (PronType=Prs) in other languages (including e.g. Spanish). Many languages have two possible forms of personal pronouns, short/clitic, and full/nonclitic. The usual solution is that both forms are PronType=Prs, and another, language-specific feature distinguishes between the forms. Variant=Short is used in some treebanks but maybe we could define something like Clitic=Yes. Or are there reasons in Italian to say that mi, lo, si etc. are not Prs? @msimi @SimonettaMontemagni @alessandrolenci

Another specific value is PronType=Predet. Used for (pre)determiners like tutti “all”, entrambi “both”. Again, I think that being placed before another determiner is a property orthogonal to our pronoun types. These two instances should be simply PronType=Tot.

And finally, PronType=Ord. Used for ordinal numerals like primo “first”, secondo “second”, terzo “third”. In UD, these are not pronouns or determiners but adjectives (ADJ). And their ordinal status should be marked by NumType=Ord, which is a universal feature. So unless I am missing something, I believe that PronType=Ord should be removed from Italian.

@msimi
Copy link
Contributor

msimi commented Nov 18, 2016

Hi Dan,

we discussed (with Simonetta, Cristina and the rest of the team) your proposal for UD v2 concerning Italian PronTypes and here are the conclusions.

The examples in the documentation of PronType=Clit seem similar to what is labeled as personal pronoun (PronType=Prs) in other languages (including e.g. Spanish). Many languages have two possible forms of personal pronouns, short/clitic, and full/nonclitic. The usual solution is that both forms are PronType=Prs, and another, language-specific feature distinguishes between the forms. Variant=Short is used in some treebanks but maybe we could define something like Clitic=Yes. Or are there reasons in Italian to say that mi, lo, si etc. are not Prs? @msimi https://github.com/msimi @SimonettaMontemagni https://github.com/SimonettaMontemagni @alessandrolenci https://github.com/alessandrolenciPronType=Clit: We agree to use PronType=Prs with the addition of Clitic=Yes. Clitics are in fact personal pronouns with a special status. The new feature Clit=Yes accounts for this.

Another specific value is PronType=Predet. Used for (pre)determiners like tutti “all”, entrambi “both”. Again, I think that being placed before another determiner is a property orthogonal to our pronoun types. These two instances should be simply PronType=Tot.

PronType=Predet: As far as we continue using the relation type det:predet, this feature is not really needed; we can use the more standard PronType=Tot.
And finally, PronType=Ord. Used for ordinal numerals like primo “first”, secondo “second”, terzo “third”. In UD, these are not pronouns or determiners but adjectives (ADJ). And their ordinal status should be marked by NumType=Ord, which is a universal feature. So unless I am missing something, I believe that PronType=Ord should be removed from Italian.

PronType=Ord: We have been using PRON for ordinals only when they have a pronominal function (ex Il primo ad arrivare "The first to arrive”). Simonetta says that they have a "hybrid status” between pronouns and adjectives. No objection then to treat them as adjectives ADJ also in these cases. We have other cases of adjectives in noun role. The feature will be NumType=Ord.

Maria & c.

On 17 Nov 2016, at 21:33, Dan Zeman notifications@github.com wrote:

http://universaldependencies.org/it/feat/PronType.html http://universaldependencies.org/it/feat/PronType.html
http://universaldependencies.org/v2/features.html http://universaldependencies.org/v2/features.html
When looking for possible extensions of the feature set for UD v2, I came across several Italian-specific values of the PronType feature: Clit, Predet and Ord. I was wondering whether there are possibly better ways of annotating these words.

The examples in the documentation of PronType=Clit seem similar to what is labeled as personal pronoun (PronType=Prs) in other languages (including e.g. Spanish). Many languages have two possible forms of personal pronouns, short/clitic, and full/nonclitic. The usual solution is that both forms are PronType=Prs, and another, language-specific feature distinguishes between the forms. Variant=Short is used in some treebanks but maybe we could define something like Clitic=Yes. Or are there reasons in Italian to say that mi, lo, si etc. are not Prs? @msimi https://github.com/msimi @SimonettaMontemagni https://github.com/SimonettaMontemagni @alessandrolenci https://github.com/alessandrolenci
Another specific value is PronType=Predet. Used for (pre)determiners like tutti “all”, entrambi “both”. Again, I think that being placed before another determiner is a property orthogonal to our pronoun types. These two instances should be simply PronType=Tot.

And finally, PronType=Ord. Used for ordinal numerals like primo “first”, secondo “second”, terzo “third”. In UD, these are not pronouns or determiners but adjectives (ADJ). And their ordinal status should be marked by NumType=Ord, which is a universal feature. So unless I am missing something, I believe that PronType=Ord should be removed from Italian.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #353, or mute the thread https://github.com/notifications/unsubscribe-auth/AIq8fTL9GTxGSQNz2Vvixyc02synewjJks5q_LozgaJpZM4K1xm5.

@dan-zeman
Copy link
Member Author

Great, thanks. As for the pronominal ordinals, I did not know that you were actually distinguishing specific usage. I am still slightly in favor of keeping them as ADJ because I believe that this is what most UD treebanks do, and one POS category can indeed perform different functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants