Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

annotation of "se" and "dont" in French #530

Closed
ioan2 opened this issue Feb 19, 2018 · 7 comments
Closed

annotation of "se" and "dont" in French #530

ioan2 opened this issue Feb 19, 2018 · 7 comments

Comments

@ioan2
Copy link

ioan2 commented Feb 19, 2018

Hello,

This issue is close to issue #461, since the annotation of the reflexive pronoun "se" has been discussed, but no decision seems to have been taken.
In fr-ud-train.conllu V2, "se" has been annotated 1487 times as an "obj" relation and 574 times as "expl". I would like to understand the motivation for this. In traditional French grammar there is an distinction between "verbes pronominaux" and "verbe exclusivement pronominaux". The first are verbs which can be used as reflexive, the latter are always with an reflexive pronoun. However checking (some) of the examples I do not think that this is the distinction here (e.g. examples with the verb "se voit": "se" is attached to "voit" 10 times as "obj" and 4 times as "expl"

  • sent_id = fr-ud-train_05555 "Le gouvernement de Bourassa se voit donc obligé ..."
  • sent_id = fr-ud-train_06271 "... elle se voit traitée ..." (obj)

If there is a semantic distinction, it is very subtle, but for dependency analysis I would prefer a distinctive feature or XPOS, but not a different dependency relation

A similar problem is the attachment of "dont": whereas I agree totally when it is attached as "nmod" in sentences like

  • sent_id = fr-ud-train_00061: "....cependant un certain nombre d'artefacts caractéristiques du viatique funéraire royal de la XVIIIe dynastie ont pu être retrouvés et dont le tombeau de Toutânkhamon..."

There are case where "dont" is attached (correctly, in my view) to a verb, but using the "iobj" relation

  • sent_id = fr-ud-train_00537: "Professionnalisme et qualité, voilà les mots qui conviennent pour l'accompagnement dont j'ai bénéficié."
    There are more examples like this (sentences 543, 824, 852, 964). I think iobj is not the correct relation, since "dont" can be rephrased as "dont j'ai bénéficié" --> "j'ai bénéficié de l'accompagnement" which is no "iobj" but rather "obl" (or something different :-)
    Is there a reason why "dont" is "iobj" in these cases?

Best regards
Johannes

@dseddah
Copy link

dseddah commented Feb 19, 2018

Hi,
Can I ask if you noticed the same patterns in the other French treebanks (the se was annotated manually in the Sequoia and the FTB treebanks for example) ?

Djamé

@ioan2
Copy link
Author

ioan2 commented Feb 19, 2018

Hi Djamé,
not yet, but I will check as sson as I can

Johannes

@gossebouma
Copy link
Contributor

I counted nsubj + verb + se in UD_French, UD_French-Sequoia and UD_French-ParTUT (v2.1). By checking for nsubj I hope I avoided including occurrences of se in passives. French-PUD and French-FTB do not have lemmas so I skipped those.

Here is an overview of the number of distinct verb lemmas occurring with se as obj, with se as expl, and the number of roots that occur with both se/expl and se/obj.

treebank obj expl overlap
sequoia 10 107 2
french 251 163 16
Partut 28 24 4

It seems there are interesting differences (ie only sequioa has overall preference for expl), but i leave that to you

Here is some code to produce more detailed statistics if you like.

@ioan2
Copy link
Author

ioan2 commented Feb 20, 2018

Hi
I just checked UD_French-Sequoia for "dont" and there are instances of "iobj" as in UD_French:

  • sent_id = emea-fr-dev_00024: "...la perfusion dépend de la manière dont le SCA doit être traité: "
  • sent_id = Europar.550_00003: "...les producteurs de viande de boeuf et de mouton, dont on attend aujourd'hui qu'ils vendent leurs produits..."

Johannes

@perrier54
Copy link
Contributor

perrier54 commented Feb 20, 2018

Regarding the annotation of "se", there is a very simple criterion for deciding between EXPL, OBJ and IOBJ:

  • If it is possible to replace "se" with "le", without changing the meaning of the verb, put OBJ.
  • f it is possible to replace "se" with "lui", without changing the meaning of the verb, put IOBJ.
  • In other cases, put EXPL.

Some examples:
Il se lave (he is washing himself) -> OBJ
Il se lave les mains (he is washing his hands) -> IOBJ
Il se souvient (he remembers) -> EXPL
Ce livre se vend bien (This book is sold well) -> EXPL
Il se fait huer (he gets booed) -> EXPL (il le fait huer has a causative meaning but not Il se fait huer)

Even if the criterion is simple, in some bordeline cases, it is difficult to choose. The two examples of @ioan2 are borderline cases:
sent_id = fr-ud-train_05555 "Le gouvernement de Bourassa se voit donc obligé ..."
sent_id = fr-ud-train_06271 "... elle se voit traitée ..."
(obj)
In my opinion, "se voir + predicate" has not the same meaning as "voir + obj +predicate"; Thus, the dependency is EXPL in both examples.

The problem for "dont" is different. It comes from the unclear status of IOBJ in UD (see my comments on this issue, unfortunately in French).

@ioan2
Copy link
Author

ioan2 commented Feb 20, 2018

I agree with this differentiation for OBJ/EXPL, and I checked the first 10 examples of "se" in fr-ud-train.conllu, and I think 8 of them which are OBJ should be EXPL in this case

@dan-zeman
Copy link
Member

I am tentatively closing this issue. If actual annotation in UD_French-GSD still contains bugs w.r.t. attachment of se, please open an issue in the repository of that treebank.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants