Compound verbs in Marathi #386

Open
vinit-ivar opened this Issue Dec 20, 2016 · 19 comments

Projects

None yet

5 participants

@vinit-ivar

Marathi has multi-verb compounds where non-main verbs are bleached of semantic value and only modify Aktionsart. Typically, the main verb has VerbForm=Conv, whilst the non-main verb has VerbForm=Fin.

There's also similar constructs, though, where the non-main verbs do have semantic value and represent a sequence of actions, similar to how the converb functions in Slavic languages. For instance, contrast:

मी बसून जेवतो. mī basūn jevto. "I sit and then eat." (PRN-1SG sit-CONV eat-1MSG)
मी बसून टाकतो. mī basūn ṭakto. "I emphatically/vehemently sit." (PRN-1SG sit-CONV put-1MSG)

The first of these is fairly trivial to parse; जेवतो jevto as the root, and an advmod between it and बसून basun. The second is a bit trickier. I'm not too sure of the validity of having a completely semantically blank verb serve as the root - they seem to behave, in this context, similar to auxiliaries (although with some differences). What's a good way to handle this? I believe the Turkish treebank also has some similar issues, maybe somebody could weigh in?

@dan-zeman
Member

Agreed that advmod(जेवतो, बसून) is the way I would go with the first example, or at least it would occur to me as the first option. There is actually another option, to label the relation xcomp instead of advmod, because the subject of basūn is inherited from jevto. But this construction does not seem to fit very well into the (quite diverse!) set of situations where xcomp is used (http://universaldependencies.org/u/overview/complex-syntax.html#secondary-predicates,
http://universaldependencies.org/u/dep/xcomp.html), thus I guess I still prefer advmod.

Now with the second example I see your point but I do not think we should change the annotation just because ṭakto is semantically weak. The syntactic structure is still the same, right?

I would not classify ṭakto as an auxiliary. The construction here does not seem to be a periphrastic verb form where ṭakto is responsible only for additional features such as tense, aspect or modality. If I understand it correctly, you could as well say something like mī basto "I sit" when you just wanted to construct a finite form in the present tense? Then it should be OK here to make ṭakto the root. The concerns about its semantic weakness are not unlike concerns about semantic weakness of take in English take a picture; but we do not grant it a different analysis from take a glass from the cupboard.

@jnivre
Contributor
jnivre commented Dec 21, 2016

I agree with Dan. The use of "auxiliary" should be restricted to grammaticalised expressions of TAME. Light verbs should not be treated as auxiliaries.

@jnivre
Contributor
jnivre commented Dec 21, 2016

Could the second case possibly be analysed as a serial verb construction using compound:svc?

@dan-zeman dan-zeman added this to the lg-specific v2 milestone Dec 21, 2016
@vinit-ivar

These seem a bit distinct from the English take a picture, though - that seems more similar to Persian-style light verbs, and take is the only verb in that context anyway. I'm using compound:lvc for similar constructs (they're very common in Marathi).

I do see the logic behind using advmod in this context, though I'm still a bit iffy about that. For instance, there's other compound verb patterns where the V2 very strongly indicates, for instance, the inchoative aspect - मी बसू लागलो. mī basū laglo. "I begin sitting." (PRN-1SG sit-INCP attach-1MSG). The V1 is (morphologically) not the same as in the first kind of compound (it's glossed as an inceptive in my analyser), but it still seems like the V2 here marks aspect. Perhaps a subtype of aux might work for verbs that mark what seems like Aktionsart?

@dan-zeman yes, mī basto is valid, it's the imperfective present.

@jnivre To be honest, I'm not entirely sure how compound:svc works - it seem like it indicates a series of actions, based on the example in the docs. What languages are using it?

@dan-zeman
Member

I admit that लागलो looks more like an auxiliary to me but I would have to know much more about Indo-Aryan languages to be able to propose a boundary between auxiliary and main verbs. Auxiliaries are supposed to be a closed, and typically rather small class. You should be able to enumerate them, together with the list of constructions in which they appear (that would be good to do anyways, for the documentation).

It is desirable to somewhat synchronize the approaches across Indo-Aryan languages and you may want to consult the UD Hindi treebank and see what is analyzed as auxiliaries there. However, please do not take it too seriously, as it seems to need a lot of cleaning (see http://hdl.handle.net/11346/PMLTQ-TLYD).

@jnivre
Contributor
jnivre commented Dec 21, 2016

The "compound:svc" is new and is not used anywhere yet, but it has been discussed for Chinese and Hebrew in #323.

@vinit-ivar

I'm aware that UD Hindi isn't very good, I've run into several annoyances with it myself. Looking through it, it looks like the UD Hindi solution to both kinds of compound verbs that I mentioned is indeed to use aux - however, if the copula follows the secondary verb, there's an additional auxpass(V2, cop). Does auxpass (or aux:pass for v2) really work here? It seems like aux(V1, V2) and cop(V1, cop) might be a better way to put it.

About using aux for the first kind - @dan-zeman, the class of possible V2s is a closed class, and not very large (perhaps around 15 unique verbs, off the top of my head). Still, I do see the issues with using aux. About compound:svc - it seems like it would work for the kind of compound verb construct where the two verbs both have semantic value, and it would perhaps work better than advmod. I'm still not too sure it seems to adequately describe a situation where one verb is semantically empty, but perhaps I haven't really understood how it works.

@MemduhG
MemduhG commented Dec 22, 2016

We've been discussing similar constructs for Turkish (and other Turkic languages, where @ftyers could probably weigh in better than me) for a while. There are some verbs that can modify the aspect (aktionsart?) of the previous verb, which is usually VerbForm=Conv. The following example is adapted from UD Turkish, still in UD1 style.

1   Dolaşıp dolaş   VERB    Verb    Aspect=Perf|Mood=Ind|Negative=Pos|Tense=Pres|VerbForm=Trans 2   advcl   _   _
2   durma   dur VERB    Verb    Aspect=Perf|Mood=Imp|Negative=Neg|Number=Sing|Person=2|Tense=Pres   0   root    _   _   

Dolaşıp durma, "don't wander (continuously)". Wander.CONV stand.NEG.IMP is the literal gloss of this sentence, and currently we have durma, "don't stand" as the head and the content word as advcl to it.

When the particular verb durmak is not used, the same construction gives a sense of consecutiveness, and I think it might make sense to use compound:svc to differentiate this from the above, but advcl is also a good description for the second case, and the first does not quite feel like advcl.

@vinit-ivar
vinit-ivar commented Dec 22, 2016 edited

To sum it up - "verbs" that have no semantic value but assign aspect or Aktionsart could be marked advmod or advcl but a better solution would be ideal. Perhaps something similar to aux (maybe a subtype?) as some verbs in these contexts sometimes assign only aspect (and sometimes Aktionsart). Verbs in similar contexts that do have semantic value could be marked with advmod, advcl, or compound:svc if they're consecutive actions.

@dan-zeman
Member

I agree. Just note that there is probably a continuous scale of verbs with different levels of "semantic value", while the annotation approach has to suddenly change at some point of the scale. That gives you some freedom (but also vagueness). We have had similar issues all over UD and sometimes we have been rather restrictive about what qualifies as a function word (e.g., with copulas we strongly encourage people not to include any pseudo-copulas such as "to become", although they could also be said to be semantically very weak).

@vinit-ivar

One could argue that the verbs are on a spectrum, yes - some are semantically empty, some still retain some semantic value, but which/how much is honestly up to personal interpretation and is pretty debatable. For me, seeing as these V2s are a closed class that have been enumerated in several grammars, I would tend to agree with having all of them marked with maybe something like aux:lvb (they're also referred to as light verbs in the literature). advmod could perhaps remain the way it is for temporally consecutive actions, I suppose - similar constructs exist in (East?) Slavic languages with perfective/imperfective participles, I believe.

On semantic values - off the top of my head, two V2s, the equivalents for give and take - have some fairly significant residual semantic value, and imply benefactivity (for the agent, or someone else). I'm not too sure how to handle these within this system.

@jnivre
Contributor
jnivre commented Dec 24, 2016

I don't object to using aux for the closed class, but I woul advise against the subtype "lvb", since the general policy is to treat light verb constructions either as compounds (for example, in Persian) or as completely compositional. I agree that there is probably a continuum from light verbs to auxiliaries (via so-called semi-auxiliaries) but if we treat them as aux we are in fact saying that they are no longer light verbs, so a subtype like lvb would be confusing.

@dan-zeman
Member

@vinit-ivar : Yes, there are converbs (also called transgressives, gerunds or adverbial participles) in Slavic languages. Their frequency differs by language; in Czech for example they are either frozen expressions or archaic style. Past/perfective converb would correspond to the "do something and then something" sense, and I believe it would be clearly advcl. Present/imperfective converb would correspond to "do something while doing something". None of them are used in a manner that would call for auxiliary or light-verb-like analysis.

@ashwinivd

In the example मी बसून टाकतो. mī basūn ṭakto. "I emphatically/vehemently sit." (PRN-1SG sit-CONV drop-1MSG), I would argue that the sentence in this case is perhaps better translated as "I sit, somehow finishing it off)", where ṭakto is contributing the meaning of 'getting it over with', rather than some kind of emphatic meaning. It is of course, different from its 'full' verb meaning of drop as in 'I dropped the ball'

For example, I might say this sentence in the context of not really wanting to sit (maybe I am in a hurry), but I'll sit and get it over with. The verb ṭakto in this case is a light verb. It is also not semantically empty and contributes a meaning of completeness, with the additional meaning of being unwanted. The sit+drop combination is also slightly odd for me, as I would prefer transitive verbs with light verb drop e.g. read, do.

Light verbs need not occur with just nouns in Marathi/Hindi (e.g. the take a picture case), but may occur with verbs as well (usually, all these have certain combinatorial constraints). So these cases can have the compound:lvc label. They are not aux, or serial verbs as there is a modulation of the main verb semantics by the LV. It is true that the aux/serial verb/light verb divide is not altogether clear at times. Seiss has a nice review of the three types, for reference.

To summarize some key points from her article (with the caveat that some small differences will still exist across languages):
serial verbs: "single sub-events which constitute a complex event together"
light verbs: "modify or modulate the event of the preverb" (which could be a noun/verb), may also change valency of the main verb and also result in a composite event semantics
auxiliaries: no combinatorial constraints (can appear with any main verb), unlike light/serial verbs

@MemduhG
MemduhG commented Jan 2, 2017

@ashwinivd : as far as I have seen, at least in Turkic and Iranian languages, compound:lvc is used for nominal + verb compounds. I think there is at least some syntactic difference between this case and using non-finite + finite verbs. It might make the nomenclature unintuitive for people approaching from this perspective with IA languages, but I think this warrants a different dependency label.

@vinit-ivar

Yeah, I'm not too comfortable with compound:lvc either - yes, Seiss calls them light verbs, but in my opinion calling both N+V constructions and V+V constructions "light verb constructions" is a bit dodgy - they're both very distinct syntactic combinations, and deserve distinct labels.

Semantic validity is also not really something I've noticed cross-linguistically - it appears that the Marathi verb rāhṇe "to stay", which indicates unbroken, continuing actions when used in compound constructs, is similar to the Turkish durmak "to stand", and Kyrgyz beru "to give", which are also used in similar constructs. It's hard to argue that all three of these are simultaneously semantically valid, particularly when Marathi deṇe "to give" is also used in a compound verb construction, to indicate benefactivity - which is completely different from how it would be used in Kyrgyz.

Whilst I'm not fluent in any Dravidian languages, it appears that Krishnamurti considers all verbs used in these kinds of compounds (which also exist in Dravidian languages) as auxiliaries, which is also something to consider.

@jnivre
Contributor
jnivre commented Jan 18, 2017

Can't you use compound:lvc for N+V and compound:svc for V+V?

@vinit-ivar

I might be incorrect, but from what I can understand from #323, compound:svc applies when the actions denoted by two verbs occur sequentially, without explicit conjunctive markers? That wouldn't really apply here as one of the verbs doesn't actually "occur".

@jnivre
Contributor
jnivre commented Jan 19, 2017

No, serial verbs do not refer to sequential events. They are complex predicates that typically refer to single events (although there are many things that have been called serial verbs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment