lemma of "comparatives" #28

livyreal · 2016-10-12T13:35:45Z

"maior", "melhor" and "ótimo" in Freeling dictionary have lemmas "maior", "melhor" and "ótimo". We have discussed it before ( @fcbr, @arademaker and me) and decided that not always these adjectives are used as comparatives (specially "ótimo" and we decided that having an uniform treatment would be the better choice), so the best way to tag them would be keep themselves as lemmas.

In Bosque [1,2](what comes from PALAVRAS), the lemmas of these words are: "grande", "bom" and "bom".

The same situation for "péssimo", "pior" and "menor". (except by "péssimo" whose Freeling lemma is "mau", but it something that I should have corrected before but I didn't).

What solution should we take now?

I prefer to keep the lemmas as the word themselves, maior maior; ótimo ótimo, etc. Mainly because I do not think this relations really exist nowadays, it is something from Latin grammar that survived in traditional grammar. It seems to me "horrível", "terrível", "horroroso" are so related to "mau" as "péssimo" is.

For reference:

[1] Bosque 7.5 Universal dependencies, file bosque_CP.udep.conll.gz,
[2] Bosque 7.5 Universal dependencies, file bosque_CF.udep.conll.gz
[3] Bosque version 7.3, converted by Dan Zeman available in http://github.com/UniversalDependencies/UD_Portuguese
[4] Linguateca Version of Bosque CoNLL (7.3), http://www.linguateca.pt/floresta/CoNLL-X/

vcvpaiva · 2016-10-13T22:49:22Z

I disagree @livyreal . if we lose the information that "melhor" is a comparative of "bom" we will have difficulties when dealing with comparatives. We have done nothing of this sort yet, but throwing away this kind of information is not sensible. (I am not saying we need always to transform stuff in comparatives, hence the two possibilities, but losing the info is no good.)

claudiafreitas · 2016-10-14T00:12:05Z

@vcvpaiva, I got your point, but 2 lemmas is weird.
And since comparatives in PT are limited to these few cases, it seems easier (to me) to do some kind of post editing... não?
On the other hand, the changes proposed by Livy (and I agree with them) implies changes in PALAVRAS original annotation. Is that what we want (now)? (It is not clear(to me) if this issue has to do with Freeling convertion only, or with UD material preparation)

vcvpaiva · 2016-10-14T00:44:18Z

@claudiafreitas post processing of some sort would work for me, but just changing the lemma seems to me the wrong thing to do. and I don't know about the size of the problem, haven't done much work on that, but comparatives are logically important and difficult!

livyreal · 2016-10-14T01:57:54Z

@claudiafreitas we are thinking now more on correcting UD 7.5 version, so yes, it implies changes in PALAVRAS annotation. Dealing with it in Freeling is much easier, we just need to change it in the dictionary. But we are postponing this work for now and looking to what should be the better "Bosque" version that we can get. After it, we think on Freeling conversion. We have two different tasks and choices on them could be different. So let's think on Bosque. What we prefer to have in Bosque?

I agree with @vcvpaiva that having this information encoded is (logically) important, but my point is closer to Claudia's point. Those words, as "melhor", "pior", are more often used as simple adjectives.

Maybe the better way to deal with comparatives in Portuguese would be looking to the syntactic structure they appear ("melhor do que", "pior que", etc"). We hope (and this is why we are putting so much effort on this project) many information (including semantic information) could arise from dependency relations. And I think we have comparative adjectives in Portuguese only when we have some sort of syntactic structure doing it (maybe it is a very strong statement, but it's my feeling now). Different from Latin (or even English) in which this information is encoded in the lexical form itself.

Let's see some examples from English:

He is the best!
He is the better...

We have one single lexical item in PT, melhor. When we want to say "better" in Portuguese, we need to use a syntactic structure to it (or a contextual information! I love context and you know I'm a big fan of DRT, but let's consider the step we are in NLP and not in linguist theory :( ), so the comparative information is not in the lexical item itself. What do you think? Am I going to far with my lexical semantics? Valeria what you can add in this English discussion? For sure you know much better than me.

vcvpaiva · 2016-10-14T22:12:25Z

@livyreal He is the better teacher/player/musician. you need to be able to infer that there are two teachers/players/musicians being talked about, right?
the same about "ele e' o melhor professor/jogador/musico", there are two of them (at least, it could be more) and both are good, but one is better. if "melhor" is simply another word, not related to good, you lose all this information.
OTOH if you know it's a comparative for "good" you can simply forget it, if you prefer.

arademaker · 2016-10-15T17:40:43Z

acho que estas discussões estão meio perdidas. Acho que uma forma pode ser lematizada para outra desde que a informação seja preservada na POS tag e features. De fato não podemos perder informação.

@vcvpaiva não acho que 'both' é opção, estamos anotando um corpus, um token não poderá ter duas anotações. Se a decisão do corpus irá precisar ser depois refletida em como queremos que as ferramentas se comportem. Ter no dict do freeling dois pares (lema,POS tag) para uma mesma forma é possivel, se necessário, e neste caso o contexto é que irá ajudar a resolver a melhor opção.

@claudiafreitas a questão aqui é a anotação, nossa prioridade. Sobre perda de informação, se 'melhor' for lematizado para 'bom' mas a informação for transferida para a tag de POS e features, não vejo problema.

@livyreal a mudança no PALAVRAS e Freeling imagino ser a mesma, o léxico de ambos terão que ser mudados. Posso estar enganado por não conhecer os detalhes da implementação do PALAVRAS, mas acho que a lematização e POS funcionam guidas por um léxico em ambos. A forma de desambiguar é que é diferente. Ou seja, uma coisa é o sistema capturar para cada forma a possível POS e lemma, depois é decidir qual das opções é melhor no contexto.

Those words, as "melhor", "pior", are more often used as simple adjectives.

Neste caso, o que vcs estão querendo é dizer que podemos marcar como adj superlativo/comparativo quando for o caso com lemas adequados?

arademaker · 2016-10-15T18:15:58Z

@livyreal voltei agora na tese do @EckhardBick ! Me parece (vide página 9 do livro dele) que o processo de busca no dict é bem semelhante ao freeling. Para uma data form, o dicionário irá conter as regras de flexão e afixos e o módulo PALMORF é responsável por recuperar o dicionário as opções de pares (POS, lema) para cada wordform. A coisa é um pouco mais complicada porque outras coisas precisam ser consideradas, como expressões multi palavras, mas é basicamente a idéia de que o dict irá conter as opções de pares.

livyreal · 2016-10-27T20:45:47Z

one more case:

superiores alto

in "Eles tiveram que fazer isso se os escravos fossem superiores em qualidades que os próprios brancos valorizavam, onde estaria a justificativa moral para mantê-los escravizados?" ref="CF27-4"

it does not seems that "superiores" is the comparative of "alto".

Workbench

arademaker · 2021-09-08T19:34:34Z

awk '$2 ~ /^(maior|melhor|menor)$/ {print $2,$3,$4,$6}' *.conllu | sort | uniq -c
80 maior grande ADJ Gender=Fem|Number=Sing
46 maior grande ADJ Gender=Masc|Number=Sing
 2 maior grande NOUN Gender=Fem|Number=Sing
 1 maior grande NOUN Gender=Masc|Number=Sing
 1 maior maior ADJ Gender=Fem|Number=Sing
 2 melhor bem ADV _
24 melhor bom ADJ Gender=Fem|Number=Sing
32 melhor bom ADJ Gender=Masc|Number=Sing
 4 melhor bom ADJ Number=Sing
 3 melhor bom ADV _
 1 melhor bom NOUN Gender=Fem|Number=Sing
 4 melhor bom NOUN Gender=Masc|Number=Sing
 2 melhor melhor ADJ Gender=Masc|Number=Sing
 1 melhor melhor ADJ _
13 melhor melhor ADV _
 1 melhor melhor NOUN Gender=Masc|Number=Sing
12 menor pequeno ADJ Gender=Fem|Number=Sing
 9 menor pequeno ADJ Gender=Masc|Number=Sing

Related to #219 de qq modo, penso que a lematização de maior para grande deveria obrigatoriamente deixar algum traço nas features. Mas acredito mesmo que deveriam ser duas entradas diferentes como em LR-POR/MorphoBr#88, logo os lemas deveriam ser melhor/melhor. e maior/maior etc.

copiado do #304

arademaker · 2023-02-25T00:36:56Z

Em UniversalDependencies/docs#889, concordei em considerar que no PT temos os comparativos marcados por processos sintáticos apenas. Documentação em https://universaldependencies.org/pt/feat/Degree.html. Basicamente, só temos agora os valores Abs, Dim e Aug para Degree.

Esta decisão é compatível com as discussões acima de desconsiderar os comparativos de superioridade anômalos (Nova Gramática do português contemporâneo, Cunha e Cintra, 7 edição, página 274).

melhor/melhor
pior/pior
maior/maior
menor/menor

Mas o que fazer quanto aos superlativos absolutos anômalos: ótimo, péssimo, máximo, mínimo? Para estes, ainda podemos usar Degree=Abs. Destes, apenas ótimo ainda não normalizei o lemma para ótimo, e hoje são os únicos casos marcados com Degree=Abs. E ainda temos vários casos de superlativos absolutos sintéticos que não estão marcados e lematizados corretamente:

documents/CP0299.conllu
57	gravíssimo	gravíssimo	ADJ	ADJ|M|S|@>N	Gender=Masc|Number=Sing	58	amod	_	_

Também aproveitei para remover entradas de LR-POR/MorphoBr@e5b6aef

also related to errors introduced by the change in the validation given that Cmp was removed as a value of Degree

livyreal added freeling labels Oct 12, 2016

arademaker pushed a commit that referenced this issue Aug 27, 2021

Merge pull request #28 from wellington36/workbench

210250b

Workbench

wellington36 mentioned this issue Sep 8, 2021

melhor/bom ... #304

Closed

arademaker removed the bosque-ud label Oct 18, 2021

arademaker added this to the release 2.9 milestone Oct 18, 2021

arademaker removed the freeling label Oct 18, 2021

arademaker modified the milestones: release 2.9, release 2.10 Nov 4, 2021

arademaker mentioned this issue Nov 4, 2021

revisão dos lemmas #384

Open

6 tasks

UniversalDependencies deleted a comment from vcvpaiva Feb 25, 2023

UniversalDependencies deleted a comment from claudiafreitas Feb 25, 2023

UniversalDependencies deleted a comment from vcvpaiva Feb 25, 2023

UniversalDependencies deleted a comment from claudiafreitas Feb 25, 2023

arademaker added a commit that referenced this issue Feb 25, 2023

related to #28

7f5651e

also related to errors introduced by the change in the validation given that Cmp was removed as a value of Degree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lemma of "comparatives" #28

lemma of "comparatives" #28

livyreal commented Oct 12, 2016 •

edited by arademaker

Loading

vcvpaiva commented Oct 13, 2016

claudiafreitas commented Oct 14, 2016

vcvpaiva commented Oct 14, 2016

livyreal commented Oct 14, 2016

vcvpaiva commented Oct 14, 2016

arademaker commented Oct 15, 2016

arademaker commented Oct 15, 2016

livyreal commented Oct 27, 2016

arademaker commented Sep 8, 2021

arademaker commented Feb 25, 2023 •

edited

Loading

lemma of "comparatives" #28

lemma of "comparatives" #28

Comments

livyreal commented Oct 12, 2016 • edited by arademaker Loading

vcvpaiva commented Oct 13, 2016

claudiafreitas commented Oct 14, 2016

vcvpaiva commented Oct 14, 2016

livyreal commented Oct 14, 2016

vcvpaiva commented Oct 14, 2016

arademaker commented Oct 15, 2016

arademaker commented Oct 15, 2016

livyreal commented Oct 27, 2016

arademaker commented Sep 8, 2021

arademaker commented Feb 25, 2023 • edited Loading

livyreal commented Oct 12, 2016 •

edited by arademaker

Loading

arademaker commented Feb 25, 2023 •

edited

Loading