Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systematically checking missing plural forms for compounds #65

Open
leoalenc opened this issue Feb 7, 2020 · 6 comments
Open

systematically checking missing plural forms for compounds #65

leoalenc opened this issue Feb 7, 2020 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@leoalenc
Copy link
Contributor

leoalenc commented Feb 7, 2020

As pointed out in issues #61 and #64, some nouns lack a plural form.
This problem seems to be widespread, e.g. the plural form of the
following N+Adj compound is missing:

~/MorphoBr$ grep -E "[[:space:]]mosca\-morta\+" nouns/*.dict adjectives/*.dict
nouns/nouns.gfl.dict:mosca-morta        mosca-morta+N+F+SG
nouns/nouns.gfl.dict:mosca-morta        mosca-morta+N+M+SG

There is no linguistic reason for this gap, compare the analogous
compound below:

~/MorphoBr$ grep -E "[[:space:]]cabeça\-chata\+" nouns/*.dict adjectives/*.dict
nouns/nouns.gfl.dict:cabeça-chata       cabeça-chata+N+F+SG
nouns/nouns.gfl.dict:cabeça-chata       cabeça-chata+N+M+SG
nouns/nouns.gfl.dict:cabeças-chatas     cabeça-chata+N+F+PL
nouns/nouns.gfl.dict:cabeças-chatas     cabeça-chata+N+M+PL

So it would be very useful to automatically check the whole inventory
of nouns and adjectives for missing plural forms in order to fill in these gaps.

@leoalenc leoalenc added the bug Something isn't working label Feb 7, 2020
@arademaker
Copy link
Contributor

For the compounds N-ADJ we can surely make a script to check, using the entries for nouns and adjectives. So mosca-morta is mosca/N and morta/adj so we can get the plural for mosca and plural for morta and produce the moscas-mortas.

But this strategy may work only for N-ADJ compounds, right? BTW, would it be better to implement it as a transducer or as a script? Considering long term maintenance...

@arademaker
Copy link
Contributor

@leoalenc, In your grep you search for adjectives and nouns but entries were found only in nouns. Are you expecting them in the adjectives too? Should we add them to adjectives too? That could potentially double the size of the repository. We need a general way to deal with the fact that almost all nouns can be also used as adjectives, right?

@leoalenc
Copy link
Contributor Author

@leoalenc, In your grep you search for adjectives and nouns but entries were found only in nouns. Are you expecting them in the adjectives too? Should we add them to adjectives too? That could potentially double the size of the repository. We need a general way to deal with the fact that almost all nouns can be also used as adjectives, right?

@arademaker , procurei por substantivos e adjetivos porque tinha me deparado com outros exemplos que apresentaram erros análogos na geração de diminutivos e pertencem às duas classes.

@leoalenc
Copy link
Contributor Author

leoalenc commented Feb 11, 2020

For the compounds N-ADJ we can surely make a script to check, using the entries for nouns and adjectives. So mosca-morta is mosca/N and morta/adj so we can get the plural for mosca and plural for morta and produce the moscas-mortas.

But this strategy may work only for N-ADJ compounds, right? BTW, would it be better to implement it as a transducer or as a script? Considering long term maintenance...

@arademaker, parece mais fácil implementar isso como um programa procedural na linha do que vocês esboçou. E você tem razão, compostos N+N se comportam de maneira diferente, só o primeiro membro se flexionando. E há vários outros tipos de compostos. É uma questão complexa, por enquanto, seria suficiente levantar os casos que não apresentam plural e resolver isso manualmente ou através de algum script na linha do que você sugeriu, dependendo do número de casos encontrados.

@leoalenc
Copy link
Contributor Author

@arademaker , de fato, todo adjetivo pode ocorrer na posição do substantivo, mas isso não implica que seja um substantivo no léxico. Por exemplo, o carro vermelho é econômico, o azul consome muita gasolina. Esse exemplo não motiva a inclusão de azul como substantivo no dicionário. Está subentendida a palavra carro, mencionada no contexto anterior. No léxico, só dizemos que um adjetivo é substantivo quando possui uma semântica especial. Por exemplo, azul para designar o nome da cor, por exemplo, o azul é uma cor tranquilizante. No caso de última mencionado em #57, o dicionário Houaiss de fato apresenta uma semântica especial tanto para acepção da palavra como substantivo feminino quanto masculino, além das acepção como adjetivo.

@arademaker arademaker changed the title systematically checking for missing plural forms systematically checking for missing plural forms for compounds Feb 11, 2020
@arademaker arademaker changed the title systematically checking for missing plural forms for compounds systematically checking missing plural forms for compounds Feb 11, 2020
@arademaker
Copy link
Contributor

arademaker commented Feb 11, 2020

OK, vamos considerar que o escopo deste issue é apenas:

por enquanto, seria suficiente levantar os casos que não apresentam plural e resolver isso manualmente ou através de algum script na linha do que você sugeriu, dependendo do número de casos encontrados.

Já sabemos que temos ~8K compostos com hífens mas muitos podem não ser N-ADJ. O script deverá filtrar formas com hífen, verificar se primeira palavra é N e segunda é ADJ, e então procurar pela variante plural.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants