Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

асфальтны is not analyzed correctly #20

Closed
mansayk opened this issue Jan 16, 2019 · 16 comments
Closed

асфальтны is not analyzed correctly #20

mansayk opened this issue Jan 16, 2019 · 16 comments
Labels
bug Something isn't working lexc twol

Comments

@mansayk
Copy link
Member

mansayk commented Jan 16, 2019

According to Tatar orthographical dictionary it should be "асфальтны", not "асфальтне":
http://suzlek.antat.ru/words.php?txtW=%D0%B0%D1%81%D1%84%D0%B0%D0%BB%D1%8C%D1%82&submit=%D0%AD%D0%B7%D0%BB%D3%99%D2%AF

echo "асфальтны" | apertium-destxt -n | lt-proc -z -w 'apertium-tat/tat.automorf.bin' | cg-proc -z 'apertium-tat/tat.rlx.bin' | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' | apertium-retxt
^асфальтны/*асфальтны$

echo "асфальтне" | apertium-destxt -n | lt-proc -z -w 'apertium-tat/tat.automorf.bin' | cg-proc -z 'apertium-tat/tat.rlx.bin' | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' | apertium-retxt
^асфальтне/асфальт<n><sg><acc>$
@mansayk mansayk added the bug Something isn't working label Jan 16, 2019
@mansayk
Copy link
Member Author

mansayk commented Jan 16, 2019

The same thing here:

echo "ательены" | apertium-destxt -n | lt-proc -z -w 'apertium-tat/tat.automorf.bin' | cg-proc -z 'apertium-tat/tat.rlx.bin' | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' | apertium-retxt
^ательены/*ательены$

root@apertium:~# echo "ательене" | apertium-destxt -n | lt-proc -z -w 'apertium-tat/tat.automorf.bin' | cg-proc -z 'apertium-tat/tat.rlx.bin' | cg-proc -z -w -1 'apertium-tat/dev/mansur.bin' | apertium-retxt
^ательене/ателье<n><sg><acc>$

@jonorthwash
Copy link
Member

According to Tatar orthographical dictionary it should be "асфальтны", not "асфальтне":

So we should definitely generate асфальтны, but should we analyse both forms? That is, is асфальтне attested commonly enough?

(Btw, the dictionary link doesn't show any relevant information when I click on it.)

@jonorthwash
Copy link
Member

Also, can you confirm how nouns that end in ль behave, like роль, руль, автомобиль? What about words that end in бль, like рубль, ансамбль, etc.?

@mansayk
Copy link
Member Author

mansayk commented Jan 17, 2019

but should we analyse both forms? That is, is асфальтне attested commonly enough?

Some people of course can write "асфальтне", but it will be spelling mistake. If we analyze both forms, than it will also affect apertium's spellchecker.

Although that spellchecker doesn't already work as expected because of many archaic and dialect words in the dictionary, that's why I think we should add some 'Orth' tag for "good" words in the dictionary and spellchecker would use only them...

Maybe here we should analyze both forms but add some additional tag that means that it is not orthographically correct. If I remember correctly @IlnarSelimcan already used one a couple of times...

@mansayk
Copy link
Member Author

mansayk commented Jan 17, 2019

Also, can you confirm how nouns that end in ль behave, like роль, руль, автомобиль? What about words that end in бль, like рубль, ансамбль, etc.?

Most of them have affixes with front vowels, but there might be exceptions. For example, correct ones:
рольдән
рульдән
автомобильдән
ансамбльдән
but
акропольдан (I don't know why, but http://suzlek.antat.ru/words.php?txtW=%D0%B0%D0%BA%D1%80%D0%BE%D0%BF%D0%BE%D0%BB%D1%8C&submit=%D0%AD%D0%B7%D0%BB%D3%99%D2%AF)

@mansayk
Copy link
Member Author

mansayk commented Jan 17, 2019

And some more:
фасоль, фасолена
декольте, декольтесы
кольт, кольты
вольт, вольты

@jonorthwash
Copy link
Member

The dictionary urls aren't giving me any information of the sort you seem to be describing:
screenshot from 2019-01-17 23-30-42

@jonorthwash
Copy link
Member

^ательены/*ательены$

Do Russian words ending in ‹е› generally take back vowel endings? That is, is this part of a larger pattern, or is it an exception?

jonorthwash added a commit that referenced this issue Jan 18, 2019
@jonorthwash
Copy link
Member

Related issue: we have the lexicon set up to do both ноябрьдә and ноябрьда. Which is correct?

@jonorthwash
Copy link
Member

Also, is it январенда or январендә? Once I got фасоленда working, январендә is now being produced as январенда. I'll hack it to only work with оль words for now, but this will need to be investigated.

jonorthwash added a commit that referenced this issue Jan 18, 2019
@jonorthwash
Copy link
Member

I think we should add some 'Orth' tag for "good" words in the dictionary and spellchecker would use only them...

Actually, we do the reverse. We add a tag <err_orth> for words that are attested but are considered orthographic errors, and we just automatically remove them when we generate the spell checker. So what we want (and as of eb360c7 now get) is the following:

$ echo "асфальтны" | apertium -d . tat-morph
^асфальтны/асфальт<n><acc>$^./.<sent>$

$ echo "асфальтне" | apertium -d . tat-morph
^асфальтне/асфальт<n><acc><err_orth>$^./.<sent>$

Have a look at the commit—with knowledge of how the word-class categorisation works, it's pretty simple to do for many words.

@mansayk
Copy link
Member Author

mansayk commented Jan 19, 2019

"Акрополь" is strange. You can search for that word here:
http://suzlek.antat.ru
And it finds it.

@mansayk
Copy link
Member Author

mansayk commented Jan 19, 2019

According to the aforementioned website the correct one is "ноябрьдә".

@mansayk
Copy link
Member Author

mansayk commented Jan 19, 2019

And also it says, the correct one is "январенда".

@mansayk
Copy link
Member Author

mansayk commented Jan 19, 2019

"фасоль"

  • correct "фасолена" according to orthographical dictionary.
  • correct "фасольгә" according to explanatory dictionary.
    So, it turned out both of them can be treated as correct?

@mansayk
Copy link
Member Author

mansayk commented Jan 19, 2019

Do Russian words ending in ‹е› generally take back vowel endings? That is, is this part of a larger pattern, or is it an exception?

I cannot right now say it explicitly, but I think you are right. All words that came to my mind have endings with back vowels: ришельесы, ательесы, льесы, подпольесы.

TinoDidriksen pushed a commit that referenced this issue Feb 1, 2019
TinoDidriksen pushed a commit that referenced this issue Feb 1, 2019
@mansayk mansayk closed this as completed Feb 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lexc twol
Projects
None yet
Development

No branches or pull requests

2 participants