Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible inflection errors #2

Closed
ryskina opened this issue Feb 2, 2021 · 7 comments
Closed

Possible inflection errors #2

ryskina opened this issue Feb 2, 2021 · 7 comments

Comments

@ryskina
Copy link

ryskina commented Feb 2, 2021

I've been comparing Apertium-generated paradigms with the ones in Iskhakov & Pal'mbakh 1961 grammar book (Ф. Г. Исхаков, А. А. Пальмбах. Грамматика тувинского языка: Фонетика и морфология.) and found some mismatches.
Disclaimer: I am not a speaker of Tuvan.

  1. Some Apertium-generated imperative forms for кел:

    келеалыңар:кел<v><iv><imp><p1><pl>
    келейн:кел<v><iv><imp><p1><sg>
    келеалы:кел<v><iv><imp><p1><du>
    

    I&P book has келиилиңер, келийн, келиили respectively (pp. 391-392).

  2. Some <p3><pl> forms have a double -лер. I haven't seen this in the literature and it looked suspicious.

    келдилер:кел<v><iv><ifi><p3><pl>
    келдилерлер:кел<v><iv><ifi><p3><pl>
    

    I&P has келдилер for this analysis (I&P 365), and Harrison, 2000 has keldi(ler). The same pattern in other tenses:

    келгеннер:кел<v><iv><ger_past><nom>+э<cop><aor><p3><pl>
    келгеннерлер:кел<v><iv><ger_past><nom>+э<cop><aor><p3><pl>
    келгендирлер:кел<v><iv><ger_past><nom>+э<cop><aor><evid><p3><pl>
    келгендирлерлер:кел<v><iv><ger_past><nom>+э<cop><aor><evid><p3><pl>
    ...
    
@ryskina
Copy link
Author

ryskina commented Feb 2, 2021

Update on 1: the analyzer inflects номчу and ал incorrectly:

номчуайн/номчу<v><tv><imp><p1><sg>
алайн/ал<v><tv><imp><p1><sg>

I&P have номчууйнand алыйн.

jonorthwash added a commit that referenced this issue Feb 3, 2021
@jonorthwash
Copy link
Member

2. Some <p3><pl> forms have a double -лер. I haven't seen this in the literature and it looked suspicious.
келдилер:кел<v><iv><ifi><p3><pl> келдилерлер:кел<v><iv><ifi><p3><pl>

This is an alternative form in the transducer, available for analysis only. If you don't want to see it, you should use the generator, not the analyser.

@ryskina
Copy link
Author

ryskina commented Feb 3, 2021

Thanks! I've generated the forms using

echo "[ к е л  %<v%> [ ? - [ %+ | %<subst%> ] ]* ]" | hfst-regexp2fst -o prefix.hfst
hfst-invert .deps/tyv.LR.hfst | hfst-compose-intersect -1 - -2 prefix.hfst | hfst-fst2strings 

Is there a different way I should use that would exclude analysis-only forms?

@ftyers
Copy link
Member

ftyers commented Feb 3, 2021

You can try using .deps/tyv.RL.hfst if it exists.

@ryskina
Copy link
Author

ryskina commented Feb 3, 2021

Sorry, do you mean using it without the hfst-invert?

@jonorthwash
Copy link
Member

jonorthwash commented Feb 3, 2021

I think @ftyers means to use the RL file instead of the LR file with hfst-invert, yes. The RL file has all the analyses removed that we would normally want to analyse and not generate, and constitutes a stage in the creation of the generator.

jonorthwash added a commit that referenced this issue Feb 3, 2021
additional forms found from corpus
@ryskina
Copy link
Author

ryskina commented Feb 4, 2021

Thank you! Sorry, I misread @ftyers's comment. The forms are correct now, and changing from LR to RL was helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants