Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM in sme-smj, loops through same rules over and over again (not sure if it ever ends) #97

Open
unhammer opened this issue Jan 2, 2024 · 3 comments

Comments

@unhammer
Copy link
Member

unhammer commented Jan 2, 2024

$ echo '– Lea stáhtahálddašeaddji rolla sihkkarastit ahte boazodoallit,
báikkálaš ja regionálalaš eiseválddit gulahallet ja lea maid
stáhtahálddašeaddji bargu oahpahit aktevrraide boazodoalo
areáladárbbu. Departemeanta lea 2021 vuosttaš jahkebeale gárveme
sierra bagadallama boazodoalu ja plána- ja huksenlobi birra mii galgá
nannet boazodolliid plána- ja huksenlobi gelbbolašvuođa ja mii galgá
nannet fylkkagielddaid ja gielddaid gelbbolašvuođa boazodoalus ja
boazodoallovuoigatvuođain, lohká Skogan.'  | apertium -d . sme-smj_rtx

hangs.

or with input-to-rtx.txt since giella-smj doesn't have updated packages to build with:

$ cat input-to-rtx.txt | rtx-proc --anaphora sme-smj.rtx.bin
^–<punct>$ ^Liehket<vblex><indic><pres><p3><sg>$ ^stáhttaháldadiddje<n><nomag><sg><gen>$ ^roalla<n><sg><nom>$ ^sihkarasstet<vblex><inf>$ ^jut<cnjsub>$ ^ælloniehkke<n><pl><nom>$^,<cm>$
^bájkálasj<adj><attr>$ ^ja<cnjcoo>$ ^regiåvnålasj<adj><attr>$ ^oajválasj<n><pl><nom>$ ^guládallat<vblex><indic><pres><p3><pl>$ ^ja<cnjcoo>$ ^liehket<vblex><indic><pres><p3><sg>$ ^stáhttaháldadiddje<n><nomag><sg><nom>$
^aj<adv>$ ^barggo<n><sg><nom>$ ^åhpadit<vblex><supn>$ ^akterra<n><pl><ill>$ ^ællosujtto<n><sg><gen>$
^areálla<n><cmp_sgnom><cmp>+dárbbo<n><sg><acc>$

and then it hangs.

With --rules we see it go through the same rules over and over again.
(Could some sort of per-sentence memoisation / dynamic programming be useful?)

@unhammer unhammer changed the title OOM in sme-smj OOM in sme-smj, loops through same rules over and over again (not sure if it ever ends) Jan 2, 2024
@mr-martian
Copy link
Collaborator

I had previously concluded that caching was impossible because of shared state (global variables, destructive updates, etc), but now I think it might be possible for the compiler to flag which rules access or update that state and then at runtime everything else can be cached once the input reaches some threshold.

In the shorter term, does adding -F help at all? (Also I just realized that the long versions of -f and -F are identical, so I should fix that.)

@unhammer
Copy link
Member Author

unhammer commented Jan 2, 2024

Can you give examples of shared state? I'm not sure if we're using that or not.
EDIT: I see https://wiki.apertium.org/wiki/Apertium-recursive/Formalism#Global_Chunk_Variables is one such; pretty sure we're not using that at least. What's a destructive update?

But -F does help! Now that long sentence translates in half a second. I haven't checked tests yet for what effect it has though :)

@mr-martian
Copy link
Collaborator

chunk variables, string variables, node insertion, and <let> (which only applies if you're writing rules in XML).

Though I think it's also entirely possible that the bytecode interpreter is not the bottleneck and our actual problem is allocating thousands of nodes to store the different paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants