Greedy prefix method sometimes fails for complex words #19

ProfessorO · 2016-08-28T02:18:02Z

The word "unuenaskitoj" (firstborns) should parse as unu-e-nask-it-o-j, but because we grab the longest root that matches the front of the word, we instead get unu-en-as-ki-{toj} (where "toj" is an unknown root, which luckily doesn't mean anything in Esperanto, AFAIK). It looks like we're going to have to go with something like "if the greedy prefix method leaves some un-parseable sections, try something else (say, greedy suffix, or iterating through all possible parsings, which would be far slower)."

ProfessorO · 2016-08-28T02:20:22Z

Genesis 4:3, BTW. :)

ProfessorO · 2016-08-28T03:08:17Z

This will connect with the parsing of kia/kiam, kio/kiom, tia/tiam, tio/tiom. Right now the shorter root is in the database, and I can add "m" as a separate root, but it doesn't actually have its own meaning. However, with tiam in the list, tiamaniere (in Genesis 6:15), which should parse to tia-manier-e (in such a manner), instead parses as tiam-a-ni-er-e, which isn't even sort of right (it would mean at-that-time + adjective + we + part-of-the-whole + adverb).

I'm beginning to think the greedy algorithm is correct, but needs to select the biggest root that fits ANYWHERE in the word first (and then recursively parse the remaining pieces as two separate words, if that piece is in the middle of the word). Or perhaps try all possible parsings and pick the one with fewest roots. Or something like that. :/

ProfessorO · 2016-08-29T18:59:38Z

Another interesting example: aliris in Genesis 19:9. It should parse as al-ir-is, but since ali is a root (meaning other), it instead parses as ali-{ris} (where ris doesn't parse). This one would be fixed not by being greedy (ali is the longest root that fits in the word), but by backtracking until you find at least one collection of roots that allows parsing to complete.

… first (LPF?). Fixes #19, but doesn't address #21 or #22, and actually creates #23.

ProfessorO added a commit that referenced this issue Sep 1, 2016

Extract longest root anywhere first (LRAF?) instead of longest prefix…

98bf5c3

… first (LPF?). Fixes #19, but doesn't address #21 or #22, and actually creates #23.

ProfessorO mentioned this issue Sep 1, 2016

Extract longest root anywhere first (LRAF?) instead of longest prefix first. #24

Merged

ProfessorO closed this as completed in #24 Sep 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greedy prefix method sometimes fails for complex words #19

Greedy prefix method sometimes fails for complex words #19

ProfessorO commented Aug 28, 2016 •

edited

Loading

ProfessorO commented Aug 28, 2016

ProfessorO commented Aug 28, 2016

ProfessorO commented Aug 29, 2016

Greedy prefix method sometimes fails for complex words #19

Greedy prefix method sometimes fails for complex words #19

Comments

ProfessorO commented Aug 28, 2016 • edited Loading

ProfessorO commented Aug 28, 2016

ProfessorO commented Aug 28, 2016

ProfessorO commented Aug 29, 2016

ProfessorO commented Aug 28, 2016 •

edited

Loading