-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dictionary lookup / translate all readings #105
Comments
It looks like the backend has already been developed a bit within APy: https://github.com/Kira-D/apertium-html-tools/tree/dictionaryLookup. |
@jonorthwash I prepared this docs, but unfortunately there is no information about preprocessing (I used the old format for the task) http://wiki.apertium.org/wiki/GSOC%2716_Kira%27s_results._Apertium_website_improvements:_Docs_diff |
@Kira-D, this is helpful, thanks! Could you clarify what you mean by "the old format"? |
@jonorthwash, I'm sorry, I mistakenly thought that it is not a new feature. |
I am looking into this as part of this GCI Task. |
That means one of the following:
I can't be certain which it is. I suggest running your own instance of APy. Running html-tools via Docker does exactly that. |
Hey im here for cdi, not sure what to do, Ive cloned the master branch |
Hi @sushain97 Please tell me where to start this enhancement? |
By dictionary lookup, do you guys mean:
The former approach inherently is lexical-form-to-lexical-form, the latter is surface-to-surface. The latter is much more interesting feature in my humble opinion. I did that for apertium-kaz (and kaz-tat, kaz-rus, kaz-eng) with an ad-hoc script at [1]. It is a local solution relying on modes. The problem with scaling it up to all pairs is that:
A language agnostic solution would have to parse modes.xml and make sense of it. [1] https://github.com/apertium/apertium-kaz/blob/master/tests/vocabulary/expander.rkt |
A related note or rant about modes.xml is that even with gendebug option set to "yes" (a wonderful feature otherwise, kudos @unhammer), all pipelines defined in generated modes begin with morphological analysis, and tmk there is no way of referring to parts of the pipeline saying something like "kaz-tat from interchunk to postgen" etc without copypasting the mode file itself (and thus breaking a CS 101) and introducing yet another partial mode. |
That shouldn't be too hard to do if you have access to an xml parser, assuming translators use the regular names stages. |
I do this quite a lot in various scripts, though I typically include the other steps as well in case they do necessary things (cg-proc, pretransfer). So if -morph gives One could also give the output of bilingual transducer, e.g. I from |
is this issue still open? |
Yes, it's still an unsolved task. |
Assorted thoughts:
The text was updated successfully, but these errors were encountered: