Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary lookup / translate all readings #105

Open
sushain97 opened this issue Mar 12, 2017 · 15 comments
Open

Dictionary lookup / translate all readings #105

sushain97 opened this issue Mar 12, 2017 · 15 comments

Comments

@sushain97
Copy link
Member

Assorted thoughts:

  • A nice front end needs to be developed for this, with nice probability bars if there is probability data available from the back end
  • A back end also needs to be developed
    • this back end needs to be able to make use of some format of data (probably a database table), which could include word pairs, and optionally also probabilities.
  • Should give synonyms and alternative translations.
  • It should rank the translations by likelihood.
@sushain97
Copy link
Member Author

sushain97 commented Mar 12, 2017

It looks like the backend has already been developed a bit within APy: https://github.com/Kira-D/apertium-html-tools/tree/dictionaryLookup.

@jonorthwash
Copy link
Member

@Kira-D (and @ftyers): how did the backend for dictionary lookup work? I think I remember that the data had to be preprocessed (dumping a .dix file into a sqlite db?). Is there any documentation for this, or anything you remember that might help?

@Kira-D
Copy link
Contributor

Kira-D commented Jun 4, 2017

@jonorthwash I prepared this docs, but unfortunately there is no information about preprocessing (I used the old format for the task) http://wiki.apertium.org/wiki/GSOC%2716_Kira%27s_results._Apertium_website_improvements:_Docs_diff

@jonorthwash
Copy link
Member

@Kira-D, this is helpful, thanks! Could you clarify what you mean by "the old format"?

@Kira-D
Copy link
Contributor

Kira-D commented Jun 4, 2017

@jonorthwash, I'm sorry, I mistakenly thought that it is not a new feature.
But it is a new feature.
I also found a separate brunch for the feature https://github.com/Kira-D/apertium-apy/commits/dictionaryLookup
You can see diffs here. Hope it'll be helpful.

@Androbin
Copy link
Contributor

Androbin commented Dec 7, 2017

I am looking into this as part of this GCI Task.
But for some reason www.apertium.org/apy/dictionaryLookup gives me a 404 response code no matter what.

@sushain97
Copy link
Member Author

But for some reason www.apertium.org/apy/dictionaryLookup gives me a 404 response code no matter what.

That means one of the following:

  1. /dictionaryLookup is disabled on production APy
  2. dictionary lookup isn't actually merged in on the apertium-apy repo
  3. dictionary lookup is broken
  4. you're not calling it correctly :)

I can't be certain which it is. I suggest running your own instance of APy. Running html-tools via Docker does exactly that.

@Stevenjin8
Copy link

Hey im here for cdi, not sure what to do, Ive cloned the master branch

@aditya-prayaga
Copy link
Contributor

Hi @sushain97 Please tell me where to start this enhancement?

@IlnarSelimcan
Copy link
Member

IlnarSelimcan commented Mar 28, 2019

By dictionary lookup, do you guys mean:

  • passing a lexical form through a bilingual dictionary and returning all possible translations, or
  • passing a surface form through a morphological analyser, taking all possible analyses, passing them through a bilingual transducer, taking all possible translations, passing them through transfer, and generating surface forms?

The former approach inherently is lexical-form-to-lexical-form, the latter is surface-to-surface. The latter is much more interesting feature in my humble opinion.

I did that for apertium-kaz (and kaz-tat, kaz-rus, kaz-eng) with an ad-hoc script at [1]. It is a local solution relying on modes.

The problem with scaling it up to all pairs is that:

  • each translator can have different number of transfer stages
  • transfer stage mode files usually aren't installed

A language agnostic solution would have to parse modes.xml and make sense of it.

[1] https://github.com/apertium/apertium-kaz/blob/master/tests/vocabulary/expander.rkt

@IlnarSelimcan
Copy link
Member

IlnarSelimcan commented Mar 28, 2019

A related note or rant about modes.xml is that even with gendebug option set to "yes" (a wonderful feature otherwise, kudos @unhammer), all pipelines defined in generated modes begin with morphological analysis, and tmk there is no way of referring to parts of the pipeline saying something like "kaz-tat from interchunk to postgen" etc without copypasting the mode file itself (and thus breaking a CS 101) and introducing yet another partial mode.

@unhammer
Copy link
Member

That shouldn't be too hard to do if you have access to an xml parser, assuming translators use the regular names stages.

@unhammer
Copy link
Member

unhammer commented May 9, 2021

passing a surface form through a morphological analyser, taking all possible analyses, passing them through a bilingual transducer, taking all possible translations, passing them through transfer, and generating surface forms?

I do this quite a lot in various scripts, though I typically include the other steps as well in case they do necessary things (cg-proc, pretransfer). So if -morph gives ^åt/åt<pr>/ete<vblex><pret>/åt<n><nt><sg><ind>$ I transform that to ^åt/åt<pr>$ ^åt/ete<vblex><pret>$ ^åt/åt<n><nt><sg><ind>$ and pass it through the steps following morph up until lt-proc -b, and and then I do the same split there.

One could also give the output of bilingual transducer, e.g. I from åt into nno→dan it might be useful to see spise, verb past-tense; til preposition.

@chetana0070
Copy link

is this issue still open?

@TinoDidriksen
Copy link
Member

Yes, it's still an unsolved task.

@unhammer unhammer changed the title Dictionary lookup Dictionary lookup / translate all readings Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests