Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kobo dictionary support (requires issue 5 to be done) #6

Open
jzohrab opened this issue Jun 21, 2023 · 1 comment
Open

Add Kobo dictionary support (requires issue 5 to be done) #6

jzohrab opened this issue Jun 21, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@jzohrab
Copy link
Collaborator

jzohrab commented Jun 21, 2023

This is a good idea, simple offline-style dict.

@jzohrab
Copy link
Collaborator Author

jzohrab commented Jun 25, 2023

See #5 for initial notes.

The kobo dictionaries at https://www.epubor.com/kobo-dictionary-download-and-install.html are good starts, but you need to change the http links to https.

When the dict is downloaded, if you decompress the zip, it contains a bunch of files, e.g. co.html, but these are in fact compressed data. You can decompress them, eg

cp ca.html hack_co.html
gzip  -S .html -d hack_co.html
mv hack_co hack_co.data

and this results in a file called hack_co.data with data like the following:

<w><p><a name="correr"/><b>correr</b> [koˈreɾ]<br/><br/>
<p>Del latín <i>currere</i></p><br/><ol><li>Desplazarse rápidamente ....</li>
...
<variant name="corra"/>
<variant name="corre"/>
...

So, these files could be pre-processed to have all (??) variants of a word, and the word itself, being an initial index into the data files, and a Lute-Kobo lookup could look like this: Given input word fui, pre-processed file initial_index_fu.data contains something like fui: ir (fui being one of the variants of ir, we hope!), and then the actual lookup is done using ir to get the definition.

I don't know how this would/should work for ambiguous mappings. Perhaps something like gato: gato; gatar (if there is a word like gatar).

if nothing is found, just return 'not found'.

The pre-processing could be done outside of Lute, or as a heavyweight initial load. Outside is better, I think: less crap to go wrong in the app, separate concern.

@jzohrab jzohrab transferred this issue from jzohrab/lute Nov 11, 2023
@jzohrab jzohrab changed the title Add Kobo dictionary support (requires https://github.com/jzohrab/lute/issues/39 to be done) Add Kobo dictionary support (requires issue 5 to be done) Nov 15, 2023
@jzohrab jzohrab added the enhancement New feature or request label Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

1 participant