Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support <hiero> mediawiki extension #703

Closed
lasconic opened this issue Feb 10, 2021 · 26 comments · Fixed by #1061
Closed

Support <hiero> mediawiki extension #703

lasconic opened this issue Feb 10, 2021 · 26 comments · Fixed by #1061

Comments

@lasconic
Copy link
Collaborator

lasconic commented Feb 10, 2021

@BoboTiG
Copy link
Owner

BoboTiG commented Feb 12, 2021

Should we handle it instead? It seems to be pictures.

@lasconic
Copy link
Collaborator Author

What do you mean ? convert the pictures to GIF and embed like we do for math ?

@lasconic
Copy link
Collaborator Author

@BoboTiG
Copy link
Owner

BoboTiG commented Feb 12, 2021

I did not have a look at the PHP file that is handling the template. But I guess it is "only" a bunch of files referenced by a key (here "R11"). IF it is that, we could handle it and use inline GIF as we do for math and chem, yes.

@lasconic
Copy link
Collaborator Author

@BoboTiG
Copy link
Owner

BoboTiG commented Feb 13, 2021

Pictures are there.

WDYT of displaying GIF for the template?

@BoboTiG
Copy link
Owner

BoboTiG commented Feb 13, 2021

It seems more like several GIFs for "Ptah". I do not know if it is worth handling the template. Let me know your thoughts :)

@lasconic
Copy link
Collaborator Author

It's a bit more complicated than just one GIF indeed. The extension outputs an HTML table and is able to put symbols on top of each other.
To know if it's worth the pain..., I checked how many time hiero is used in the wikicode we currently render.
In french, in 13 words (on 1,555,588)...

'Sekhmet'
'Apophis'
'Aton'
'Néfertiti'
'Pharaon'
'Ptah'
'Ramsès'
'djed'
'gomme'
'khépesh'
'oasis'
'ouchebti'
'uraeus'

63 in english, on 677,008 words

'barge'
'barque'
'basalt'
'Hathor'
'Hatshepsut'
'Hatti'
'Moab'
'Ab'
'Set'
'Shemu'
'Neith'
'Nephthys'
'Akhenaten'
'Akhet'
'Sobek'
'Anubis'
'Anuket'
'Sphinx'
'Onuphrius'
'Sutekh'
'Aswan'
'Imhotep'
'Thoth'
'Peret'
'Isis'
'Djahy'
'Jerusalem'
'Tutankhamon'
'Tutankhaten'
'Tybi'
'Unas'
'adobe'
'Wadjet'
'ba'
'Wenis'
'Punt'
'alphabet'
'Ra'
'ammonia'
'Re'
'Retjenu'
'ankh'
'ebony'
'Maat'
'emerald'
'ibis'
'life, prosperity, health'
'lightland'
'lily'
'heqat'
'hieroglyph'
'hin'
'natron'
'oasis'
'plewd'
'sphinx'
'senet'
'tjaty'
'serekh'
'stibium'
'uraeus'
'ushabti'
'trona'

Could be worth it, especially if most of them are sequential and "simple"...

@lasconic
Copy link
Collaborator Author

For french, here are the code.

S42-G17*X1-I12
O29 Q3:Q3 I14
i-t:n-N5
pr:aA
Q3:X1-V28-C19
ra:Z1-ms-s-sw
R11
N29-W19-M17-M17*X1-N33:Z2
Aa1:Q3-N37:F23-F51
Aa2-X1:N25
w-S-b-t:y-A53
I12

Some are simple like R11, but most of them contains * or : ... and it's less simple and would require a table or some css...

@lasconic
Copy link
Collaborator Author

lasconic commented Feb 14, 2021

Convert the PNG in GIF and store base64 in a map. Resulting file is 655KB.

import os
from PIL import Image 
from io import BytesIO
from base64 import b64encode


files = os.listdir(".")

results = {}
for f in files:
    if f.endswith(".png"):
        code = f.split("_", 1)[1].split(".")[0]
        png = Image.open(f) 
        im = BytesIO()
        png.convert("L").save(im, format="gif", optimize=True)
        im.seek(0)
        raw = im.read()
        results[code] = f'<img src="data:image/gif;base64,{b64encode(raw).decode()}"/>'


print("hiero = {")
for t, r in sorted(results.items()):
    print(f'    "{t}": \'{r}\',')
print(f"}}  # {len(results):,}")

@lasconic
Copy link
Collaborator Author

lasconic commented Feb 14, 2021

In short, we probably need to reproduce the whole PHP scripts to have a decent support.

In particular the tokenizer, https://github.com/wikimedia/mediawiki-extensions-wikihiero/blob/366b1226891e609650b4c7f7d925b718c779517c/includes/HieroTokenizer.php
and the render function at https://github.com/wikimedia/mediawiki-extensions-wikihiero/blob/366b1226891e609650b4c7f7d925b718c779517c/includes/WikiHiero.php#L259

Also some hiero code uses phonemes and not the code used in the PNG filename. So we need a copy of https://github.com/wikimedia/mediawiki-extensions-wikihiero/blob/366b1226891e609650b4c7f7d925b718c779517c/includes/WikiHiero.php#L259

It will be hard to unit test the output, since it's only img tag with base64 and a bunch of HTML...

A bit too much for a sunday :)

@BoboTiG
Copy link
Owner

BoboTiG commented Feb 14, 2021

Clearly too much, yes :)

Thanks for the analysis and pre-work ;)

lasconic added a commit to lasconic/ebook-reader-dict that referenced this issue Feb 16, 2021
@lasconic
Copy link
Collaborator Author

@BoboTiG
Copy link
Owner

BoboTiG commented Feb 16, 2021

Nice one!

lasconic added a commit to lasconic/ebook-reader-dict that referenced this issue Jul 1, 2021
@BoboTiG
Copy link
Owner

BoboTiG commented Aug 17, 2021

I was wondering what do you think about your patch? Worth giving a try on my side?

@lasconic
Copy link
Collaborator Author

lasconic commented Aug 17, 2021

It's kind of linked with the HTML table one #1024, since table support is needed. So I would tackle HTML table first to get some info on how well it works on kobo before tackling this one.

lasconic added a commit to lasconic/ebook-reader-dict that referenced this issue Aug 22, 2021
lasconic added a commit to lasconic/ebook-reader-dict that referenced this issue Aug 23, 2021
@lasconic lasconic changed the title Ignore <hiero> mediawiki extension Support <hiero> mediawiki extension Aug 23, 2021
@lasconic
Copy link
Collaborator Author

Attached a dictionary containing the french words with hiero from #703 (comment)

dicthtml-fr.zip

@BoboTiG
Copy link
Owner

BoboTiG commented Aug 23, 2021

C'est propre !

I think the cell width should be adapted to the picture width it contain.
See https://fr.wiktionary.org/wiki/Rams%C3%A8s for example:

  • The 1st column, 2nd picture is taking the whole with and is deformed.
  • The 3rd column is too large.

But we can live as-is 👍

@BoboTiG
Copy link
Owner

BoboTiG commented Aug 23, 2021

https://fr.wiktionary.org/wiki/Sekhmet is not really well displayed too.

@lasconic
Copy link
Collaborator Author

lasconic commented Aug 23, 2021

Yes, I feel like I'm pushing the limit of the HTML renderer on the Kobo... Here is Sekhmet in Chrome (rendered bigger to be the right size on Kobo...) Somehow the styling in the Kobo browser is not the same... (do we know which renderer it is ? Probably webkit, but which version ?) Maybe it's not the browser but a default CSS applied to table... Any idea if we can see this CSS somewhere ?

Capture 2021-08-23 à 19 54 22

and Ramsès

Capture 2021-08-23 à 19 57 21

@BoboTiG
Copy link
Owner

BoboTiG commented Aug 23, 2021

I could go up to https://github.com/kobolabs/qt-everywhere-opensource-src-4.6.2/blob/master/src/3rdparty/webkit/VERSION to find the WebKit version, but the hash is not helpfull (69dd29fbeb12d076741dce70ac6bc155101ccd6f, I could not find it). Given the changelog, it is an old one from 2009-11-30. That mirror has a history until 2012 only.

And I am not sure about those information, I got the 4.6.2 version of Qt Embedded from the latest Kobo firmware (https://kbdownload1-a.akamaihd.net/firmwares/kobo7/Feb2021/kobo-update-4.26.16704.zip), so it should be right.

@lasconic
Copy link
Collaborator Author

Ok, so if they use webkit to do dictionary rendering, it's the one included in Qt 4.6.2.

I investigated the style... I believe I found the problem for Ramsès, not yet for Sekhmet

New french dictionary:
dicthtml-fr.zip

@BoboTiG
Copy link
Owner

BoboTiG commented Aug 23, 2021

About the default CSS, I cannot say it is used in the dictionary area though:

* {padding: 0; margin: 0; }
body { font: %1px %2; }
table, thead, tbody, tr, td, th { font-size: inherit; font-family: inherit; }

(still looking for more data)

lasconic added a commit to lasconic/ebook-reader-dict that referenced this issue Aug 23, 2021
@lasconic
Copy link
Collaborator Author

Interesting page for testing : https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:WikiHiero/Exemples

@BoboTiG
Copy link
Owner

BoboTiG commented Aug 24, 2021

The new version is way better 💪
The rendering is great!

@BoboTiG
Copy link
Owner

BoboTiG commented Aug 24, 2021

https://fr.wiktionary.org/wiki/Aton needs more space in column 2. Maybe it is a vertical alignment issue like for Sekhmet.
https://fr.wiktionary.org/wiki/Ptah and https://fr.wiktionary.org/wiki/gomme also.

lasconic added a commit to lasconic/ebook-reader-dict that referenced this issue Aug 24, 2021
lasconic added a commit to lasconic/ebook-reader-dict that referenced this issue Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants