-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utils: <math> formulas rendered to SVGs without using LaTeX tools #1432
Conversation
Sourcery Code Quality Report❌ Merging this PR will decrease code quality in the affected files by 1.86%.
Here are some functions in these files that still need a tune-up:
Legend and ExplanationThe emojis denote the absolute quality of the code:
The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request. Please see our documentation here for details on how these metrics are calculated. We are actively working on this report - lots more documentation and extra metrics to come! Help us improve this quality report! |
Question 1: Should we use a LRU cache on Question 2: Should we store optimized SVGs for later runs? So we could run tests without calling the REST API. |
lru_cache: My understanding is that it will cache the results of a function when the function is called several times with the same args during a program run. It will not persists between run. I believe we don't call this function often with the same argument. Right ? So I would say lru_cache will not really speed up the process, but hey, we could try it. store svgs: if it doesn't make the test moot, sure, we would store them. |
wikidict/utils.py
Outdated
|
||
svg_optimized = scourString(svg_raw, options=SCOUR_OPTS) | ||
return ( | ||
f'<img src="data:image/svg+xml;utf8,{quote(svg_optimized)}"/>' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You tried to embed the SVG directly ? without img tag ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, good idea 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "cercle unité" contains raw text that scales badly. Will check the SVG content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked english words, the display is just perfect using <svg>
. Let's hope we can fix the display issue for "cercle unité" (and potentially other words, but we made a good progress still).
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm interesting ... Saving the SVG to a file: it is displayed properly via the OS viewer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it's scoup that is doing something with IDs, and brwosers don't like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about duplicate formulas, but it will be near to zero at the end, mostly because the cache won't be shared accross locales. Let's forget that idea.
It would make sense to store them: 2 HTTP calls saved for each formula. I'll implement it right now. |
Root cause was shortening IDs via Scour.
@lasconic ready for final review. Only one tiny concern about the potential size of |
Let's see if other formulas are cut off too. I would say it's quite correct now (far better than what we used to have). |
Do you want to dig into scaling down the SVGs, maybe? |
I updated the cache with all formulas used in french. I'll update with all locales shortly. |
cb61ade fixed the issue where we altered |
I'm finished with the cache. Maybe before merging we could find a way to compress
Maybe not a good idea. The file is only 2 MiN, we can live with that. |
Ah, about the cut off, I didn't a big part of the picture was hidden. That's a shame. |
instead of md5, maybe SHA1 ? It seems wikimedia does this : https://github.com/wikimedia/restbase/blob/ecef17bda6f4efc0d6e187fb05b1eeb389bf7120/sys/post_data.js#L13 SVGZ is gzip no ? It doesn't perform well in term of speed ? For the cut off, it's a pity but the real estate on kobo is so small... and we can't change the margins... We could change the font size but it would need style for every word. |
Weird, I can't replicate the hash. formula = "V^n"
print(sha1(formula.encode()).hexdigest()) It gives
I didn't try yet, but the issue is not about perfs but storage on our side. Maybe not even a issue, I just raised the concern.
What about we live with it? :) |
They do the hash on a normalized json string it includes the type and the formula https://github.com/wikimedia/restbase/blob/ecef17bda6f4efc0d6e187fb05b1eeb389bf7120/sys/mathoid.js#L33 |
No luck so far. I would just let the cache as-is until it becomes problematic. |
I finally could replicate the hash. We need to use the normalized TeX, it's available in the 1st call (formula is {
"success": true,
"checked": "V^{n}",
"requiredPackages": [],
"identifiers": ["V", "n"],
"endsWithDot": false,
} (See the Then, we need to use the full query, as you found out, but also without spaces between items: d = json.dumps({"q": "V^{n}", "type": "tex"}, indent=None)
d = d.replace(" ", "")
print(sha1(d.encode()).hexdigest()) I know, the POC is ugly, and will not work for all cases :) |
Well, we "just" need to fix the display issue, then we will be good to merge. |
Which display issue ? If it's the cut out formula, then I don't think there is a fix... The screen of the kobo is just too small for some formula. If we make it smaller, it will become not readable. |
I was hoping to have such CSS hack: I will still try one thing or two. In the eventuallity it's not possible to fix, do you propose to close the PR, and forget SVG stuff? |
Formulas are not so present in dictionaries. If some are cut, I'm OK with that. |
Fixes #1427.
Fixes #1198.
Closes #1209.
Tests to pass before merging (the rendering is good, but not the display):
$ python -m wikidict fr --gen-dict "cercle unité" --output issue-1427
$ python -m wikidict en --gen-dict "Wallis product,primitive recursion,Horner's rule" --output issue-1427