Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fuzzer-found issues #126

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Conversation

LaurenzV
Copy link
Collaborator

@LaurenzV LaurenzV commented Jul 9, 2024

Still in-progress, more info to follow

@LaurenzV
Copy link
Collaborator Author

@RazrFalcon Do you know if there is a particular reason we don't use the unicode_normalization crate for the composing/decomposing characters? Or has just no one bothered to switch to it?

@LaurenzV
Copy link
Collaborator Author

Looks like you consciously removed it: f0e5a766

However, there is a new crate from the icu4x folks, anything speaking against using it directly? The reason I'm asking is that there is something wrong with our current table. 😅 I presume this could be fixed by improving the generation, but I don't see why we should do that if someone else already did it. It does depend on tinyvec, but no dependencies otherwise.

@RazrFalcon
Copy link
Owner

As you can guess, I do not remember. It was a long time ago. But I do remember that we had some issues with external crates. Either they weren't low-level enough or were producing different output to HB.

If you can replace embedded Unicode tables - I'm all for it.

In general, a rule of thumb when it comes to RB: if something is strange then it's because we had to match HB output.

@RazrFalcon
Copy link
Owner

Also remember that HB/RB has its own unicode normalization algorithm. We cannot use a third-party crate for that.

@LaurenzV
Copy link
Collaborator Author

Also remember that HB/RB has its own unicode normalization algorithm. We cannot use a third-party crate for that.

Yep, that I know. But perhaps I know the reason why now, it seems like harfbuzz always decomposes a character into 2 units, while the unicode_normalization crate always decomposes as low as possible which could be more than 2... So I'll have to see if I can figure it out.

@LaurenzV
Copy link
Collaborator Author

@behdad Is it expected that HB_NO_OT_RULESETS_FAST_PATH changes the shaping result? With the following font when running

hb-shape NotoSerifGujarati-VariableFont_wght.ttf --no-glyph-names --unicodes U+0ABE,U+0AA8,U+0ACD,U+200D,U+0AA4,U+0ABF
I get
[414=0+596|60=0+251|61=1+251|186=1+293|3=1+0|38=1+543]

while if I enable HB_NO_OT_RULESETS_FAST_PATH I get
[414=0+596|60=0+251|102=1+251|186=1+293|3=1+0|38=1+543]

@RazrFalcon
Copy link
Owner

it seems like harfbuzz always decomposes a character into 2 units

Yes, this rings a bell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants