Fix fuzzer-found issues #126

LaurenzV · 2024-07-09T18:45:27Z

Still in-progress, more info to follow

LaurenzV · 2024-07-10T08:15:50Z

@RazrFalcon Do you know if there is a particular reason we don't use the unicode_normalization crate for the composing/decomposing characters? Or has just no one bothered to switch to it?

LaurenzV · 2024-07-10T08:28:03Z

Looks like you consciously removed it: f0e5a766

However, there is a new crate from the icu4x folks, anything speaking against using it directly? The reason I'm asking is that there is something wrong with our current table. 😅 I presume this could be fixed by improving the generation, but I don't see why we should do that if someone else already did it. It does depend on tinyvec, but no dependencies otherwise.

RazrFalcon · 2024-07-10T08:33:13Z

As you can guess, I do not remember. It was a long time ago. But I do remember that we had some issues with external crates. Either they weren't low-level enough or were producing different output to HB.

If you can replace embedded Unicode tables - I'm all for it.

In general, a rule of thumb when it comes to RB: if something is strange then it's because we had to match HB output.

RazrFalcon · 2024-07-10T08:34:48Z

Also remember that HB/RB has its own unicode normalization algorithm. We cannot use a third-party crate for that.

LaurenzV · 2024-07-10T08:36:50Z

Also remember that HB/RB has its own unicode normalization algorithm. We cannot use a third-party crate for that.

Yep, that I know. But perhaps I know the reason why now, it seems like harfbuzz always decomposes a character into 2 units, while the unicode_normalization crate always decomposes as low as possible which could be more than 2... So I'll have to see if I can figure it out.

LaurenzV · 2024-07-10T10:15:49Z

@behdad Is it expected that HB_NO_OT_RULESETS_FAST_PATH changes the shaping result? With the following font when running

hb-shape NotoSerifGujarati-VariableFont_wght.ttf --no-glyph-names --unicodes U+0ABE,U+0AA8,U+0ACD,U+200D,U+0AA4,U+0ABF
I get
[414=0+596|60=0+251|61=1+251|186=1+293|3=1+0|38=1+543]

while if I enable HB_NO_OT_RULESETS_FAST_PATH I get
[414=0+596|60=0+251|102=1+251|186=1+293|3=1+0|38=1+543]

RazrFalcon · 2024-07-10T15:01:44Z

it seems like harfbuzz always decomposes a character into 2 units

Yes, this rings a bell.

LaurenzV added 2 commits July 9, 2024 20:52

Fix incorrect assignment of glyph props

c201d0d

Update unicode_norm table

7997bbe

LaurenzV force-pushed the fuzzer-fixes branch from 0897d80 to cd799f7 Compare July 9, 2024 18:52

Remove exclusions to unicode norm table

3f602c6

LaurenzV force-pushed the fuzzer-fixes branch from cd799f7 to 3f602c6 Compare July 10, 2024 09:26

LaurenzV added 3 commits July 10, 2024 19:53

Fix bug in anchor matrix handling

73af5c9

Regenerate machines

a38d39f

Fix wrong category override in gen-indic-table.py

fda9bcc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fuzzer-found issues #126

Fix fuzzer-found issues #126

LaurenzV commented Jul 9, 2024

LaurenzV commented Jul 10, 2024

LaurenzV commented Jul 10, 2024

RazrFalcon commented Jul 10, 2024

RazrFalcon commented Jul 10, 2024

LaurenzV commented Jul 10, 2024

LaurenzV commented Jul 10, 2024

RazrFalcon commented Jul 10, 2024

Fix fuzzer-found issues #126

Are you sure you want to change the base?

Fix fuzzer-found issues #126

Conversation

LaurenzV commented Jul 9, 2024

LaurenzV commented Jul 10, 2024

LaurenzV commented Jul 10, 2024

RazrFalcon commented Jul 10, 2024

RazrFalcon commented Jul 10, 2024

LaurenzV commented Jul 10, 2024

LaurenzV commented Jul 10, 2024

RazrFalcon commented Jul 10, 2024