Unicode normalisation #45

ngawangtrinley · 2019-06-17T23:23:48Z

Add a Unicode normalization method for bad/ambiguous unicode:
https://github.com/Esukhia/derge-tengyur/blob/c45da1faabef28ff0b037557499bf07946e5c3ab/scripts/error-report.py#L44

eroux · 2019-06-18T05:23:05Z

Unfortunately It's not automatable, every case is different.

To take an example sometimes there are two drengbu instead of a double drengbu, sometimes instead of one drengbu... same for unattached diacritics, sometimes they should be attached to what's after, sometimes to what's before. I think I've fixed everything that could be fixed fully automatically, except punctuation maybe... unless you had something specific in mind?

drupchen · 2019-07-02T13:43:57Z

as a start: https://github.com/Esukhia/adarsha2esukhia/blob/master/adarsha2esukhia.py#L90

eroux · 2019-07-03T09:03:54Z

other idea: 0F6A to 0F62 when no subscribed or the subscribed that don't change the shape of the rago (like nya)

drupchen · 2019-11-01T07:25:41Z

closed here because this issue pertains to pybo

OpenPecha/pybo#1

ngawangtrinley assigned drupchen Jun 17, 2019

drupchen mentioned this issue Nov 1, 2019

Unicode normalization OpenPecha/pybo#1

Open

drupchen closed this as completed Nov 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode normalisation #45

Unicode normalisation #45

ngawangtrinley commented Jun 17, 2019

eroux commented Jun 18, 2019

drupchen commented Jul 2, 2019

eroux commented Jul 3, 2019

drupchen commented Nov 1, 2019 •

edited

Loading

Unicode normalisation #45

Unicode normalisation #45

Comments

ngawangtrinley commented Jun 17, 2019

eroux commented Jun 18, 2019

drupchen commented Jul 2, 2019

eroux commented Jul 3, 2019

drupchen commented Nov 1, 2019 • edited Loading

drupchen commented Nov 1, 2019 •

edited

Loading