sanskrit normalization #20

eroux · 2019-01-20T08:04:25Z

For indexing purposes, it might be relevant to do some easy normalization of Sanskrit, mostly having r+geminate be normalized to r+simple consonnant. There are tons of examples in canonical collections, for instance:

རྨྨ --> རྨ
རྦྦ -> རྦ
རྒྒ -> རྒ
etc.

The text was updated successfully, but these errors were encountered:

eroux · 2019-06-26T09:47:49Z

There could also be the graphic variants:

0FB0 --> 0F71
0FBB --> 0FB1
0FBC --> 0FB2
0FBA --> 0FAD
0F6A --> 0F62

eroux · 2019-07-14T19:29:52Z

as well as either ignoring or normalizing:

0f7e
0f82
0f83
0f86

eroux · 2020-12-07T14:43:11Z

perhaps some sandhis like ny+ts -> ny+dz...

and also normalizing ts -> c, tsh -> ch, dz -> j? require a bit of wit as this shouldn't be done for Standard Tibetan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sanskrit normalization #20

sanskrit normalization #20

eroux commented Jan 20, 2019

eroux commented Jun 26, 2019 •

edited

Loading

eroux commented Jul 14, 2019

eroux commented Dec 7, 2020

sanskrit normalization #20

sanskrit normalization #20

Comments

eroux commented Jan 20, 2019

eroux commented Jun 26, 2019 • edited Loading

eroux commented Jul 14, 2019

eroux commented Dec 7, 2020

eroux commented Jun 26, 2019 •

edited

Loading