This package extends Unicode support for Cuis Smalltalk.
This is a work in-progress.
It is far from complete, and adds only a few features beyond those available in the core image. In particular, collation is not yet implemented.
Documentation is also somewhat lacking, as I'm only publishing this package now to support my Regexp package.
This package provides a user-level API for accessing Unicode character data and additionally supports
- case folding,
- categories,
- character set detection, encoding, and decoding,
- normalization, and
- properties,
Note that normalization is already supported by the core image, so the main contribution here is in providing a more accessible API.
Additionally, as Unicode data is quite extensive, this package makes an effort to keep the in-memory representation compact. Using sparse arrays--implemented as a combination of prefix-trees and run arrays--we're able to store the 154,998 characters defined by Unicode 16 as only 40,091 distinct data points, achieving a better than 70% reduction in potential memory cost vs. a strictly naive implementation.
Issues preferably.