Cuis-Smalltalk-Unicode

This package extends Unicode support for Cuis Smalltalk.

Status

This is a work in-progress.

It is far from complete, and adds only a few features beyond those available in the core image. In particular, collation is not yet implemented.

Documentation is also somewhat lacking, as I'm only publishing this package now to support my Regexp package.

License

MIT License

Features

This package provides a user-level API for accessing Unicode character data and additionally supports

case folding,
categories,
character set detection, encoding, and decoding,
normalization, and
properties,

Note that normalization is already supported by the core image, so the main contribution here is in providing a more accessible API.

Additionally, as Unicode data is quite extensive, this package makes an effort to keep the in-memory representation compact. Using sparse arrays--implemented as a combination of prefix-trees and run arrays--we're able to store the 154,998 characters defined by Unicode 16 as only 40,091 distinct data points, achieving a better than 70% reduction in potential memory cost vs. a strictly naive implementation.

Contributing

Issues preferably.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Data/Unicode		Data/Unicode
Packages		Packages
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cuis-Smalltalk-Unicode

Status

License

Features

Contributing

About

Uh oh!

Languages

License

coder5506/Cuis-Smalltalk-Unicode

Folders and files

Latest commit

History

Repository files navigation

Cuis-Smalltalk-Unicode

Status

License

Features

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages