Encode and decode text using UTF-9.
On April 1st 2005, IEEE released the RFC4042 “UTF-9 and UTF-18 Efficient Transformation Formats of Unicode” :
The current representation formats for Unicode (UTF-7, UTF-8, UTF-16) are not storage and computation efficient on platforms that utilize the 9 bit nonet as a natural storage unit instead of the 8 bit octet.
Since there are not so many architecture that use 9 bit nonets as natural storage units and the release date was on April Fools’ Day, the beautiful UTF-9 was forgotten and no python implementation is available.
This python module is here to fill this gap! ;)
There are only two functions:
utf9encode(string)
: takes a string and returns a utf9-encoded version.utf9decode(data)
: takes utf9-encoded data and returns the corresponding string.
>>> import utf9 >>> encoded = utf9.utf9encode(u'ႹЄLᒪo, 🌍ǃ') >>> print repr(encoded) 'pxe0xb7-x0c!1xc3x92xd5x1bxc5x82x07nx83xxedxdecXxf80' >>> print utf9.utf9decode(encoded) ႹЄLᒪo, 🌍ǃ