Skip to content
This repository has been archived by the owner on Oct 3, 2019. It is now read-only.

Convert Unicode strings to ASCII-only strings that can be converted back to the original strings while still trying to be human-readable.

License

schierlm/mnemonifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mnemonifier

Convert Unicode strings to ASCII-only strings that can be converted back to the original strings while still trying to be human-readable.

Description

Mnemonifier provides methods for converting Unicode strings (containing any Unicode characters including those not in the Basic Multilingual Plane) to ASCII-only strings that can be converted back to the original strings while still trying to be human-readable.

This is achieved by using RFC1345 mnemonics ({@code à} becomes [a!]) and decomposition mappings (Ǹ becomes [N|!]). Anything not included in either of these two lists is represented as hex code ( becomes [#20AC]).

To enable rounddtrip conversion, any square brackets inside the original string are replaced by [[] and []], respectively.

The decoder is available in two versions: The strict version will throw an exception if any square bracket is not encoded correctly, the lax version (for cases where the user is able to edit/type encoded strings) will pass these unchanged.

Subclass implementations can override #getCodepointInfo to provide additional information about an unencodable character, which is added in curly braces (for example [#20AC{EUR}]).

Unidecode integration

mnemonifier-unidecode uses Unidecode to provide codepoint info for codepoints that are not covered by RFC1345 or by decomposition mapping.

About

Convert Unicode strings to ASCII-only strings that can be converted back to the original strings while still trying to be human-readable.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages