Skip to content
Fast Unicode normalization in Haskell
Haskell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Data Update to Unicode 12.1.0. May 23, 2019
benchmark
test
unicode-data Update instructions of unicode-data/README.md May 23, 2019
.gitignore gitignore: ucd.all.flat.{xml,pdb} May 23, 2019
.travis.yml
Changelog.md bump version to 0.3.6 Jun 14, 2019
LICENSE
NOTES.md Add unicode and normalization resources Nov 20, 2016
README.md
Setup.hs
appveyor.yml
stack-7.10.yaml update stack.yaml for GHC-8.0 Nov 16, 2018
stack-8.0.yaml update stack.yaml for GHC-8.0 Nov 16, 2018
stack.yaml
unicode-transforms.cabal bump version to 0.3.6 Jun 14, 2019

README.md

Unicode Transforms

Hackage Build Status Windows Build status Coverage Status

Fast Unicode 12.1.0 normalization in Haskell (NFC, NFKC, NFD, NFKD).

What is normalization?

Unicode characters with adornments (e.g. Á) can be represented in two different forms, as a single composed character (U+00C1 = Á) or as multiple decomposed characters (U+0041(A) U+0301( ́ ) = Á). They are differently encoded byte sequences but for humans they have exactly the same visual appearance.

A regular byte comparison may tell that two strings are different even though they might be equivalent. We need to convert both the strings in a normalized form using the Unicode Character Database before we can compare them for equivalence. For example:

>> import Data.Text.Normalize
>> normalize NFC "\193" == normalize NFC "\65\769"
True

Contributing

Please use https://github.com/harendra-kumar/unicode-transforms to raise issues, or send pull requests.

You can’t perform that action at this time.