Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Let's make complex Unicode stuff a piece of cake in the D programming language.

Project status aka TODO list

[DONE] Data structures for Unicode

[DONE] Codepoint set via inversion list

  Ended up as 2 data structures: more compact RleBitSet and generally faster InverisonList.

[DONE] Flexible n-level bit-trie

[NEED NO WORK] Per-encoding trie generation (UTF-8, UTF-16, UTF-32)

  Even better, via reading the whole UTF sequence in one word, see
  http://forum.dlang.org/post/jveaua$2bol$1@digitalmars.com

[DONE] *Universal trie data structure (at least integers, strings and arrays of structs)

  Though it could be extended in many ways

[DONE] Normalization

[DONE] Correct NFC normalization (UTF-8, UTF-16, UTF-32)

Unexpectedly got blocked but coming soon. (i.e. out of GSOC scope formally) 
In essence NFC/NFKC are slightly harder thne NFD/NFKD resp. And NFC is the most widely used form in the text interchange.

[DONE] Version for NFKD

[DONE] Optimized all of normalization forms.

Normalization takes into account Quick check proporty and other hacks, along the way high-speed Trie strucutres are used throught. So it should already have good baseline performance that may be tweaked in future.

[DONE] NFD

[DONE] NFKC

[IN PROGRESS] Case conversions and case-agnostic operations

[DONE] Simple casefolding comparator (sicmp)

[DONE] Full casefolding comparator (icmp)

Indeed does more work in general.

[TODO] Fixed toUpperCase, toLowerCase etc.

[DONE] User perceived Character (Graphemes)

[DONE] Grapheme cluster data-type (small-string optimized array)

[IN PROGRESS] Miscelanous

[DONE] Update isXXX functions in std.uni

[DONE] An automation script to update to the fresh version of Unicode character database.

[OUT OF SCOPE] Legacy encodings support (std.encoding?), bunch of these commonly found in modern web-browsers.

See also the current slice of documentation: http://blackwhale.github.com/phobos/uni.html#unicode

Something went wrong with that request. Please try again.