Skip to content
a clean C library for processing UTF-8 Unicode data
Branch: master
Clone or download
past-due and stevengj [CMake] Add UTF8PROC_NO_INSTALL option (#152)
* [CMake] Add UTF8PROC_NO_INSTALL option

* change to UTF8PROC_INSTALL
Latest commit 4167498 Apr 17, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
bench Pass users' flags (#141) Nov 1, 2018
data add utf8proc_unicode_version (#151) Mar 30, 2019
test add utf8proc_unicode_version (#151) Mar 30, 2019
.gitignore added test for #128 Apr 27, 2018
.travis.yml fontforge is no longer needed Apr 9, 2019
CMakeLists.txt [CMake] Add UTF8PROC_NO_INSTALL option (#152) Apr 17, 2019
Doxyfile Fix #26: use doxygen for generating API docs Mar 22, 2015 copyright year update Apr 9, 2019
MANIFEST Generate and install a pkg-config file. (#142) Nov 1, 2018
Makefile add utf8proc_unicode_version (#151) Mar 30, 2019 NEWS for 2.3 Mar 30, 2019 update for unicode 12 (#148) Mar 30, 2019
appveyor.yml Generate and install a pkg-config file. (#142) Nov 1, 2018
utf8proc.c add utf8proc_unicode_version (#151) Mar 30, 2019
utf8proc_data.c give up on Unifont for charwidth data (#150) Mar 30, 2019


Travis CI Status AppVeyor status

utf8proc is a small, clean C library that provides Unicode normalization, case-folding, and other operations for data in the UTF-8 encoding. It was initially developed by Jan Behrens and the rest of the Public Software Group, who deserve nearly all of the credit for this package. With the blessing of the Public Software Group, the Julia developers have taken over development of utf8proc, since the original developers have moved to other projects.

(utf8proc is used for basic Unicode support in the Julia language, and the Julia developers became involved because they wanted to add Unicode 7 support and other features.)

(The original utf8proc package also includes Ruby and PostgreSQL plug-ins. We removed those from utf8proc in order to focus exclusively on the C library for the time being, but plan to add them back in or release them as separate packages.)

The utf8proc package is licensed under the free/open-source MIT "expat" license (plus certain Unicode data governed by the similarly permissive Unicode data license); please see the included file for more detailed information.

Quick Start

For compilation of the C library run make.

General Information

The C library is found in this directory after successful compilation and is named libutf8proc.a (for the static library) and (for the dynamic library).

The Unicode version supported is 12.0.0.

For Unicode normalizations, the following options are used:

  • Normalization Form C: STABLE, COMPOSE
  • Normalization Form D: STABLE, DECOMPOSE
  • Normalization Form KC: STABLE, COMPOSE, COMPAT
  • Normalization Form KD: STABLE, DECOMPOSE, COMPAT

C Library

The documentation for the C library is found in the utf8proc.h header file. utf8proc_map is function you will most likely be using for mapping UTF-8 strings, unless you want to allocate memory yourself.

To Do

See the Github issues list.


Bug reports, feature requests, and other queries can be filed at the utf8proc issues page on Github.

See also

An independent Lua translation of this library, lua-mojibake, is also available.

You can’t perform that action at this time.