Java wrapper for the Compact Language Detector 2 library

The Compact Language Detector 2 is a native library written in C++ to detect the language of plain-text or HTML documents. Originally written for the Chromium web browser, the library is able to identify 80+ language (or 160+ in the full version). The classification uses identifies a language either by script (e.g., Greek), or uses a Naïve Bayesian classifier operating with 4-letter n-grams ("quadgrams") or (for CJK languages) single-letter "unigrams". The classifier also accepts external hints, e.g., the top-level domain of a web page or the language code sent in the HTTP header.


Native Library

First, the library (or a .dll on Windows) needs to be installed.

  • on Debian-based systems the easiest way is to install the package libcld-0:
apt-get install libcld2-0 libcld2-dev
  • to compile the CLD2 library from source:
git clone
cd cld/internal/
export CFLAGS="-Wno-narrowing -O3"

If you only want the libraries, ./ is sufficient. You may use different compiler flags, the flag -Wno-narrowing is required for compilers which follow the C++11 standard.

Both the Debian package and the source build provide two native libraries: and The former supports 80+, the latter 160+ languages. However, the from the Debian package isn't a complete shared library - it only contains the tables used by the classifier. To use the larger tables for 160+ language instead of those for 80+ languages, you must use the LD_PRELOAD trick and set

Java Bindings

This project is build and installed using Maven

mvn install

and can then be used as dependency


To link the Java code with the native libraries, you need to make sure that Java can find the share object:

  • either install the native library on a standard library path (already done when the Debian package is used)
  • add the directory where your installed to the environment variable LD_LIBRARY_PATH
  • use the Java option -Djava.library.path=...

Java Native Access (JNA) and libffi

The CLD2 native functions are accessed via the Java Native Access (JNA) which uses the Foreign Function Interface Library (libffi). JNA is a project dependency but the libffi needs to be present on your system. If not install it, e.g.

apt-get install libffi6


This package has derived from (package com.deezer.research.cld2), see the original README.

Further inspirations are taken from CAFDataProcessing/worker-languagedetection, but this project depends on a modified version of CLD2 distributed only as a binary.


  • extended interface
  • support to pass as arguments Java objects of the classes Locale and Charset
  • proper ISO-639-3 language codes for all 160 languages


These bindings are Apache 2.0 licensed. Also CLD2, weslang and all dependencies use the Apache 2.0 license.