Skip to content
Language detection library for Crystal
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.
languages Initial commit Sep 14, 2019
spec Initial commit Sep 14, 2019
src Fixed some things Oct 4, 2019
.editorconfig Initial commit Sep 14, 2019
.gitignore Initial commit Sep 14, 2019
LICENSE Initial commit Sep 14, 2019
shard.yml Initial commit Sep 14, 2019


Lang is a language detector for Crystal. It was trained using the Bayes Classifier and is extreemely accurate (test results coming soon) even for very similar languages such as Russian and Ukranian (with a large enough sample size).

Supported languages (116)

Abkhazian (abk), Afrikaans (af), Albanian (alb), English, Old (ca.450-1100) (ang), Arabic (ar), Official Aramaic (700-300 BCE) (arc), Assamese (asm), Asturian (ast), Avaric (ava), Azerbaijani (az), Bashkir (ba), Belarusian (bel), Bengali (bn), Bhojpuri (bho), Bosnian (bs), Burmese (my), Chechen (ce), Cornish (kw), Corsican (co), Crimean Tatar (crh), Kashubian (csb), Lower Sorbian (dsb), French (fr), Northern Frisian (frr), Friulian (fur), Georgian (ka), Gaelic (gd), Galician (gl), Manx (gv), Swiss German (gsw), Hawaiian (haw), Hebrew (he), Hindi (hi), Croatian (hr), Upper Sorbian (hsb), Igbo (ig), Iloko (ilo), Javanese (jv), Lojban (jbo), Japanese (ja), Kara-Kalpak (kaa), Kabyle (kab), Kalaallisut (kl), Kannada (kn), Kabardian (kbd), Central Khmer (khm), Kinyarwanda (rw), Komi (kv), Korean (ko), Karachay-Balkar (krc), Ladino (lad), Lao (lo), Lezghian (lez), Limburgan (li), Lingala (ln), Luxembourgish (lb), Lushai (lus), Macedonian (mk), Malayalam (ml), Maori (mi), Marathi (mr), Moksha (mdf), Minangkabau (min), Malagasy (mg), Maltese (mt), Mongolian (mn), Mirandese (mwl), Erzya (myv), Neapolitan (nap), Navajo (nav), Low German (nds), Nepal Bhasa (new), Norwegian Nynorsk (nn), Norwegian (no), Oriya (ori), Ossetian (os), Turkish, Ottoman (1500-1928) (ota), Pangasinan (pag), Pampanga (pam), Papiamento (pap), Pali (pi), Portuguese (pt), Romansh (roh), Russian (ru), Yakut (sah), Sicilian (scn), Scots (sco), Shan (shn), Northern Sami (se), Shona (sna), Sindhi (snd), Somali (so), Spanish (es), Sardinian (sc), Sranan Tongo (srn), Serbian (sr), Swahili (sw), Tamil (ta), Tatar (tat), Telugu (te), Tetum (tet), Thai (th), Tibetan (bo), Tok Pisin (tpi), Turkmen (tk), Turkish (tr), Udmurt (udm), Ukrainian (uk), Urdu (ur), Vietnamese (vi), Volapük (vo), Walloon (wa), Wolof (wo), Kalmyk (xal), Yiddish (yi), Yoruba (yo)


  1. Add the dependency to your shard.yml:

        github: cadmiumcr/lang
  2. Run shards install


require "cadmium_lang"

# by default language data will be loaded from the data/languages
# directory, however if you include this in a compiled binary you may
# have to set the path manually
lang =

puts lang.detect("Название страны происходит от этнонима ") # => ru
puts lang.detect("Якщо сторінка була тут створена нещодавно") # => uk


Adding languages to the data set is easy. Put each language sample in its own file named [iso-code].txt (ex. en.txt for english) and place all examples in a folder. Then paste the following code into a crystal file and modify the constants.

require "cadmium_lang"
# or
require "./src/cadmium_lang"

DATA_PATH = "path/to/data"
OUTPUT_FOLDER = "path/to/output"
LOAD_DATA = false

classifier =

# If true previous data will be loaded from the OUTPUT_FOLDER.
# Should be true for re-trains, false for first time.

# Train the classifer on a directory full of data samples

# Save the results

In the case of the standard language set used in this library the values should be (assuming your crystal file is located at the root of the project):

  • DATA_PATH: Anything. You need to make the folder.
  • OUTPUT_FOLDER: "data/languages"
  • LOAD_DATA: true


  1. Fork it (
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request


You can’t perform that action at this time.