Detect the language of text.
- franc can support more languages(†) than any other library
- franc is packaged with support for 82, 187, or 406 languages
- franc has a CLI
† - Based on the UDHR, the most translated document in the world.
franc supports many languages, which means it’s easily confused on small samples. Make sure to pass it big documents to get reliable results.
npm:
npm install franc
This installs the franc
package, with support for 187 languages
(languages which have 1 million or more speakers).
franc-min
(82 languages, 8m or more speakers) and franc-all
(all
406 possible languages) are also available.
Finally, use franc-cli
to install the CLI.
Browser builds for franc-min
, franc
, and franc-all
are
available on GitHub Releases.
var franc = require('franc')
franc('Alle menslike wesens word vry') // => 'afr'
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট') // => 'ben'
franc('Alle menneske er fødde til fridom') // => 'nno'
franc('') // => 'und' (language code that stands for undetermined)
// You can change what’s too short (default: 10):
franc('the') // => 'und'
franc('the', {minLength: 3}) // => 'sco'
console.log(franc.all('O Brasil caiu 26 posições'))
Yields:
[ [ 'por', 1 ],
[ 'src', 0.8797557538750587 ],
[ 'glg', 0.8708313762329732 ],
[ 'snn', 0.8633161108501644 ],
[ 'bos', 0.8172851103804604 ],
... 116 more items ]
console.log(franc.all('O Brasil caiu 26 posições', {only: ['por', 'spa']}))
Yields:
[ [ 'por', 1 ], [ 'spa', 0.799906059182715 ] ]
console.log(franc.all('O Brasil caiu 26 posições', {ignore: ['src', 'glg']}))
Yields:
[ [ 'por', 1 ],
[ 'snn', 0.8633161108501644 ],
[ 'bos', 0.8172851103804604 ],
[ 'hrv', 0.8107092531705026 ],
[ 'lav', 0.810239549084077 ],
... 114 more items ]
Install:
npm install franc-cli --global
Use:
CLI to detect the language of text
Usage: franc [options] <string>
Options:
-h, --help output usage information
-v, --version output version number
-m, --min-length <number> minimum length to accept
-o, --only <string> allow languages
-i, --ignore <string> disallow languages
-a, --all display all guesses
Usage:
# output language
$ franc "Alle menslike wesens word vry"
# afr
# output language from stdin (expects utf8)
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | franc
# ben
# ignore certain languages
$ franc --ignore por,glg "O Brasil caiu 26 posições"
# src
# output language from stdin with only
$ echo "Alle mennesker er født frie og" | franc --only nob,dan
# nob
Package | Languages | Speakers |
---|---|---|
franc-min |
82 | 8M or more |
franc |
187 | 1M or more |
franc-all |
406 | - |
Note that franc returns ISO 639-3 codes (three letter codes). Not ISO 639-1 or ISO 639-2. See also GH-10 and GH-30.
To get more info about the languages represented by ISO 639-3, use
iso-639-3
.
There is also an index available to map ISO 639-3 to ISO 639-1 codes,
iso-639-3/to-1.json
, but note that not all 639-3 codes can
be represented in 639-1.
Franc has been ported to several other programming languages.
- Elixir —
paasaa
- Erlang —
efranc
- Go —
franco
,whatlanggo
- R —
franc
- Rust —
whatlang-rs
The works franc is derived from have themselves also been ported to other languages.
Franc is a derivative work from guess-language (Python, LGPL), guesslanguage (C++, LGPL), and Language::Guess (Perl, GPL). Their creators granted me the rights to distribute franc under the MIT license: respectively, Kent S. Johnson, Jacob R. Rideout, and Maciej Ceglowski.