public
Description: Language Identification with Ruby: probabilistic language identification with ruby1.9
Homepage:
Clone URL: git://github.com/snifty/whatlang.git
whatlang / README
100644 13 lines (6 sloc) 0.38 kb
1
2
3
4
5
6
7
8
9
10
11
12
13
NB: Requires ruby1.9.
 
A module to identify which of any one of a number of human languages a given text is in.
 
We use a simple similarity measure between frequency counts of bigrams to compare an unknown text to a set of models of known languages.
 
The language models are built with samples from:
 
  http://www.unicode.org/udhr/downloads.html
 
which is copied to models/ .