public
Description: Language Identification with Ruby: probabilistic language identification with ruby1.9
Homepage:
Clone URL: git://github.com/snifty/whatlang.git
snifty (author)
Tue Jul 08 15:30:39 -0700 2008
commit  61ef9ea92482e52784fe0713bc79d5c918e8bb92
tree    564234e3e3ef4ce3d7f86d7ecded043e63f68ec7
parent  364cd4b2d7c683d9f1252d3e1471eaf6fb5d5c38
name age message
file README Loading commit data...
file generate_models.rb
file lid.rb
directory models/ Tue Jul 08 15:30:39 -0700 2008 better format of sample files [snifty]
README
NB: Requires ruby1.9.

A module to identify which of any one of a number of human languages a given text is in.

We use a simple similarity measure between frequency counts of bigrams to compare an unknown text to a set of models of 
known languages.  

The language models are built with samples from: 

  http://www.unicode.org/udhr/downloads.html

which is copied to models/ .