public
Description: Language Identification with Ruby: probabilistic language identification with ruby1.9
Homepage:
Clone URL: git://github.com/snifty/whatlang.git
name age message
file README Tue Jul 08 15:30:39 -0700 2008 better format of sample files [snifty]
file generate_models.rb Tue Jul 08 16:00:59 -0700 2008 my broken class, take pity on it [snifty]
file lid.rb Tue Jul 08 14:55:55 -0700 2008 some sketching... [snifty]
directory models/ Tue Jul 08 15:30:39 -0700 2008 better format of sample files [snifty]
README
NB: Requires ruby1.9.

A module to identify which of any one of a number of human languages a given text is in.

We use a simple similarity measure between frequency counts of bigrams to compare an unknown text to a set of models of 
known languages.  

The language models are built with samples from: 

  http://www.unicode.org/udhr/downloads.html

which is copied to models/ .