## Language Detection

Python provides some libraries for language detection:

* [langdetect](https://pypi.python.org/pypi/langdetect/1.0.1) - LangDetect is language detection library ported from [Google's language-detection algorithm](https://github.com/shuyo/language-detection/blob/wiki/ProjectHome.md).
* [langid](https://github.com/saffsd/langid.py) - LangId is a standalone Language Identification tool
* [polyglot](https://pypi.python.org/pypi/polyglot) - Polyglot is a natural language pipeline that supports massive multilingual applications. Its language detection uses the Google Translate API

The langdetect is the fastest and smaller, so the best library for the purpose. Althoug the libraries above are good estimators, they are not good enough to the language use we face in blogs and microblogs. The article [Language Detection for Twitter](https://shuyo.wordpress.com/2012/02/21/language-detection-for-twitter-with-99-1-accuracy/) presents an algorithm in python that helps the detection for that kind of text. 


### LangDetect

In [12]:
from langdetect import detect
# Simply detects the main languages
detect(u'Eu falo Português')

u'pt'

In [45]:
from langdetect import detect_langs
# Or shows the probabilities for the detected languages
detect_langs(u'Tudo bem brother?')

[en:0.85714175108, pt:0.14285723368]

In [30]:
# Language Detection only performs reasonable with a text bigger than one word.
detect_langs(u'Olá')

[hu:0.999996875725]


In [47]:
# and with formal language, not with slangs or internet language.
detect_langs(u'Tah td bem c vc?')

[vi:0.571427435325, id:0.285714212192, en:0.142858299116]

### LangId

In [56]:
import langid
# Langid does have a good accuracy over short sentences
langid.classify(u'Eu falo Português')

('ku', 0.7003455583568504)

In [57]:
langid.classify(u'Este texto é para demonstrar a assertividade do algoritmo de detecção de línguas')

('pt', 1.0)

### Polyglot

In [62]:
from polyglot.text import Text, Word
text = Text(u'Eu falo Português')
text.language.name

u'Portuguese'

In [58]:
help(text.detect_language)

Help on method detect_language in module polyglot.text:

detect_language(self) method of polyglot.text.Text instance
    Detect the blob's language using the Google Translate API.
    Requires an internet connection.
    Usage:
    ::
        >>> b = Text("bonjour")
        >>> b.language
        u'fr'

