Skip to content

aboSamoor/polyglot

Repository files navigation

polyglot

Downloads Latest Version Build Status Documentation Status

Polyglot is a natural language pipeline that supports massive multilingual applications.

Features

  • Tokenization (165 Languages)
  • Language detection (196 Languages)
  • Named Entity Recognition (40 Languages)
  • Part of Speech Tagging (16 Languages)
  • Sentiment Analysis (136 Languages)
  • Word Embeddings (137 Languages)
  • Morphological analysis (135 Languages)
  • Transliteration (69 Languages)

Developer

  • Rami Al-Rfou @ rmyeid gmail com

Quick Tutorial

Language Detection

Language Detected: Code=fr, Name=French

Tokenization

[u'Beautiful', u'is', u'better', u'than', u'ugly', u'.', u'Explicit', u'is', u'better', u'than', u'implicit', u'.', u'Simple', u'is', u'better', u'than', u'complex', u'.']

[Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple is better than complex.")]

Part of Speech Tagging

Word POS Tag

O DET primeiro ADJ uso NOUN de ADP desobediência NOUN civil ADJ em ADP massa NOUN ocorreu ADJ em ADP setembro NOUN de ADP 1906 NUM . PUNCT

Named Entity Recognition

[I-LOC([u'Gro\xdfbritannien']), I-PER([u'Gandhi'])]

Polarity

Word Polarity

Beautiful 0 is 0 better 1 than 0 ugly -1 . 0

Embeddings

Neighbors (Synonms) of Obama

Bush Reagan Clinton Ahmadinejad Nixon Karzai McCain Biden Huckabee Lula

The first 10 dimensions out the 256 dimensions

[-2.57382345 1.52175975 0.51070285 1.08678675 -0.74386948 -1.18616164

2.92784619 -0.25694436 -1.40958667 -2.39675403]

Morphology

[u'Pre', u'process', u'ing']

Transliteration

препрокессинг