## Natural Language Processing with Polyglot

#### Installation on Unix
+ sudo apt-get install python-numpy libicu-dev
+ pip install polyglot

#### Installation on Windows

##### Download the PyCLD2 and PyICU From 
 - https://www.lfd.uci.edu/~gohlke/pythonlibs/
- pip install pycld2-0.31-cp36-cp36m-win_amd64.whl
- pip install PyICU-1.9.8-cp36-cp36m-win_amd64.whl
- pip install Morfessor-2.0.4-py2.py3-none-any.whl
- git clone https://github.com/aboSamoor/polyglot.git
- python setup.py install


- polyglot download embeddings2.en
- polyglot download ner2.en
- polyglot download sentiment2.en
- polyglot download pos2.en
- polyglot download morph2.en
- polyglot download transliteration2.ar

#### Uses and Application
+ Fundamentals or Basics of NLP
+ Transliteration
+ Named Entity Recognition
+ Sentiment Analysis

##### NB similar learning curve like TextBlob API

#### Tokenization
+ Splitting text into words

In [47]:
# Load packages
import polyglot
from polyglot.text import Text,Word

In [48]:
# Word Tokens
docx = Text(u"He likes reading and painting")


In [49]:
docx.words

WordList(['He', 'likes', 'reading', 'and', 'painting'])

In [50]:
docx2 = Text(u"He exclaimed, 'what're you doing? Reading?'.")

In [51]:
docx2.words

WordList(['He', 'exclaimed', ',', "'", "what're", 'you', 'doing', '?', 'Reading', '?', "'", '.'])

In [52]:
# Sentence tokens
docx3 = Text(u"He likes reading and painting.He exclaimed, 'what're you doing? Reading?'.")

In [53]:
docx3.sentences

[Sentence("He likes reading and painting.He exclaimed, 'what're you doing?"),
 Sentence("Reading?'.")]

#### Parts of Speech Tagging
+ polyglot download embeddings2.la
+ pos_tags


In [54]:
docx

Text("He likes reading and painting")

In [55]:
docx.pos_tags
    

[('He', 'PRON'),
 ('likes', 'VERB'),
 ('reading', 'VERB'),
 ('and', 'CONJ'),
 ('painting', 'NOUN')]

#### Language Detection
+ polyglot.detect
+ language.name
+ language.code

In [56]:
docx

Text("He likes reading and painting")

In [57]:
docx.language.name

'English'

In [58]:
docx.language.code

'en'

In [59]:
from polyglot.detect  import Detector

In [60]:
en_text = "He is a student "
fr_text = "Il est un étudiant"
ru_text = "Он студент"

In [67]:
detect_en = Detector(en_text)
detect_fr = Detector(fr_text)
detect_ru = Detector(ru_text)

Detector is not able to detect the language reliably.
Detector is not able to detect the language reliably.


In [63]:
print(detect_en.language)

name: English     code: en       confidence:  94.0 read bytes:   704


In [66]:
print(detect_fr.language)

name: French      code: fr       confidence:  95.0 read bytes:   870


In [68]:
print(detect_ru.language)

name: Serbian     code: sr       confidence:  95.0 read bytes:   614


#### Sentiment Analysis
+ polarity

In [71]:
docx4 = Text(u"He hates reading and playing")

In [69]:
docx

Text("He likes reading and painting")

In [70]:
docx.polarity

1.0

In [72]:
docx4.polarity

-1.0

#### Named Entities
+ entities

In [73]:
docx5 = Text(u"John Jones was a FBI detector")

In [74]:
docx5.entities

[I-PER(['John', 'Jones']), I-ORG(['FBI'])]

#### Morphology
+  morpheme is the smallest grammatical unit in a language. 
+ morpheme may or may not stand alone, word, by definition, is freestanding. 
+ morphemes

In [75]:
docx6 = Text(u"preprocessing")

In [76]:
docx6.morphemes

WordList(['pre', 'process', 'ing'])

#### Transliteration

In [77]:
# Load 
from polyglot.transliteration import Transliterator
translit = Transliterator(source_lang='en',target_lang='fr')

In [78]:
translit.transliterate(u"working")

'working'

In [None]:
# Jesse JCharis
# J-Secur1ty
# Jesus Saves @JCharisTect