# Part Of Speech 
Each language is made up of a number of parts of speech such as verbs, nouns, adverbs, adjectives and so on.
PoS is all about tagging (assigning) language-specific parts of a speech on a text
## Using Library spacy

In [1]:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
doc1 = nlp('''The son of a salesman who later operated an electrochemical factory, 
instein was born in the German Empire, but moved to Switzerland in 1895 and 
renounced his German citizenship in 1896. Specializing in physics and
mathematics, he received his academic teaching diploma from the Swiss 
Federal Polytechnic School (German: eidgenössische polytechnische Schule) 
in Zürich in 1900. The following year, he acquired Swiss citizenship, which
he kept for his entire life. After initially struggling to find work, from 
1902 to 1909 he was employed as a patent examiner at the Swiss Patent Office
in Bern.

''')

#first method to get pos
for token in doc1:
    print('Words is   : ' , token.text)#word
    print('POS is   : ' , token.pos ,'===',token.pos_  , '===', spacy.explain(token.pos_))
    print('Dep is   : ' , token.dep , '===',token.dep_, '===', spacy.explain(token.dep_))#more explanation pos
    print('Tag is   : ' , token.tag , '===',token.tag_, '===', spacy.explain(token.tag_))#more2 expanation pos
    print('-----------------------')

Words is   :  The
POS is   :  90 === DET === determiner
Dep is   :  415 === det === determiner
Tag is   :  15267657372422890137 === DT === determiner
-----------------------
Words is   :  son
POS is   :  92 === NOUN === noun
Dep is   :  430 === nsubjpass === nominal subject (passive)
Tag is   :  15308085513773655218 === NN === noun, singular or mass
-----------------------
Words is   :  of
POS is   :  85 === ADP === adposition
Dep is   :  443 === prep === prepositional modifier
Tag is   :  1292078113972184607 === IN === conjunction, subordinating or preposition
-----------------------
Words is   :  a
POS is   :  90 === DET === determiner
Dep is   :  415 === det === determiner
Tag is   :  15267657372422890137 === DT === determiner
-----------------------
Words is   :  salesman
POS is   :  92 === NOUN === noun
Dep is   :  439 === pobj === object of preposition
Tag is   :  15308085513773655218 === NN === noun, singular or mass
-----------------------
Words is   :  who
POS is   :  95 === P

Tag is   :  15267657372422890137 === DT === determiner
-----------------------
Words is   :  following
POS is   :  84 === ADJ === adjective
Dep is   :  402 === amod === adjectival modifier
Tag is   :  10554686591937588953 === JJ === adjective (English), other noun-modifier (Chinese)
-----------------------
Words is   :  year
POS is   :  92 === NOUN === noun
Dep is   :  428 === npadvmod === noun phrase as adverbial modifier
Tag is   :  15308085513773655218 === NN === noun, singular or mass
-----------------------
Words is   :  ,
POS is   :  97 === PUNCT === punctuation
Dep is   :  445 === punct === punctuation
Tag is   :  2593208677638477497 === , === punctuation mark, comma
-----------------------
Words is   :  he
POS is   :  95 === PRON === pronoun
Dep is   :  429 === nsubj === nominal subject
Tag is   :  13656873538139661788 === PRP === pronoun, personal
-----------------------
Words is   :  acquired
POS is   :  100 === VERB === verb
Dep is   :  8206900633647566924 === ROOT === None


In [3]:
#second method to get pos
for token in doc1:
    print(f'{token.text:{10}} {token.pos_:{8}} {token.tag_:{6}} {spacy.explain(token.tag_)}')#8,6... espace

The        DET      DT     determiner
son        NOUN     NN     noun, singular or mass
of         ADP      IN     conjunction, subordinating or preposition
a          DET      DT     determiner
salesman   NOUN     NN     noun, singular or mass
who        PRON     WP     wh-pronoun, personal
later      ADV      RB     adverb
operated   VERB     VBD    verb, past tense
an         DET      DT     determiner
electrochemical ADJ      JJ     adjective (English), other noun-modifier (Chinese)
factory    NOUN     NN     noun, singular or mass
,          PUNCT    ,      punctuation mark, comma

          SPACE    _SP    whitespace
instein    NOUN     NN     noun, singular or mass
was        AUX      VBD    verb, past tense
born       VERB     VBN    verb, past participle
in         ADP      IN     conjunction, subordinating or preposition
the        DET      DT     determiner
German     PROPN    NNP    noun, proper singular
Empire     PROPN    NNP    noun, proper singular
,          PUNCT    ,

### Also can know the tense of verb perfectly

In [4]:
#verb read is present
doc = nlp(u'I read book now.')
r = doc[1]
print(f'{r.text:{10}} {r.pos_:{8}} {r.tag_:{6}} {spacy.explain(r.tag_)}')

read       VERB     VBP    verb, non-3rd person singular present


In [5]:
#verb read past simple
doc = nlp(u'I read a book on NLP.')
r = doc[1]
print(f'{r.text:{1}} {r.pos_:{8}} {r.tag_:{6}} {spacy.explain(r.tag_)}')

read VERB     VBD    verb, past tense


In [6]:
# get any type how words have in docm
docm=nlp("""I want to read,
Some beautiful book ,
like rssail gayr morssala""")
POS_counts = docm.count_by(spacy.attrs.POS)
for k,v in sorted(POS_counts.items()):
    print(f'{k}. {doc1.vocab[k].text:{5}}: {v}')#we have 2 verb: want , read 

84. ADJ  : 1
85. ADP  : 1
90. DET  : 1
92. NOUN : 3
94. PART : 1
95. PRON : 1
96. PROPN: 1
97. PUNCT: 2
100. VERB : 2
103. SPACE: 2


In [7]:
TAG_counts = docm.count_by(spacy.attrs.TAG)
for k,v in sorted(TAG_counts.items()):
    print(f'{k}. {docm.vocab[k].text:{4}}: {v}')

1292078113972184607. IN  : 1
2593208677638477497. ,   : 2
5595707737748328492. TO  : 1
6893682062797376370. _SP : 2
9188597074677201817. VBP : 1
10554686591937588953. JJ  : 1
13656873538139661788. PRP : 1
14200088355797579614. VB  : 1
15267657372422890137. DT  : 1
15308085513773655218. NN  : 3
15794550382381185553. NNP : 1


## Using Library nltk

In [8]:
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer

In [9]:
text = 'Moses supposes his toeses are roses but moses supposes erroneously'
print(nltk.word_tokenize(text))
for w , m in nltk.pos_tag(nltk.word_tokenize(text)):
    print(f'word : ({w}), type : ({m}) , means :  ({spacy.explain(m)})')

['Moses', 'supposes', 'his', 'toeses', 'are', 'roses', 'but', 'moses', 'supposes', 'erroneously']
word : (Moses), type : (NNS) , means :  (noun, plural)
word : (supposes), type : (VBZ) , means :  (verb, 3rd person singular present)
word : (his), type : (PRP$) , means :  (pronoun, possessive)
word : (toeses), type : (NNS) , means :  (noun, plural)
word : (are), type : (VBP) , means :  (verb, non-3rd person singular present)
word : (roses), type : (NNS) , means :  (noun, plural)
word : (but), type : (CC) , means :  (conjunction, coordinating)
word : (moses), type : (VBZ) , means :  (verb, 3rd person singular present)
word : (supposes), type : (NNS) , means :  (noun, plural)
word : (erroneously), type : (RB) , means :  (adverb)


In [10]:
text = '''
Thomas Gradgrind, sir.  A man of realities.  A man of facts and calculations.  A man who proceeds upon the principle that
two and two are four, and nothing over, and who is not to be talked into allowing for anything over.  Thomas Gradgrind, 
sir—peremptorily Thomas—Thomas Gradgrind.  With a rule and a pair of scales, and the multiplication table always in his pocket, 
sir, ready to weigh and measure any parcel of human nature, and tell you exactly what it comes to.  It is a mere question of
figures, a case of simple arithmetic.  You might hope to get some other nonsensical belief into the head of George Gradgrind, or Augustus Gradgrind, or John Gradgrind, or Joseph Gradgrind (all supposititious, non-existent persons), but into the head of Thomas Gradgrind—no, sir!

In such terms Mr. Gradgrind always mentally introduced himself, whether to his private circle of acquaintance, or to the public in general.  In such terms, no doubt, substituting the words ‘boys and girls,’ for ‘sir,’ Thomas Gradgrind now presented Thomas Gradgrind to the little pitchers before him, who were to be filled so full of facts.
'''
#just do it sentence segmantation
custom_sent_tokenizer = PunktSentenceTokenizer(text)
tokenized = custom_sent_tokenizer.tokenize(text)
tokenized[:10]# first ten senteces

['\nThomas Gradgrind, sir.',
 'A man of realities.',
 'A man of facts and calculations.',
 'A man who proceeds upon the principle that\ntwo and two are four, and nothing over, and who is not to be talked into allowing for anything over.',
 'Thomas Gradgrind, \nsir—peremptorily Thomas—Thomas Gradgrind.',
 'With a rule and a pair of scales, and the multiplication table always in his pocket, \nsir, ready to weigh and measure any parcel of human nature, and tell you exactly what it comes to.',
 'It is a mere question of\nfigures, a case of simple arithmetic.',
 'You might hope to get some other nonsensical belief into the head of George Gradgrind, or Augustus Gradgrind, or John Gradgrind, or Joseph Gradgrind (all supposititious, non-existent persons), but into the head of Thomas Gradgrind—no, sir!',
 'In such terms Mr. Gradgrind always mentally introduced himself, whether to his private circle of acquaintance, or to the public in general.',
 'In such terms, no doubt, substituting the words

In [11]:
 for i in tokenized[:5]:#first five senteces
        for w , m in nltk.pos_tag(nltk.word_tokenize(i)):#m is word tokenize ,w postag
                print(f'word : ({w}), type : ({m}) , means :  ({spacy.explain(m)})')
                print('-----------------------------------------------')

word : (Thomas), type : (NNP) , means :  (noun, proper singular)
-----------------------------------------------
word : (Gradgrind), type : (NNP) , means :  (noun, proper singular)
-----------------------------------------------
word : (,), type : (,) , means :  (punctuation mark, comma)
-----------------------------------------------
word : (sir), type : (NN) , means :  (noun, singular or mass)
-----------------------------------------------
word : (.), type : (.) , means :  (punctuation mark, sentence closer)
-----------------------------------------------
word : (A), type : (DT) , means :  (determiner)
-----------------------------------------------
word : (man), type : (NN) , means :  (noun, singular or mass)
-----------------------------------------------
word : (of), type : (IN) , means :  (conjunction, subordinating or preposition)
-----------------------------------------------
word : (realities), type : (NNS) , means :  (noun, plural)
------------------------------------------

## pos NLTK  in Arabic language 
is so bad because have big mistakes

In [12]:
doc1 = nlp('''ضمت مؤلفات الخوارزمي كتاب الجمع والتفريق
في الحساب الهندي، وكتاب رسم الربع المعمور، وكتاب تقويم البلدان، وكتاب العمل بالأسطرلاب، 
وكتاب "صورة الأرض " الذي اعتمد فيه على كتاب المجسطي لبطليموس مع إضافات وشروح
وتعليقات، وأعاد كتابة كتاب الفلك الهندي المعروف باسم "السند هند الكبير" الذي ترجم إلى اللغة
العربية زمن الخليفة المنصور فأعاد الخوارزمي كتابته وأضاف إليه وسمي كتابه "السند هند الصغير".

وقد عرض في كتاب المختصر في حساب الجبر والمقابلة أول حل منهجي
للمعادلات الخطية والمعادلات التربيعية مستعملا في ذلك الطريقة المعروفة باسم إكمال المربع. ويعتبر مؤسس علم الجبر،
(اللقب الذي يتقاسمه مع ديوفانتوس) في القرن الثاني عشر، ولقد قدمت ترجمات اللاتينية عن حسابه على الأرقام الهندية، 
النظام العشري إلى العالم الغربي. نقح الخوارزمي كتاب الجغرافيا لكلاوديوس بطليموس وكتب في علم الفلك والتنجيم.
''')


for token in doc1:
    print('Words is   : ' , token.text)
    print('POS is   : ' , token.pos ,'===',token.pos_  , '===', spacy.explain(token.pos_))
    print('Dep is   : ' , token.dep , '===',token.dep_, '===', spacy.explain(token.dep_))
    print('Tag is   : ' , token.tag , '===',token.tag_, '===', spacy.explain(token.tag_))
    print('-----------------------')

Words is   :  ضمت
POS is   :  96 === PROPN === proper noun
Dep is   :  7037928807040764755 === compound === compound
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  مؤلفات
POS is   :  96 === PROPN === proper noun
Dep is   :  429 === nsubj === nominal subject
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  الخوارزمي
POS is   :  100 === VERB === verb
Dep is   :  8206900633647566924 === ROOT === None
Tag is   :  17109001835818727656 === VBD === verb, past tense
-----------------------
Words is   :  كتاب
POS is   :  96 === PROPN === proper noun
Dep is   :  429 === nsubj === nominal subject
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  الجمع
POS is   :  96 === PROPN === proper noun
Dep is   :  408 === ccomp === clausal complement
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
W

Dep is   :  416 === dobj === direct object
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  الخليفة
POS is   :  96 === PROPN === proper noun
Dep is   :  403 === appos === appositional modifier
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  المنصور
POS is   :  96 === PROPN === proper noun
Dep is   :  7037928807040764755 === compound === compound
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  فأعاد
POS is   :  96 === PROPN === proper noun
Dep is   :  7037928807040764755 === compound === compound
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  الخوارزمي
POS is   :  96 === PROPN === proper noun
Dep is   :  8206900633647566924 === ROOT === None
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  كتابته
POS is   : 

-----------------------
Words is   :  مؤسس
POS is   :  96 === PROPN === proper noun
Dep is   :  7037928807040764755 === compound === compound
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  علم
POS is   :  96 === PROPN === proper noun
Dep is   :  7037928807040764755 === compound === compound
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  الجبر
POS is   :  96 === PROPN === proper noun
Dep is   :  7037928807040764755 === compound === compound
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  ،
POS is   :  96 === PROPN === proper noun
Dep is   :  7037928807040764755 === compound === compound
Tag is   :  15794550382381185553 === NNP === noun, proper singular
-----------------------
Words is   :  

POS is   :  103 === SPACE === space
Dep is   :  429 === nsubj === nominal subject
Tag is   :  6893682062797376370 === _SP === w

In [13]:
for token in doc1:
    print(f'{token.text:{10}} {token.pos_:{8}} {token.tag_:{6}} {spacy.explain(token.tag_)}')

ضمت        PROPN    NNP    noun, proper singular
مؤلفات     PROPN    NNP    noun, proper singular
الخوارزمي  VERB     VBD    verb, past tense
كتاب       PROPN    NNP    noun, proper singular
الجمع      PROPN    NNP    noun, proper singular
والتفريق   PROPN    NNP    noun, proper singular

          SPACE    _SP    whitespace
في         ADV      RB     adverb
الحساب     PROPN    NNP    noun, proper singular
الهندي     PROPN    NNP    noun, proper singular
،          PROPN    NNP    noun, proper singular
وكتاب      PROPN    NNP    noun, proper singular
رسم        PROPN    NNP    noun, proper singular
الربع      PROPN    NNP    noun, proper singular
المعمور    PROPN    NNP    noun, proper singular
،          PROPN    NNP    noun, proper singular
وكتاب      PROPN    NNP    noun, proper singular
تقويم      PROPN    NNP    noun, proper singular
البلدان    PROPN    NNP    noun, proper singular
،          PROPN    NNP    noun, proper singular
وكتاب      PROPN    NNP    noun, proper singular
ال

In [14]:
POS_counts = doc1.count_by(spacy.attrs.POS)
for k,v in sorted(POS_counts.items()):
    print(f'{k}. {doc1.vocab[k].text:{5}}: {v}')

84. ADJ  : 3
85. ADP  : 3
86. ADV  : 2
92. NOUN : 9
96. PROPN: 108
97. PUNCT: 12
100. VERB : 5
101. X    : 1
103. SPACE: 9
