### Parts of Speech tagging

Part-of-Speech(POS) Tagging is the process of assigning different labels known as POS tags to the words in a sentence that tells us about the part-of-speech of the word.

Two type of POS tagging:

1) Universal POS tagging
    
2) Detailed POS tagging

In [None]:
#we can do POS tagging by Spacy, NLTK and Stanza

### POS tagging using Spacy

spacy.load('en_core_web_sm')....?

Once you've downloaded and installed a trained pipeline, you can load it via spacy.load . This will return a Language object containing all components and data needed to process text. We usually call it nlp 

In [17]:
import spacy
import pandas as pd 
nlp=spacy.load('en_core_web_sm')
text1="According to Margaret Doody, the novel has  took a continuous and comprehensive history of about two thousand years, with its origins in the Ancient Greek and Roman novel, in Chivalric romance, and in the tradition of the Italian renaissance novella.[6] The ancient romance form was revived by Romanticism, especially the historical romances of Walter Scott and the Gothic novel.[7] Some, including M. H. Abrams and Walter Scott, have argued that a novel is a fiction narrative that displays a realistic depiction of the state of a society, while the romance encompasses any fictitious narrative that emphasizes marvellous or uncommon incidents."
for token in nlp(text1):
    print(token.text,':',token.pos_, ':',token.tag_)

According : VERB : VBG
to : ADP : IN
Margaret : PROPN : NNP
Doody : PROPN : NNP
, : PUNCT : ,
the : DET : DT
novel : NOUN : NN
has : AUX : VBZ
  : SPACE : _SP
took : VERB : VBD
a : DET : DT
continuous : ADJ : JJ
and : CCONJ : CC
comprehensive : ADJ : JJ
history : NOUN : NN
of : ADP : IN
about : ADV : RB
two : NUM : CD
thousand : NUM : CD
years : NOUN : NNS
, : PUNCT : ,
with : ADP : IN
its : PRON : PRP$
origins : NOUN : NNS
in : ADP : IN
the : DET : DT
Ancient : PROPN : NNP
Greek : PROPN : NNP
and : CCONJ : CC
Roman : ADJ : JJ
novel : NOUN : NN
, : PUNCT : ,
in : ADP : IN
Chivalric : ADJ : JJ
romance : NOUN : NN
, : PUNCT : ,
and : CCONJ : CC
in : ADP : IN
the : DET : DT
tradition : NOUN : NN
of : ADP : IN
the : DET : DT
Italian : ADJ : JJ
renaissance : NOUN : NN
novella.[6 : NUM : CD
] : X : XX
The : DET : DT
ancient : ADJ : JJ
romance : NOUN : NN
form : NOUN : NN
was : AUX : VBD
revived : VERB : VBN
by : ADP : IN
Romanticism : PROPN : NNP
, : PUNCT : ,
especially : ADV : RB
the : DET

In [None]:
#The XX tag. The XX tag is used for partial words, but only when you cannot figure out from the context what the word is.

In [None]:
# .pos_ returns Universal pos tags
# .tag_ returns tag associated to that word

### Dependency Parsing
Dependency parsing is the process of analyzing the grammatical structure of a sentence based on the dependencies between the words in a sentence.

In [18]:
for token in nlp(text1):
    print(token.text,':',token.dep_, ':',token.head.text)
    

According : prep : took
to : prep : According
Margaret : compound : Doody
Doody : pobj : to
, : punct : took
the : det : novel
novel : nsubj : took
has : aux : took
  : nsubj : took
took : ROOT : took
a : det : history
continuous : amod : history
and : cc : continuous
comprehensive : conj : continuous
history : dobj : took
of : prep : history
about : advmod : thousand
two : compound : thousand
thousand : nummod : years
years : pobj : of
, : punct : took
with : prep : took
its : poss : origins
origins : pobj : with
in : prep : origins
the : det : novel
Ancient : compound : Greek
Greek : amod : novel
and : cc : Greek
Roman : conj : Greek
novel : pobj : in
, : punct : took
in : prep : took
Chivalric : compound : romance
romance : pobj : in
, : punct : in
and : cc : in
in : conj : in
the : det : tradition
tradition : pobj : in
of : prep : tradition
the : det : renaissance
Italian : amod : renaissance
renaissance : pobj : of
novella.[6 : dep : took
] : punct : took
The : det : form
ancient 

In [None]:
# .dep_ returns the dependency tag for a word,
# .head.text returns the respective head word
#the word 'took' has a dependency tag of ROOT. This tag is assigned to the word which acts as the head of 
#many words in a sentence but is not a child of any other word. Generally, it is the main verb of the sentence similar to 
#‘took’ in this case.

In [19]:
from spacy import displacy
displacy.render(nlp(text1),jupyter=True)

In the above image, 
the arrows represent the dependency between two words in which the word at the arrowhead is the child, and the word at the end of the arrow is head. The root word can act as the head of multiple words in a sentence but is not a child of any other word. You can see above that the word ‘took’ has multiple outgoing arrows but none incoming. Therefore, it is the root word. One interesting thing about the root word is that if you start tracing the dependencies in a sentence you can reach the root word, no matter from which word you start.

### Constituency Parsing
Constituency Parsing is the process of analyzing the sentences by breaking down it into sub-phrases also known as constituents. These sub-phrases belong to a specific category of grammar like NP (noun phrase) and VP(verb phrase).

Now you know what constituency parsing is,  Now spaCy does not provide an official API for constituency parsing. Therefore, we will be using the StanfordParser with Stanza . It is a python implementation of the parsers based on Constituency Parsing with a Self-Attentive Encoder from ACL 2018.

In [None]:
#We can also use  Berkeley Neural Parseror  and NLTK for this purpose

In [37]:
!pip install stanfordcorenlp

Collecting stanfordcorenlp
  Downloading stanfordcorenlp-3.9.1.1-py2.py3-none-any.whl (5.7 kB)
Installing collected packages: stanfordcorenlp
Successfully installed stanfordcorenlp-3.9.1.1


In [None]:
#https://stanfordnlp.github.io/CoreNLP/download.html  --from here we can download 'stanford-corenlp-full-2018-10-05' this file

In [38]:
from stanfordcorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('stanford-corenlp-full-2018-10-05', lang='en')

sentence = 'The clothes in the dressing room are gorgeous. Can I have one?'
tree_str = nlp.parse(sentence)
print(tree_str)

FileNotFoundError: [WinError 2] The system cannot find the file specified

Difference between :
Dependency parsing displays only relationships between words and their constitutes while constituency parsing displays the entire sentence structure and relationships.

In [None]:
#https://spacy.io/models/en
#https://spacy.io/usage/models
#http://www.surdeanu.info/mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html
#https://arxiv.org/abs/1805.01052