<a href="https://colab.research.google.com/github/Nithin46/KDM_ICP2/blob/main/KDM2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Part of Speech (POS) Tagging**

For this, I had used spaCy library. First, I have imported the required library and then passed the input data. Finally, used word.pos_ & word.tag_ and explain(word.tag_) functions to print the desired output. 

*spaCy library comes pre-built with machine learning algorithms that, depending upon the context (surrounding words), it is capable of returning the correct POS tag for the word.

In [None]:
import spacy
sp = spacy.load('en_core_web_sm')
data = sp("Xi Jinping is a Chinese politician who has served as General Secretary of the Chinese Communist Party (CCP) and Chairman of the Central Military Commission (CMC) since 2012, and President of the People's Republic of China (PRC) since 2013. He has been the paramount leader of China, the most prominent political leader in the country, since 2012. The son of Chinese Communist veteran Xi Zhongxun, he was exiled to rural Yanchuan County as a teenager following his father's purge during the Cultural Revolution and lived in a cave in the village of Liangjiahe, where he joined the CCP and worked as the party secretary.")
for word in data:
    print(f'{word.text:{12}} {word.pos_:{10}} {word.tag_:{8}} {spacy.explain(word.tag_)}')

Xi           PROPN      NNP      noun, proper singular
Jinping      PROPN      NNP      noun, proper singular
is           AUX        VBZ      verb, 3rd person singular present
a            DET        DT       determiner
Chinese      ADJ        JJ       adjective
politician   NOUN       NN       noun, singular or mass
who          PRON       WP       wh-pronoun, personal
has          AUX        VBZ      verb, 3rd person singular present
served       VERB       VBN      verb, past participle
as           SCONJ      IN       conjunction, subordinating or preposition
General      PROPN      NNP      noun, proper singular
Secretary    PROPN      NNP      noun, proper singular
of           ADP        IN       conjunction, subordinating or preposition
the          DET        DT       determiner
Chinese      PROPN      NNP      noun, proper singular
Communist    PROPN      NNP      noun, proper singular
Party        PROPN      NNP      noun, proper singular
(            PUNCT      -LRB-    le

# **Named entity recognizer (NER)**



In [None]:
import spacy
from spacy import displacy # displacy() - Visualizing POS tags in a graphical way
nlp = spacy.load('en_core_web_sm')
text = nlp ("Xi Jinping is a Chinese politician who has served as General Secretary of the Chinese Communist Party (CCP) and Chairman of the Central Military Commission (CMC) since 2012, and President of the People's Republic of China (PRC) since 2013. He has been the paramount leader of China, the most prominent political leader in the country, since 2012. The son of Chinese Communist veteran Xi Zhongxun, he was exiled to rural Yanchuan County as a teenager following his father's purge during the Cultural Revolution and lived in a cave in the village of Liangjiahe, where he joined the CCP and worked as the party secretary.")
displacy.render(text, style = 'ent', jupyter=True)

**Another way**

In [None]:
import spacy
sp = spacy.load('en_core_web_sm')
data = sp("Xi Jinping is a Chinese politician who has served as General Secretary of the Chinese Communist Party (CCP) and Chairman of the Central Military Commission (CMC) since 2012, and President of the People's Republic of China (PRC) since 2013. He has been the paramount leader of China, the most prominent political leader in the country, since 2012. The son of Chinese Communist veteran Xi Zhongxun, he was exiled to rural Yanchuan County as a teenager following his father's purge during the Cultural Revolution and lived in a cave in the village of Liangjiahe, where he joined the CCP and worked as the party secretary.")
print(data.ents)  #ents - which returns the list of all the named entities in the data.
for entity in data.ents:
    print("{:30s}\t{:30s}\t".format(entity.text,entity.label_))

(Xi Jinping, Chinese, the Chinese Communist Party, CCP, the Central Military Commission, 2012, the People's Republic of China, PRC, 2013, China, 2012, Chinese, Communist, Xi Zhongxun, Yanchuan County, the Cultural Revolution, Liangjiahe, CCP)
Xi Jinping                    	PERSON                        	
Chinese                       	NORP                          	
the Chinese Communist Party   	ORG                           	
CCP                           	ORG                           	
the Central Military Commission	ORG                           	
2012                          	DATE                          	
the People's Republic of China	GPE                           	
PRC                           	GPE                           	
2013                          	DATE                          	
China                         	GPE                           	
2012                          	DATE                          	
Chinese                       	NORP                          	


# **Lemmatization**

In [None]:
import nltk
nltk.download('punkt')
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
data = "Xi Jinping is a Chinese politician who has served as General Secretary of the Chinese Communist Party (CCP) and Chairman of the Central Military Commission (CMC) since 2012, and President of the People's Republic of China (PRC) since 2013. He has been the paramount leader of China, the most prominent political leader in the country, since 2012. The son of Chinese Communist veteran Xi Zhongxun, he was exiled to rural Yanchuan County as a teenager following his father's purge during the Cultural Revolution and lived in a cave in the village of Liangjiahe, where he joined the CCP and worked as the party secretary."
nltk_tokens = nltk.word_tokenize(data)
print("{0:20}{1:30}".format("Original","Lemmatization"))
for w in nltk_tokens:
       print ("{0:20}{1:30}".format(w,wordnet_lemmatizer.lemmatize(w, pos="v"))) # I have passed the optional parameter Part of Speech as "v" - Verb. So it will process based on Verb.

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
Original            Lemmatization                 
Xi                  Xi                            
Jinping             Jinping                       
is                  be                            
a                   a                             
Chinese             Chinese                       
politician          politician                    
who                 who                           
has                 have                          
served              serve                         
as                  as                            
General             General                       
Secretary           Secretary                     
of                  of                            
the                 the                           
Chinese  

# **Co-reference resolution**

Getting run time error and crashing due to version incompatability.  

In [5]:
import numpy as np
import pandas as pd

MODEL_URL = "https://github.com/huggingface/neuralcoref-models/releases/" \
            "download/en_coref_md-3.0.0/en_coref_md-3.0.0.tar.gz"

!pip install spacy==2.0.12
!pip install {MODEL_URL}
!python -m spacy download en_core_web_md

import en_coref_md
nlp = en_coref_md.load()

test_sent = "Xi Jinping is a Chinese politician who has served as General Secretary of the Chinese Communist Party (CCP) and Chairman of the Central Military Commission (CMC) since 2012, and President of the People's Republic of China (PRC) since 2013. He has been the paramount leader of China, the most prominent political leader in the country, since 2012. The son of Chinese Communist veteran Xi Zhongxun, he was exiled to rural Yanchuan County as a teenager following his father's purge during the Cultural Revolution and lived in a cave in the village of Liangjiahe, where he joined the CCP and worked as the party secretary."
doc = nlp(test_sent)
for x in doc.ents:
    if x._.coref_cluster:
        print(x._.coref_cluster)

Collecting https://github.com/huggingface/neuralcoref-models/releases/download/en_coref_md-3.0.0/en_coref_md-3.0.0.tar.gz
[?25l  Downloading https://github.com/huggingface/neuralcoref-models/releases/download/en_coref_md-3.0.0/en_coref_md-3.0.0.tar.gz (161.3MB)
[K     |████████████████████████████████| 161.3MB 73kB/s 
Building wheels for collected packages: en-coref-md
  Building wheel for en-coref-md (setup.py) ... [?25l[?25hdone
  Created wheel for en-coref-md: filename=en_coref_md-3.0.0-cp36-cp36m-linux_x86_64.whl size=163510688 sha256=d8bf51dffe312e79b7deb0084d08f503376dfd6174052d90ef1268c7d350f2be
  Stored in directory: /root/.cache/pip/wheels/aa/a3/8f/9df13c027b75169bcca62682563e9823bb213c72a2cc3efed8
Successfully built en-coref-md

[93m    Linking successful[0m
    /usr/local/lib/python3.6/dist-packages/en_core_web_md -->
    /usr/local/lib/python3.6/dist-packages/spacy/data/en_core_web_md

    You can now load the model via spacy.load('en_core_web_md')

Xi Jinping: [Xi Ji

# **Parsing**


In [None]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
data = "Xi Jinping is a Chinese politician who has served as General Secretary of the Chinese Communist Party (CCP) and Chairman of the Central Military Commission (CMC) since 2012, and President of the People's Republic of China (PRC) since 2013. He has been the paramount leader of China, the most prominent political leader in the country, since 2012. The son of Chinese Communist veteran Xi Zhongxun, he was exiled to rural Yanchuan County as a teenager following his father's purge during the Cultural Revolution and lived in a cave in the village of Liangjiahe, where he joined the CCP and worked as the party secretary."
new_token = nltk.pos_tag (word_tokenize(data))
new_token

np = r "NP: {<DT>?<JJ>*<NN>}" #This is a definition for a rule to group of words into a noun phrase.  It will group one determinant, then zero or more adjectives followed by zero or more nouns. 
chunk_parser = nltk.RegexpParser(np) #RegexpParser - Uses a set of regular expression patterns to specify the behavior of the parser. 
result = chunk_parser.parse(new_token)
result

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


TclError: ignored

Tree('S', [Tree('NP', [('Xi', 'NN')]), ('Jinping', 'NNP'), ('is', 'VBZ'), Tree('NP', [('a', 'DT'), ('Chinese', 'JJ'), ('politician', 'NN')]), ('who', 'WP'), ('has', 'VBZ'), ('served', 'VBN'), ('as', 'IN'), ('General', 'NNP'), ('Secretary', 'NNP'), ('of', 'IN'), ('the', 'DT'), ('Chinese', 'NNP'), ('Communist', 'NNP'), ('Party', 'NNP'), ('(', '('), ('CCP', 'NNP'), (')', ')'), ('and', 'CC'), ('Chairman', 'NNP'), ('of', 'IN'), ('the', 'DT'), ('Central', 'NNP'), ('Military', 'NNP'), ('Commission', 'NNP'), ('(', '('), ('CMC', 'NNP'), (')', ')'), ('since', 'IN'), ('2012', 'CD'), (',', ','), ('and', 'CC'), ('President', 'NNP'), ('of', 'IN'), ('the', 'DT'), ('People', 'NNP'), ("'s", 'POS'), ('Republic', 'NNP'), ('of', 'IN'), ('China', 'NNP'), ('(', '('), ('PRC', 'NNP'), (')', ')'), ('since', 'IN'), ('2013', 'CD'), ('.', '.'), ('He', 'PRP'), ('has', 'VBZ'), ('been', 'VBN'), Tree('NP', [('the', 'DT'), ('paramount', 'JJ'), ('leader', 'NN')]), ('of', 'IN'), ('China', 'NNP'), (',', ','), ('the', 'DT