<a href="https://colab.research.google.com/github/VishalMaurya/NLPwithPython/blob/master/Course/NLP_01_python_basic/Python_basic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# spaCy Basics
spaCy (https://spacy.io/) is an open-source Python library that parses and "understands" large volumes of text. Separate models are available that cater to specific languages (English, French, German, etc.).

In [0]:
import numpy as np
import pandas as pd
import spacy

In [0]:
nlp = spacy.load('en_core_web_sm')

In [0]:
# create doc object
doc = nlp(u'Punjab CM Capt Amarinder Singh has invited President Ram Nath Kovind, Prime Minister Narendra Modi and former Prime Minister Manmohan Singh to join the first all-party group that would attend the historic event of the opening of Kartarpur Corridor in Pakistan.')

In [40]:
for token in doc:
  print(f'{token.text:{10}} {token.pos_:{10}} {token.dep_:{10}} {spacy.explain(token.dep_):}')

Punjab     PROPN      compound   compound
CM         PROPN      compound   compound
Capt       PROPN      compound   compound
Amarinder  PROPN      compound   compound
Singh      PROPN      nsubj      nominal subject
has        VERB       aux        auxiliary
invited    VERB       ROOT       None
President  PROPN      compound   compound
Ram        PROPN      compound   compound
Nath       PROPN      compound   compound
Kovind     PROPN      dobj       direct object
,          PUNCT      punct      punctuation
Prime      PROPN      compound   compound
Minister   PROPN      compound   compound
Narendra   PROPN      compound   compound
Modi       PROPN      conj       conjunct
and        CCONJ      cc         coordinating conjunction
former     ADJ        amod       adjectival modifier
Prime      PROPN      compound   compound
Minister   PROPN      compound   compound
Manmohan   PROPN      compound   compound
Singh      PROPN      conj       conjunct
to         PART       aux        auxi

In [41]:
nlp.pipeline

[('tagger', <spacy.pipeline.pipes.Tagger at 0x7f4eb57c59e8>),
 ('parser', <spacy.pipeline.pipes.DependencyParser at 0x7f4eb57165e8>),
 ('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x7f4eb5716648>)]

# Tokenization

In [42]:
for token in doc:
  print(f'{token.text:{10}} {token.tag_:{10}} {token.lemma_:{10}} {token.is_alpha:{10}} {spacy.explain(token.tag_):>{25}}')

Punjab     NNP        Punjab              1     noun, proper singular
CM         NNP        CM                  1     noun, proper singular
Capt       NNP        Capt                1     noun, proper singular
Amarinder  NNP        Amarinder           1     noun, proper singular
Singh      NNP        Singh               1     noun, proper singular
has        VBZ        have                1 verb, 3rd person singular present
invited    VBN        invite              1     verb, past participle
President  NNP        President           1     noun, proper singular
Ram        NNP        Ram                 1     noun, proper singular
Nath       NNP        Nath                1     noun, proper singular
Kovind     NNP        Kovind              1     noun, proper singular
,          ,          ,                   0   punctuation mark, comma
Prime      NNP        Prime               1     noun, proper singular
Minister   NNP        Minister            1     noun, proper singular
Narendra   N

In [44]:
doc2 = nlp(u'In nutshell, Pakistan came out of the cold to shoot itself in foot and give India yet another sweet victory.\
 The previous one for India was in the International Court of Justice (ICJ) which stayed execution of Kulbhushan Jadhav, the former Indian Navy officer, and ordered review of the judgment of a Pakistan military court.')
for sent in doc2.sents:
  print(sent)

In nutshell, Pakistan came out of the cold to shoot itself in foot and give India yet another sweet victory.
The previous one for India was in the International Court of Justice (ICJ) which stayed execution of Kulbhushan Jadhav, the former Indian Navy officer, and ordered review of the judgment of a Pakistan military court.


In [46]:
span_doc = doc2[30:34]
span_doc

International Court of Justice

In [49]:
print(type(span_doc))
print(type(doc2))

<class 'spacy.tokens.span.Span'>
<class 'spacy.tokens.doc.Doc'>


In [57]:
for entity in doc2.ents:
  print(entity) 
  print(entity.label_) 
  print(spacy.explain(entity.label_))
  print('\n')

Pakistan
GPE
Countries, cities, states


India
GPE
Countries, cities, states


India
GPE
Countries, cities, states


the International Court of Justice
ORG
Companies, agencies, institutions, etc.


ICJ
ORG
Companies, agencies, institutions, etc.


Kulbhushan Jadhav
PERSON
People, including fictional


Indian
NORP
Nationalities or religious or political groups


Navy
ORG
Companies, agencies, institutions, etc.


Pakistan
GPE
Countries, cities, states




In [63]:
for chunk in doc2.noun_chunks:
  print(chunk)

nutshell
Pakistan
the cold
itself
foot
India
yet another sweet victory
The previous one
India
Justice
execution
Kulbhushan Jadhav
the former Indian Navy officer
review
the judgment
a Pakistan military court


In [66]:
spacy.displacy.render(doc2, style='ent', jupyter=True)

In [71]:
spacy.displacy.render(doc2, style='dep', jupyter=True, options={'distance':80})


# Stemming

In [0]:
import nltk
from nltk.stem.porter import PorterStemmer
from nltk.stem.snowball import SnowballStemmer

In [74]:
doc2

In nutshell, Pakistan came out of the cold to shoot itself in foot and give India yet another sweet victory. The previous one for India was in the International Court of Justice (ICJ) which stayed execution of Kulbhushan Jadhav, the former Indian Navy officer, and ordered review of the judgment of a Pakistan military court.

In [81]:
Porter = PorterStemmer()
Snowball = SnowballStemmer(language='english')
for token in doc2:
  print(token.text,'----->',Porter.stem(token.text),'|',Snowball.stem(token.text))

In -----> In | in
nutshell -----> nutshel | nutshel
, -----> , | ,
Pakistan -----> pakistan | pakistan
came -----> came | came
out -----> out | out
of -----> of | of
the -----> the | the
cold -----> cold | cold
to -----> to | to
shoot -----> shoot | shoot
itself -----> itself | itself
in -----> in | in
foot -----> foot | foot
and -----> and | and
give -----> give | give
India -----> india | india
yet -----> yet | yet
another -----> anoth | anoth
sweet -----> sweet | sweet
victory -----> victori | victori
. -----> . | .
The -----> the | the
previous -----> previou | previous
one -----> one | one
for -----> for | for
India -----> india | india
was -----> wa | was
in -----> in | in
the -----> the | the
International -----> intern | intern
Court -----> court | court
of -----> of | of
Justice -----> justic | justic
( -----> ( | (
ICJ -----> icj | icj
) -----> ) | )
which -----> which | which
stayed -----> stay | stay
execution -----> execut | execut
of -----> of | of
Kulbhushan -----> kulbh