### Spacy Pipeline Introduction
#### It is based on https://spacy.io/usage/linguistic-features

In [3]:
import spacy

In [4]:
nlp=spacy.load('en_core_web_sm')

In [6]:
# Text is take from Spacy website
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
# doc is now nlp object

In [22]:
## Text	Lemma	POS	Tag	Dep	Shape	alpha	stop
for token in doc:
    print(f'{token.text:{10}}{token.lemma_:{10}} {token.pos_:{10}}{token.tag_:{10}}{token.dep_:{10}}{token.shape_:{5}}{token.is_alpha:{3}} {token.is_stop:{3}}')

Apple     Apple      PROPN     NNP       nsubj     Xxxxx  1   0
is        be         VERB      VBZ       aux       xx     1   1
looking   look       VERB      VBG       ROOT      xxxx   1   0
at        at         ADP       IN        prep      xx     1   1
buying    buy        VERB      VBG       pcomp     xxxx   1   0
U.K.      U.K.       PROPN     NNP       compound  X.X.   0   0
startup   startup    NOUN      NN        dobj      xxxx   1   0
for       for        ADP       IN        prep      xxx    1   1
$         $          SYM       $         quantmod  $      0   0
1         1          NUM       CD        compound  d      0   0
billion   billion    NUM       CD        pobj      xxxx   1   0


In [23]:
for token in doc:
    print(f'{token.text:{10}}{token.lemma_:{10}} {token.pos_:{10}}{token.is_stop:{10}}')

Apple     Apple      PROPN              0
is        be         VERB               1
looking   look       VERB               0
at        at         ADP                1
buying    buy        VERB               0
U.K.      U.K.       PROPN              0
startup   startup    NOUN               0
for       for        ADP                1
$         $          SYM                0
1         1          NUM                0
billion   billion    NUM                0


### Dependency Parser

In [33]:
for chunk in doc.noun_chunks:
    print(f'{chunk.text:{20}}{chunk.root.text:{10}} {chunk.root.dep_:{10}}')

Apple               Apple      nsubj     
U.K. startup        startup    dobj      


### Named Entity Recognition

In [38]:
for ent in doc.ents:
    print(f'{ent.text:{20}}{ent.label_:{10}}')

Apple               ORG       
U.K.                GPE       
$1 billion          MONEY     


### Sentence Segmentation 

In [45]:
doc1 = nlp('Lemmatization is the process of grouping together the different inflected forms of a word. Lemmatization is similar to stemming but it brings context to the words. So it links words with similar meaning to one word. ')

In [46]:
doc1

Lemmatization is the process of grouping together the different inflected forms of a word. Lemmatization is similar to stemming but it brings context to the words. So it links words with similar meaning to one word. 

In [47]:
for sent in doc1.sents:
    print(sent)

Lemmatization is the process of grouping together the different inflected forms of a word.
Lemmatization is similar to stemming
but it brings context to the words.
So it links words with similar meaning to one word.


In [66]:
doc2 = nlp('Hello..This is simple nlp program')

In [67]:
for sent in doc2.sents:
    print(sent)

Hello..This is simple nlp program


In [59]:
## now doc2 is having two sentence. we need to define custom rule to segment the sentence.

In [69]:
def cust_rule(doc):
    for token in doc[:-1]:
        if token.text == '..':
            doc[token.i+1].is_sent_start = True
    return doc

In [70]:
nlp.add_pipe(cust_rule,before = 'parser')
doc2 = nlp('Hello..This is simple nlp program')

In [71]:
for sent in doc2.sents:
    print(sent)

Hello..
This is simple nlp program


### Dependency Parse

In [73]:
from spacy import displacy

In [74]:
doc

Apple is looking at buying U.K. startup for $1 billion

In [75]:
displacy.render(doc,style='dep')

In [77]:
## displacy in another style
displacy.render(doc,style='dep',options = {'compact':True})

In [79]:
## displacy in entity
displacy.render(doc,style='ent')