<h2 align="center">Spacy Language Processing Pipelines</h2>

<h3>Nlp blank pipeline

In [185]:
import spacy 

In [186]:
nlp = spacy.blank("en")
sentence = nlp("Apply the pipeline to some text. The text can span multiple sentences, and can contain arbitrary whitespace.")

In [187]:
for words in sentence:
    print(words)

Apply
the
pipeline
to
some
text
.
The
text
can
span
multiple
sentences
,
and
can
contain
arbitrary
whitespace
.


We get above error because we have a blank pipeline as shown below. Pipeline is something that starts with a Tokenizer. You can see there is nothing there hence the blank pipeline


In [188]:
nlp.pipe_names

[]

nlp.pipe_names is empty array indicating no components in the pipeline. Pipeline is something that starts with a tokenizer 

<h3>Trained pipeline

In [189]:
nl_p=spacy.load("en_core_web_sm")

In [190]:
nl_p.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [191]:
nl_p.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x1fad93470a0>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x1fad9347b80>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x1fad59dbb50>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x1fad872cc40>),
 ('lemmatizer', <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x1fad8775d40>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x1fad59dbae0>)]

In [192]:
doc = nl_p("Apply the pipeline to some text. The text can span multiple sentences, and can contain arbitrary whitespace.")

In [193]:
for words in sentence:
    print(words, ' | ', spacy.explain(words.pos_))

Apply  |  None
the  |  None
pipeline  |  None
to  |  None
some  |  None
text  |  None
.  |  None
The  |  None
text  |  None
can  |  None
span  |  None
multiple  |  None
sentences  |  None
,  |  None
and  |  None
can  |  None
contain  |  None
arbitrary  |  None
whitespace  |  None
.  |  None




In [194]:
for words in sentence.ents:
        print(words.text, " | ", words.label_, " | ", spacy.explain(words.label_))

The output to the code above is None and the code below didn't bring out any output because it is a blank pipeline and does not recognize the given pipe name attribute, while the code below will bring out the expected result because it is a trained pipeline.

In [195]:
for token in doc:
    print(token, ' | ', spacy.explain(token.pos_))

Apply  |  verb
the  |  determiner
pipeline  |  noun
to  |  adposition
some  |  determiner
text  |  noun
.  |  punctuation
The  |  determiner
text  |  noun
can  |  auxiliary
span  |  verb
multiple  |  adjective
sentences  |  noun
,  |  punctuation
and  |  coordinating conjunction
can  |  auxiliary
contain  |  verb
arbitrary  |  adjective
whitespace  |  noun
.  |  punctuation


In [196]:
doc = nl_p("I owe First Bank $1200")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

First Bank  |  ORG  |  Companies, agencies, institutions, etc.
1200  |  MONEY  |  Monetary values, including unit


In [197]:
from spacy import displacy

displacy.render(doc, style='ent')

It is possible to add a pipe name for example 'ner' from a trained pipeline to a blank pipeline.

In [198]:
nlp.add_pipe('ner', source=nl_p)
nlp.pipe_names

['ner']

In [199]:
nlp.add_pipe('lemmatizer', source=nl_p)
nlp.add_pipe('parser', source=nl_p)
nlp.add_pipe('tagger', source=nl_p)
nlp.pipe_names

['ner', 'lemmatizer', 'parser', 'tagger']

 Get all the proper nouns from a given text in a list and also count how many of them using a trained pipeline.
- **Proper Noun** means a noun that names a particular person, place, or thing.

In [200]:
text = ''' Ravi and Raju are the best friends from school days.They wanted to go for a world tour and 
visit famous cities like Paris, London, Dubai, Rome etc and also they called their another friend Mohan to take part of this world tour.
They started their journey from Hyderabad and spent next 3 months travelling all the wonderful cities in the world and cherish a happy moments!
'''

In [201]:
nl_p = spacy.load("en_core_web_sm")

In [202]:
e = nl_p(text)
proper_nouns = []
for result in e:
    if result.pos_ == 'PROPN':
        proper_nouns.append(result)

print('The Proper nouns are: ', proper_nouns)
print('There are {} Proper nouns in the text'.format(len(proper_nouns)))

The Proper nouns are:  [Ravi, Raju, Paris, London, Dubai, Rome, Mohan, Hyderabad]
There are 8 Proper nouns in the text


 Get all companies names from a given text and also the count of them.

In [203]:
text = ''' The Top 5 companies in USA are Tesla, Walmart, Amazon, Microsoft, Google and the top 5 companies in 
India are Infosys,  Reliance, HDFC Bank, Hindustan Unilever and Bharti Airtel'''


doc = nlp(text)
Company_names = []
for ent in doc.ents:
    if ent.label_ == 'ORG':
        Company_names.append(ent)
    
print('Company names: ', Company_names)
print('Count: ', len(Company_names))


Company names:  [Tesla, Walmart, Amazon, Microsoft, Google, Infosys, Reliance, HDFC Bank, Hindustan Unilever, Bharti Airtel]
Count:  10
