**Spacy Language Processing Pipeline**

Blank NLP Pipeline

In [2]:
import spacy

In [3]:
nlp=spacy.blank('en')
doc=nlp("Captain america ate 100$ of samosa. Then he said I can do this all day.")
for token in doc:
  print(token)

Captain
america
ate
100
$
of
samosa
.
Then
he
said
I
can
do
this
all
day
.


In [4]:
nlp.pipe_names

[]

**Using Trained pipeline**

In [5]:
nlp=spacy.load('en_core_web_sm')
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [6]:
nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x79a11acb2ce0>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x79a11acb23e0>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x79a11ac302e0>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x79a11ac1af00>),
 ('lemmatizer',
  <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x79a11af3e740>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x79a11ac30270>)]

In [7]:
doc=nlp("Captain america ate 100$ of samosa. Then he said I can do this all day.")

In [8]:
for token in doc:
  print(token,'|',spacy.explain(token.pos_),'|',token.lemma_)

Captain | proper noun | Captain
america | proper noun | america
ate | verb | eat
100 | numeral | 100
$ | numeral | $
of | adposition | of
samosa | proper noun | samosa
. | punctuation | .
Then | adverb | then
he | pronoun | he
said | verb | say
I | pronoun | I
can | auxiliary | can
do | verb | do
this | pronoun | this
all | determiner | all
day | noun | day
. | punctuation | .


**Named Entity Recognition**

In [9]:
doc=nlp("Tesla Inc is going to acquire twitter for $45 billion")
for ent in doc.ents:
  print(ent.text,'|',ent.label_)


Tesla Inc | ORG
$45 billion | MONEY


In [17]:
from spacy import displacy
displacy.render(doc,style='ent')

'<div class="entities" style="line-height: 2.5; direction: ltr">\n<mark class="entity" style="background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Tesla Inc\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">ORG</span>\n</mark>\n is going to acquire twitter for \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    $45 billion\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">MONEY</span>\n</mark>\n</div>'

**Adding component to a blank pipeline**

In [26]:
source_nlp=spacy.load('en_core_web_sm')


In [27]:
nlp=spacy.blank('en')
nlp.add_pipe('ner',source=source_nlp)
nlp.pipe_names


['ner']

In [30]:
doc=nlp("Tesla Inc is going to acquire twitter for $45 billion")
for ent in doc.ents:
  print(ent.text,ent.label_)

Tesla Inc ORG
$45 billion MONEY


**Examples**

In [10]:
import spacy
nlp=spacy.load('en_core_web_sm')

To extract all pronouns from the text

In [13]:
text = '''Ravi and Raju are the best friends from school days.They wanted to go for a world tour and
visit famous cities like Paris, London, Dubai, Rome etc and also they called their another friend Mohan to take part of this world tour.
They started their journey from Hyderabad and spent next 3 months travelling all the wonderful cities in the world and cherish a happy moments!
'''
doc=nlp(text)
proper_noun=[]
for token in doc:
  if token.pos_=='PROPN':
    proper_noun.append(token)
proper_noun

[Raju, Paris, London, Dubai, Rome, Mohan, Hyderabad]

Extracting all Company names and Count of all company names

In [27]:
text = '''The Top 5 companies in USA are Tesla, Walmart, Amazon, Microsoft, Google and the top 5 companies in
India are Infosys, Reliance, HDFC Bank, Hindustan Unilever and Bharti Airtel'''
doc=nlp(text)
all_company_names=[]
for ent in doc.ents:
  if ent.label_=='ORG':
    all_company_names.append(ent)
print(all_company_names)
count=len(all_company_names)
count

[Tesla, Walmart, Amazon, Microsoft, Google, Infosys, Reliance, HDFC Bank, Hindustan Unilever, Bharti]


10