In [2]:
import spacy
nlp = spacy.blank("en")

doc = nlp("Captain america ate 100$ of samosa. Then he said I can do this all day.")

for token in doc:
    print(token)

Captain
america
ate
100
$
of
samosa
.
Then
he
said
I
can
do
this
all
day
.


In [3]:
nlp.pipe_names

[]

nlp.pipe_names is empty array indicating no components in the pipeline. Pipeline is something that starts with a tokenizer

In [5]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     --------------------------------------- 0.1/12.8 MB 656.4 kB/s eta 0:00:20
     --------------------------------------- 0.1/12.8 MB 656.4 kB/s eta 0:00:20
     --------------------------------------- 0.1/12.8 MB 393.8 kB/s eta 0:00:33
     --------------------------------------- 0.1/12.8 MB 350.1 kB/s eta 0:00:37
     --------------------------------------- 0.1/12.8 MB 437.6 kB/s eta 0:00:30
     --------------------------------------- 0.1/12.8 MB 426.7 kB/s eta 0:00:30
     --------------------------------------- 0.2/12.8 MB 399.3 kB/s eta 0:00:32
      -------------------------------------- 0.2/12.8 MB 454.0 kB/s eta 0:00:28
      ----------------------------------

In [6]:
nlp = spacy.load("en_core_web_sm")
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [7]:
nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x264f18da450>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x264e9fc9430>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x264f19fa880>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x264f2d31410>),
 ('lemmatizer', <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x264f2d1b7d0>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x264e9d5d380>)]

In [8]:

doc = nlp("Captain america ate 100$ of samosa. Then he said I can do this all day.")

for token in doc:
    print(token, " | ", spacy.explain(token.pos_), " | ", token.lemma_)

Captain  |  proper noun  |  Captain
america  |  proper noun  |  america
ate  |  verb  |  eat
100  |  numeral  |  100
$  |  numeral  |  $
of  |  adposition  |  of
samosa  |  proper noun  |  samosa
.  |  punctuation  |  .
Then  |  adverb  |  then
he  |  pronoun  |  he
said  |  verb  |  say
I  |  pronoun  |  I
can  |  auxiliary  |  can
do  |  verb  |  do
this  |  pronoun  |  this
all  |  determiner  |  all
day  |  noun  |  day
.  |  punctuation  |  .


**Named Entity Recognition**

In [10]:
doc = nlp("Tesla Inc is going to acquire twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ",  ent.label_, " | ", spacy.explain(ent.label_))

Tesla Inc  |  ORG  |  Companies, agencies, institutions, etc.
$45 billion  |  MONEY  |  Monetary values, including unit


In [11]:
from spacy import displacy

displacy.render(doc, style="ent")

**Adding a Component to a blank pipeline**

In [13]:
source_nlp = spacy.load("en_core_web_sm")

nlp = spacy.blank("en")
nlp.add_pipe("ner", source = source_nlp)
nlp.pipe_names

['ner']

In [15]:
doc = nlp("Tesla Inc is going to acquire twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla Inc  |  ORG
$45 billion  |  MONEY
