<h2 align="center">Spacy Language Processing Pipelines Tutorial</h2>

<h3>Blank nlp pipeline</h3>

In [2]:
import spacy

nlp = spacy.blank("en")

doc = nlp("Mrs.Naegle graded 5 assignments. Then she said I can do this all day.")

for token in doc:
    print(token)

Mrs.
Naegle
graded
5
assignments
.
Then
she
said
I
can
do
this
all
day
.


Theres an error because we have a blank pipeline as shown below.

In [3]:
nlp.pipe_names

[]

nlp.pipe_names is empty array, indicating that there are components in the pipeline. A Pipeline is something that starts with a tokenizer 

<h3>Download trained pipeline</h3>

To download trained pipeline use a command such as,

python -m spacy download en_core_web_sm

This downloads the small (sm) pipeline for english language

Further instructions on : https://spacy.io/usage/models#quickstart

In [5]:
nlp = spacy.load("en_core_web_sm")
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [6]:
nlp.pipeline

[('tok2vec', <spacy.pipeline.tok2vec.Tok2Vec at 0x24eda7ace90>),
 ('tagger', <spacy.pipeline.tagger.Tagger at 0x24eda7ac470>),
 ('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x24eda7a7f40>),
 ('attribute_ruler',
  <spacy.pipeline.attributeruler.AttributeRuler at 0x24edbc73750>),
 ('lemmatizer', <spacy.lang.en.lemmatizer.EnglishLemmatizer at 0x24edbc73350>),
 ('ner', <spacy.pipeline.ner.EntityRecognizer at 0x24eda7a7d80>)]

In [7]:
doc = nlp("Mrs.Naegle graded 5 assignments. Then she said I can do this all day.")

for token in doc:
    print(token, " | ", spacy.explain(token.pos_), " | ", token.lemma_)

Mrs.  |  proper noun  |  Mrs.
Naegle  |  proper noun  |  Naegle
graded  |  verb  |  grade
5  |  numeral  |  5
assignments  |  noun  |  assignment
.  |  punctuation  |  .
Then  |  adverb  |  then
she  |  pronoun  |  she
said  |  verb  |  say
I  |  pronoun  |  I
can  |  auxiliary  |  can
do  |  verb  |  do
this  |  pronoun  |  this
all  |  determiner  |  all
day  |  noun  |  day
.  |  punctuation  |  .


<h3>Named Entity Recognition</h3>

In [8]:
doc = nlp("Tesla Inc is going to acquire twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, ent.label_)

Tesla Inc ORG
$45 billion MONEY


In [9]:
from spacy import displacy

displacy.render(doc, style="ent")