In [1]:
import spacy

# Load language library
nlp = spacy.load('en_core_web_sm')   # en_core_: English core,  web_sm: Small version of this language library

In [2]:
doc = nlp(u'Tesla is looking at buying U.S. statup for $6 Million')

What's actually going to happen here is,

using `nlp = spacy.load('en_core_web_sm')`

this language library we just loaded that Spacy developed... it's going to essentially parse this

`'Tesla is looking at buying U.S. statup for $6 Million'` 

entire string into seperate components for us and it's gonna parse it into <b>Tokens</b>.

In [3]:
for token in doc:
    print(token.text)

Tesla
is
looking
at
buying
U.S.
statup
for
$
6
Million


In [5]:
for token in doc:
    print(token.text, token.pos_)

Tesla PROPN
is AUX
looking VERB
at ADP
buying VERB
U.S. PROPN
statup NOUN
for ADP
$ SYM
6 NUM
Million NUM


What's special about this is :

Telse is not consider as a <b>Verb</b> but it is considered as <b>Propernoun</b>.

It also recognized Million as a <b>number</b> but `Not` as a <b>character</b>.

In [6]:
for token in doc:
    print(token.text, token.pos_, token.dep_)

Tesla PROPN nsubj
is AUX aux
looking VERB ROOT
at ADP prep
buying VERB pcomp
U.S. PROPN compound
statup NOUN dobj
for ADP prep
$ SYM quantmod
6 NUM compound
Million NUM pobj


## Pipeline

In [7]:
nlp.pipeline

[('tagger', <spacy.pipeline.pipes.Tagger at 0x249b408cd08>),
 ('parser', <spacy.pipeline.pipes.DependencyParser at 0x249a98f94c8>),
 ('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x249a98f9468>)]

ner : Name Entity Recognizer

In [8]:
nlp.pipe_names

['tagger', 'parser', 'ner']

### Tokenization

- Very first step in process any text is split it up all the components parts. That can be `words` and `punctuation` into tokens and these tokens are annotated inside the doc object to contain descriptive information.

In [9]:
doc2 = nlp(u"Tesla isn't looking into startups anymore.")

In [10]:
for token in doc2:
    print(token.text, token.pos_, token.dep_)

Tesla PROPN nsubj
is AUX aux
n't PART neg
looking VERB ROOT
into ADP prep
startups NOUN pobj
anymore ADV advmod
. PUNCT punct
