In [1]:
import spacy

# Load language library
nlp = spacy.load('en_core_web_sm')   # en_core_: English core,  web_sm: Small version of this language library

In [2]:
doc = nlp(u'Tesla is looking at buying U.S. statup for $6 Million')  # u : Unicode string

What's actually going to happen here is,

using `nlp = spacy.load('en_core_web_sm')`

this language library we just loaded that Spacy developed... it's going to essentially parse this

`'Tesla is looking at buying U.S. statup for $6 Million'` 

entire string into seperate components for us and it's gonna parse it into <b>Tokens</b>.

In [3]:
for token in doc:
    print(token.text)

Tesla
is
looking
at
buying
U.S.
statup
for
$
6
Million


Each of this words which are present is divided into <b>Tokens</b>.

## token.pos_

- Token Part of speech

In [4]:
for token in doc:
    print(token.text, token.pos_)

Tesla PROPN
is AUX
looking VERB
at ADP
buying VERB
U.S. PROPN
statup NOUN
for ADP
$ SYM
6 NUM
Million NUM


What's special about this is :

Telsa is not consider as a <b>Verb</b> but it is considered as <b>Propernoun</b>.

It also recognized Million as a <b>number</b> but `Not` as a <b>character</b>.

## token.dep_

- Token Syntactic dependency

In [5]:
for token in doc:
    print(token.text, token.pos_, token.dep_)

Tesla PROPN nsubj
is AUX aux
looking VERB ROOT
at ADP prep
buying VERB pcomp
U.S. PROPN compound
statup NOUN dobj
for ADP prep
$ SYM quantmod
6 NUM compound
Million NUM pobj


## Pipeline

In [6]:
nlp.pipeline

[('tagger', <spacy.pipeline.pipes.Tagger at 0x21f0f995e08>),
 ('parser', <spacy.pipeline.pipes.DependencyParser at 0x21f06483fa8>),
 ('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x21f0f99e048>)]

ner : Name Entity Recognizer

In [7]:
nlp.pipe_names

['tagger', 'parser', 'ner']

### Tokenization

- Very first step in process any text is split it up all the components parts. That can be `words` and `punctuation` into <b>Tokens</b> and these tokens are annotated inside the doc object to contain descriptive information.

In [11]:
doc2 = nlp(u"Tesla isn't    looking into startups anymore.")

In [12]:
for token in doc2:
    print(token.text, token.pos_, token.dep_)

Tesla PROPN nsubj
is AUX aux
n't PART neg
    SPACE 
looking VERB ROOT
into ADP prep
startups NOUN pobj
anymore ADV advmod
. PUNCT punct


#### Using indexing to grab the `Tokens` individually.

In [14]:
print(doc2[0])

print(doc2[0].pos_)

print(doc2[0].dep_)

Tesla
PROPN
nsubj


## Spans

In [15]:
doc3 = nlp(u'Although commmonly attributed to John Lennon from his song "Beautiful Boy", \
the phrase "Life is what happens to us while we are making other plans" was written by \
cartoonist Allen Saunders and published in Reader\'s Digest in 1957, when Lennon was 17.')

In [16]:
life_quotes = doc3[16:30]

In [17]:
print(life_quotes)

"Life is what happens to us while we are making other plans"


In [18]:
type(life_quotes)

spacy.tokens.span.Span

In [19]:
type(doc3)

spacy.tokens.doc.Doc

In [20]:
doc4 = nlp(u"This is the first sentence. This is the another sentence. This is the last sentence.")

In [21]:
for sen in doc4.sents:
    print(sen)

This is the first sentence.
This is the another sentence.
This is the last sentence.


In [28]:
for sen in doc4.sents:
    print(sen[0])

This
This
This


In [29]:
doc4[6].is_sent_start

True

In [30]:
doc4[7]

is

In [31]:
doc4[8]

the

In [32]:
doc4[8].is_sent_start

This will return none bcz it is not the start of the sentence.