# <h1><center>Spacy Basics

In [0]:
import spacy

Loading the model

In [0]:
nlp = spacy.load('en_core_web_sm')  #en_core_web_sm = core english language smaller versions

In [0]:
doc = nlp(u'Tesla is looking at buying U.S. startup for $6 millions')  # u represents unary format

In [6]:
for token in doc:
  print(token.text)

Tesla
is
looking
at
buying
U.S.
startup
for
$
6
millions


In [9]:
for token in doc:
  print(token.text,token.pos)  # pos = part of speech

Tesla 96
is 87
looking 100
at 85
buying 100
U.S. 96
startup 92
for 85
$ 99
6 93
millions 92


In [10]:
for token in doc:
  print(token.text,token.pos_)

Tesla PROPN
is AUX
looking VERB
at ADP
buying VERB
U.S. PROPN
startup NOUN
for ADP
$ SYM
6 NUM
millions NOUN


PROPN = Proper Noun
AUX = auxillary verb, etc


In [11]:
for token in doc:
  print(token.text,token.dep_)   #dep_ = syntactic dependency

Tesla nsubj
is aux
looking ROOT
at prep
buying pcomp
U.S. compound
startup dobj
for prep
$ quantmod
6 nummod
millions pobj


Creating a pipeline

In [12]:
nlp.pipeline

[('tagger', <spacy.pipeline.pipes.Tagger at 0x7f8aa8901f98>),
 ('parser', <spacy.pipeline.pipes.DependencyParser at 0x7f8aa8759648>),
 ('ner', <spacy.pipeline.pipes.EntityRecognizer at 0x7f8aa87596a8>)]

In [13]:
nlp.pipe_names

['tagger', 'parser', 'ner']

___
## Spans
Large Doc objects can be hard to work with at times. A **span** is a slice of Doc object in the form `Doc[start:stop]`.

In [0]:
doc3 = nlp(u'Although commmonly attributed to John Lennon from his song "Beautiful Boy", \
the phrase "Life is what happens to us while we are making other plans" was written by \
cartoonist Allen Saunders and published in Reader\'s Digest in 1957, when Lennon was 17.')

In [15]:
life_quote = doc3[16:30]
print(life_quote)

"Life is what happens to us while we are making other plans"


In [16]:
type(life_quote)

spacy.tokens.span.Span

In [17]:
type(doc3)

spacy.tokens.doc.Doc

___
## Sentences
Certain tokens inside a Doc object may also receive a "start of sentence" tag. While this doesn't immediately build a list of sentences, these tags enable the generation of sentence segments through `Doc.sents`. Later we'll write our own segmentation rules.

In [0]:
doc4 = nlp(u'This is the first sentence. This is another sentence. This is the last sentence.')

In [19]:
for sent in doc4.sents:
    print(sent)

This is the first sentence.
This is another sentence.
This is the last sentence.


In [20]:
doc4[6].is_sent_start

True

In [0]:
doc4[7].is_sent_start