The first step in creating a `Doc` object is to break down the incoming text into component pieces or "tokens".

-  **Prefix**:	Character(s) at the beginning &#9656; `$ ( “ ¿`
-  **Suffix**:	Character(s) at the end &#9656; `km ) , . ! ”`
-  **Infix**:	Character(s) in between &#9656; `- -- / ...`
-  **Exception**: Special-case rule to split a string into several tokens or prevent a token from being split when punctuation rules are applied &#9656; `St. U.S.`

In [1]:
import spacy
nlp= spacy.load('en_core_web_sm')

In [2]:
mystring = '"We\'re moving to L.A.!"'
mystring

'"We\'re moving to L.A.!"'

In [4]:
print(mystring)

"We're moving to L.A.!"


In [3]:
doc = nlp(mystring)

In [4]:
for token in doc:
    print(token.text)

"
We
're
moving
to
L.A.
!
"


In [5]:
doc2 = nlp(u"We're here to help! send snail-mail, email support@oursite.com or visit us http://www.oursite.com!")

In [8]:
for t in doc2:
    print(t)

We
're
here
to
help
!
send
snail
-
mail
,
email
support@oursite.com
or
visit
us
http://www.oursite.com
!


In [11]:
doc3 = nlp(u"A 5 km NYC cab ride costs $10.30.")

In [12]:
for t in doc3:
    print(t)

A
5
km
NYC
cab
ride
costs
$
10.30
.


In [15]:
doc4 = nlp(u"Let's visit St. Louis in the U.S next year.")

In [16]:
for t in doc4:
    print(t)

Let
's
visit
St.
Louis
in
the
U.S
next
year
.


In [17]:
len(doc4)

11

In [19]:
len(doc4.vocab)

802

In [20]:
doc5 = nlp(u"It is better to give then receive.")

In [21]:
doc5[0]

It

In [22]:
doc5[2:5]

better to give

In [23]:
doc5[0] = 'test'

TypeError: 'spacy.tokens.doc.Doc' object does not support item assignment

In [24]:
doc6 = nlp(u"Apple to build a Hong Kong factory for $6 million")

In [25]:
for token in doc6:
    print(token.text,end=' | ')

Apple | to | build | a | Hong | Kong | factory | for | $ | 6 | million | 

### Named Entities

Going a step beyond tokens, *named entities* add another layer of context. The language model recognizes that certain words are organizational names while others are locations, and still other combinations relate to money, dates, etc. Named entities are accessible through the `ents` property of a `Doc` object.  

https://spacy.io/usage/linguistic-features#named-entities

In [31]:
for entity in doc6.ents:
    print(entity)
    print(entity.label_)
    print(str(spacy.explain(entity.label_)))
    print('\n')

Apple
ORG
Companies, agencies, institutions, etc.


Hong Kong
GPE
Countries, cities, states


$6 million
MONEY
Monetary values, including unit




## Noun Chunks
Similar to `Doc.ents`, `Doc.noun_chunks` are another object property. *Noun chunks* are "base noun phrases" – flat phrases that have a noun as their head. You can think of noun chunks as a noun plus the words describing the noun – for example, in [Sheb Wooley's 1958 song](https://en.wikipedia.org/wiki/The_Purple_People_Eater), a *"one-eyed, one-horned, flying, purple people-eater"* would be one long noun chunk.  

https://spacy.io/usage/linguistic-features#noun-chunks

In [32]:
doc7 = nlp(u"Autonomous cars shift insurance liability toward manufacturers.")

In [33]:
for chunk in doc7.noun_chunks:
    print(chunk)

Autonomous cars
insurance liability
manufacturers


### Built-in Visualizers

spaCy includes a built-in visualization tool called **displaCy**. displaCy is able to detect whether you're working in a Jupyter notebook, and will return markup that can be rendered in a cell right away. When you export your notebook, the visualizations will be included as HTML.

For more info visit https://spacy.io/usage/visualizers

In [34]:
from spacy import displacy

In [39]:
doc = nlp(u"Apple is going to build a U.K. factory for $6 million.")

In [44]:
displacy.render(doc, style='dep', jupyter=True, options={'distance':80})

In [49]:
doc = nlp(u"Over the last quarter Apple sold nearly 20 thousand iPods for a profit of $6 million.")

In [50]:
displacy.render(doc, style='ent', jupyter=True)

In [52]:
doc = nlp(u"This is a sentence.")
displacy.serve(doc,style='dep')


Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


In [54]:
#127.1.1.0:5000