Get more info in the [spaCy docs](https://spacy.io) or learn more with the [Intro to spaCy course](https://course.spacy.io/en/chapter1)! 

Install spaCy and download the small English model `en_core_web_sm`.

In [None]:
!python -m pip install spacy
!python -m spacy download en_core_web_sm

In [1]:
import spacy
from spacy import displacy

In [2]:
# load the small English model
nlp = spacy.load("en_core_web_sm")

### Lexical Attributes
These attributes are also called lexical attributes: they refer to the entry in the vocabulary and don't depend on the token's context.

In [3]:
doc = nlp("It costs $5.")

print("Index: ", [token.i for token in doc]) 
print("Text: ", [token.text for token in doc])
 
print("is_alpha:", [token.is_alpha for token in doc]) 
print("is_punct:", [token.is_punct for token in doc]) 
print("like_num:", [token.like_num for token in doc])

Index:  [0, 1, 2, 3, 4]
Text:  ['It', 'costs', '$', '5', '.']
is_alpha: [True, True, False, False, False]
is_punct: [False, False, False, False, True]
like_num: [False, False, False, True, False]


### POS tags
Model predictions on part of speech tags, like Noun, Verb, etc.

https://spacy.io/usage/linguistic-features#pos-tagging

In [4]:
# Process a text
doc = nlp("She ate the pizza")

# Predicting POS tags
for token in doc:    
  # Print the text and the predicted part-of-speech tag    
  print(token.text, token.pos_)

She PRON
ate VERB
the DET
pizza NOUN


### Dependency parsing
In addition to the part-of-speech tags, we can also predict how the words are related. For example, whether a word is the subject of the sentence or an object.

https://spacy.io/usage/linguistic-features#dependency-parse

In [5]:
# predicting dependency graphs
for token in doc:    
  print(token.text, token.pos_, token.dep_, token.head.text)

displacy.render(doc, style="dep", jupyter=True)

She PRON nsubj ate
ate VERB ROOT ate
the DET det pizza
pizza NOUN dobj ate


### Named entities
Named entities are "real world objects" that are assigned a name – for example, a person, an organization or a country.
The doc.ents property lets you access the named entities predicted by the named entity recognition model.

https://spacy.io/usage/linguistic-features#named-entities

In [6]:
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

# Iterate over the predicted entities
for ent in doc.ents:    
  # Print the entity text and its label    
  print(ent.text, ent.label_)

displacy.render(doc, style="ent", jupyter=True)

Apple ORG
U.K. GPE
$1 billion MONEY


### Spans
Unlike named entities, which have clear token boundaries and are often comprised of the same syntactic units, spans can be overlapping and composed of arbitrary phrases.
The doc.spans property lets you access the predicted spans.

https://spacy.io/api/spancategorizer

In [7]:
from spacy.tokens import Span 
 
doc = nlp("Welcome to the Bank of China.") 

doc.spans["sc"] = [ 
Span(doc, 3, 6, "ORG"), 
Span(doc, 5, 6, "GPE"), 
] 

displacy.render(doc, style="span", jupyter=True)

### Lemmatizer
Assigns base forms to tokens.

https://spacy.io/usage/linguistic-features#lemmatization

In [8]:
doc = nlp("Apples are the best fruit.")

for token in doc:
  print(token.text, token.lemma_)

Apples apple
are be
the the
best good
fruit fruit
. .


### Sentencizer
Custom sentence boundary detection logic without dependency parsing.

https://spacy.io/api/sentencizer

In [9]:
nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")

<spacy.pipeline.sentencizer.Sentencizer at 0x7ffbb58c59c0>

In [10]:
doc = nlp("This is a sentence. This is another sentence.")

print("Number of sentences: ", len(list(doc.sents)))

for sent in doc.sents:
  print(sent)

Number of sentences:  2
This is a sentence.
This is another sentence.
