
## Introduction to 
<img src="https://miro.medium.com/max/1200/1*HTtQseukwrBiREJf8MSVcA.jpeg" alt="Spacy Logo" style="width: 600px;"/>


- [Main Documentation Page](https://spacy.io/)  
- [How to install spaCy](https://spacy.io/usage)
- [spaCy 101, The most important concepts, explained in simple terms
](https://spacy.io/usage/spacy-101)
- [Free course- Advanced NLP with spaCy](https://course.spacy.io/)

### Without spaCy, Python is able to process text as a sequence of characters (called a string).  We can slice a string, we can add strings, replace sections of a string and many other tasks.  

See [w3schools string functions](https://www.w3schools.com/python/python_ref_string.asp)

Common examples for working with strings:

In [None]:
#Selecting a slice, selecting part of the string from [begin : end]
wilde = 'Be yourself; everyone else is already taken.'
print('the string "{}" has {} characters. Note that the index begins at 0.'.format(wilde, len(wilde)))

[print(i, charachter) for i, charachter in enumerate(wilde)][0]

In [None]:
print('wilde[4:12] will start at position 4 and end at 12 ->', wilde[4:12])
print('or read backwards from the end [-40: -32] ->', wilde[-40:-32])
print('you can even mix forward and backward! wilde[-40:12]', wilde[-40:12])

In [None]:
#Find and replace
wilde = 'Be yourself; everyone else is already taken.'
wilde.replace('yourself', 'a fish').replace('everyone', 'everything')

In [None]:
#Split 
wilde = 'Be yourself; everyone else is already taken.'
print(wilde.split()) # Split on empty spaces
print(wilde.split(';'))
print(wilde.split('y'))  #Note that the charachter or space used to split the string is removed

In [None]:
# We can join a list of strings
print(' '.join(['Be', 'yourself;', 'everyone', 'else', 'is', 'already', 'taken.']))

import random

animals = ['fish', 'turtle', 'panther', 'parrot']
adjective = ['scary', 'green', 'overweight', 'fluffy']
print('Be a ' + ' '.join([random.choice(adjective), random.choice(animals)]) + ' everything else is taken.')


### spaCy gives the machine an understanding of text, not just as a sequence of characters, but as natural language

[A full list of base languages](https://github.com/explosion/spaCy/tree/master/spacy/lang)




In [1]:
from spacy.lang.de import German

nlp = German()
doc = nlp('Sei du selbst! Alle anderen sind bereits vergeben.')


from spacy.lang.en import English 

nlp = English()
doc = nlp('Be yourself; everyone else is already taken.')

### The document object
Once we have imported a base language class or language model and a text, spaCy will create what is called a document (doc) object.  
The doc object typically contains:


|   [attributes](https://spacy.io/api/doc#attributes) |   | 
|---|---|
| tokens (individual parts of the text)  | doc[5]  |
|  the text  | doc.text
| the text split into sentences  | doc.sents |
| entities detected in the text | doc.ents |


Full documentation can be found [here](https://spacy.io/api/doc).


In [2]:
print('**Note the difference between working with a slice of a doc object versus a Python string**')

print(wilde[:3])
print(doc[:3])

print('**Also note how spaCy tokenization differs from Python split()**')
print('[*] Python:')
for token in wilde.split():
    print(token)
    
print('------')    
print('[*] spaCy:')
for token in doc:
    print(token)

**Note the difference between working with a slice of a doc object versus a Python string**


NameError: name 'wilde' is not defined

In [3]:
# The to_json() method is a useful way to view the information contained in the doc object

doc = nlp('''spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python.
If you’re working with a lot of text, you’ll eventually want to know more about it. For example, what’s it about? What do the words mean in context? Who is doing what to whom? What companies and products are mentioned? Which texts are similar to each other?
spaCy is designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning..''')

doc.to_json()

{'text': 'spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python.\nIf you’re working with a lot of text, you’ll eventually want to know more about it. For example, what’s it about? What do the words mean in context? Who is doing what to whom? What companies and products are mentioned? Which texts are similar to each other?\nspaCy is designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning..',
 'tokens': [{'id': 0, 'start': 0, 'end': 5},
  {'id': 1, 'start': 6, 'end': 8},
  {'id': 2, 'start': 9, 'end': 10},
  {'id': 3, 'start': 11, 'end': 15},
  {'id': 4, 'start': 15, 'end': 16},
  {'id': 5, 'start': 17, 'end': 21},
  {'id': 6, 'start': 21, 'end': 22},
  {'id': 7, 'start': 22, 'end': 28},
  {'id': 8, 'start': 29, 'end': 36},
  {'id': 9, 'start': 37, '

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/SpaCy_logo.svg/1200px-SpaCy_logo.svg.png" alt="Spacy Logo" style="width: 80px;"/>  
##  Tokens
As you can see above, the doc contains a split of the text into tokens.  Each token object has [65 attributes](https://spacy.io/api/token#attributes) that can be used during analysis.  Common tasks include:
- removing all punctuation from the text
- counting root forms of the words (lemmata)
- removing stopwords from the doc


|   [attributes](https://spacy.io/api/token#attributes) |   | 
|---|---|
| root form (lemma)  | token.lemma_  |
| Named entity type  | token.ent_type_ |
| token is punctuation  | token.is_punct |
| part of speech | token.pos_ |
| in stop words | token.is_stop |


Full documentation can be found [here](https://spacy.io/api/token#_title).


In [None]:
for token in doc:
    print(token.text,
         token.lemma_,
         token.pos_,
         token.dep_,
         token.shape_,
         token.is_stop)

In [None]:
# Useful function to make sense of linguistic terminology and abbreviations 
import spacy 

spacy.explain("PRON")


In [6]:
import spacy
nlp = spacy.load('ru_model0')
text = """Я, Франсуа Вийон, школяр, 
         В сем пятьдесят шестом году, 
         Поостудив сердечный жар, 
         И наложив на мысль узду, 
         И зная, что к концу иду, 
         Нашел, что время приглядеться 
         К себе и своему труду, 
         Как учит римлянин Вегеций.""" 
doc = nlp(text)

In [13]:
# This cell will print all of the attributes of a token.  Change the index to inspect difference tokens in the doc.

token_index = 0

import inspect
import random

attributes = inspect.getmembers(doc[token_index], lambda a:not(inspect.isroutine(a)))
output = ''
for attribute in attributes:
    try:
        print(attribute)
    except Exception as e:
        print(e)

('_', <spacy.tokens.underscore.Underscore object at 0x7f2843276310>)
('__class__', <class 'spacy.tokens.token.Token'>)
('__delattr__', <method-wrapper '__delattr__' of spacy.tokens.token.Token object at 0x7f284324c410>)
('__doc__', 'An individual token – i.e. a word, punctuation symbol, whitespace,\n    etc.\n\n    DOCS: https://spacy.io/api/token\n    ')
('__eq__', <method-wrapper '__eq__' of spacy.tokens.token.Token object at 0x7f284324c410>)
('__ge__', <method-wrapper '__ge__' of spacy.tokens.token.Token object at 0x7f284324c410>)
('__getattribute__', <method-wrapper '__getattribute__' of spacy.tokens.token.Token object at 0x7f284324c410>)
('__gt__', <method-wrapper '__gt__' of spacy.tokens.token.Token object at 0x7f284324c410>)
('__hash__', <method-wrapper '__hash__' of spacy.tokens.token.Token object at 0x7f284324c410>)
('__init__', <method-wrapper '__init__' of spacy.tokens.token.Token object at 0x7f284324c410>)
('__le__', <method-wrapper '__le__' of spacy.tokens.token.Token obje

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/SpaCy_logo.svg/1200px-SpaCy_logo.svg.png" alt="Spacy Logo" style="width: 80px;"/>  
##  Spans
When studying text, we are often interested in features that involve more than one token.  To do this, we can create a span.  For example, "New York City"

Span [attributes](https://spacy.io/api/span#attributes)

Full documentation can be found [here](https://spacy.io/api/span#_title). 

In [None]:
text = 'I just got back from New York City.'
nlp = English()
doc = nlp(text)

nyc = doc[5:8] 

print(
    '[*] spaCy',
    nyc.start,
    nyc.end,
    doc[nyc.start:nyc.end],
)
print(  
    '[*] string',
    nyc.start_char,
    nyc.end_char,
    text[nyc.start_char:nyc.end_char]
)

# Exercise: create individualized vocabularly lists 
At Haverford, we have an application called [the Bridge](https://bridge.haverford.edu/) that generates custom vocabulary lists for learning Latin and ancient Greek.  To do this, we create a list of words from texts that the student has already read and understood.  We then use the lemma of each word to compare the list of known words against words in a new text.  We can then identify which words will be new to the reader.

<img src='https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/J%C3%B3kai_M%C3%B3r_litogr%C3%A1fia.jpg/220px-J%C3%B3kai_M%C3%B3r_litogr%C3%A1fia.jpg'>

For current purposes, let's use texts in Hungarian. Let's say that I'm learning Hungarian and reading Mór Jókai's *The novel of the next century* (1872).  I have just finished book one and want to know what new words I will encounter when reading book two.   

>*Note* I am using Python sets to find the difference between the two books. I could also find the union, the intersection and other set operations.  For more on this topic, there is an excellent tutorial from [Real Python](https://realpython.com/python-sets/). 

In [None]:
# Here I use the requests library to get the texts from Project Gutenberg
import requests 
vol_1 = requests.get('http://www.gutenberg.org/files/55911/55911-0.txt')
vol_2 = requests.get('http://www.gutenberg.org/files/55912/55912-0.txt')    

In [None]:
from spacy.lang.hu import Hungarian 
nlp = Hungarian()
nlp.max_length = 1070000 # This is needed given the length of the text 

vol_1_doc = nlp(vol_1.text)

# Create a set of words that are not punctuation or stop words
vol_1_words = set([token.lemma_ for token in vol_1_doc if token.is_stop is False and token.is_punct is False])

vol_2_doc = nlp(vol_2.text)
vol_2_words = set([token.lemma_ for token in vol_2_doc if token.is_stop is False and token.is_punct is False])

new_words = vol_2_words.difference(vol_1_words)
len(new_words)


## Ouch, that's far too many words to learn!  Let's only count the 100 most frequent words and then create our list.

In [None]:
from spacy.tokens import Token
from collections import Counter

# Add an extension to our tokens called "count"
Token.set_extension("count", default=False, force=True)


# Calculate the number of times that a lemma appears in the text
counts = Counter([token.lemma_ for token in vol_1_doc if not token.is_punct and not token.is_stop]).most_common(100)
counts = dict(counts)

# Add the count to each token. 
vol_1_doc = nlp(vol_1.text)
for token in vol_1_doc:
    if token.lemma_ in counts.keys():
        token._.count = counts[token.lemma_]

# Repeat for the second text and find the difference 
counts = Counter([token.lemma_ for token in vol_2_doc if not token.is_punct and not token.is_stop]).most_common(100)
counts = dict(counts)

# I don't speak Hungarian, but these are clearly not words, let's get rid of them
del counts['\r\n']
del counts['\r\n\r\n']
del counts['-e']

vol_2_doc = nlp(vol_2.text)
for token in vol_2_doc:
    if token.lemma_ in counts.keys():
        token._.count = counts[token.lemma_]

# Now we find the difference between the most common words in the two texts        
set_vol1 = set([(token.lemma_, token._.count) for token in vol_1_doc if token._.count])
set_vol2 = set([(token.lemma_, token._.count) for token in vol_2_doc if token._.count])
difference = set_vol2.difference(set_vol1)
difference

## Bonus  
We might also want to add a dictionary definition to our vocabulary list. 

In [None]:
from IPython.core.display import display, HTML
                
# PyWiktionary https://pypi.org/project/pywiktionary/
from pywiktionary.wiktionary_parser_factory import WiktionaryParserFactory

parser_factory = WiktionaryParserFactory(default_language='en')
parser_factory_result = parser_factory.get_page('tesz')
display(HTML(parser_factory_result['response']['query']['pages']['264347']['extract']))

# Models 

What if we wanted to create a list of the 100 most freqent verbs or nouns in the text?  With the base Hungarian model, token.pos_ returns nothing. Also take a look at our lemmas. Are those really the root forms?  The basic Hungarian model simply does not know parts of speech or lemmata.  We need one that does. 

Here is a listing of the officially supported spaCy models: https://spacy.io/models
There are currently models for :
- English
- German
- French
- Spanish
- Portuguese
- Italian
- Dutch
- Greek
- Multi-language

The spaCy documentation lists the features and capabilities of each model.  Keep in mind that there can be several models for a language.  Larger models are often slower and require more memory. In exchange, the larger models are often more accurate and have more features such as word vectors, dependency parsing and other pipelines.   If you're not using the more advanced features of a large model, then you would probably be better off using something small.  As a general rule, it's best to start small and then deliberately move up as needed. 


To add a spaCy supported model, simply type: 
`python -m spacy download <name of model>` `en_core_web_sm` for example


In [None]:
import spacy

#English base language object
#nlp = English()

#English small language model
nlp = spacy.load('en_core_web_sm')


doc = nlp('Be yourself; everyone else is already taken.')
print(doc.text)
for token in doc:
    print(token.text, token.pos_, token.dep_)

There is a growing community of spaCy users.  There are dozens of spaCy-based projects in the [Universe](https://spacy.io/universe) as well as user-created language models.  If you visit [awesome-hungarian-nlp](https://github.com/oroszgy/awesome-hungarian-nlp), for example, you'll find a link to a spaCy Hungarian model [here](https://github.com/oroszgy/spacy-hungarian-models).

This is a full-featured model with
- Word vectors
- Brown clusters
- Token frequencies 
- Sentencizer
- PoS Tagger
- Lemmatizer
- Dependency parser

> If you are working locally, you'll need to install the model:  
> `pip install https://github.com/oroszgy/spacy-hungarian-models/releases/download/hu_core_ud_lg-0.2.0/hu_core_ud_lg-0.2.0-py3-none-any.whl`



In [None]:
import hu_core_ud_lg

nlp = spacy.load('hu_core_ud_lg')
doc = nlp('A jövo század regénye.')

In [None]:
print('token: ',[token.lemma_ for token in doc])
print('pos  : ',[token.pos_ for token in doc])

In [None]:
from collections import Counter
import pandas as pd
import spacy
import hu_core_ud_lg

nlp = spacy.load('hu_core_ud_lg')

text = '''A Szentgyörgy-Dunaágon a rakpartról egy alagút visz keresztül. Általában
mindenütt vasból öntött alagútak visznek át egyik partról a másikra, mik
könnyebben és gyorsabban elkészíthetők, mint a hidak. Hidat nem
építettek az Otthon város folyamain sehol, azért, mert mikor a Duna nagy
áradásban van, a hajókat csak eléjük kötött ærodromokkal lehet víz
ellenében felvontatni, s ezeknek járását a hidjármak akadályoznák; hanem
alagút minden utcza nyilásából vezet át a túlpartra. A Szentgyörgy-parti
fensíkon van a Duna-Delta őserdeje; most pompás népkertté átalakítva,
melyet a világ minden építési ízlése szerint alkotott nyári lakok
diszítenek. Ez a «fényűzés városa».'''

doc = nlp(text)
verb_counter = Counter(token.lemma_ for token in doc if token.pos_ == 'VERB')
df = pd.DataFrame(verb_counter.most_common(20))
df.columns = ['verb', 'count']
df

Now we can create a list of the most common new verbs we'll encounter in book 2 by adding `token.pos_=='VERB'`. Note that the large model requires 6GB of memory and cannot be run on our small virtual machine.  

```python
# Now we find the difference between the most common words in the two texts        
set_vol1 = set([(token.lemma_, token._.count) for token in vol_1_doc if token._.count and token.pos_=='VERB'])
set_vol2 = set([(token.lemma_, token._.count) for token in vol_2_doc if token._.count and token.pos_=='VERB'])
difference = set_vol2.difference(set_vol1)
difference
```

**Further Reading on Parts of Speech**  

[Johnathan Reeve, Isolating Literary Style with Raymond Queneau
](https://jonreeve.com/2019/09/exercises-in-style/) ([code notebook](https://gist.github.com/JonathanReeve/cacf9d874b405b621710a7436425af49))

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/SpaCy_logo.svg/1200px-SpaCy_logo.svg.png" alt="Spacy Logo" style="width: 80px;"/>  
##  Named Entity Recognition
Most of the models in spaCy have an entity recognizer.  This is similar to identifying parts of speech in the text, but greatly expands what we can automatically identify.  The types of entities and categories will vary from model to model and should be in the model's documentation. For most languages, the categories are: 

|   [named entities](https://spacy.io/api/annotation#ner-wikipedia-scheme) |   | 
|---|---|
| PER  | Named person or family  |
| ORG  | Named corporate, governmental, or other organizational entitity. |
| LOC  | Name of politically or grographically defined location (cities, provinces, countries, international regions, bodies of water, mountains).  |
| MISC | Miscellaneous entities, e.g. events, nationalities, products or works of art. |

[Here is a list of the categories in the spaCy small English model](https://spacy.io/api/annotation#named-entities)

[Here is a useful web application that can be used to assess the categories available in various spaCy models](https://explosion.ai/demos/displacy-ent)


Full documentation can be found [here](https://spacy.io/usage/linguistic-features#named-entities-101).

--- 

H.G. Wells, *The Invisible Man* (1897)
<img src="https://www.slashfilm.com/wp/wp-content/images/invisible-man-cast-new.jpg" alt="invisible man photo" style="width: 600px;"/>

In [None]:
import requests 
invisible_man = requests.get('http://www.gutenberg.org/cache/epub/5230/pg5230.txt')

In [None]:
import spacy
from spacy import displacy
import en_core_web_sm

nlp = spacy.load('en_core_web_sm')
doc = nlp(invisible_man.text)
displacy.render(doc, style="ent")

In [None]:
# list of people that appear in the text 
import pandas as pd
doc = nlp(invisible_man.text)
person_list = []
for ent in doc.ents:
    if ent.label_ == 'PERSON':
        person_list.append(ent.text.replace('\r','').replace('\n',''))

df = pd.DataFrame(set(person_list)) 
df.head(10)

In [None]:
# list of places that appear in the text 
import pandas as pd
doc = nlp(invisible_man.text)
place_list = []
for ent in doc.ents:
    if ent.label_ == 'GPE':
        place_list.append(ent.text)

df = pd.DataFrame(set(place_list)) 
df.head(10)

In [None]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
displacy.render(next(doc.sents), style="dep")

In [None]:
# Source https://github.com/pmbaumgartner/binder-notebooks/blob/master/holy-nlp.ipynb 

actors_and_actions = []

def token_is_subject_with_action(token):
    nsubj = token.dep_ == 'nsubj'
    head_verb = token.head.pos_ == 'VERB'
    person = token.ent_type_ == 'PERSON'
    return nsubj and head_verb and person

for token in doc:
    if token_is_subject_with_action(token):
        span = doc[token.head.left_edge.i:token.head.right_edge.i+1]
        data = dict(name=token.orth_,
                    span=span.text,
                    verb=token.head.lower_,
                    log_prob=token.head.prob,
                    )
        actors_and_actions.append(data)

print(len(actors_and_actions))

In [None]:
import pandas as pd

action_df = pd.DataFrame(actors_and_actions)

print('Unique Names:', action_df['name'].nunique())

most_common = (action_df
    .groupby(['name', 'verb'])
    .size()
    .groupby(level=0, group_keys=False)
    .nlargest(1)
    .rename('Count')
    .reset_index(level=0)
    .rename(columns={
        'verb': 'Most Common'
    })
)

# exclude log prob < -20, those indicate absence in the model vocabulary
most_unique = (action_df[action_df['log_prob'] > -20]
    .groupby(['name', 'verb'])['log_prob']
    .min()
    .groupby(level=0, group_keys=False)
    .nsmallest(1)
    .rename('Log Prob.')
    .reset_index(level = 0)
    .rename(columns={
        'verb': 'Most Unique'
    })
)

# SO groupby credit
# https: //stackoverflow.com/questions/27842613/pandas-groupby-sort-within-groups

In [None]:
most_common.sort_values('Count', ascending=False).head(15)


## European Literary Text Collection (ELTeC) languages 


| Language   | spaCy base class | spaCy model                                                                               | 
|------------|------------------|-------------------------------------------------------------------------------------------| 
| Croatian   | hr               | [hr_set(spacy_stanfordnlp)](https://stanfordnlp.github.io/stanfordnlp/models.html)         | 
| Czech      | cs               | [cs_cac(spacy_stanfordnlp)](https://stanfordnlp.github.io/stanfordnlp/models.html)        | 
| Dutch      | nl               | [nl_core_news_sm](https://spacy.io/models/nl#nl_core_news_sm)                             | 
| English    | en               | [en_core_web_md](https://spacy.io/models/en#en_core_web_md)                               | 
| French     | fr               | [fr_core_news_md](https://spacy.io/models/fr#fr_core_news_md)                             | 
| German     | de               | [de_core_news_md](https://spacy.io/models/de#de_core_news_md)                              | 
| Greek      | el               | [el_core_news_md](https://spacy.io/models/el#el_core_news_md)                             | 
| Hungarian  | hu               | [hu_core_ud_lg](https://github.com/oroszgy/spacy-hungarian-models)                        | 
| Italian    | it               | [it_core_news_sm](https://spacy.io/models/it#it_core_news_sm)                             | 
| Latvian    | lv               | [lv_lvtb(spacy_stanfordnlp)](https://stanfordnlp.github.io/stanfordnlp/models.html)       | 
| Norwegan   | nb               | [no_nynorsk(spacy_stanfordnlp)](https://stanfordnlp.github.io/stanfordnlp/models.html)    | 
| Polish     | pl               | [spaCy-pl](http://spacypl.sigmoidal.io/#home)                                             | 
| Portuguese | pt               | [pt_core_news_sm](https://spacy.io/models/pt#pt_core_news_sm)                             | 
| Romanian   | ro               | [ro_rrt (spacy_stanfordnlp)](https://stanfordnlp.github.io/stanfordnlp/models.html)       | 
| Russian    | ru               | [ru_syntagrus (spacy_stanfordnlp)](https://stanfordnlp.github.io/stanfordnlp/models.html) | 
| Serbian    | sr               | [sr_set (spacy_stanfordnlp)](https://stanfordnlp.github.io/stanfordnlp/models.html)       | 
| Slovenian  | sl               | [sl_ssj (spacy_stanfordnlp)](https://stanfordnlp.github.io/stanfordnlp/models.html)       | 
| Spanish    | es               | [es_core_news_md](https://spacy.io/models/es#es_core_news_md)                             | 
| Swedish    | sv               | [🤘 Lemmy](https://github.com/sorenlind/lemmy/)                                           | 


In [None]:
#Model języka polskiego
import spacy
from spacy import displacy 

nlp = spacy.load('pl_model')
doc = nlp("Bądź sobą; wszyscy inni są już zajęci.")
displacy.render(doc, style="dep")

## Working with Stanfordnlp models ![](https://pbs.twimg.com/profile_images/897182721272799232/0CplRl36_400x400.jpg)

[Documentation](https://stanfordnlp.github.io/stanfordnlp/installation_usage.html)

```
$ pip install stanfordnlp spacy-stanfordnlp

```

```python
import stanfordnlp
stanfordnlp.download('en')   # This downloads the English models for the neural pipeline


Using the default treebank "en_ewt" for language "en".
Would you like to download the models for: en_ewt now? (Y/n)
y

Default download directory: /home/ajanco/stanfordnlp_resources
Hit enter to continue or type an alternate directory.


Downloading models for: en_ewt
Download location: /home/ajanco/stanfordnlp_resources/en_ewt_models.zip
100%|██████████| 235M/235M [00:51<00:00, 4.92MB/s] 

Download complete.  Models saved to: /home/ajanco/stanfordnlp_resources/en_ewt_models.zip
Extracting models file for: en_ewt
Cleaning up...Done.
```

In [None]:
import stanfordnlp
from spacy_stanfordnlp import StanfordNLPLanguage

snlp = stanfordnlp.Pipeline(lang="en")
nlp = StanfordNLPLanguage(snlp)

doc = nlp('Be yourself; everyone else is already taken.')
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_)

![](https://spacy.io/architecture-bcdfffe5c0b9f221a2f6607f96ca0e4a.svg)

In [None]:
# Language object
from spacy.lang.es import Spanish
nlp = Spanish()

# Doc object
doc = nlp("La duda es uno de los nombres de la inteligencia.")

# Tokens
for token in doc:
    print(token)
    
# Spans
span = doc[0:2]
print(span)

In [None]:
import spacy 
# python -m spacy download es_core_news_sm

# Models
nlp = spacy.load("es_core_news_sm")
doc = nlp("La duda es uno de los nombres de la inteligencia.")

# Part of speech 
for token in doc:
    print(token.pos_)
    
# Entities 
for token in doc.ents:
    print(token.label_)