## Parts of Speech Tagging using spaCy

Parts of Speech tagging is the next step of the tokenization. Once we have done tokenization, spaCy can parse and tag a given Doc. **spaCy** is pre-trained using statistical modelling. This model consists of binary data and is trained on enough examples to make predictions that generalize across the language. Example, a word following ‚Äúthe‚Äù in English is most likely a noun.


It is always challenging to find the correct **parts of speech** due to the following reasons:

* Enabling machine to understand and process raw text is not easy. 
* Same word plays differently in different context of a sentence.
* Sometime words which are completely different, tells almost the same meaning.
* Even splitting text into useful word-like units can be difficult in many languages. 
* While it‚Äôs possible to solve some problems starting from only the raw characters, it‚Äôs usually better to use linguistic knowledge to add useful information. 
* **That‚Äôs exactly what *spaCy* is designed to do: you put in raw text, and get back a Doc object, that comes with a variety of annotations.**

Reference <a href = "https://spacy.io/">spacy.io/</a>

In this section we'll cover **coarse POS tags (noun, verb, adjective)**,  **fine-grained tags (plural noun, past-tense verb, superlative adjective** and **Dependency Parsing**

## Coarse-grained Part-of-speech Tags
Every token is assigned a POS Tag from the following list:


<table><tr><th>POS</th><th>DESCRIPTION</th><th>EXAMPLES</th></tr>
    
<tr><td>ADJ</td><td>adjective</td><td>*big, old, green, incomprehensible, first*</td></tr>
<tr><td>ADP</td><td>adposition</td><td>*in, to, during*</td></tr>
<tr><td>ADV</td><td>adverb</td><td>*very, tomorrow, down, where, there*</td></tr>
<tr><td>AUX</td><td>auxiliary</td><td>*is, has (done), will (do), should (do)*</td></tr>
<tr><td>CONJ</td><td>conjunction</td><td>*and, or, but*</td></tr>
<tr><td>CCONJ</td><td>coordinating conjunction</td><td>*and, or, but*</td></tr>
<tr><td>DET</td><td>determiner</td><td>*a, an, the*</td></tr>
<tr><td>INTJ</td><td>interjection</td><td>*psst, ouch, bravo, hello*</td></tr>
<tr><td>NOUN</td><td>noun</td><td>*girl, cat, tree, air, beauty*</td></tr>
<tr><td>NUM</td><td>numeral</td><td>*1, 2017, one, seventy-seven, IV, MMXIV*</td></tr>
<tr><td>PART</td><td>particle</td><td>*'s, not,*</td></tr>
<tr><td>PRON</td><td>pronoun</td><td>*I, you, he, she, myself, themselves, somebody*</td></tr>
<tr><td>PROPN</td><td>proper noun</td><td>*Mary, John, London, NATO, HBO*</td></tr>
<tr><td>PUNCT</td><td>punctuation</td><td>*., (, ), ?*</td></tr>
<tr><td>SCONJ</td><td>subordinating conjunction</td><td>*if, while, that*</td></tr>
<tr><td>SYM</td><td>symbol</td><td>*$, %, ¬ß, ¬©, +, ‚àí, √ó, √∑, =, :), üòù*</td></tr>
<tr><td>VERB</td><td>verb</td><td>*run, runs, running, eat, ate, eating*</td></tr>
<tr><td>X</td><td>other</td><td>*sfpksdpsxmsa*</td></tr>
<tr><td>SPACE</td><td>space</td></tr>

___
## Fine-grained Part-of-speech Tags
Tokens are subsequently given a fine-grained tag as determined by morphology:
<table>
<tr><th>POS</th><th>Description</th><th>Fine-grained Tag</th><th>Description</th><th>Morphology</th></tr>
<tr><td>ADJ</td><td>adjective</td><td>AFX</td><td>affix</td><td>Hyph=yes</td></tr>
<tr><td>ADJ</td><td></td><td>JJ</td><td>adjective</td><td>Degree=pos</td></tr>
<tr><td>ADJ</td><td></td><td>JJR</td><td>adjective, comparative</td><td>Degree=comp</td></tr>
<tr><td>ADJ</td><td></td><td>JJS</td><td>adjective, superlative</td><td>Degree=sup</td></tr>
<tr><td>ADJ</td><td></td><td>PDT</td><td>predeterminer</td><td>AdjType=pdt PronType=prn</td></tr>
<tr><td>ADJ</td><td></td><td>PRP\$</td><td>pronoun, possessive</td><td>PronType=prs Poss=yes</td></tr>
<tr><td>ADJ</td><td></td><td>WDT</td><td>wh-determiner</td><td>PronType=int rel</td></tr>
<tr><td>ADJ</td><td></td><td>WP\$</td><td>wh-pronoun, possessive</td><td>Poss=yes PronType=int rel</td></tr>
<tr><td>ADP</td><td>adposition</td><td>IN</td><td>conjunction, subordinating or preposition</td><td></td></tr>
<tr><td>ADV</td><td>adverb</td><td>EX</td><td>existential there</td><td>AdvType=ex</td></tr>
<tr><td>ADV</td><td></td><td>RB</td><td>adverb</td><td>Degree=pos</td></tr>
<tr><td>ADV</td><td></td><td>RBR</td><td>adverb, comparative</td><td>Degree=comp</td></tr>
<tr><td>ADV</td><td></td><td>RBS</td><td>adverb, superlative</td><td>Degree=sup</td></tr>
<tr><td>ADV</td><td></td><td>WRB</td><td>wh-adverb</td><td>PronType=int rel</td></tr>
<tr><td>CONJ</td><td>conjunction</td><td>CC</td><td>conjunction, coordinating</td><td>ConjType=coor</td></tr>
<tr><td>DET</td><td>determiner</td><td>DT</td><td>determiner</td><td></td></tr>
<tr><td>INTJ</td><td>interjection</td><td>UH</td><td>interjection</td><td></td></tr>
<tr><td>NOUN</td><td>noun</td><td>NN</td><td>noun, singular or mass</td><td>Number=sing</td></tr>
<tr><td>NOUN</td><td></td><td>NNS</td><td>noun, plural</td><td>Number=plur</td></tr>
<tr><td>NOUN</td><td></td><td>WP</td><td>wh-pronoun, personal</td><td>PronType=int rel</td></tr>
<tr><td>NUM</td><td>numeral</td><td>CD</td><td>cardinal number</td><td>NumType=card</td></tr>
<tr><td>PART</td><td>particle</td><td>POS</td><td>possessive ending</td><td>Poss=yes</td></tr>
<tr><td>PART</td><td></td><td>RP</td><td>adverb, particle</td><td></td></tr>
<tr><td>PART</td><td></td><td>TO</td><td>infinitival to</td><td>PartType=inf VerbForm=inf</td></tr>
<tr><td>PRON</td><td>pronoun</td><td>PRP</td><td>pronoun, personal</td><td>PronType=prs</td></tr>
<tr><td>PROPN</td><td>proper noun</td><td>NNP</td><td>noun, proper singular</td><td>NounType=prop Number=sign</td></tr>
<tr><td>PROPN</td><td></td><td>NNPS</td><td>noun, proper plural</td><td>NounType=prop Number=plur</td></tr>
<tr><td>PUNCT</td><td>punctuation</td><td>-LRB-</td><td>left round bracket</td><td>PunctType=brck PunctSide=ini</td></tr>
<tr><td>PUNCT</td><td></td><td>-RRB-</td><td>right round bracket</td><td>PunctType=brck PunctSide=fin</td></tr>
<tr><td>PUNCT</td><td></td><td>,</td><td>punctuation mark, comma</td><td>PunctType=comm</td></tr>
<tr><td>PUNCT</td><td></td><td>:</td><td>punctuation mark, colon or ellipsis</td><td></td></tr>
<tr><td>PUNCT</td><td></td><td>.</td><td>punctuation mark, sentence closer</td><td>PunctType=peri</td></tr>
<tr><td>PUNCT</td><td></td><td>''</td><td>closing quotation mark</td><td>PunctType=quot PunctSide=fin</td></tr>
<tr><td>PUNCT</td><td></td><td>""</td><td>closing quotation mark</td><td>PunctType=quot PunctSide=fin</td></tr>
<tr><td>PUNCT</td><td></td><td>``</td><td>opening quotation mark</td><td>PunctType=quot PunctSide=ini</td></tr>
<tr><td>PUNCT</td><td></td><td>HYPH</td><td>punctuation mark, hyphen</td><td>PunctType=dash</td></tr>
<tr><td>PUNCT</td><td></td><td>LS</td><td>list item marker</td><td>NumType=ord</td></tr>
<tr><td>PUNCT</td><td></td><td>NFP</td><td>superfluous punctuation</td><td></td></tr>
<tr><td>SYM</td><td>symbol</td><td>#</td><td>symbol, number sign</td><td>SymType=numbersign</td></tr>
<tr><td>SYM</td><td></td><td>\$</td><td>symbol, currency</td><td>SymType=currency</td></tr>
<tr><td>SYM</td><td></td><td>SYM</td><td>symbol</td><td></td></tr>
<tr><td>VERB</td><td>verb</td><td>BES</td><td>auxiliary "be"</td><td></td></tr>
<tr><td>VERB</td><td></td><td>HVS</td><td>forms of "have"</td><td></td></tr>
<tr><td>VERB</td><td></td><td>MD</td><td>verb, modal auxiliary</td><td>VerbType=mod</td></tr>
<tr><td>VERB</td><td></td><td>VB</td><td>verb, base form</td><td>VerbForm=inf</td></tr>
<tr><td>VERB</td><td></td><td>VBD</td><td>verb, past tense</td><td>VerbForm=fin Tense=past</td></tr>
<tr><td>VERB</td><td></td><td>VBG</td><td>verb, gerund or present participle</td><td>VerbForm=part Tense=pres Aspect=prog</td></tr>
<tr><td>VERB</td><td></td><td>VBN</td><td>verb, past participle</td><td>VerbForm=part Tense=past Aspect=perf</td></tr>
<tr><td>VERB</td><td></td><td>VBP</td><td>verb, non-3rd person singular present</td><td>VerbForm=fin Tense=pres</td></tr>
<tr><td>VERB</td><td></td><td>VBZ</td><td>verb, 3rd person singular present</td><td>VerbForm=fin Tense=pres Number=sing Person=3</td></tr>
<tr><td>X</td><td>other</td><td>ADD</td><td>email</td><td></td></tr>
<tr><td>X</td><td></td><td>FW</td><td>foreign word</td><td>Foreign=yes</td></tr>
<tr><td>X</td><td></td><td>GW</td><td>additional word in multi-word expression</td><td></td></tr>
<tr><td>X</td><td></td><td>XX</td><td>unknown</td><td></td></tr>
<tr><td>SPACE</td><td>space</td><td>_SP</td><td>space</td><td></td></tr>
<tr><td></td><td></td><td>NIL</td><td>missing tag</td><td></td></tr>
</table>

**Now Let's understand POS with some examples**

In [1]:
# import spacy and load english language model
import spacy
nlp = spacy.load('pt_core_news_lg')

In [2]:
# Create a simple Doc object
doc_texto = '''Defendida por agentes do mercado financeiro e uma das bandeiras da equipe econ√¥mica do governo Jair Bolsonaro, o projeto de autonomia do Banco Central (BC) deve avan√ßar na C√¢mara s√≥ ap√≥s a reforma tribut√°ria andar, no que depender do presidente da Casa, Rodrigo Maia (DEM-RJ). Para ele, o projeto sobre a institui√ß√£o presidida por Roberto Campos Neto n√£o √© urgente no curto prazo.

"Aceito votar autonomia do Banco, aceito, √© claro, votar os dep√≥sitos volunt√°rios, mas a√≠ temos que organizar melhor a pauta at√© o fim do ano. √â s√≥ o governo ter boa vontade na reforma tribut√°ria", disse Maia, ao participar de evento organizado pelo Ita√∫. "A reforma tribut√°ria tem import√¢ncia muito maior que autonomia do Banco Central", comentou.

O projeto de autonomia do BC foi aprovado na ter√ßa-feira, 3, pelo Senado e agora precisa do aval dos deputados para virar lei. O texto mant√©m o controle dos pre√ßos como objetivo central, mas inclui ainda duas novas metas acess√≥rias, sem preju√≠zo √† principal: suavizar as flutua√ß√µes do n√≠vel de atividade econ√¥mica e fomentar o pleno emprego no Pa√≠s. O governo concordou com a reda√ß√£o da proposta, apesar de o BC ser historicamente contr√°rio a ampliar o escopo da atua√ß√£o.

Maia j√° reclamou outras vezes da falta de empenho e atua√ß√£o do governo para se aprovar a medida. Na semana passada, acusou o presidente do Banco Central, Roberto Campos Neto, de ter vazado informa√ß√µes sobre conversa que os dois tiveram no dia da decis√£o do Comit√™ de Pol√≠tica Monet√°ria (Copom), que manteve a taxa Selic em 2% ao ano.

Ao jornal O Estado de S. Paulo, Maia criticou a articula√ß√£o do presidente do BC em alertar sobre os reflexos para a economia da dificuldade do Congresso em avan√ßar com as vota√ß√µes da pauta de ajuste fiscal. Segundo o presidente da C√¢mara, Campos Neto tentou fazer uma articula√ß√£o pol√≠tica, sem combinar, o que n√£o seria papel dele, mas dos ministros da Economia, Paulo Guedes, e da articula√ß√£o pol√≠tica, Luiz Eduardo Ramos.

Nesta sexta-feira, o presidente da C√¢mara lembrou que havia uma proposta de autonomia do BC semelhante na C√¢mara, mas que n√£o foi votada, e disse que n√£o comentou at√© agora sobre o tema porque n√£o foi procurado pelo governo para falar sobre o assunto. "Se eu conseguisse conversar com algu√©m do governo, eu poderia te responder, mas ningu√©m me procura. N√£o vou conversar com a imprensa antes de conversar com o governo", disse.

Como o Broadcast (sistema de not√≠cias em tempo real do Grupo Estado) mostrou na quinta-feira, deputados j√° se articulam na C√¢mara para modificar o projeto aprovado pelo Senado. O partido Novo, por exemplo, quer enxugar a proposta que recebeu aval dos senadores para reduzir os chamados acess√≥rios que foram colocados para o Banco Central.

Sobre a reforma tribut√°ria, Maia deu sinais de que quer aprovar o projeto antes de deixar a presid√™ncia da Casa e acredita que com acordo pode fazer isso rapidamente.'''

doc = nlp(doc_texto)

## View token tags
Recall <a href = "https://ashutoshtripathi.com/2020/04/06/guide-to-tokenization-lemmatization-stop-words-and-phrase-matching-using-spacy/">Tokenization</a> 
We can obtain a particular token by its index position.

* To view the **coarse POS** tag use `token.pos_`
* To view the **fine-grained** tag use `token.tag_`
* To view the description of either type of tag use `spacy.explain(tag)`

<div class="alert alert-info">
spaCy encodes all strings to hash values to reduce memory usage and improve efficiency. So to get the readable string representation of an attribute, we need to add an underscore _ to its name:

Note that `token.pos` and `token.tag` return integer hash values; by adding the underscores we get the text equivalent that lives in **doc.vocab**.
</div>

In [3]:
# Print the full text:
print(doc.text)

Defendida por agentes do mercado financeiro e uma das bandeiras da equipe econ√¥mica do governo Jair Bolsonaro, o projeto de autonomia do Banco Central (BC) deve avan√ßar na C√¢mara s√≥ ap√≥s a reforma tribut√°ria andar, no que depender do presidente da Casa, Rodrigo Maia (DEM-RJ). Para ele, o projeto sobre a institui√ß√£o presidida por Roberto Campos Neto n√£o √© urgente no curto prazo.

"Aceito votar autonomia do Banco, aceito, √© claro, votar os dep√≥sitos volunt√°rios, mas a√≠ temos que organizar melhor a pauta at√© o fim do ano. √â s√≥ o governo ter boa vontade na reforma tribut√°ria", disse Maia, ao participar de evento organizado pelo Ita√∫. "A reforma tribut√°ria tem import√¢ncia muito maior que autonomia do Banco Central", comentou.

O projeto de autonomia do BC foi aprovado na ter√ßa-feira, 3, pelo Senado e agora precisa do aval dos deputados para virar lei. O texto mant√©m o controle dos pre√ßos como objetivo central, mas inclui ainda duas novas metas acess√≥rias, sem preju√

In [4]:
# Print the fifth word and associated tags:
print(doc[4].text, doc[4].pos_, doc[4].tag_, spacy.explain(doc[4].tag_))

mercado NOUN NOUN__Gender=Masc|Number=Sing None


**Applying this to the entire `doc` object**

In [5]:
for token in doc:
    print(f'{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_)}')

Defendida  VERB       VERB__Gender=Fem|Number=Sing|VerbForm=Part|Voice=Pass None
por        ADP        ADP        adposition
agentes    NOUN       NOUN__Gender=Masc|Number=Plur None
do         DET        ADP_DET__Definite=Def|Gender=Masc|Number=Sing|PronType=Art None
mercado    NOUN       NOUN__Gender=Masc|Number=Sing None
financeiro ADJ        ADJ__Gender=Masc|Number=Sing None
e          CCONJ      CCONJ      coordinating conjunction
uma        NUM        NUM__NumType=Card None
das        DET        ADP_DET__Definite=Def|Gender=Fem|Number=Plur|PronType=Art None
bandeiras  NOUN       NOUN__Gender=Fem|Number=Plur None
da         DET        ADP_DET__Definite=Def|Gender=Fem|Number=Sing|PronType=Art None
equipe     NOUN       NOUN__Gender=Fem|Number=Sing None
econ√¥mica  ADJ        ADJ__Gender=Fem|Number=Sing None
do         DET        ADP_DET__Definite=Def|Gender=Masc|Number=Sing|PronType=Art None
governo    NOUN       NOUN__Gender=Masc|Number=Sing None
Jair       PROPN      PROPN__Gender

**Note:** In the above example to format the representation I have added :{10} this is nothing but to give spacing between each token. Just to have better look and feel. No other specific reason. This count start from the first character of the token. You can add any number instead of {10} to have spacing as you wish.

## Working with POS Tags

* In the English language, It is very common that the same string of characters can have different meanings, even within the same sentence. 
* For this reason, morphology is important. 
* **spaCy** uses machine learning algorithms to best predict the use of a token in a sentence. 
* Is *"I read books on NLP"* present or past tense? 
* Is *wind* a verb or a noun?

Let's understand all this with the help of below examples.

In [7]:
doc = nlp(doc_texto)
r = doc[1]

print(f'{r.text:{10}} {r.pos_:{8}} {r.tag_:{6}} {spacy.explain(r.tag_)}')

por        ADP      ADP    adposition


In [8]:
doc = nlp(doc_texto)
r = doc[1]

print(f'{r.text:{10}} {r.pos_:{8}} {r.tag_:{6}} {spacy.explain(r.tag_)}')

por        ADP      ADP    adposition


In the first example, spaCy assumed that ***read*** was **Present Tense**.<br>
In the second example the present tense form would be ***I am reading a book***, so spaCy assigned the past tense.

## Counting POS Tags

The `Doc.count_by()` method accepts a specific token attribute as its argument, and returns a frequency count of the given attribute as a dictionary object. Keys in the dictionary are the integer values of the given attribute ID, and values are the frequency. Counts of zero are not included.

In [9]:
doc = nlp(doc_texto)

# Count the frequencies of different coarse-grained POS tags:
POS_counts = doc.count_by(spacy.attrs.POS)
POS_counts

{100: 70,
 85: 47,
 92: 111,
 90: 95,
 84: 25,
 89: 12,
 93: 5,
 96: 51,
 97: 70,
 86: 22,
 95: 18,
 87: 11,
 103: 7,
 98: 20,
 99: 1}

It means tag which has key as 96 is appeared only once and ta with key as 83 has appeared three times in the sentence.
This isn't very helpful until you decode the attribute ID:

In [10]:
doc.vocab[83].text

'LANG'

### Create a frequency list of POS tags from the entire document
Since `POS_counts` returns a dictionary, we can obtain a list of keys with `POS_counts.items()`.<br>By sorting the list we have access to the tag and its count, in order.

In [11]:
for k,v in sorted(POS_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{5}}: {v}')

84. ADJ  : 25
85. ADP  : 47
86. ADV  : 22
87. AUX  : 11
89. CCONJ: 12
90. DET  : 95
92. NOUN : 111
93. NUM  : 5
95. PRON : 18
96. PROPN: 51
97. PUNCT: 70
98. SCONJ: 20
99. SYM  : 1
100. VERB : 70
103. SPACE: 7


k contains the key number of the tag and v contains the frequency number.

## Counting fine-grained Tag

In [12]:
# Count the different fine-grained tags:
TAG_counts = doc.count_by(spacy.attrs.TAG)

for k,v in sorted(TAG_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{4}}: {v}')

85. ADP : 40
86. ADV : 16
89. CCONJ: 12
97. PUNCT: 70
98. SCONJ: 19
99. SYM : 1
19706760756614928. DET__Definite=Def|Gender=Masc|Number=Sing|PronType=Art: 23
37506411736329516. PROPN__Gender=Fem|Number=Sing: 4
200353096500367006. VERB__Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin: 1
241186479898401091. PROPN__Number=Sing: 19
786860628783810627. AUX__Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin: 1
923825831195736815. PRON__Gender=Masc|Number=Sing|Person=3|PronType=Prs: 1
1182704070608105034. ADJ__Gender=Masc|Number=Sing: 10
1204389248178916658. ADP_PRON__Gender=Masc|Number=Sing|Person=3|PronType=Prs: 1
1244853049173095517. NUM__NumType=Card: 5
1851943826059061873. VERB__Gender=Masc|Number=Plur|VerbForm=Part|Voice=Pass: 1
1966231466788530083. VERB__Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin: 15
2432708380621179292. VERB__Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin: 10
2557482793400667203. ADP_DET__Gender=Fem|Number=Sing|PronType=Dem: 1
2576739957777

<div class="alert alert-info">**Why did the ID numbers get so big?** In spaCy, certain text values are hardcoded into `Doc.vocab` and take up the first several hundred ID numbers. Strings like 'NOUN' and 'VERB' are used frequently by internal operations. Others, like fine-grained tags, are assigned hash values as needed.</div>
<div class="alert alert-info">**Why don't SPACE tags appear?** In spaCy, only strings of spaces (two or more) are assigned tokens. Single spaces are not.</div>

___
## Fine-grained POS Tag Examples
These are some grammatical examples (shown in **bold**) of specific fine-grained tags. We've removed punctuation and rarely used tags:
<table>
<tr><th>POS</th><th>TAG</th><th>DESCRIPTION</th><th>EXAMPLE</th></tr>
<tr><td>ADJ</td><td>AFX</td><td>affix</td><td>The Flintstones were a **pre**-historic family.</td></tr>
<tr><td>ADJ</td><td>JJ</td><td>adjective</td><td>This is a **good** sentence.</td></tr>
<tr><td>ADJ</td><td>JJR</td><td>adjective, comparative</td><td>This is a **better** sentence.</td></tr>
<tr><td>ADJ</td><td>JJS</td><td>adjective, superlative</td><td>This is the **best** sentence.</td></tr>
<tr><td>ADJ</td><td>PDT</td><td>predeterminer</td><td>Waking up is **half** the battle.</td></tr>
<tr><td>ADJ</td><td>PRP\$</td><td>pronoun, possessive</td><td>**His** arm hurts.</td></tr>
<tr><td>ADJ</td><td>WDT</td><td>wh-determiner</td><td>It's blue, **which** is odd.</td></tr>
<tr><td>ADJ</td><td>WP\$</td><td>wh-pronoun, possessive</td><td>We don't know **whose** it is.</td></tr>
<tr><td>ADP</td><td>IN</td><td>conjunction, subordinating or preposition</td><td>It arrived **in** a box.</td></tr>
<tr><td>ADV</td><td>EX</td><td>existential there</td><td>**There** is cake.</td></tr>
<tr><td>ADV</td><td>RB</td><td>adverb</td><td>He ran **quickly**.</td></tr>
<tr><td>ADV</td><td>RBR</td><td>adverb, comparative</td><td>He ran **quicker**.</td></tr>
<tr><td>ADV</td><td>RBS</td><td>adverb, superlative</td><td>He ran **fastest**.</td></tr>
<tr><td>ADV</td><td>WRB</td><td>wh-adverb</td><td>**When** was that?</td></tr>
<tr><td>CONJ</td><td>CC</td><td>conjunction, coordinating</td><td>The balloon popped **and** everyone jumped.</td></tr>
<tr><td>DET</td><td>DT</td><td>determiner</td><td>**This** is **a** sentence.</td></tr>
<tr><td>INTJ</td><td>UH</td><td>interjection</td><td>**Um**, I don't know.</td></tr>
<tr><td>NOUN</td><td>NN</td><td>noun, singular or mass</td><td>This is a **sentence**.</td></tr>
<tr><td>NOUN</td><td>NNS</td><td>noun, plural</td><td>These are **words**.</td></tr>
<tr><td>NOUN</td><td>WP</td><td>wh-pronoun, personal</td><td>**Who** was that?</td></tr>
<tr><td>NUM</td><td>CD</td><td>cardinal number</td><td>I want **three** things.</td></tr>
<tr><td>PART</td><td>POS</td><td>possessive ending</td><td>Fred**'s** name is short.</td></tr>
<tr><td>PART</td><td>RP</td><td>adverb, particle</td><td>Put it **back**!</td></tr>
<tr><td>PART</td><td>TO</td><td>infinitival to</td><td>I want **to** go.</td></tr>
<tr><td>PRON</td><td>PRP</td><td>pronoun, personal</td><td>**I** want **you** to go.</td></tr>
<tr><td>PROPN</td><td>NNP</td><td>noun, proper singular</td><td>**Kilroy** was here.</td></tr>
<tr><td>PROPN</td><td>NNPS</td><td>noun, proper plural</td><td>The **Flintstones** were a pre-historic family.</td></tr>
<tr><td>VERB</td><td>MD</td><td>verb, modal auxiliary</td><td>This **could** work.</td></tr>
<tr><td>VERB</td><td>VB</td><td>verb, base form</td><td>I want to **go**.</td></tr>
<tr><td>VERB</td><td>VBD</td><td>verb, past tense</td><td>This **was** a sentence.</td></tr>
<tr><td>VERB</td><td>VBG</td><td>verb, gerund or present participle</td><td>I am **going**.</td></tr>
<tr><td>VERB</td><td>VBN</td><td>verb, past participle</td><td>The treasure was **lost**.</td></tr>
<tr><td>VERB</td><td>VBP</td><td>verb, non-3rd person singular present</td><td>I **want** to go.</td></tr>
<tr><td>VERB</td><td>VBZ</td><td>verb, 3rd person singular present</td><td>He **wants** to go.</td></tr>
</table>

## Dependency Parsing

* Dependency parsing is the process of extracting the dependencies of a sentence to represent its grammatical structure. 
* It defines the dependency relationship between headwords and their dependents. 
* The head of a sentence has no dependency and is called the root of the sentence. 
* The verb is usually the head of the sentence. All other words are linked to the headword.

The dependencies can be mapped in a directed graph representation:

* Words are the nodes.
* The grammatical relationships are the edges.
* Dependency parsing helps you know what role a word plays in the text and how different words relate to each other. 
* It‚Äôs also used in shallow parsing and named entity recognition.

In [13]:
# Count the different dependencies:
DEP_counts = doc.count_by(spacy.attrs.DEP)

for k,v in sorted(DEP_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{4}}: {v}')

0.     : 7
399. advcl: 19
400. advmod: 19
402. amod: 24
403. appos: 10
405. aux : 3
407. cc  : 12
408. ccomp: 3
410. conj: 12
411. cop : 5
415. det : 45
417. expl: 1
422. iobj: 1
423. mark: 24
426. nmod: 51
429. nsubj: 26
434. obj : 38
435. obl : 21
436. parataxis: 5
445. punct: 68
450. xcomp: 10
451. acl : 7
341650274070182569. fixed: 5
6025759613719693942. acl:relcl: 5
7833439085008721140. nsubj:pass: 3
8110129090154140942. case: 90
8206900633647566924. ROOT: 22
11989517400186072643. obl:agent: 6
12837356684637874264. nummod: 2
17772752594865228322. flat:name: 17
18127512502257733094. aux:pass: 4


Here we've shown `spacy.attrs.POS`, `spacy.attrs.TAG` and `spacy.attrs.DEP`.<br>

# Visualizing Parts of Speech
spaCy offers an outstanding visualizer called **displaCy**:

In [14]:
# Import the displaCy library
from spacy import displacy

In [17]:
# Create a simple Doc object
sentences = list(doc.sents)
sentences[0]

Defendida por agentes do mercado financeiro e uma das bandeiras da equipe econ√¥mica do governo Jair Bolsonaro, o projeto de autonomia do Banco Central (BC) deve avan√ßar na C√¢mara s√≥ ap√≥s a reforma tribut√°ria andar, no que depender do presidente da Casa, Rodrigo Maia (DEM-RJ).

In [18]:
# Render the dependency parse immediately inside Jupyter:
displacy.render(sentences[0], style='dep', jupyter=True, options={'distance': 95})

The dependency parse shows the coarse POS tag for each token, as well as the **dependency tag** if given:

In [19]:
for token in doc:
    print(f'{token.text:{10}} {token.pos_:{7}} {token.dep_:{7}} {spacy.explain(token.dep_)}')

Defendida  VERB    acl     clausal modifier of noun (adjectival clause)
por        ADP     case    case marking
agentes    NOUN    obl:agent None
do         DET     case    case marking
mercado    NOUN    nmod    modifier of nominal
financeiro ADJ     amod    adjectival modifier
e          CCONJ   cc      coordinating conjunction
uma        NUM     conj    conjunct
das        DET     case    case marking
bandeiras  NOUN    nmod    modifier of nominal
da         DET     case    case marking
equipe     NOUN    nmod    modifier of nominal
econ√¥mica  ADJ     amod    adjectival modifier
do         DET     case    case marking
governo    NOUN    nmod    modifier of nominal
Jair       PROPN   appos   appositional modifier
Bolsonaro  PROPN   flat:name None
,          PUNCT   punct   punctuation
o          DET     det     determiner
projeto    NOUN    nsubj   nominal subject
de         ADP     case    case marking
autonomia  NOUN    nmod    modifier of nominal
do         DET     case    case m

___
## Handling Large Text
`displacy.serve()` accepts a single Doc or list of Doc objects. Since large texts are difficult to view in one line, you may want to pass a list of spans instead. Each span will appear on its own line:

In [21]:
doc2 = nlp(doc_texto)

# Create spans from Doc.sents:
spans = list(doc2.sents)

displacy.render(spans, style='dep',jupyter=True, options={'distance': 110})

___
## Customizing the Appearance
Besides setting the distance between tokens, you can pass other arguments to the `options` parameter:

<table>
<tr><th>NAME</th><th>TYPE</th><th>DESCRIPTION</th><th>DEFAULT</th></tr>
<tr><td>`compact`</td><td>bool</td><td>"Compact mode" with square arrows that takes up less space.</td><td>`False`</td></tr>
<tr><td>`color`</td><td>unicode</td><td>Text color (HEX, RGB or color names).</td><td>`#000000`</td></tr>
<tr><td>`bg`</td><td>unicode</td><td>Background color (HEX, RGB or color names).</td><td>`#ffffff`</td></tr>
<tr><td>`font`</td><td>unicode</td><td>Font name or font family for all text.</td><td>`Arial`</td></tr>
</table>

For a full list of options visit https://spacy.io/api/top-level#displacy_options

In [22]:
options = {'distance': 90, 'compact': 'True', 'color': 'black', 'bg': '#09a3e0', 'font': 'Times'}

displacy.render(doc, style='dep',jupyter=True, options=options)

References:

* https://spacy.io/
* https://www.udemy.com/course/nlp-natural-language-processing-with-python