## <font style="color:blue "> Quick Start with POS tagging</font>

In [1]:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
## create doc and extract NERs
doc = nlp(u"Tesla has become one the most famous car brand in the world")

In [3]:
## ok so what's tokens? let's keep it simple 
## token are phrases from paragraphes or words from phrases or even characters from words
## yeah that it!
## let's try tokenize our doc
list_of_token = [] ## empty list to store tokens
for token in doc:
    list_of_token.append(token)
list_of_token

[Tesla, has, become, one, the, most, famous, car, brand, in, the, world]

* NER basics

<table>
<tr><td>token.text</td><td>Display text</td></tr>
<tr><td>token.pos_</td><td>Display POS</td></tr>
<tr><td>token.tag_</td><td>Display Tag</td></tr>
<tr><td>spacy.explain(tag)</td><td>Explain the Tag</td></tr>
<tr><td>Doc.count_by()</td><td>count POS Tag</td></tr>
<tr><td>POS_counts.items()</td><td>Count POS Tag frequency</td></tr>

In [4]:
for token in doc:
    print(f'{token.text:{10}} {token.pos_:{8}} {token.tag_:{6}} {spacy.explain(token.tag_)}')

Tesla      PROPN    NNP    noun, proper singular
has        AUX      VBZ    verb, 3rd person singular present
become     VERB     VBN    verb, past participle
one        NUM      CD     cardinal number
the        DET      DT     determiner
most       ADV      RBS    adverb, superlative
famous     ADJ      JJ     adjective (English), other noun-modifier (Chinese)
car        NOUN     NN     noun, singular or mass
brand      NOUN     NN     noun, singular or mass
in         ADP      IN     conjunction, subordinating or preposition
the        DET      DT     determiner
world      NOUN     NN     noun, singular or mass


* `ALL AVAILABLE POS tag with spacy`

<table><tr><th>POS</th><th>DESCRIPTION</th>
    
<tr><td>ADJ</td><td>adjective</td>
<tr><td>ADP</td><td>adposition</td>
<tr><td>ADV</td><td>adverb</td>
<tr><td>AUX</td><td>auxiliary</td>
<tr><td>CONJ</td><td>conjunction</td>
<tr><td>CCONJ</td>
<tr><td>DET</td><td>determiner</td>
<tr><td>INTJ</td><td>interjection</td>
<tr><td>NOUN</td><td>noun</td>
<tr><td>NUM</td><td>numeral</td>
<tr><td>PART</td><td>particle</td>
<tr><td>PRON</td><td>pronoun</td>
<tr><td>PROPN</td><td>proper noun</td>
<tr><td>PUNCT</td><td>punctuation</td>
<tr><td>SCONJ</td>
<tr><td>SYM</td><td>symbol</td>
<tr><td>VERB</td><td>verb</td>
<tr><td>X</td><td>other</td>
<tr><td>SPACE</td><td>space</td></tr>

* `POS Tag counting`

In [6]:
counts = doc.count_by(spacy.attrs.POS)
counts

{96: 1, 87: 1, 100: 1, 93: 1, 90: 2, 86: 1, 84: 1, 92: 3, 85: 1}

In [7]:
type(counts)

dict

- Since counts is a dict python type, let's make more visible and print the keys in str format

In [10]:
for k,v in counts.items():
    print(doc.vocab[k].text,v)

PROPN 1
AUX 1
VERB 1
NUM 1
DET 2
ADV 1
ADJ 1
NOUN 3
ADP 1


* `POS Tag frequency counting`

In [12]:
for k,v in sorted(counts.items()):
    print(f'{k}. {doc.vocab[k].text:{5}}: {v}')

84. ADJ  : 1
85. ADP  : 1
86. ADV  : 1
87. AUX  : 1
90. DET  : 2
92. NOUN : 3
93. NUM  : 1
96. PROPN: 1
100. VERB : 1


## <font style="color:blue "> POS tagging Visualization</font>

In [13]:
from spacy import displacy ## the magic word for visualization 
displacy.render(doc, style='dep', jupyter=True, options={'distance': 110})