<a href="https://colab.research.google.com/github/Sujitharasamsetty/NLP-Tutorial/blob/main/Part_Of_Speech_POS_Tagging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import spacy

In [2]:
nlp = spacy.load("en_core_web_sm")

In [6]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

# **POS Tags**

In [5]:
doc = nlp("Wow! Dr. Strange made 265 million $ on the very first day")

for token in doc :
  print(token ,'|', token.pos_ , '|' , spacy.explain(token.pos_))

Wow | INTJ | interjection
! | PUNCT | punctuation
Dr. | PROPN | proper noun
Strange | PROPN | proper noun
made | VERB | verb
265 | NUM | numeral
million | NUM | numeral
$ | NUM | numeral
on | ADP | adposition
the | DET | determiner
very | ADV | adverb
first | ADJ | adjective
day | NOUN | noun


In [10]:
doc = nlp("Wow! Dr. Strange made 265 million $ on the very first day")

for token in doc:
  print(token,'|' , token.pos_ , '|', spacy.explain(token.pos_) , '|', token.tag_ , '|', spacy.explain(token.tag_))

Wow | INTJ | interjection | UH | interjection
! | PUNCT | punctuation | . | punctuation mark, sentence closer
Dr. | PROPN | proper noun | NNP | noun, proper singular
Strange | PROPN | proper noun | NNP | noun, proper singular
made | VERB | verb | VBD | verb, past tense
265 | NUM | numeral | CD | cardinal number
million | NUM | numeral | CD | cardinal number
$ | NUM | numeral | CD | cardinal number
on | ADP | adposition | IN | conjunction, subordinating or preposition
the | DET | determiner | DT | determiner
very | ADV | adverb | RB | adverb
first | ADJ | adjective | JJ | adjective (English), other noun-modifier (Chinese)
day | NOUN | noun | NN | noun, singular or mass


**Spacy figures out the past vs present tense for quit**

In [13]:
doc = nlp("He quits the job")

print(doc[1].text, "|", doc[1].tag_, "|", spacy.explain(doc[1].tag_))

quits | VBZ | verb, 3rd person singular present


In [15]:
doc = nlp("He quit the job")

print(doc[1].text, '|' , doc[1].tag_  , '|' , spacy.explain (doc[1].tag_))

quit | VBD | verb, past tense


In [19]:
earning_texts =  """ Microsoft Corp. today announced the following results for the quarter ended December 31, 2021, as compared to the corresponding period of last fiscal year:

·         Revenue was $51.7 billion and increased 20%

·         Operating income was $22.2 billion and increased 24%

·         Net income was $18.8 billion and increased 21%

·         Diluted earnings per share was $2.48 and increased 22% """

In [23]:
doc = nlp(earning_texts)

filtered_tokens=[]

for token in doc :

  if token.pos not in [ 'SPACE' , 'PUNCT', 'X']:
    filtered_tokens.append(token)

In [24]:
filtered_tokens[:20]

[ ,
 Microsoft,
 Corp.,
 today,
 announced,
 the,
 following,
 results,
 for,
 the,
 quarter,
 ended,
 December,
 31,
 ,,
 2021,
 ,,
 as,
 compared,
 to]

In [26]:
count = doc.count_by(spacy.attrs.POS)
count

{103: 9,
 96: 3,
 92: 14,
 100: 10,
 90: 3,
 85: 4,
 93: 13,
 97: 7,
 98: 1,
 84: 4,
 87: 4,
 99: 4,
 89: 4}

In [27]:
doc.vocab[96].text

'PROPN'

In [29]:
for k,v in count.items ():
  print(doc.vocab[k].text , '|' , v)

SPACE | 9
PROPN | 3
NOUN | 14
VERB | 10
DET | 3
ADP | 4
NUM | 13
PUNCT | 7
SCONJ | 1
ADJ | 4
AUX | 4
SYM | 4
CCONJ | 4
