***Importing Library***

In [1]:
import spacy

***POS Tags***

In [2]:
nlp = spacy.load("en_core_web_sm")

In [5]:
doc = nlp("I am driving a car for four hours. I feels tired now")

for token in doc:
    print(token, " | ", token.pos)

I  |  95
am  |  87
driving  |  100
a  |  90
car  |  92
for  |  85
four  |  93
hours  |  92
.  |  97
I  |  95
feels  |  100
tired  |  84
now  |  86


In [6]:
doc = nlp("I am driving a car for four hours. I feels tired now")

for token in doc:
    print(token, " | ", token.pos_)

I  |  PRON
am  |  AUX
driving  |  VERB
a  |  DET
car  |  NOUN
for  |  ADP
four  |  NUM
hours  |  NOUN
.  |  PUNCT
I  |  PRON
feels  |  VERB
tired  |  ADJ
now  |  ADV


In [8]:
doc = nlp("I am driving a car for four hours. I feels tired now")

for token in doc:
    print(token, " | ", token.pos_, " | ", spacy.explain(token.pos_))

I  |  PRON  |  pronoun
am  |  AUX  |  auxiliary
driving  |  VERB  |  verb
a  |  DET  |  determiner
car  |  NOUN  |  noun
for  |  ADP  |  adposition
four  |  NUM  |  numeral
hours  |  NOUN  |  noun
.  |  PUNCT  |  punctuation
I  |  PRON  |  pronoun
feels  |  VERB  |  verb
tired  |  ADJ  |  adjective
now  |  ADV  |  adverb


In [9]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [10]:
doc = nlp("Wow! i was wondering that how could i watch Dark Hour")

for token in doc:
    print(token, " | ", token.pos_, " | ", spacy.explain(token.pos_))

Wow  |  INTJ  |  interjection
!  |  PUNCT  |  punctuation
i  |  PRON  |  pronoun
was  |  AUX  |  auxiliary
wondering  |  VERB  |  verb
that  |  SCONJ  |  subordinating conjunction
how  |  SCONJ  |  subordinating conjunction
could  |  AUX  |  auxiliary
i  |  PRON  |  pronoun
watch  |  VERB  |  verb
Dark  |  PROPN  |  proper noun
Hour  |  PROPN  |  proper noun


***TAGS***

In [11]:
doc = nlp("Wow! i was wondering that how could i watch Dark Hour")

for token in doc:
    print(token, " | ", token.pos_, " | ", spacy.explain(token.pos_), " | ", token.tag_, " | ", spacy.explain(token.tag_))

Wow  |  INTJ  |  interjection  |  UH  |  interjection
!  |  PUNCT  |  punctuation  |  .  |  punctuation mark, sentence closer
i  |  PRON  |  pronoun  |  PRP  |  pronoun, personal
was  |  AUX  |  auxiliary  |  VBD  |  verb, past tense
wondering  |  VERB  |  verb  |  VBG  |  verb, gerund or present participle
that  |  SCONJ  |  subordinating conjunction  |  IN  |  conjunction, subordinating or preposition
how  |  SCONJ  |  subordinating conjunction  |  WRB  |  wh-adverb
could  |  AUX  |  auxiliary  |  MD  |  verb, modal auxiliary
i  |  PRON  |  pronoun  |  PRP  |  pronoun, personal
watch  |  VERB  |  verb  |  VB  |  verb, base form
Dark  |  PROPN  |  proper noun  |  NNP  |  noun, proper singular
Hour  |  PROPN  |  proper noun  |  NNP  |  noun, proper singular


***Figuring out past and present Tense***

In [12]:
doc = nlp("He quits the job")

doc[1]
print(doc[1].text, "|", doc[1].tag_, "|", spacy.explain(doc[1].tag_))

quits | VBZ | verb, 3rd person singular present


In [13]:
doc = nlp("He quit the job")

doc[1]
print(doc[1].text, "|", doc[1].tag_, "|", spacy.explain(doc[1].tag_))

quit | VBD | verb, past tense


***Removing Space, X and Punct***

In [14]:
text="""Microsoft Corp. today announced the following results for the quarter ended December 31, 2021, 
as compared to the corresponding period of last fiscal year:

·         Revenue was $51.7 billion and increased 20%
·         Operating income was $22.2 billion and increased 24%
·         Net income was $18.8 billion and increased 21%
·         Diluted earnings per share was $2.48 and increased 22%
“Digital technology is the most malleable resource at the world’s disposal to overcome constraints and 
reimagine everyday work and life,” said Satya Nadella, chairman and chief executive officer of Microsoft. 
“As tech as a percentage of global GDP continues to increase, we are innovating and investing across diverse 
and growing markets, with a common underlying technology stack and an operating model that reinforces a common 
strategy, culture, and sense of purpose.” “Solid commercial execution, represented by strong bookings growth 
driven by long-term Azure commitments, increased Microsoft Cloud revenue to $22.1 billion, up 32% year over year” 
said Amy Hood, executive vice president and chief financial officer of Microsoft."""

doc = nlp(text)

In [15]:
for token in doc:
        print(token, " | ", token.pos_, " | ", spacy.explain(token.pos_))

Microsoft  |  PROPN  |  proper noun
Corp.  |  PROPN  |  proper noun
today  |  NOUN  |  noun
announced  |  VERB  |  verb
the  |  DET  |  determiner
following  |  VERB  |  verb
results  |  NOUN  |  noun
for  |  ADP  |  adposition
the  |  DET  |  determiner
quarter  |  NOUN  |  noun
ended  |  VERB  |  verb
December  |  PROPN  |  proper noun
31  |  NUM  |  numeral
,  |  PUNCT  |  punctuation
2021  |  NUM  |  numeral
,  |  PUNCT  |  punctuation

  |  SPACE  |  space
as  |  SCONJ  |  subordinating conjunction
compared  |  VERB  |  verb
to  |  ADP  |  adposition
the  |  DET  |  determiner
corresponding  |  ADJ  |  adjective
period  |  NOUN  |  noun
of  |  ADP  |  adposition
last  |  ADJ  |  adjective
fiscal  |  ADJ  |  adjective
year  |  NOUN  |  noun
:  |  PUNCT  |  punctuation


  |  SPACE  |  space
·  |  PUNCT  |  punctuation
          |  SPACE  |  space
Revenue  |  NOUN  |  noun
was  |  AUX  |  auxiliary
$  |  SYM  |  symbol
51.7  |  NUM  |  numeral
billion  |  NUM  |  numeral
and  |  CCO

In [18]:
for token in doc:
    if token.pos_ in ["SPACE", "X", "PUNCT"]:
        print(token, " | ", token.pos_, " | ", spacy.explain(token.pos_))

,  |  PUNCT  |  punctuation
,  |  PUNCT  |  punctuation

  |  SPACE  |  space
:  |  PUNCT  |  punctuation


  |  SPACE  |  space
·  |  PUNCT  |  punctuation
          |  SPACE  |  space

  |  SPACE  |  space
·  |  PUNCT  |  punctuation
          |  SPACE  |  space

  |  SPACE  |  space
·  |  PUNCT  |  punctuation
          |  SPACE  |  space

  |  SPACE  |  space
·  |  PUNCT  |  punctuation
          |  SPACE  |  space

  |  SPACE  |  space
“  |  PUNCT  |  punctuation

  |  SPACE  |  space
,  |  PUNCT  |  punctuation
”  |  PUNCT  |  punctuation
,  |  PUNCT  |  punctuation
.  |  PUNCT  |  punctuation

  |  SPACE  |  space
“  |  PUNCT  |  punctuation
,  |  PUNCT  |  punctuation

  |  SPACE  |  space
,  |  PUNCT  |  punctuation

  |  SPACE  |  space
,  |  PUNCT  |  punctuation
,  |  PUNCT  |  punctuation
.  |  PUNCT  |  punctuation
”  |  PUNCT  |  punctuation
“  |  PUNCT  |  punctuation
,  |  PUNCT  |  punctuation

  |  SPACE  |  space
-  |  PUNCT  |  punctuation
,  |  PUNCT  |  punctuati

In [19]:
for token in doc:
    if token.pos_ not in ["SPACE", "X", "PUNCT"]:
        print(token, " | ", token.pos_, " | ", spacy.explain(token.pos_))

Microsoft  |  PROPN  |  proper noun
Corp.  |  PROPN  |  proper noun
today  |  NOUN  |  noun
announced  |  VERB  |  verb
the  |  DET  |  determiner
following  |  VERB  |  verb
results  |  NOUN  |  noun
for  |  ADP  |  adposition
the  |  DET  |  determiner
quarter  |  NOUN  |  noun
ended  |  VERB  |  verb
December  |  PROPN  |  proper noun
31  |  NUM  |  numeral
2021  |  NUM  |  numeral
as  |  SCONJ  |  subordinating conjunction
compared  |  VERB  |  verb
to  |  ADP  |  adposition
the  |  DET  |  determiner
corresponding  |  ADJ  |  adjective
period  |  NOUN  |  noun
of  |  ADP  |  adposition
last  |  ADJ  |  adjective
fiscal  |  ADJ  |  adjective
year  |  NOUN  |  noun
Revenue  |  NOUN  |  noun
was  |  AUX  |  auxiliary
$  |  SYM  |  symbol
51.7  |  NUM  |  numeral
billion  |  NUM  |  numeral
and  |  CCONJ  |  coordinating conjunction
increased  |  VERB  |  verb
20  |  NUM  |  numeral
%  |  NOUN  |  noun
Operating  |  VERB  |  verb
income  |  NOUN  |  noun
was  |  AUX  |  auxiliary
$  |

In [24]:
filtered_tokens = []

for token in doc:
    if token.pos_ not in ["SPACE", "X", "PUNCT"]:
        filtered_tokens.append(token)

In [25]:
filtered_tokens[:21]

[Microsoft,
 Corp.,
 today,
 announced,
 the,
 following,
 results,
 for,
 the,
 quarter,
 ended,
 December,
 31,
 2021,
 as,
 compared,
 to,
 the,
 corresponding,
 period,
 of]

In [26]:
count = doc.count_by(spacy.attrs.POS)
count

{96: 13,
 92: 46,
 100: 24,
 90: 9,
 85: 17,
 93: 16,
 97: 27,
 103: 16,
 98: 1,
 84: 20,
 87: 6,
 99: 5,
 89: 12,
 86: 2,
 94: 3,
 95: 2}

In [27]:
doc.vocab[96].text

'PROPN'

In [28]:
for k, v in count.items():
    print(doc.vocab[k].text, "|", v)

PROPN | 13
NOUN | 46
VERB | 24
DET | 9
ADP | 17
NUM | 16
PUNCT | 27
SPACE | 16
SCONJ | 1
ADJ | 20
AUX | 6
SYM | 5
CCONJ | 12
ADV | 2
PART | 3
PRON | 2
