## Part Of Speech POS Tagging

Part of speech (POS) tagging is the process of marking each word in a text with its corresponding POS tag, such as noun, verb, adjective, etc. In spaCy, POS tagging is a built-in functionality that uses machine learning algorithms to predict the POS tag of each token (word or punctuation) in a sentence. spaCy's POS tagger is trained on a large corpus of text data and uses dependency parsing and named entity recognition to improve its accuracy. spaCy's POS tags are based on the universal POS tagset, which is a standardized set of POS tags that can be applied across different languages. Once POS tagging is performed, it is used in several NLP tasks such as Named Entity Recognition(NER), Text classification, etc.

In [1]:
!python -m spacy download en_core_web_lg  --quiet

[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_lg')


In [2]:
import spacy
nlp = spacy.load("en_core_web_lg")

In [3]:
doc = nlp("Elon flew to mars yesterday. He carried biryani masala with him")


In [7]:
for tok in doc:
    print(tok.text, " | ", tok.pos_,  " | ", spacy.explain(tok.pos_))

Elon  |  PROPN  |  proper noun
flew  |  VERB  |  verb
to  |  ADP  |  adposition
mars  |  PROPN  |  proper noun
yesterday  |  NOUN  |  noun
.  |  PUNCT  |  punctuation
He  |  PRON  |  pronoun
carried  |  VERB  |  verb
biryani  |  PROPN  |  proper noun
masala  |  NOUN  |  noun
with  |  ADP  |  adposition
him  |  PRON  |  pronoun


### Tags

In [8]:
doc = nlp("Wow! Dr. Strange made 265 million $ on the very first day")

In [15]:
for tok in doc:
    print(tok.text, " | ", tok.pos_, " | " , spacy.explain(tok.pos_)," | ", tok.tag_," | ", spacy.explain(tok.tag_) )

Wow  |  INTJ  |  interjection  |  UH  |  interjection
!  |  PUNCT  |  punctuation  |  .  |  punctuation mark, sentence closer
Dr.  |  PROPN  |  proper noun  |  NNP  |  noun, proper singular
Strange  |  PROPN  |  proper noun  |  NNP  |  noun, proper singular
made  |  VERB  |  verb  |  VBD  |  verb, past tense
265  |  NUM  |  numeral  |  CD  |  cardinal number
million  |  NUM  |  numeral  |  CD  |  cardinal number
$  |  SYM  |  symbol  |  $  |  symbol, currency
on  |  ADP  |  adposition  |  IN  |  conjunction, subordinating or preposition
the  |  DET  |  determiner  |  DT  |  determiner
very  |  ADV  |  adverb  |  RB  |  adverb
first  |  ADJ  |  adjective  |  JJ  |  adjective (English), other noun-modifier (Chinese)
day  |  NOUN  |  noun  |  NN  |  noun, singular or mass


### In below sentences Spacy figures out the past vs present tense for quit

In [18]:
doc = nlp("He quits the job")

print(doc[1].text, "|", doc[1].tag_, "|", spacy.explain(doc[1].tag_))
    

quits | VBZ | verb, 3rd person singular present


In [19]:
doc = nlp("He quit the job")

print(doc[1].text, "|", doc[1].tag_, "|", spacy.explain(doc[1].tag_))

quit | VBD | verb, past tense


### Removing all SPACE, PUNCT and X token from text

In [23]:
text = "Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that deals with the interaction between computers and humans using natural language. It involves the use of algorithms and computational techniques to analyze, understand, and generate human language. NLP is used to build systems that can perform tasks such as language translation, sentiment analysis, speech recognition, and text summarization. Some common applications of NLP include chatbots, virtual assistants, and language-enabled search engines."

In [24]:
doc = nlp(text)

In [25]:
doc

Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that deals with the interaction between computers and humans using natural language. It involves the use of algorithms and computational techniques to analyze, understand, and generate human language. NLP is used to build systems that can perform tasks such as language translation, sentiment analysis, speech recognition, and text summarization. Some common applications of NLP include chatbots, virtual assistants, and language-enabled search engines.

In [31]:
for tok in doc[:100]:
    print(tok.text,"|", tok.pos_)

Natural | PROPN
Language | PROPN
Processing | PROPN
( | PUNCT
NLP | PROPN
) | PUNCT
is | AUX
a | DET
branch | NOUN
of | ADP
artificial | ADJ
intelligence | NOUN
( | PUNCT
AI | PROPN
) | PUNCT
that | PRON
deals | VERB
with | ADP
the | DET
interaction | NOUN
between | ADP
computers | NOUN
and | CCONJ
humans | NOUN
using | VERB
natural | ADJ
language | NOUN
. | PUNCT
It | PRON
involves | VERB
the | DET
use | NOUN
of | ADP
algorithms | NOUN
and | CCONJ
computational | ADJ
techniques | NOUN
to | PART
analyze | VERB
, | PUNCT
understand | VERB
, | PUNCT
and | CCONJ
generate | VERB
human | ADJ
language | NOUN
. | PUNCT
NLP | NOUN
is | AUX
used | VERB
to | PART
build | VERB
systems | NOUN
that | PRON
can | AUX
perform | VERB
tasks | NOUN
such | ADJ
as | ADP
language | NOUN
translation | NOUN
, | PUNCT
sentiment | NOUN
analysis | NOUN
, | PUNCT
speech | NOUN
recognition | NOUN
, | PUNCT
and | CCONJ
text | NOUN
summarization | NOUN
. | PUNCT
Some | DET
common | ADJ
applications | NOUN
of | ADP
N

In [35]:
filtered_tok =[]

for tok in doc:
    if tok.pos_ not in ["SPACE", "PUNCT", "X"]:
        filtered_tok.append(tok)     

In [36]:
filtered_tok[:20]

[Natural,
 Language,
 Processing,
 NLP,
 is,
 a,
 branch,
 of,
 artificial,
 intelligence,
 AI,
 that,
 deals,
 with,
 the,
 interaction,
 between,
 computers,
 and,
 humans]

In [38]:
count = doc.count_by(spacy.attrs.POS)
count

{96: 5,
 97: 16,
 87: 3,
 90: 4,
 92: 28,
 85: 6,
 84: 7,
 95: 3,
 100: 11,
 89: 5,
 94: 2}

In [39]:
doc.vocab[96].text

'PROPN'

In [40]:
for k,v in count.items():
    print(doc.vocab[k].text, "|",v)

PROPN | 5
PUNCT | 16
AUX | 3
DET | 4
NOUN | 28
ADP | 6
ADJ | 7
PRON | 3
VERB | 11
CCONJ | 5
PART | 2
