## **Parts of speech (POS)**

Parts of speech (POS) tags are labels assigned to words in a sentence to indicate their grammatical category or syntactic function within the sentence. POS tagging is a crucial step in natural language processing (NLP) and linguistic analysis as it helps in understanding the structure and meaning of a sentence. Here are some common parts of speech and their corresponding tags:

1. **Noun (NN)**: A word that represents a person, place, thing, or idea.
   - Example: cat, dog, book, happiness

2. **Verb (VB)**: A word that describes an action or occurrence.
   - Example: run, eat, sleep, write

3. **Adjective (JJ)**: A word that describes or modifies a noun.
   - Example: beautiful, happy, tall, blue

4. **Adverb (RB)**: A word that describes or modifies a verb, adjective, or another adverb, often indicating how, when, or where an action takes place.
   - Example: quickly, loudly, very, now

5. **Pronoun (PRP)**: A word that takes the place of a noun.
   - Example: he, she, it, they

6. **Preposition (IN)**: A word that shows the relationship between a noun or pronoun and other words in a sentence.
   - Example: in, on, at, under

7. **Conjunction (CC)**: A word that connects words, phrases, or clauses.
   - Example: and, but, or, so

8. **Interjection (UH)**: A word or phrase expressing strong emotion or surprise.
   - Example: wow, oh, hey

9. **Determiner (DT)**: A word that introduces a noun and expresses its reference in the context.
   - Example: the, a, this, those

10. **Numeral (CD)**: A word or symbol representing a number.
    - Example: one, 2, first, second

In a sentence, each word is assigned a POS tag to indicate its grammatical role. For example, in the sentence "The cat is sleeping," the POS tags would be "DT (The) NN (cat) VBZ (is) VBG (sleeping)." POS tagging is used in various NLP applications, including text analysis, information retrieval, and machine translation.

In [1]:
corpus ="""could not have blamed you for being the first to lose heart if I, your commander, had not shared in your exhausting marches and your perilous campaigns;
it would have been natural enough if you had done all the work merely for others to reap the reward. 
But it is not so. You and I, gentlemen, have shared the labour and shared the danger, and the rewards are for us all. 
The conquered territory belongs to you; from your ranks the governors of it are chosen; already the greater part of its treasure passes into your hands, 
and when all Asia is overrun, then indeed I will go further than the mere satisfaction of our ambitions: the utmost hopes of riches or power which each one of you cherishes will be far surpassed, 
and whoever wishes to return home will be allowed to go, either with me or without me. I will make those who stay the envy of those who return.
"""

In [2]:
import nltk
from nltk.corpus import stopwords
documents = nltk.sent_tokenize(corpus)

In [3]:
documents

['could not have blamed you for being the first to lose heart if I, your commander, had not shared in your exhausting marches and your perilous campaigns;\nit would have been natural enough if you had done all the work merely for others to reap the reward.',
 'But it is not so.',
 'You and I, gentlemen, have shared the labour and shared the danger, and the rewards are for us all.',
 'The conquered territory belongs to you; from your ranks the governors of it are chosen; already the greater part of its treasure passes into your hands, \nand when all Asia is overrun, then indeed I will go further than the mere satisfaction of our ambitions: the utmost hopes of riches or power which each one of you cherishes will be far surpassed, \nand whoever wishes to return home will be allowed to go, either with me or without me.',
 'I will make those who stay the envy of those who return.']

In [4]:
print("The quick brown fox jumps over the lazy dog.")
print(nltk.pos_tag("The quick brown fox jumps over the lazy dog.".split(" ")))

The quick brown fox jumps over the lazy dog.
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog.', 'NN')]


In [5]:
# Find out the POS tag
for i in range(len(documents)):
    words=nltk.word_tokenize(documents[i])
    words=[word for word in words if word not in set(stopwords.words('english'))]
    pos_tag=nltk.pos_tag(words)
    print(pos_tag)

[('could', 'MD'), ('blamed', 'VB'), ('first', 'RB'), ('lose', 'JJ'), ('heart', 'NN'), ('I', 'PRP'), (',', ','), ('commander', 'NN'), (',', ','), ('shared', 'VBD'), ('exhausting', 'VBG'), ('marches', 'NNS'), ('perilous', 'JJ'), ('campaigns', 'NNS'), (';', ':'), ('would', 'MD'), ('natural', 'JJ'), ('enough', 'RB'), ('done', 'VBN'), ('work', 'NN'), ('merely', 'RB'), ('others', 'NNS'), ('reap', 'VBP'), ('reward', 'NN'), ('.', '.')]
[('But', 'CC'), ('.', '.')]
[('You', 'PRP'), ('I', 'PRP'), (',', ','), ('gentlemen', 'NNS'), (',', ','), ('shared', 'VBD'), ('labour', 'NN'), ('shared', 'VBN'), ('danger', 'NN'), (',', ','), ('rewards', 'NNS'), ('us', 'PRP'), ('.', '.')]
[('The', 'DT'), ('conquered', 'JJ'), ('territory', 'NN'), ('belongs', 'NNS'), (';', ':'), ('ranks', 'VBZ'), ('governors', 'NNS'), ('chosen', 'VBP'), (';', ':'), ('already', 'RB'), ('greater', 'JJR'), ('part', 'NN'), ('treasure', 'NN'), ('passes', 'VBZ'), ('hands', 'NNS'), (',', ','), ('Asia', 'NNP'), ('overrun', 'UH'), (',', ','