## Introduction 

Speech tagging, also known as part-of-speech (POS) tagging, is a foundational task in natural language processing (NLP). In speech tagging, each word in a text is assigned a tag representing its grammatical role, such as noun, verb, adjective, etc.

`Example:`

* Input: A sequence of words (e.g., "The dog barked loudly").
* Output: A sequence of tags (e.g., "DET NOUN VERB ADV").


In [3]:
# a nice and long quote

paragraph = """
    the best way to predict the future is to create it,
    and the best way to create the future is to learn from the past.
    so don't be afraid to learn from the past, and don't be afraid to create the future."""


In [None]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords


# step1: tokenize the paragraph
words = word_tokenize(paragraph)
print(words)

In [None]:

# step2: apply stop words

stopwords = set(stopwords.words('english'))
print(stopwords)

In [None]:
filter_words = [word for word in words if word.lower() not in stopwords]
print(filter_words)

In [None]:
# step3: find the pos tag of the word

for i in range(len(words)):
    pos_tag = nltk.pos_tag(words[i])
    print(words[i], pos_tag(words[i]))

In [None]:
print(nltk.pos_tag("Taj Mahal is a beautiful Monument".split()))

In [2]:
# we can use also spacy 

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Taj Mahal is a beautiful monument.")
for token in doc:
    print(f"{token.text} -> {token.pos_} ({token.tag_})")


Taj -> PROPN (NNP)
Mahal -> PROPN (NNP)
is -> AUX (VBZ)
a -> DET (DT)
beautiful -> ADJ (JJ)
monument -> NOUN (NN)
. -> PUNCT (.)


In [4]:
doc = nlp(paragraph)
for token in doc:
    print(f"{token.text} -> {token.pos_} ({token.tag_})")


     -> SPACE (_SP)
the -> DET (DT)
best -> ADJ (JJS)
way -> NOUN (NN)
to -> PART (TO)
predict -> VERB (VB)
the -> DET (DT)
future -> NOUN (NN)
is -> AUX (VBZ)
to -> PART (TO)
create -> VERB (VB)
it -> PRON (PRP)
, -> PUNCT (,)

     -> SPACE (_SP)
and -> CCONJ (CC)
the -> DET (DT)
best -> ADJ (JJS)
way -> NOUN (NN)
to -> PART (TO)
create -> VERB (VB)
the -> DET (DT)
future -> NOUN (NN)
is -> AUX (VBZ)
to -> PART (TO)
learn -> VERB (VB)
from -> ADP (IN)
the -> DET (DT)
past -> NOUN (NN)
. -> PUNCT (.)

     -> SPACE (_SP)
so -> ADV (RB)
do -> AUX (VBP)
n't -> PART (RB)
be -> AUX (VB)
afraid -> ADJ (JJ)
to -> PART (TO)
learn -> VERB (VB)
from -> ADP (IN)
the -> DET (DT)
past -> NOUN (NN)
, -> PUNCT (,)
and -> CCONJ (CC)
do -> AUX (VBP)
n't -> PART (RB)
be -> AUX (VB)
afraid -> ADJ (JJ)
to -> PART (TO)
create -> VERB (VB)
the -> DET (DT)
future -> NOUN (NN)
. -> PUNCT (.)


"""
# POS Tags and Their Descriptions

- **CC**: Coordinating conjunction  
- **CD**: Cardinal digit  
- **DT**: Determiner  
- **EX**: Existential "there" (e.g., "there is")  
- **FW**: Foreign word  
- **IN**: Preposition/subordinating conjunction  
- **JJ**: Adjective (e.g., "big")  
- **JJR**: Adjective, comparative (e.g., "bigger")  
- **JJS**: Adjective, superlative (e.g., "biggest")  
- **LS**: List marker (e.g., "1)")  
- **MD**: Modal (e.g., "could", "will")  
- **NN**: Noun, singular (e.g., "desk")  
- **NNS**: Noun, plural (e.g., "desks")  
- **NNP**: Proper noun, singular (e.g., "Harrison")  
- **NNPS**: Proper noun, plural (e.g., "Americans")  
- **PDT**: Predeterminer (e.g., "all the kids")  
- **POS**: Possessive ending (e.g., "parent's")  
- **PRP**: Personal pronoun (e.g., "I", "he", "she")  
- **PRP$**: Possessive pronoun (e.g., "my", "his", "hers")  
- **RB**: Adverb (e.g., "very", "silently")  
- **RBR**: Adverb, comparative (e.g., "better")  
- **RBS**: Adverb, superlative (e.g., "best")  
- **RP**: Particle (e.g., "give up")  
- **TO**: To (e.g., "to go 'to' the store")  
- **UH**: Interjection (e.g., "errrrrrrrm")  
- **VB**: Verb, base form (e.g., "take")  
- **VBD**: Verb, past tense (e.g., "took")  
- **VBG**: Verb, gerund/present participle (e.g., "taking")  
- **VBN**: Verb, past participle (e.g., "taken")  
- **VBP**: Verb, singular present, non-3rd person (e.g., "take")  
- **VBZ**: Verb, 3rd person singular present (e.g., "takes")  
- **WDT**: Wh-determiner (e.g., "which")  
- **WP**: Wh-pronoun (e.g., "who", "what")  
- **WP$**: Possessive wh-pronoun (e.g., "whose")  
- **WRB**: Wh-adverb (e.g., "where", "when") 