<a href="https://colab.research.google.com/github/Aditi-24-05/POS_tagging/blob/main/POS_Tagging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **POS Tagging**
**POS tagging is a process in natural language processing (NLP) where each word in a text is labeled with its corresponding part of speech which includes  nouns, verbs, adjectives, and other grammatical categories.**

**By: Aditi Rawat**

# **Importing libraries required for the project**

In [None]:
import nltk
from nltk.corpus import brown
from nltk import ne_chunk
nltk.download('brown')
nltk.download('punkt')

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

# **Training the bigram tagger and using the unigram tagger as "backoff".**

"Backoff" strategy, utilizes multiple taggers in sequence, falling back on less contextually sensitive taggers when more contextually sensitive ones cannot provide a tag.

In [None]:
brown_tagged_sents = brown.tagged_sents()
bigram_tagger = nltk.BigramTagger(brown_tagged_sents, backoff=nltk.UnigramTagger(brown_tagged_sents))

# **Get user Input**

In [None]:
sentence = input("Enter a sentence: ")
# Tokenizing  user input and tag it using the bigram tagger
words = nltk.word_tokenize(sentence)
tagged_words = bigram_tagger.tag(words)

Enter a sentence: The name of my sweet cat is Kitty 


# **Extract named entities**

In [None]:
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('averaged_perceptron_tagger')

tagged = nltk.pos_tag(words)
ne_tagged = ne_chunk(tagged)

named_entities = []
for entity in ne_tagged:
    if isinstance(entity, nltk.tree.Tree) and entity.label() == 'PERSON':
        name = ' '.join([word for word, _ in entity.leaves()])
        named_entities.append(name)

[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


# **Update POS tags considering both taggers and NER**

In [None]:
for i, (word, tag) in enumerate(tagged_words):
    if word in named_entities:
        tagged_words[i] = (word, 'NNP')

# **Display the output**

In [None]:
for word, tag in tagged_words:
    print(f"{word}/{tag}", end=" ")

The/AT name/NN of/IN my/PP$ sweet/JJ cat/NN is/BEZ Kitty/NNP 