## Parts of Speech (POS) Tagging

### 1. **Context**
**Parts of Speech (POS) Tagging** is an essential task in Natural Language Processing (NLP) that involves identifying and labeling words in a sentence according to their part of speech, such as noun, verb, adjective, etc. POS tagging provides the grammatical structure of a sentence, which is crucial for many downstream tasks like syntactic parsing, named entity recognition, and information extraction.

In this notebook, we will explore how to perform **POS tagging** using the **NLTK (Natural Language Toolkit)** library.

---

### 2. **Install NLTK**
Before we begin, ensure that NLTK is installed on your system:

```bash
!pip install nltk

### 3. Download NLTK Resources
We need to download certain resources from NLTK, including the POS tagger and tokenizers:

In [12]:
import nltk
nltk.downloader.update()
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package punkt_tab to C:\Users\IT
[nltk_data]     Support\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     C:\Users\IT Support\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger_eng.zip.


True

### 4. POS Tagging Example
Let's start by writing a simple example to perform POS tagging on a sentence.

In [13]:
from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Example sentence
sentence = "NLTK is a leading platform for building Python programs to work with human language data."

# Tokenize the sentence
words = word_tokenize(sentence)

# Perform POS tagging
tagged_words = pos_tag(words)

print("Tagged Words:", tagged_words)

Tagged Words: [('NLTK', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('leading', 'VBG'), ('platform', 'NN'), ('for', 'IN'), ('building', 'VBG'), ('Python', 'NNP'), ('programs', 'NNS'), ('to', 'TO'), ('work', 'VB'), ('with', 'IN'), ('human', 'JJ'), ('language', 'NN'), ('data', 'NNS'), ('.', '.')]


### 5. Explanation of POS Tags
The pos_tag function assigns each word a POS tag from the Penn Treebank tagset. Here are some examples of the tags you might see:

* NNP: Proper Noun, Singular
* VBZ: Verb, 3rd person singular present
* DT: Determiner
* VBG: Verb, gerund or present participle
* IN: Preposition or subordinating conjunction
* JJ: Adjective
* NNS: Noun, plural
Some Common POS Tags:
* NN: Noun, singular
* VB: Verb, base form
* JJ: Adjective
* RB: Adverb
* PRP: Personal pronoun
* DT: Determiner


### 6. POS Tagging for Larger Texts
POS tagging can be applied to larger pieces of text as well. Here's an example for tagging a paragraph.

In [14]:
# Example paragraph
paragraph = """
Natural Language Processing (NLP) is a subfield of artificial intelligence. It focuses on the interaction between 
computers and human language, and it's used to process and analyze large amounts of natural language data.
"""

# Tokenize and POS tag the paragraph
words_paragraph = word_tokenize(paragraph)
tagged_paragraph = pos_tag(words_paragraph)

print("Tagged Paragraph:", tagged_paragraph)

Tagged Paragraph: [('Natural', 'JJ'), ('Language', 'NNP'), ('Processing', 'NNP'), ('(', '('), ('NLP', 'NNP'), (')', ')'), ('is', 'VBZ'), ('a', 'DT'), ('subfield', 'NN'), ('of', 'IN'), ('artificial', 'JJ'), ('intelligence', 'NN'), ('.', '.'), ('It', 'PRP'), ('focuses', 'VBZ'), ('on', 'IN'), ('the', 'DT'), ('interaction', 'NN'), ('between', 'IN'), ('computers', 'NNS'), ('and', 'CC'), ('human', 'JJ'), ('language', 'NN'), (',', ','), ('and', 'CC'), ('it', 'PRP'), ("'s", 'VBZ'), ('used', 'VBN'), ('to', 'TO'), ('process', 'VB'), ('and', 'CC'), ('analyze', 'VB'), ('large', 'JJ'), ('amounts', 'NNS'), ('of', 'IN'), ('natural', 'JJ'), ('language', 'NN'), ('data', 'NNS'), ('.', '.')]


### 7. Handling Ambiguities in POS Tagging
POS tagging can sometimes be ambiguous, and the same word can have different tags based on context. For example:

In [15]:
sentence_ambiguous = "The bank is next to the river."
words_ambiguous = word_tokenize(sentence_ambiguous)
tagged_ambiguous = pos_tag(words_ambiguous)

print("Tagged Ambiguous Sentence:", tagged_ambiguous)

Tagged Ambiguous Sentence: [('The', 'DT'), ('bank', 'NN'), ('is', 'VBZ'), ('next', 'JJ'), ('to', 'TO'), ('the', 'DT'), ('river', 'NN'), ('.', '.')]


Here, the word "bank" is tagged as a noun (NN). However, depending on the context, it could also refer to a financial institution, requiring different tagging. In such cases, contextual POS tagging or a more advanced model can be used to resolve ambiguity.

### 8. Conclusion

POS tagging is a fundamental task in text processing that helps in understanding the syntactic structure of a sentence. The NLTK library makes it easy to perform POS tagging with a simple API. By labeling words with their respective parts of speech, we can perform more advanced NLP tasks like syntactic parsing and information extraction.

#### Key Takeaways:
- **POS Tagging** is used to identify the grammatical structure of sentences.
- The **Penn Treebank tagset** is commonly used for tagging words.
- **POS tagging** can help in resolving ambiguities and improving NLP models.

---

### 9. Further Enhancements

While NLTK provides an easy-to-use POS tagging tool, there are other advanced methods for improving tagging accuracy:

- **Contextual POS Tagging**: Use models like **spaCy** or **BERT** that consider word context for better tagging accuracy.
- **Custom Tagging**: Train custom POS taggers for specialized domains or languages.
- **Rule-based Tagging**: Combine POS tagging with rules to handle ambiguities more effectively.