# POS-TAGGING APLICATION

**Part-of-Speech (POS) Tagging** is a fundamental task in Natural Language Processing (NLP) that involves **assigning a grammatical category (or "tag") to each word in a given text.**

**Objective:** To identify the lexical category of each word, such as noun, verb, adjective, adverb, pronoun, preposition, conjunction, interjection, etc., based on its definition and its context within the sentence.

**How it Works:**

* **Input:** A sequence of words (a sentence or a chunk of text).
* **Output:** A sequence of words, where each word is paired with its corresponding POS tag.

**Example:**

* **Sentence:** "The quick brown fox jumps over the lazy dog."
* **POS Tagged Output:**
    * "The": Determiner (DT)
    * "quick": Adjective (JJ)
    * "brown": Adjective (JJ)
    * "fox": Noun (NN)
    * "jumps": Verb (VBZ)
    * "over": Preposition (IN)
    * "the": Determiner (DT)
    * "lazy": Adjective (JJ)
    * "dog": Noun (NN)

**Importance/Applications:**

* **Foundation for Higher-Level NLP Tasks:** POS tagging is a crucial preprocessing step for many more complex NLP applications.
* **Word Sense Disambiguation:** Helps to understand the correct meaning of a word that might have multiple senses (e.g., "bank" as a financial institution vs. "bank" as a river bank).
* **Syntactic Parsing:** Essential for building parse trees and understanding the grammatical structure of sentences.
* **Named Entity Recognition (NER):** Helps to identify proper nouns, locations, organizations, etc.
* **Machine Translation:** Provides grammatical information that can guide translation.
* **Information Extraction:** Aids in extracting specific data from text.
* **Text-to-Speech Systems:** Helps determine pronunciation and intonation (e.g., "read" - present vs. past tense).

**Challenges:**

* **Ambiguity:** Many words can function as different parts of speech depending on the context (e.g., "book" as a noun vs. "book" as a verb).
* **New Words/Slang:** Models need to be robust enough to handle words not seen during training.

**Common Approaches:**

* **Rule-Based Tagging:** Uses hand-crafted rules based on suffixes, prefixes, and context.
* **Stochastic/Statistical Tagging:** Uses probability based on how frequently a word appears with a certain tag and how frequently one tag follows another. (e.g., Hidden Markov Models - HMMs, Maximum Entropy Models).
* **Neural Network-Based Tagging:** Uses deep learning models (like RNNs, LSTMs, Transformers) to learn complex patterns from data.

In [1]:
import nltk

nltk.download('averaged_perceptron_tagger')

nltk.download('punkt')

text = nltk.word_tokenize("I will buy ice cream.")

nltk.pos_tag(text)

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Felipe\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Felipe\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


[('I', 'PRP'),
 ('will', 'MD'),
 ('buy', 'VB'),
 ('ice', 'JJ'),
 ('cream', 'NN'),
 ('.', '.')]