# Part of Speech Tagging (POST )

## What is it?


Part-of-Speech Tagging (POS Tagging) is the process of labeling each word in a sentence with its corresponding part of speech, such as noun, verb, adjective, adverb, etc.

This helps in understanding the grammatical structure of the sentence.

Key Parts of Speech:

* Noun (NN): Person, place, thing, or idea (e.g., “dog”, “city”).
* Verb (VB): Action or state of being (e.g., “run”, “is”).
* Adjective (JJ): Describes or modifies a noun (e.g., “big”, “blue”).
* Adverb (RB): Describes or modifies a verb, adjective, or other adverb (e.g., “quickly”, “very”).
* Pronoun (PRP): Replaces a noun (e.g., “he”, “they”).
* Preposition (IN): Shows relationships between nouns (e.g., “in”, “on”).
* Conjunction (CC): Connects words, phrases, or clauses (e.g., “and”, “but”).
* Determiner (DT): Introduces a noun (e.g., “the”, “a”).
* Interjection (UH): Expresses strong emotion (e.g., “oh”, “wow”).

## What for?

Uses of POS Tagging

	1.	Syntactic Parsing, Helps in building the syntactic structure of sentences (parsing), which is essential for understanding the grammatical relationships between words.
	
	2.	Named Entity Recognition (NER), Assists in identifying proper nouns, which are often part of named entities like names of people, organizations, locations, etc.
	
	3.	Information Retrieval, Enhances search engines by improving query understanding and matching relevant documents more effectively.
	
	4.	Machine Translation, Improves the accuracy of translating text from one language to another by preserving grammatical structures.
	
	5.	Text-to-Speech Systems, Aids in determining the correct pronunciation of words based on their part of speech (e.g., “lead” as a noun vs. “lead” as a verb).
	
	6.	Word Sense Disambiguation, Helps in resolving ambiguity when a word has multiple meanings, by using its part of speech and context.
	
	7.	Sentiment Analysis, Improves the accuracy of sentiment analysis by considering the role of each word in a sentence (e.g., adjectives often carry sentiment).
	
	8.	Coreference Resolution, Assists in identifying when different words refer to the same entity in a text, which is crucial for understanding the meaning of the text.
	
	9.	Text Summarizatio, Helps in identifying key elements of a sentence, which can be useful for generating summaries.
	
	10.	Grammar Checking and Correction, Used in applications that check and correct grammatical errors in text.
  

## Example

## How to do it?

### Packages

* NLTK
* Spacy
* TextBlob
* Gensim

In [28]:
import nltk
import spacy
from textblob import TextBlob
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/aymanelsayeed/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

### Examples

In [3]:
example = "The quick brown fox jumps over the lazy dog"

# Tokenize the sentence
tokens = nltk.word_tokenize(example)

# Apply POS tagging
tagged = nltk.pos_tag(tokens)

tagged

[('The', 'DT'),
 ('quick', 'JJ'),
 ('brown', 'NN'),
 ('fox', 'NN'),
 ('jumps', 'VBZ'),
 ('over', 'IN'),
 ('the', 'DT'),
 ('lazy', 'JJ'),
 ('dog', 'NN')]

## Practise

### Quiz 1

* Read one of the datasets from the assets folder


Find the number of
* VB
* VBD
* VBG
* NNP 
* ..etc

per text

In [1]:
# write answer here

In [8]:
# find number of verbs per text

### Quiz 2

Check the distribution of each part of speech tag in the dataset