## Part-of-Speech Tagging

Part-of-Speech (POS) tagging is an essential process in Natural Language Processing (NLP) where each word in a text is assigned a tag that indicates its part of speech, such as noun, verb, adjective, etc. This process involves not only identifying the part of speech of each word but also understanding the context in which the word is used, as many words can function as more than one part of speech depending on their usage in a sentence.

Key aspects of POS tagging include:

- Tagging Accuracy: The accuracy of POS tagging depends on the complexity of the language, the variety of the text, and the effectiveness of the NLP algorithm. Contextual understanding is crucial since many words can serve as multiple parts of speech.
- Tags Set: Different POS tagging systems use different sets of tags. The most common set used in English is the Penn Treebank tag set, which includes tags for nouns, verbs, adjectives, adverbs, prepositions, conjunctions, and other parts of speech.
- Applications: POS tagging is used in various NLP tasks such as parsing, named entity recognition, sentiment analysis, machine translation, and text-to-speech conversion. It helps in understanding sentence structure and meaning.
- Machine Learning Models: Modern POS taggers often employ machine learning models, especially those based on neural networks, which can learn complex patterns in language and perform tagging with high accuracy.
- Linguistic Analysis: POS tagging is a fundamental step in linguistic analysis of texts, enabling deeper analysis such as dependency parsing and syntactic tree construction.

- Part of speech or POS is a grammatical role that explains how a particular word is used in a sentence. There are typically eight parts of speech:

Noun
Pronoun
Adjective
Verb
Adverb
Preposition
Conjunction
Interjection
Part-of-speech tagging is the process of assigning a POS tag to each token depending on its usage in the sentence. POS tags are useful for assigning a syntactic category like noun or verb to each word.

In spaCy, POS tags are available as an attribute on the Token object:


In [1]:
import spacy
import sys

nlp = spacy.load("en_core_web_sm")

In [4]:
about_text = (
    "Gus Proto is a Python developer currently"
    " working for a London-based Fintech"
    " company. He is interested in learning"
    " Natural Language Processing."
)


about_doc = nlp(about_text)
for token in about_doc:
    print(
        f"""
          TOKEN: {str(token)}
          ===
          TAG: {str(token.tag_):10}
          ===
          POS: {str(token.pos_):10} 
          ===
          DEP: {str(token.dep_):10}
          EXPLANATION: {spacy.explain(token.tag_)}
          """
    )


          TOKEN: Gus
          ===
          TAG: NNP       
          ===
          POS: PROPN      
          ===
          DEP: compound  
          EXPLANATION: noun, proper singular
          

          TOKEN: Proto
          ===
          TAG: NNP       
          ===
          POS: PROPN      
          ===
          DEP: nsubj     
          EXPLANATION: noun, proper singular
          

          TOKEN: is
          ===
          TAG: VBZ       
          ===
          POS: AUX        
          ===
          DEP: ROOT      
          EXPLANATION: verb, 3rd person singular present
          

          TOKEN: a
          ===
          TAG: DT        
          ===
          POS: DET        
          ===
          DEP: det       
          EXPLANATION: determiner
          

          TOKEN: Python
          ===
          TAG: NNP       
          ===
          POS: PROPN      
          ===
          DEP: compound  
          EXPLANATION: noun, proper singular
          

 

In [6]:
# By using POS tags, you can extract a particular category of words:
nouns = []
adjectives = []
for token in about_doc:
    if token.pos_ == "NOUN":
        nouns.append(token)
    if token.pos_ == "ADJ":
        adjectives.append(token)

print("Nouns:", nouns)
print("Adjectives:", adjectives)

Nouns: [developer, company]
Adjectives: [interested]
