 step-by-step guide to performing POS (Part-of-Speech) tagging on custom datasets using Python, with detailed explanations and comments throughout the code. We will use the popular spaCy library, which is very efficient for NLP tasks like POS tagging.

In [13]:
import spacy

# Load the pre-trained spaCy model (English model)
nlp = spacy.load("en_core_web_sm")

# Custom dataset (you can replace this with your own dataset)
custom_text = """
Natural language processing (NLP) is a field of computer science, artificial intelligence,
and computational linguistics concerned with the interactions between computers and human language.
The ultimate goal of NLP is to enable computers to understand, interpret, and generate human language.
"""

# Process the text with spaCy NLP pipeline
doc = nlp(custom_text)

# Perform POS tagging
print("POS tagging results:")

# Iterate over each token (word) in the processed document
for token in doc:
    # Print the word and its POS tag
    print(f"Word: {token.text}, POS Tag: {token.pos_}, Detailed Tag: {token.tag_}")

# Optional: Visualize the POS tags (requires 'spacy' visualizer package)
# import spacy.displacy
# spacy.displacy.render(doc, style="dep")  # To visualize the dependency parsing


POS tagging results:
Word: 
, POS Tag: SPACE, Detailed Tag: _SP
Word: Natural, POS Tag: ADJ, Detailed Tag: JJ
Word: language, POS Tag: NOUN, Detailed Tag: NN
Word: processing, POS Tag: NOUN, Detailed Tag: NN
Word: (, POS Tag: PUNCT, Detailed Tag: -LRB-
Word: NLP, POS Tag: PROPN, Detailed Tag: NNP
Word: ), POS Tag: PUNCT, Detailed Tag: -RRB-
Word: is, POS Tag: AUX, Detailed Tag: VBZ
Word: a, POS Tag: DET, Detailed Tag: DT
Word: field, POS Tag: NOUN, Detailed Tag: NN
Word: of, POS Tag: ADP, Detailed Tag: IN
Word: computer, POS Tag: NOUN, Detailed Tag: NN
Word: science, POS Tag: NOUN, Detailed Tag: NN
Word: ,, POS Tag: PUNCT, Detailed Tag: ,
Word: artificial, POS Tag: ADJ, Detailed Tag: JJ
Word: intelligence, POS Tag: NOUN, Detailed Tag: NN
Word: ,, POS Tag: PUNCT, Detailed Tag: ,
Word: 
, POS Tag: SPACE, Detailed Tag: _SP
Word: and, POS Tag: CCONJ, Detailed Tag: CC
Word: computational, POS Tag: ADJ, Detailed Tag: JJ
Word: linguistics, POS Tag: NOUN, Detailed Tag: NNS
Word: concerned, POS

Explanation of Code:

Importing spaCy:

 We import spacy, the library that provides various NLP capabilities like tokenization, POS tagging, named entity recognition (NER), etc.

Loading the spaCy Model:

The en_core_web_sm model is a small English language model that contains pre-trained word vectors and other data used for NLP tasks. By loading this model, we can perform tasks like POS tagging on any English text.

Custom Dataset:

 Here, I’ve used a custom multi-sentence text as an example. You can replace custom_text with your own dataset.

 Processing the Text:

 We pass the custom_text through the spaCy NLP pipeline to get a doc object. The doc is a processed object that contains tokens, which we can analyze further (such as obtaining their POS tags).

 POS Tagging:

 Each word in the text is processed by spaCy. We iterate through the tokens in the doc and print each word along with its POS tag (token.pos_) and the detailed POS tag (token.tag_).

token.pos_: This is the high-level part of speech, such as noun, verb, adjective.
token.tag_: This provides more detailed information, like whether it’s a singular noun, past tense verb, etc.


Explanation of Output:

Each word in the custom text is tagged with its POS. For example:


"Natural" is an adjective (ADJ).

"language" is a noun (NOUN).

"processing" is a noun (NOUN).

Optional Visualization (Dependency Parsing):

You can visualize the syntactic structure of the text with spaCy's displacy.render() method. This will display the dependency parsing tree that shows the relationship between words.

In [14]:
import spacy.displacy
spacy.displacy.render(doc, style="dep")


This will display a visual diagram of the syntactic dependencies between words, helping you better understand how words are connected in the text.

Conclusion:

In this practical, we used spaCy to perform POS tagging on a custom dataset. You can replace custom_text with your own text and analyze the POS tags for your dataset. Additionally, visualizing the dependencies helps to understand the relationships between the words in the text.