The main purpose of this code is to demonstrate how to use the Natural Language Toolkit (NLTK) library in Python to perform a series of natural language processing (NLP) tasks on a sample text. Specifically, it performs tokenization, part-of-speech tagging, and named entity recognition, and then displays the results.

Import NLTK and Necessary Functions

Download Required NLTK Data Files:


In [None]:
import nltk
from nltk import word_tokenize, pos_tag, ne_chunk

# Download necessary NLTK data files
nltk.download('punkt')  # Tokenizer
nltk.download('averaged_perceptron_tagger')  # Part-of-speech tagger
nltk.download('maxent_ne_chunker')  # Named Entity Recognizer
nltk.download('words')  # Corpus of words for NER

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping chunkers/maxent_ne_chunker.zip.
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


True

Define Sample Text

In [None]:
# Sample text to analyze
text = "Bill Gates, the co-founder of Microsoft, was born on October 28, 1955."

Tokenize the Text / Perform Part-of-Speech Tagging

In [None]:
# Tokenize the text into individual words
tokens = word_tokenize(text)

# Perform part-of-speech tagging on the tokens
tags = pos_tag(tokens)

Print Named Entities

In [None]:
# Perform named entity recognition using the part-of-speech tagged tokens
entities = ne_chunk(tags)

# Print the named entities in a tree format
print(entities)

(S
  (PERSON Bill/NNP)
  (GPE Gates/NNP)
  ,/,
  the/DT
  co-founder/NN
  of/IN
  (ORGANIZATION Microsoft/NNP)
  ,/,
  was/VBD
  born/VBN
  on/IN
  October/NNP
  28/CD
  ,/,
  1955/CD
  ./.)


Pretty-Print Named Entities

In [None]:
# Pretty-print the named entities for better visualization
entities.pprint()

(S
  (PERSON Bill/NNP)
  (GPE Gates/NNP)
  ,/,
  the/DT
  co-founder/NN
  of/IN
  (ORGANIZATION Microsoft/NNP)
  ,/,
  was/VBD
  born/VBN
  on/IN
  October/NNP
  28/CD
  ,/,
  1955/CD
  ./.)
