In [12]:
import os
import nltk
import nltk.corpus

In [13]:
print(os.listdir(nltk.data.find("corpora")))

['omw-1.4.zip', 'stopwords', 'stopwords.zip', 'wordnet.zip']


# NLTK, short for Natural Language Toolkit, is a Python library designed to work with human language data. It provides easy-to-use interfaces and resources for working with and analyzing text data. NLTK is widely used for educational purposes, research, and practical NLP applications.


#### 1. **Text Processing:** NLTK provides tools for tokenization (splitting text into words or sentences), stemming (reducing words to their root form), and lemmatization (reducing words to their base or dictionary form).

#### 2. **Part-of-Speech Tagging:** NLTK includes pre-trained models for part-of-speech tagging, which involves labeling words in a sentence with their grammatical parts of speech (e.g., noun, verb, adjective).

#### 3. **Named Entity Recognition (NER):** It offers named entity recognition tools that can identify entities such as names of people, organizations, locations, and more within text.

#### 4. **Sentiment Analysis:** NLTK includes resources for sentiment analysis, allowing you to determine the sentiment (positive, negative, neutral) of a piece of text.

#### 5. **Corpora and Lexical Resources:** NLTK provides access to various corpora (collections of text) and lexical resources (dictionaries, word lists) that are useful for linguistic analysis and research.

#### 6. **Text Classification:** It supports text classification tasks, making it suitable for tasks like spam email detection, sentiment classification, and more.

#### 7. **Language Processing Pipelines:** NLTK allows you to build processing pipelines for various NLP tasks, making it flexible for custom NLP workflows.



# 1. Tokenization:
**Tokenization is the process of splitting text into words or sentences. NLTK provides tools for both word and sentence tokenization.**

In [15]:
text = "NLTK is a powerful library for natural language processing. It is widely used in research and education."

# Tokenize the text into sentences
sentences = nltk.sent_tokenize(text)

# Tokenize the text into words
words = nltk.word_tokenize(text)

print("Sentences:")
print(sentences)
print("\nWords:")
print(words)
print(words)
len(words)

Sentences:
['NLTK is a powerful library for natural language processing.', 'It is widely used in research and education.']

Words:
['NLTK', 'is', 'a', 'powerful', 'library', 'for', 'natural', 'language', 'processing', '.', 'It', 'is', 'widely', 'used', 'in', 'research', 'and', 'education', '.']
['NLTK', 'is', 'a', 'powerful', 'library', 'for', 'natural', 'language', 'processing', '.', 'It', 'is', 'widely', 'used', 'in', 'research', 'and', 'education', '.']


19

# 2. Part-of-Speech (POS) Tagging:
**NLTK can identify the grammatical parts of speech for words in a sentence.**

In [16]:
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger', download_dir='path_to_download_directory')


[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\HP\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     path_to_download_directory...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.


True

In [17]:
# Sample text
text = "NLTK is a powerful library for natural language processing."

# Tokenize the text into words
words = nltk.word_tokenize(text)

# Perform part-of-speech tagging
pos_tags = nltk.pos_tag(words)

print("Part-of-Speech Tags:")
print(pos_tags)

Part-of-Speech Tags:
[('NLTK', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('powerful', 'JJ'), ('library', 'NN'), ('for', 'IN'), ('natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('.', '.')]


# 3. Named Entity Recognition (NER):
**NLTK can recognize named entities such as names of people, organizations, locations, and more.**

In [19]:
nltk.download('maxent_ne_chunker')
nltk.download('maxent_ne_chunker', download_dir='path_to_download_directory')


[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\HP\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping chunkers\maxent_ne_chunker.zip.
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     path_to_download_directory...
[nltk_data]   Unzipping chunkers\maxent_ne_chunker.zip.


True

In [21]:
nltk.download('words')
nltk.download('words', download_dir='path_to_download_directory')


[nltk_data] Downloading package words to
[nltk_data]     C:\Users\HP\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\words.zip.
[nltk_data] Downloading package words to path_to_download_directory...
[nltk_data]   Unzipping corpora\words.zip.


True

In [22]:
import nltk

# Sample text
text = "Apple Inc. is headquartered in Cupertino, California."

# Tokenize the text into words
words = nltk.word_tokenize(text)

# Perform named entity recognition (NER)
ner_tags = nltk.ne_chunk(nltk.pos_tag(words))

print("Named Entity Recognition Tags:")
print(ner_tags)


Named Entity Recognition Tags:
(S
  (PERSON Apple/NNP)
  (ORGANIZATION Inc./NNP)
  is/VBZ
  headquartered/VBN
  in/IN
  (GPE Cupertino/NNP)
  ,/,
  (GPE California/NNP)
  ./.)


# 4. Sentiment Analysis:
**NLTK provides resources for performing sentiment analysis to determine the sentiment of a piece of text.**

In [24]:
nltk.download('vader_lexicon')
nltk.download('vader_lexicon', download_dir='path_to_download_directory')


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\HP\AppData\Roaming\nltk_data...
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     path_to_download_directory...


True

In [25]:
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Sample text
text = "NLTK is a fantastic library for natural language processing."

# Initialize the sentiment analyzer
analyzer = SentimentIntensityAnalyzer()

# Analyze the sentiment of the text
sentiment_scores = analyzer.polarity_scores(text)

print("Sentiment Analysis Scores:")
print(sentiment_scores)


Sentiment Analysis Scores:
{'neg': 0.0, 'neu': 0.496, 'pos': 0.504, 'compound': 0.7269}
