# NLTK

NLTK (Natural Language Toolkit) is a Python library widely used for natural language processing (NLP) tasks. It provides a comprehensive set of tools and resources for tasks such as tokenization, stemming, tagging, parsing, sentiment analysis, and more. Here's an overview of some common NLTK functionalities with examples:

### 1. Tokenization:
Tokenization refers to splitting text into individual words or tokens.

In [2]:
import nltk

nltk.download('punkt')
# nltk.download('all')

text = "This is a sample sentence for tokenization."
tokens = nltk.word_tokenize(text)
print(tokens)


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Barzan\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.


['This', 'is', 'a', 'sample', 'sentence', 'for', 'tokenization', '.']


### 2. Part-of-Speech (POS) Tagging:
POS tagging involves labeling each word in a sentence with its corresponding part of speech, such as noun, verb, adjective, etc.

In [5]:
import nltk

# nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

text = "NLTK provides various NLP tools and resources."
tokens = nltk.word_tokenize(text)
pos_tags = nltk.pos_tag(tokens)
print(pos_tags)


[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Barzan\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.


[('NLTK', 'NNP'), ('provides', 'VBZ'), ('various', 'JJ'), ('NLP', 'NNP'), ('tools', 'NNS'), ('and', 'CC'), ('resources', 'NNS'), ('.', '.')]


### 3. Stemming:
Stemming reduces words to their base or root form by removing suffixes.

In [6]:
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
word = "running"
stemmed_word = stemmer.stem(word)
print(stemmed_word)


run


### 4. Sentiment Analysis:
Sentiment analysis determines the sentiment (positive, negative, neutral) of a given text.

In [8]:
from nltk.sentiment import SentimentIntensityAnalyzer

nltk.download('vader_lexicon')

sia = SentimentIntensityAnalyzer()
text = "This movie is great!"
sentiment_scores = sia.polarity_scores(text)
print(sentiment_scores)


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Barzan\AppData\Roaming\nltk_data...


{'neg': 0.0, 'neu': 0.406, 'pos': 0.594, 'compound': 0.6588}


### 5. Chunking and Named Entity Recognition (NER):
Chunking groups words together into "chunks" based on their part of speech, and NER identifies named entities like person names, organizations, locations, etc., in a sentence.

In [10]:
import nltk

nltk.download('maxent_ne_chunker')
nltk.download('words')

text = "John works at Google in London."
tokens = nltk.word_tokenize(text)
pos_tags = nltk.pos_tag(tokens)

chunked = nltk.ne_chunk(pos_tags)
print(chunked)

# Extract named entities
named_entities = []
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'PERSON' or t.label() == 'ORGANIZATION'):
    named_entities.append(' '.join([word for word, tag in subtree.leaves()]))

print(named_entities)


[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\Barzan\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping chunkers\maxent_ne_chunker.zip.
[nltk_data] Downloading package words to
[nltk_data]     C:\Users\Barzan\AppData\Roaming\nltk_data...


(S
  (PERSON John/NNP)
  works/VBZ
  at/IN
  (ORGANIZATION Google/NNP)
  in/IN
  (GPE London/NNP)
  ./.)
['John', 'Google']


[nltk_data]   Unzipping corpora\words.zip.


These examples cover just a few of the many features provided by NLTK. You can explore the official NLTK documentation and resources to learn more about its capabilities and utilize them for various NLP tasks.