# üìò Facebook Sentiment Analysis using NLTK

## 1Ô∏è‚É£ Introduction

**Sentiment Analysis** is a Natural Language Processing (NLP) technique used to determine the emotional tone behind a body of text. It is widely used in analyzing user opinions on social media platforms such as Facebook, Twitter, and online reviews.

In this mini project, we demonstrate how sentiment analysis can be performed on text data using **NLTK (Natural Language Toolkit)**. The project covers essential NLP steps including tokenization, stemming, lemmatization, part-of-speech tagging, and sentiment scoring using the VADER sentiment analyzer.

## 2Ô∏è‚É£ Dataset Description

The dataset used in this project is a text file named kindle.txt, which contains English text data.
Although this dataset is not directly scraped from Facebook, the techniques demonstrated in this notebook are **applicable to Facebook posts, comments, and other social media text data**.

Dataset characteristics:

* Plain text format

* Multiple sentences and paragraphs

* Used for demonstrating NLP preprocessing and sentiment analysis

In [1]:
import nltk
import re
import string
import numpy as np
import pandas as pd

from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem.porter import PorterStemmer
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer


## 4Ô∏è‚É£ Download Required NLTK Resources

In [2]:
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('vader_lexicon')
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_eng')


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\ASUS\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\ASUS\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\ASUS\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\ASUS\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\ASUS\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\ASUS\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[

True

## 5Ô∏è‚É£ Load and Read the Text Data

In [3]:
with open('kindle.txt', encoding='ISO-8859-2') as f:
    raw_text = f.read()

print("Sample Text:\n")
print(raw_text[:500])


Sample Text:

Drug Runners and  a U.S. Senator have something to do with the Murder http://www.amazon.com/Circumstantial-Evidence-Getting-Florida-Bozarth-ebook/dp/B004FPZ452/ref=pd_rhf_p_t_1 The State Attorney Knows... NOW So Will You. GET Ypur Copy TODAY
Heres a single, to add, to Kindle. Just read this 19th century story: "The Ghost of Round Island". Its about a man (French/American Indian) and his dog sled transporting a woman across the ice, from Mackinac Island to Cheboygan - and the ghost that...
If you


## 6Ô∏è‚É£ Text Preprocessing

In this step, we perform basic cleaning such as:

* Converting text to lowercase

* Removing special characters and digits

In [4]:
clean_text = raw_text.lower()
clean_text = re.sub(r'\d+', '', clean_text)
clean_text = clean_text.translate(str.maketrans('', '', string.punctuation))

print(clean_text[:500])


drug runners and  a us senator have something to do with the murder httpwwwamazoncomcircumstantialevidencegettingfloridabozarthebookdpbfpzrefpdrhfpt the state attorney knows now so will you get ypur copy today
heres a single to add to kindle just read this th century story the ghost of round island its about a man frenchamerican indian and his dog sled transporting a woman across the ice from mackinac island to cheboygan  and the ghost that
if you tire of nonfiction check out httpwwwamazoncomsre


## 7Ô∏è‚É£ Sentence and Word Tokenization

### Sentence Tokenization

In [5]:
sentences = sent_tokenize(clean_text)
print("Sample Sentences:\n")
print(sentences[:5])


Sample Sentences:

['drug runners and  a us senator have something to do with the murder httpwwwamazoncomcircumstantialevidencegettingfloridabozarthebookdpbfpzrefpdrhfpt the state attorney knows now so will you get ypur copy today\nheres a single to add to kindle just read this th century story the ghost of round island its about a man frenchamerican indian and his dog sled transporting a woman across the ice from mackinac island to cheboygan  and the ghost that\nif you tire of nonfiction check out httpwwwamazoncomsrefnbsbnossurlsearchaliasdapsfieldkeywordsdanielleleezwisslerxy\nghost of round island is supposedly nonfiction\nwhy is barnes and nobles version of the kindle so much more expensive than the kindle\nmaria  do you mean the nook  be careful books you buy for the kindle are for that piece of electronics and vice versa  i love my kindle there are people that swear by the nook  they like the color screenme  i want an ereader that is a reader so i dont need color  the kindle batt

### Word Tokenization

In [6]:
tokens = word_tokenize(clean_text)
print("Sample Tokens:\n")
print(tokens[:20])


Sample Tokens:

['drug', 'runners', 'and', 'a', 'us', 'senator', 'have', 'something', 'to', 'do', 'with', 'the', 'murder', 'httpwwwamazoncomcircumstantialevidencegettingfloridabozarthebookdpbfpzrefpdrhfpt', 'the', 'state', 'attorney', 'knows', 'now', 'so']


## 8Ô∏è‚É£ Stemming using Porter Stemmer

Stemming reduces words to their root form.

In [7]:
porter = PorterStemmer()

stemmed_words = [porter.stem(word) for word in tokens[:20]]

for original, stemmed in zip(tokens[:20], stemmed_words):
    print(f"{original} ‚Üí {stemmed}")


drug ‚Üí drug
runners ‚Üí runner
and ‚Üí and
a ‚Üí a
us ‚Üí us
senator ‚Üí senat
have ‚Üí have
something ‚Üí someth
to ‚Üí to
do ‚Üí do
with ‚Üí with
the ‚Üí the
murder ‚Üí murder
httpwwwamazoncomcircumstantialevidencegettingfloridabozarthebookdpbfpzrefpdrhfpt ‚Üí httpwwwamazoncomcircumstantialevidencegettingfloridabozarthebookdpbfpzrefpdrhfpt
the ‚Üí the
state ‚Üí state
attorney ‚Üí attorney
knows ‚Üí know
now ‚Üí now
so ‚Üí so


## 9Ô∏è‚É£ Lemmatization using WordNet Lemmatizer

Lemmatization converts words into meaningful base forms.

In [8]:
lemmatizer = WordNetLemmatizer()

lemmatized_words = [lemmatizer.lemmatize(word) for word in tokens[:20]]

for original, lemma in zip(tokens[:20], lemmatized_words):
    print(f"{original} ‚Üí {lemma}")


drug ‚Üí drug
runners ‚Üí runner
and ‚Üí and
a ‚Üí a
us ‚Üí u
senator ‚Üí senator
have ‚Üí have
something ‚Üí something
to ‚Üí to
do ‚Üí do
with ‚Üí with
the ‚Üí the
murder ‚Üí murder
httpwwwamazoncomcircumstantialevidencegettingfloridabozarthebookdpbfpzrefpdrhfpt ‚Üí httpwwwamazoncomcircumstantialevidencegettingfloridabozarthebookdpbfpzrefpdrhfpt
the ‚Üí the
state ‚Üí state
attorney ‚Üí attorney
knows ‚Üí know
now ‚Üí now
so ‚Üí so


## üîü Part-of-Speech (POS) Tagging

POS tagging assigns grammatical roles to words.

In [9]:
pos_tags = nltk.pos_tag(tokens[:30])
print(pos_tags)


[('drug', 'NN'), ('runners', 'NNS'), ('and', 'CC'), ('a', 'DT'), ('us', 'PRP'), ('senator', 'NN'), ('have', 'VBP'), ('something', 'NN'), ('to', 'TO'), ('do', 'VB'), ('with', 'IN'), ('the', 'DT'), ('murder', 'NN'), ('httpwwwamazoncomcircumstantialevidencegettingfloridabozarthebookdpbfpzrefpdrhfpt', 'VBD'), ('the', 'DT'), ('state', 'NN'), ('attorney', 'NN'), ('knows', 'NNS'), ('now', 'RB'), ('so', 'RB'), ('will', 'MD'), ('you', 'PRP'), ('get', 'VB'), ('ypur', 'JJ'), ('copy', 'NN'), ('today', 'NN'), ('heres', 'VBZ'), ('a', 'DT'), ('single', 'JJ'), ('to', 'TO')]


## 1Ô∏è‚É£1Ô∏è‚É£ Sentiment Analysis using VADER

VADER (Valence Aware Dictionary for Sentiment Reasoning) is suitable for social media text.

In [10]:
sid = SentimentIntensityAnalyzer()


### Sentence-wise Sentiment Scores

In [11]:
sentiment_results = []

for sentence in sentences:
    if sentence.strip():
        scores = sid.polarity_scores(sentence)
        sentiment_results.append(scores['compound'])


## 1Ô∏è‚É£2Ô∏è‚É£ Overall Sentiment Calculation

In [12]:
average_sentiment = np.mean(sentiment_results)
print("Average Sentiment Score:", average_sentiment)


Average Sentiment Score: 1.0


### Sentiment Interpretation

In [13]:
if average_sentiment > 0.05:
    print("Overall Sentiment: Positive")
elif average_sentiment < -0.05:
    print("Overall Sentiment: Negative")
else:
    print("Overall Sentiment: Neutral")


Overall Sentiment: Positive


## 1Ô∏è‚É£3Ô∏è‚É£ Results and Interpretation

The compound sentiment score represents the overall emotional polarity.

* Positive values indicate positive sentiment.

* Negative values indicate negative sentiment.

* Values close to zero represent neutral sentiment.

The dataset analyzed shows an overall sentiment trend based on the average compound score calculated above.

## 1Ô∏è‚É£4Ô∏è‚É£ Conclusion

In this mini project, we successfully implemented a text sentiment analysis pipeline using NLTK. The project demonstrated:

* Text preprocessing techniques

* Tokenization (sentence and word)

* Stemming and lemmatization

* Part-of-speech tagging

* Sentiment analysis using VADER

Although the dataset was not directly sourced from Facebook, the same methodology can be applied to Facebook posts, comments, and reviews with minimal modifications.

This project provides a strong foundation for understanding Natural Language Processing and sentiment analysis concepts.