# Sentiment Analysis with NLTK

Sentiment analysis, also known as opinion mining, is a crucial area of NLP that involves determining the sentiment expressed in a piece of text. It has various applications, including social media monitoring, brand reputation management, market research, and customer feedback analysis. NLTK provides powerful tools and techniques to perform sentiment analysis efficiently. NLTK supports multiple approaches for sentiment analysis, including rule-based and machine learning methods. Rule-based approaches rely on predefined sets of linguistic rules or lexicons to determine the sentiment of words or phrases in a text.

One popular rule-based approach is the Vader sentiment analysis tool included in NLTK, which provides a pre-trained model for analyzing sentiment. Machine learning methods leverage labeled datasets to train models that can automatically classify text into positive, negative, or neutral sentiments. NLTK offers functionality to preprocess and prepare data for machine learning classification models. It also provides access to various classifiers like Naive Bayes, Maximum Entropy, and Support Vector Machines for sentiment analysis tasks.

To illustrate an end-to-end example of sentiment analysis using NLTK’s built-in functionalities, let’s consider a scenario where we want to analyze the sentiments expressed in a collection of Twitter tweets about a particular product:

In [1]:
import nltk
from nltk.corpus import stopwords
from nltk.sentiment import SentimentIntensityAnalyzer
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.svm import SVC

# Download necessary NLTK resources (only required once)
nltk.download('vader_lexicon')
nltk.download('stopwords')
nltk.download('wordnet')

# Load labeled data for training a sentiment classifier
# Assumes the data is in the format: tweet,label (e.g., "I love this product,positive")
labeled_data = [
    ("I love this product", "positive"),
    ("This product is terrible", "negative"),
    ("The quality could be better", "neutral"),
    # Add more labeled data here...
]


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Emy\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Emy\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Emy\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [2]:
# Preprocess the labeled data
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

preprocessed_data = []
labels = []

for tweet, label in labeled_data:
    tokens = word_tokenize(tweet.lower())
    filtered_tokens = [token for token in tokens if token not in stop_words]
    lemmatized_tokens = [lemmatizer.lemmatize(token) for token in filtered_tokens]
    preprocessed_tweet = ' '.join(lemmatized_tokens)
    
    preprocessed_data.append(preprocessed_tweet)
    labels.append(label)


In [3]:
# Split the preprocessed data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(preprocessed_data, labels, test_size=0.2, random_state=42)


In [10]:
preprocessed_data

['love product', 'product terrible', 'quality could better']

In [4]:
# Vectorize the preprocessed data using TF-IDF
vectorizer = TfidfVectorizer()
X_train_vectors = vectorizer.fit_transform(X_train)
X_test_vectors = vectorizer.transform(X_test)


In [5]:
# Train a Support Vector Machine (SVM) classifier
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train_vectors, y_train)


In [6]:
# Evaluate the trained classifier on the testing set
y_pred = svm_classifier.predict(X_test_vectors)
classification_report = classification_report(y_test, y_pred)
print("Classification Report:")
print(classification_report)

Classification Report:
              precision    recall  f1-score   support

    negative       0.00      0.00      0.00       0.0
    positive       0.00      0.00      0.00       1.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [7]:
# Sentiment analysis of new, unseen tweets using Vader sentiment analyzer
unseen_tweets = [
    "This product exceeded my expectations!",
    "I'm really disappointed with the customer service.",
    "The price seems fair for the quality.",
    # Add more unseen tweets here...
]

In [8]:
analyzer = SentimentIntensityAnalyzer()

for tweet in unseen_tweets:
    sentiment_scores = analyzer.polarity_scores(tweet)
    print(f"Tweet: {tweet}")
    print(f"Sentiment Scores: {sentiment_scores}")
    print()


Tweet: This product exceeded my expectations!
Sentiment Scores: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

Tweet: I'm really disappointed with the customer service.
Sentiment Scores: {'neg': 0.361, 'neu': 0.639, 'pos': 0.0, 'compound': -0.5256}

Tweet: The price seems fair for the quality.
Sentiment Scores: {'neg': 0.0, 'neu': 0.723, 'pos': 0.277, 'compound': 0.3182}

