# #**Tools & Techniques for Performing Sentiment Analysis**

# There are various tools and techniques used to perform Sentiment Analysis, ranging from rule-based methods to machine learning and deep learning models.

 # 1️⃣ Rule-Based Techniques (Lexicon-Based)

These methods use predefined word lists (lexicons) to determine sentiment based on words and their associated scores.

Popular Tools & Libraries:

✔ VADER (Valence Aware Dictionary and sEntiment Reasoner) – Best for social media and short text.

✔ TextBlob – Simple library for NLP tasks, including sentiment analysis.

✔ SentiWordNet – Sentiment dictionary based on WordNet.

# 1a. VADER

In [31]:
# Install package
!pip install vaderSentiment




In [33]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# instantiate the vader model
analyzer = SentimentIntensityAnalyzer()
# text example
text = "I absolutely love this phone! It's amazing!"
# fit the model
score = analyzer.polarity_scores(text)

print(f"The score is:", score)


The score is: {'neg': 0.0, 'neu': 0.36, 'pos': 0.64, 'compound': 0.8713}


# Comment:

* The text is highly positive with a compound score of 0.8713.


# 1b. TextBlob

In [34]:
#install package
!pip install textblob



In [38]:
from textblob import TextBlob

# Example text
text = "I absolutely do not like this food! It's awful!"
blob = TextBlob(text)
sentiment = blob.sentiment

print(sentiment)

# Access polarity and subjectivity
polarity = sentiment.polarity
subjectivity = sentiment.subjectivity

print(f"The Polarity is : {polarity}")
print(f"The Subjectivity is : {subjectivity}")


Sentiment(polarity=-0.5625, subjectivity=0.95)
The Polarity is : -0.5625
The Subjectivity is : 0.95


# Comment:

  * Polarity -0.5625 → Negative sentiment (because of words like "awful").
  * Subjectivity 0.95 → Highly opinionated (not a neutral or factual statement).

The statement is negative but highly subjective.

# 1c. sentiwordnet


In [39]:
!pip install nltk



In [41]:
# import nltk
# nltk.download('wordnet')
# import nltk
# nltk.data.path.append('/usr/local/nltk_data')
# nltk.download('wordnet', download_dir='/usr/local/nltk_data')
# nltk.download('omw-1.4')  # Open Multilingual WordNet
# nltk.download('punkt')  # Tokenizer models
# !mkdir -p /root/nltk_data
# !python -m nltk.downloader -d /root/nltk_data wordnet

In [40]:
import nltk
nltk.download('sentiwordnet')
from nltk.corpus import sentiwordnet as swn

# Example usage
word = "happy"
synsets = list(swn.senti_synsets(word))

if synsets:
    synset = synsets[0]  # Get the first synset
    print(f"Word: {word}")
    print(f"Positive score: {synset.pos_score()}")
    print(f"Negative score: {synset.neg_score()}")
    print(f"Objective score: {synset.obj_score()}")
else:
    print(f"No synsets found for '{word}' in SentiWordNet.")


Word: happy
Positive score: 0.875
Negative score: 0.0
Objective score: 0.125


[nltk_data] Downloading package sentiwordnet to /root/nltk_data...
[nltk_data]   Package sentiwordnet is already up-to-date!


# Comment:

* The text is highly positive with a postive score of 0.875.


 # When to use?

* When you need a fast and lightweight approach.


* Suitable for social media, customer reviews, and news articles.




#  2️⃣  Machine Learning-Based Techniques*

These methods use supervised learning to classify text into sentiment categories.

Popular Machine Learning Algorithms:

✔ Naïve Bayes (NB) – Works well for text classification.

✔ Logistic Regression – A simple and effective baseline model.

✔ Support Vector Machines (SVM) – Used for high-dimensional data.

✔ Random Forest (RF) – Can be used for feature-rich text classification.

Popular Libraries:

✔ Scikit-learn – Used for text vectorization (TF-IDF, CountVectorizer) and training ML models.

✔ NLTK (Natural Language Toolkit) – Provides NLP preprocessing tools.



# 2a.  Logistic Regression

In [43]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression

# Sample data
texts = ["I love this product!", "This is the worst purchase ever."]
labels = [1, 0]  # 1 = Positive, 0 = Negative

# Convert text to numerical features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Train a simple logistic regression model
model = LogisticRegression()
model.fit(X, labels)

# Predict sentiment
test_text = ["I hate this service!"]
X_test = vectorizer.transform(test_text)
print(f"The prediction is", model.predict(X_test))


The prediction is [1]


# Comment
The model predicted 1 (Positive) instead of 0 (Negative) for the test_text "I hate this service!". This could be due to the small dataset, lack of training data, or how the CountVectorizer is handling words.

- To resolve this:

  *     1. Lack of Training Data -  Add more training samples

  *     2. Vocabulary Size & Feature Representation -  Use TfidfVectorizer Instead TF-IDF gives better weight to important words.

  *     3. Model Regularization Bias - Reduce Regularization (C parameter)

  See improved code below:




In [44]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

# Expanded dataset
texts = [
    "I love this product!", "This is the worst purchase ever.",
    "Amazing quality, very satisfied!", "Terrible service, not recommended.",
    "I am happy with this.", "I hate this service!",
    "This is fantastic!", "Awful experience."
]
labels = [1, 0, 1, 0, 1, 0, 1, 0]  # Positive (1) and Negative (0)

# Convert text to numerical features
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)

# Train logistic regression
model = LogisticRegression(C=10.0)  # Reduce regularization
model.fit(X, labels)

# Predict sentiment for a test text
test_text = ["I hate this service!"]
X_test = vectorizer.transform(test_text)
print(f"The prediction is;", model.predict(X_test))


The prediction is; [0]


# Comment:

* The prediction is '0' , the statment is negative.


# 💡 When to use?

  * When you have labeled training data.
  
  * When you need a customized sentiment analysis model.

# 3️⃣ Deep Learning-Based Techniques

These models use neural networks to improve accuracy, especially for complex text.

Popular Deep Learning Models:

✔ Recurrent Neural Networks (RNN) – Good for sequential text data.

✔ Long Short-Term Memory (LSTM) – Handles long-term dependencies in text.

✔ Transformers (BERT, GPT-3, RoBERTa) – State-of-the-art models for NLP.

Popular Deep Learning Libraries:

✔ TensorFlow & Keras – Used for building deep learning models.

✔ PyTorch – Used for training transformers like BERT.

✔ Hugging Face Transformers – Pre-trained models for sentiment analysis.


# 3a. Hugging Face's Pre-trained BERT Model  

In [26]:
from transformers import pipeline

sentiment_model = pipeline("sentiment-analysis")
result = sentiment_model("I am extremely happy with this service!")
print("-------------------------")
print("The result is")
print(result)


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


-------------------------
The result is
[{'label': 'POSITIVE', 'score': 0.9998718500137329}]


# Comment:

* The statement is label 'positive' with a high score of 0.99.


#  💡 When to use?

  * When you need high accuracy and can afford more computational power.
  
  * When working on large-scale applications.

# 4️⃣ Hybrid Approaches (Lexicon + Machine Learning)
Some models combine rule-based and machine learning methods for better results.

💡 Example:

* First, use VADER to detect sentiment scores.

* Then, train a Logistic Regression model for fine-tuning.

In [47]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Initialize VADER sentiment analyzer
analyzer = SentimentIntensityAnalyzer()

# Sample text for prediction
test_text = "This is a great product!"

# Use VADER for initial scoring
vader_score = analyzer.polarity_scores(test_text)['compound']

# Use ML model for fine-tuning sentiment classification
X_test = vectorizer.transform([test_text])
ml_prediction = model.predict(X_test)[0]  # Get scalar value

# Final classification
if vader_score > 0.5 and ml_prediction == 1:
    sentiment = "Highly Positive"
elif ml_prediction == 1:
    sentiment = "Positive"
elif ml_prediction == 0:
    sentiment = "Negative"
else:
    sentiment = "Neutral"

print(f"VADER Score: {vader_score}, ML Prediction: {ml_prediction}, Final Sentiment: {sentiment}")


VADER Score: 0.6588, ML Prediction: 1, Final Sentiment: Highly Positive


# Comment:

* The stament is highly positive with a score of 0.65 and ML Prediction = 1.
