## Introduction
Welcome to this interactive Jupyter notebook on Sentiment Analysis using product reviews. This exercise will help you learn how to process text data, analyze sentiment, and apply basic NLP techniques.

## Setup
Ensure you have the necessary libraries installed and imported.

In [1]:
%pip install nltk scikit-learn textblob
import nltk
from sklearn.feature_extraction.text import CountVectorizer
nltk.download('punkt')
nltk.download('stopwords')


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


[nltk_data] Downloading package punkt to
[nltk_data]     /Users/pratikshadange/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/pratikshadange/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

## Product Reviews
Below is an array of positive and negative product reviews that we will analyze.

In [2]:
reviews = ['I absolutely love this product! Highly recommend to everyone.', "Fantastic quality! I'm very happy with my purchase.", 'This is the best thing I have bought in a long time!', 'Completely satisfied with the product and service.', 'Five stars, will buy again!', 'This product does exactly what it says, fantastic!', 'Incredible performance and very easy to use.', 'I am so pleased with this purchase, worth every penny!', 'Great value for money and quick delivery.', 'The best on the market, hands down!', 'Such a great purchase, very pleased!', 'Product is of high quality and super durable.', 'Surpassed my expectations, absolutely wonderful!', 'This is amazing, I love it so much!', 'The product works wonderfully and is well made.', 'Not what I expected, quite disappointed.', 'The quality is not as advertised, very upset.', 'This was a waste of money, would not buy again.', 'Poor quality and did not meet my expectations.', "I regret buying this, it's awful.", 'Terrible product, do not waste your money!', 'Very unsatisfied with the purchase, it broke within a week.', 'Not worth the price, very misleading.', "The worst purchase I've ever made!", "Disappointed with the product, it's not good at all."]

## Text Cleaning Exercise
Clean the text data by converting to lowercase, removing punctuation, and filtering out stopwords.

In [5]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def clean_text(reviews):
    stop_words = set(stopwords.words('english'))
    cleaned_reviews = []
    for review in reviews:
        # Tokenize the review
        tokens = word_tokenize(review)
        # Remove punctuation and stopwords
        cleaned_tokens = [word.lower() for word in tokens if word.isalnum() and word.lower() not in stop_words]
        cleaned_reviews.append(' '.join(cleaned_tokens))
    return cleaned_reviews

# Clean the reviews
cleaned_reviews = clean_text(reviews)
print(cleaned_reviews)

['absolutely love product highly recommend everyone', 'fantastic quality happy purchase', 'best thing bought long time', 'completely satisfied product service', 'five stars buy', 'product exactly says fantastic', 'incredible performance easy use', 'pleased purchase worth every penny', 'great value money quick delivery', 'best market hands', 'great purchase pleased', 'product high quality super durable', 'surpassed expectations absolutely wonderful', 'amazing love much', 'product works wonderfully well made', 'expected quite disappointed', 'quality advertised upset', 'waste money would buy', 'poor quality meet expectations', 'regret buying awful', 'terrible product waste money', 'unsatisfied purchase broke within week', 'worth price misleading', 'worst purchase ever made', 'disappointed product good']


## Sentiment Analysis Exercise
Perform sentiment analysis using simple word counting. Identify positive and negative words, and classify the reviews based on the counts.

In [7]:
positive_words = ['love', 'fantastic', 'best', 'incredible', 'pleased', 'great', 'amazing', 'high', 'wonderful', 'satisfied']
negative_words = ['disappointed', 'waste', 'poor', 'regret', 'terrible', 'unsatisfied', 'broke', 'worst', 'not']

def analyze_sentiment(reviews):
    results = []
    for review in reviews:
        # Get count of positive and negative words in the review
        pos_count = sum(word in review.split() for word in positive_words)
        neg_count = sum(word in review.split() for word in negative_words)
        # Determine sentiment as positive or negative
        if pos_count > neg_count:
            sentiment = "Positive"
        elif neg_count > pos_count:
            sentiment = "Negative"
        else:
            sentiment = "Neutral"  # Consider reviews with equal counts of pos/neg words as neutral
        
        results.append((review, sentiment))
        
    return results

# Analyze the sentiment of cleaned reviews
sentiment_results = analyze_sentiment(cleaned_reviews)
for result in sentiment_results:
    print(result)
    
#TODO: Are the reviews mostly positive or negative?
# Determine if the reviews are mostly positive or negative
positive_reviews = sum(1 for _, sentiment in sentiment_results if sentiment == "Positive")
negative_reviews = sum(1 for _, sentiment in sentiment_results if sentiment == "Negative")

print(f"Positive reviews: {positive_reviews}")
print(f"Negative reviews: {negative_reviews}")

('absolutely love product highly recommend everyone', 'Positive')
('fantastic quality happy purchase', 'Positive')
('best thing bought long time', 'Positive')
('completely satisfied product service', 'Positive')
('five stars buy', 'Neutral')
('product exactly says fantastic', 'Positive')
('incredible performance easy use', 'Positive')
('pleased purchase worth every penny', 'Positive')
('great value money quick delivery', 'Positive')
('best market hands', 'Positive')
('great purchase pleased', 'Positive')
('product high quality super durable', 'Positive')
('surpassed expectations absolutely wonderful', 'Positive')
('amazing love much', 'Positive')
('product works wonderfully well made', 'Neutral')
('expected quite disappointed', 'Negative')
('quality advertised upset', 'Neutral')
('waste money would buy', 'Negative')
('poor quality meet expectations', 'Negative')
('regret buying awful', 'Negative')
('terrible product waste money', 'Negative')
('unsatisfied purchase broke within week', '

In [9]:
from textblob import TextBlob

sentiments = []

for review in reviews:
    blob = TextBlob(review)
    # Get the sentiment score (polarity) of the review
    sentiment_score = blob.sentiment.polarity

    # Classify the sentiment as positive, negative or neutral
    if sentiment_score > 0:
        sentiment = "Positive"
    elif sentiment_score < 0:
        sentiment = "Negative"
    else:
        sentiment = "Neutral"
        
    # Append the sentiment score to the sentiments list
    sentiments.append((sentiment_score, sentiment))
    

for i, review in enumerate(reviews):
    print(f'{review} - Sentiment: {sentiments[i]}')
    
#TODO: Calculate the average sentiment score

average_sentiment_score = sum(score for score, _ in sentiments) / len(sentiments)
print(f'Average Sentiment Score: {average_sentiment_score:.2f}')

I absolutely love this product! Highly recommend to everyone. - Sentiment: (0.3925, 'Positive')
Fantastic quality! I'm very happy with my purchase. - Sentiment: (0.75, 'Positive')
This is the best thing I have bought in a long time! - Sentiment: (0.46875, 'Positive')
Completely satisfied with the product and service. - Sentiment: (0.5, 'Positive')
Five stars, will buy again! - Sentiment: (0.0, 'Neutral')
This product does exactly what it says, fantastic! - Sentiment: (0.375, 'Positive')
Incredible performance and very easy to use. - Sentiment: (0.7316666666666667, 'Positive')
I am so pleased with this purchase, worth every penny! - Sentiment: (0.4375, 'Positive')
Great value for money and quick delivery. - Sentiment: (0.5666666666666667, 'Positive')
The best on the market, hands down! - Sentiment: (0.4027777777777778, 'Positive')
Such a great purchase, very pleased! - Sentiment: (0.5375, 'Positive')
Product is of high quality and super durable. - Sentiment: (0.24666666666666665, 'Posit

## Conclusion
Congratulations on completing this exercise! You've learned how to clean text data and perform basic sentiment analysis.