## Introduction
Welcome to this interactive Jupyter notebook on Sentiment Analysis using product reviews. This exercise will help you learn how to process text data, analyze sentiment, and apply basic NLP techniques.

## Setup
Ensure you have the necessary libraries installed and imported.

In [3]:
%pip install nltk scikit-learn textblob
import nltk
from sklearn.feature_extraction.text import CountVectorizer
nltk.download('punkt')
nltk.download('stopwords')

Collecting textblob
  Downloading textblob-0.18.0.post0-py3-none-any.whl.metadata (4.5 kB)
Downloading textblob-0.18.0.post0-py3-none-any.whl (626 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m626.3/626.3 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0mm
[?25hInstalling collected packages: textblob
Successfully installed textblob-0.18.0.post0
Note: you may need to restart the kernel to use updated packages.


[nltk_data] Downloading package punkt to
[nltk_data]     /Users/timurabdygulov/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/timurabdygulov/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

## Product Reviews
Below is an array of positive and negative product reviews that we will analyze.

In [7]:
reviews = ['I absolutely love this product! Highly recommend to everyone.', "Fantastic quality! I'm very happy with my purchase.", 'This is the best thing I have bought in a long time!', 'Completely satisfied with the product and service.', 'Five stars, will buy again!', 'This product does exactly what it says, fantastic!', 'Incredible performance and very easy to use.', 'I am so pleased with this purchase, worth every penny!', 'Great value for money and quick delivery.', 'The best on the market, hands down!', 'Such a great purchase, very pleased!', 'Product is of high quality and super durable.', 'Surpassed my expectations, absolutely wonderful!', 'This is amazing, I love it so much!', 'The product works wonderfully and is well made.', 'Not what I expected, quite disappointed.', 'The quality is not as advertised, very upset.', 'This was a waste of money, would not buy again.', 'Poor quality and did not meet my expectations.', "I regret buying this, it's awful.", 'Terrible product, do not waste your money!', 'Very unsatisfied with the purchase, it broke within a week.', 'Not worth the price, very misleading.', "The worst purchase I've ever made!", "Disappointed with the product, it's not good at all."]

## Text Cleaning Exercise
Clean the text data by converting to lowercase, removing punctuation, and filtering out stopwords.

In [9]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import string

def clean_text(reviews):
    cleaned_reviews = []
    stop = set(stopwords.words('english') + list(string.punctuation))
    for review in reviews:
        # Tokenize the review
        tokens = word_tokenize(review)
        # Remove stopwords and punctuation
        cleaned_tokens = [token.lower() for token in tokens if token.lower() not in stop]
        cleaned_reviews.append(' '.join(cleaned_tokens))
    return cleaned_reviews

# Clean the reviews
cleaned_reviews = clean_text(reviews)
print(cleaned_reviews)

['absolutely love product highly recommend everyone', "fantastic quality 'm happy purchase", 'best thing bought long time', 'completely satisfied product service', 'five stars buy', 'product exactly says fantastic', 'incredible performance easy use', 'pleased purchase worth every penny', 'great value money quick delivery', 'best market hands', 'great purchase pleased', 'product high quality super durable', 'surpassed expectations absolutely wonderful', 'amazing love much', 'product works wonderfully well made', 'expected quite disappointed', 'quality advertised upset', 'waste money would buy', 'poor quality meet expectations', "regret buying 's awful", 'terrible product waste money', 'unsatisfied purchase broke within week', 'worth price misleading', "worst purchase 've ever made", "disappointed product 's good"]


## Sentiment Analysis Exercise
Perform sentiment analysis using simple word counting. Identify positive and negative words, and classify the reviews based on the counts.

In [11]:
positive_words = ['love', 'fantastic', 'best', 'incredible', 'pleased', 'great', 'amazing', 'high', 'wonderful', 'satisfied']
negative_words = ['disappointed', 'waste', 'poor', 'regret', 'terrible', 'unsatisfied', 'broke', 'worst', 'not']

def analyze_sentiment(reviews):
    results = []
    for review in reviews:
        # Get count of positive and negative words in the review
        tokens = word_tokenize(review.lower())
        positive_count = sum(token in positive_words for token in tokens)
        negative_count = sum(token in negative_words for token in tokens)
        
        # Determine sentiment as positive or negative
        if positive_count > negative_count:
            sentiment = 'Positive'
        elif negative_count > positive_count:
            sentiment = 'Negative'
        else:
            sentiment = 'Neutral'
        
        results.append(sentiment)
    return results


# Analyze the sentiment of cleaned reviews
sentiment_results = analyze_sentiment(cleaned_reviews)
for result in sentiment_results:
    print(result)

# To determine if the reviews are mostly positive or negative
positive_count = sum(result == 'Positive' for result in sentiment_results)
negative_count = sum(result == 'Negative' for result in sentiment_results)

#TODO: Are the reviews mostly positive or negative?
if positive_count > negative_count:
    print("The reviews are mostly positive.")
elif negative_count > positive_count:
    print("The reviews are mostly negative.")
else:
    print("The reviews are balanced between positive and negative.")

Positive
Positive
Positive
Positive
Neutral
Positive
Positive
Positive
Positive
Positive
Positive
Positive
Positive
Positive
Neutral
Negative
Neutral
Negative
Negative
Negative
Negative
Negative
Neutral
Negative
Negative
The reviews are mostly positive.


In [15]:
from textblob import TextBlob

sentiments = []

for review in reviews:
    blob = TextBlob(review)
    # Get the sentiment score (polarity) of the review
    polarity = blob.sentiment.polarity
    
    # Classify the sentiment as positive, negative or neutral
    if polarity > 0:
        sentiment = 'Positive'
    elif polarity < 0:
        sentiment = 'Negative'
    else:
        sentiment = 'Neutral'
    
    sentiments.append(polarity)

for i, review in enumerate(reviews):
    print(f'{review} - Sentiment: {round(sentiments[i], 3)}')
    
#TODO: Calculate the average sentiment score
average_sentiment = sum(sentiments) / len(sentiments)
print(f"Average Sentiment Score: {round(average_sentiment, 3)}")

I absolutely love this product! Highly recommend to everyone. - Sentiment: 0.393
Fantastic quality! I'm very happy with my purchase. - Sentiment: 0.75
This is the best thing I have bought in a long time! - Sentiment: 0.469
Completely satisfied with the product and service. - Sentiment: 0.5
Five stars, will buy again! - Sentiment: 0.0
This product does exactly what it says, fantastic! - Sentiment: 0.375
Incredible performance and very easy to use. - Sentiment: 0.732
I am so pleased with this purchase, worth every penny! - Sentiment: 0.438
Great value for money and quick delivery. - Sentiment: 0.567
The best on the market, hands down! - Sentiment: 0.403
Such a great purchase, very pleased! - Sentiment: 0.537
Product is of high quality and super durable. - Sentiment: 0.247
Surpassed my expectations, absolutely wonderful! - Sentiment: 1.0
This is amazing, I love it so much! - Sentiment: 0.45
The product works wonderfully and is well made. - Sentiment: 1.0
Not what I expected, quite disappo

## Conclusion
Congratulations on completing this exercise! You've learned how to clean text data and perform basic sentiment analysis.