# Sentiment Analysis of Amazon Product Reviews

This notebook performs sentiment analysis on Amazon product reviews using
Natural Language Processing (NLP) techniques.

The analysis uses:
- **spaCy** for text processing
- **SpacyTextBlob** for sentiment polarity analysis
- **Pandas** for data handling

The goal is to classify product reviews as **positive**, **negative**, or
**neutral**, and to explore similarity between reviews.


### Step 1: Imports and Setup

In [17]:
import pandas as pd
import spacy
import spacytextblob.spacytextblob as spacytextblob_component


### Step 2:  Load spaCy model and add SpacyTextBlob to the pipeline

In [18]:
# Load the spaCy English medium model
nlp = spacy.load("en_core_web_md")

# Add SpacyTextBlob to the spaCy pipeline
nlp.add_pipe("spacytextblob")
nlp.meta["spacytextblob_component"] = (
    spacytextblob_component.SpacyTextBlob.__name__
)


### Step 3: Load the Dataset

In [None]:
# Load the Amazon product reviews dataset
file_path = "Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products_May19.csv"

try:
    dataframe = pd.read_csv(file_path)
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print("Error: Dataset file not found.")


### Step 4: Select and Clean Review Data


In [None]:
# Remove missing values from the reviews column
clean_data = dataframe.dropna(subset=["reviews.text"])

# Select the reviews.text column
reviews_data = clean_data["reviews.text"]

print(f"Total number of reviews after cleaning: {len(reviews_data)}")


Total number of reviews after cleaning: 28332


### 5: Text Preprocessing Function

In [None]:
def preprocess_text(text):
    """
    Cleans review text by removing stop words, punctuation,
    lemmatizing words, and converting text to lowercase.
    """
    doc = nlp(text)

    clean_tokens = [
        token.lemma_.lower().strip()
        for token in doc
        if not token.is_stop and not token.is_punct and token.text.strip()
    ]

    return " ".join(clean_tokens)


### Step 6: Sentiment Analysis Function

In [None]:
def analyze_sentiment(review_text):
    """
    Analyzes sentiment using SpacyTextBlob.

    Returns:
        polarity (float): Sentiment strength (-1 to 1)
        sentiment (str): Positive, Negative, or Neutral
    """
    doc = nlp(review_text)
    polarity = doc._.blob.polarity

    if polarity > 0.1:
        sentiment = "Positive"
    elif polarity < -0.1:
        sentiment = "Negative"
    else:
        sentiment = "Neutral"

    return polarity, sentiment


### Step 7: Test Sentiment on Sample Reviews

In [None]:
print("========== SENTIMENT ANALYSIS SAMPLES ==========")

sample_indices = [0, 50, 100]

for idx in sample_indices:
    if idx >= len(reviews_data):
        continue

    original_review = reviews_data.iloc[idx]
    processed_review = preprocess_text(original_review)

    polarity, sentiment = analyze_sentiment(processed_review)

    print(f"\nReview Index: {idx}")
    print(f"Original Review: {original_review[:120]}...")
    print(f"Sentiment: {sentiment}")
    print(f"Polarity Score: {polarity:.2f}")



Review Index: 0
Original Review: I order 3 of them and one of the item is bad quality. Is missing backup spring so I have to put a pcs of aluminum to mak...
Sentiment: Negative
Polarity Score: -0.70

Review Index: 50
Original Review: I definitely love the price and quantity.. My kids go tthrough them to fast. At least these will last a while.....
Sentiment: Positive
Polarity Score: 0.35

Review Index: 100
Original Review: As a teacher, I need tons of batteries, but I refused to spend excessive amounts on them, so I figured this was the best...
Sentiment: Positive
Polarity Score: 0.27


### Step 8: Review Similarity Comparison

In [None]:
print("\n========== REVIEW SIMILARITY ==========")

if len(reviews_data) >= 2:
    review_a = nlp(preprocess_text(reviews_data.iloc[0]))
    review_b = nlp(preprocess_text(reviews_data.iloc[1]))

    similarity_score = review_a.similarity(review_b)

    print(
        "Similarity score between Review 0 and Review 1: "
        f"{similarity_score:.2f}"
    )



Similarity score between Review 0 and Review 1: 0.75


### Step 9: Improvements and Next Steps

To improve the separation between neutral and polarized reviews, refine the polarity threshold by narrowing the neutral band around zero (for example, **-0.05 to 0.05**) and validating the choice against labeled samples. Because this approach is rule-based, it may miss sarcasm and domain-specific nuance. A strong next step is to evaluate transformer-based models (e.g., BERT). Hugging Face provides pretrained models that can be fine-tuned with a classification layer on labeled data to predict sentiment or emotions. See: https://huggingface.co/blog/bert-101