# Sentiment Analysis & Comparison: TextBlob vs Hugging Face
This notebook loads text reviews from a CSV file and performs sentiment analysis using:
- **TextBlob** (rule-based sentiment analysis)
- **Hugging Face Transformers** (deep learning-based classifier)
- Then compares the two outputs and optionally performs text classification.

## 1. Load Data
Ensure your CSV file contains a column named `'review'` with one review per row.

## Reuse Instructions
This notebook is designed to be reusable for any kind of text-based review dataset.

### How to Adapt It
1. Ensure your CSV file has a column with the text you want to analyze (e.g., `review`, `comment`, `feedback`).
2. Update the file name and column name in the notebook where the data is loaded:
   ```python
   df = pd.read_csv('your_file.csv')
   df = df.dropna(subset=['your_column_name'])
   df.rename(columns={'your_column_name': 'review'}, inplace=True)
   ```
3. You can now reuse the rest of the notebook without changes.
4. For very large datasets, consider applying the Hugging Face classifier on a sample (e.g., `.head(100)`) to avoid timeouts.

### Optional: Replace Zero-Shot Labels
In the final section, change the candidate labels to suit your context:
```python
labels = ['sustainability', 'product design', 'usability', 'warranty']
```

In [None]:
# Install dependencies
!pip install pandas textblob transformers --quiet
import pandas as pd
from textblob import TextBlob
from transformers import pipeline

In [None]:
# Load the CSV file (upload manually or specify path)
# Example: reviews.csv with column: 'review'
# Load your dataset here (make sure to have a column with review text)
df = pd.read_csv('reviews.csv')
# Remove rows where the review is missing
df = df.dropna(subset=['review'])  # remove any missing rows
df.head()

## 2. Sentiment with TextBlob

In [None]:
# Define function using TextBlob to extract polarity-based sentiment
def analyze_textblob(text):
# Get the polarity score: -1 (negative) to +1 (positive)
    polarity = TextBlob(text).sentiment.polarity
    if polarity > 0.1:
        return "POSITIVE"
    elif polarity < -0.1:
        return "NEGATIVE"
    else:
        return "NEUTRAL"

df['sentiment_textblob'] = df['review'].apply(analyze_textblob)
df[['review', 'sentiment_textblob']].head()

## 3. Sentiment with Hugging Face Transformers

In [None]:
# Load the sentiment-analysis pipeline
# Load pre-trained Hugging Face pipeline for sentiment classification
hf_sentiment = pipeline("sentiment-analysis")

# Apply to reviews (may take time)
# Apply sentiment classifier to each review and convert label to uppercase
df['sentiment_huggingface'] = df['review'].apply(lambda x: hf_sentiment(x)[0]['label'].upper())
df[['review', 'sentiment_textblob', 'sentiment_huggingface']].head()

## 4. Compare the Results

In [None]:
# Compare side-by-side
# Compare if both methods agree on the sentiment
df['agree'] = df['sentiment_textblob'] == df['sentiment_huggingface']
agreement_rate = df['agree'].mean()
# Display the result
print(f"TextBlob and Hugging Face agree on {agreement_rate:.2%} of the reviews")
df[['review', 'sentiment_textblob', 'sentiment_huggingface', 'agree']].head(10)

## 5. Text Classification (e.g. topic prediction)

In [None]:
# Load zero-shot-classification pipeline
# Load zero-shot classifier for multi-label prediction
classifier = pipeline("zero-shot-classification")
# Customize these labels depending on the topic of your reviews
labels = ["product quality", "customer service", "delivery", "pricing"]

# Apply to first 5 reviews as demo
for review in df['review'].head(5):
# Apply classifier to the review using candidate labels
    result = classifier(review, candidate_labels=labels)
# Display the result
    print(f"\nReview: {review}\nTop Label: {result['labels'][0]} with score {result['scores'][0]:.2f}")