# Sentiment Analysis with Python

This notebook demonstrates how to perform sentiment analysis using different approaches and libraries.

## What you'll learn:
- Rule-based sentiment analysis with TextBlob
- Machine learning approach with scikit-learn
- Using pre-trained models with transformers
- Evaluating sentiment analysis results

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from textblob import TextBlob
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")

## Sample Data
Let's create some sample movie reviews for demonstration.

In [None]:
# Sample movie reviews
sample_reviews = [
    "This movie was absolutely fantastic! Great acting and amazing story.",
    "Terrible film. Waste of time and money. Really disappointing.",
    "Not bad, but could have been better. Average performance.",
    "Outstanding cinematography and brilliant performances by all actors.",
    "Boring and predictable. I fell asleep halfway through.",
    "Loved every minute of it! Highly recommend to everyone.",
    "The plot was confusing and the acting was poor.",
    "A masterpiece! This will be remembered for years to come.",
    "Okay movie, nothing special but watchable.",
    "Worst movie I've ever seen. Completely awful."
]

# Expected sentiments (for comparison)
expected_sentiments = ['positive', 'negative', 'neutral', 'positive', 'negative', 
                      'positive', 'negative', 'positive', 'neutral', 'negative']

# Create DataFrame
df = pd.DataFrame({
    'review': sample_reviews,
    'expected_sentiment': expected_sentiments
})

print("Sample reviews:")
for i, review in enumerate(sample_reviews):
    print(f"{i+1}. {review}")

## Method 1: Rule-based Sentiment Analysis with TextBlob
TextBlob provides a simple API for sentiment analysis.

In [None]:
def get_textblob_sentiment(text):
    """
    Get sentiment using TextBlob.
    Returns polarity (-1 to 1) and subjectivity (0 to 1)
    """
    blob = TextBlob(text)
    return blob.sentiment.polarity, blob.sentiment.subjectivity

def classify_sentiment(polarity):
    """
    Classify sentiment based on polarity score.
    """
    if polarity > 0.1:
        return 'positive'
    elif polarity < -0.1:
        return 'negative'
    else:
        return 'neutral'

# Apply TextBlob sentiment analysis
df[['polarity', 'subjectivity']] = df['review'].apply(
    lambda x: pd.Series(get_textblob_sentiment(x))
)
df['textblob_sentiment'] = df['polarity'].apply(classify_sentiment)

print("TextBlob Sentiment Analysis Results:")
print(df[['review', 'polarity', 'subjectivity', 'textblob_sentiment', 'expected_sentiment']].to_string())

## Visualizing Sentiment Scores

In [None]:
# Create visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Polarity distribution
axes[0, 0].hist(df['polarity'], bins=10, color='skyblue', alpha=0.7)
axes[0, 0].set_title('Distribution of Polarity Scores')
axes[0, 0].set_xlabel('Polarity')
axes[0, 0].set_ylabel('Frequency')

# Subjectivity distribution
axes[0, 1].hist(df['subjectivity'], bins=10, color='lightcoral', alpha=0.7)
axes[0, 1].set_title('Distribution of Subjectivity Scores')
axes[0, 1].set_xlabel('Subjectivity')
axes[0, 1].set_ylabel('Frequency')

# Sentiment comparison
sentiment_comparison = pd.crosstab(df['expected_sentiment'], df['textblob_sentiment'])
sns.heatmap(sentiment_comparison, annot=True, fmt='d', cmap='Blues', ax=axes[1, 0])
axes[1, 0].set_title('Expected vs TextBlob Sentiment')

# Polarity vs Subjectivity scatter
colors = {'positive': 'green', 'negative': 'red', 'neutral': 'gray'}
for sentiment in df['expected_sentiment'].unique():
    mask = df['expected_sentiment'] == sentiment
    axes[1, 1].scatter(df[mask]['polarity'], df[mask]['subjectivity'], 
                      c=colors[sentiment], label=sentiment, alpha=0.7)
axes[1, 1].set_xlabel('Polarity')
axes[1, 1].set_ylabel('Subjectivity')
axes[1, 1].set_title('Polarity vs Subjectivity')
axes[1, 1].legend()

plt.tight_layout()
plt.show()

## Method 2: Machine Learning Approach
Using TF-IDF features with Logistic Regression.

In [None]:
# For this example, let's create a larger synthetic dataset
positive_samples = [
    "Excellent movie with great acting",
    "Fantastic story and amazing visuals",
    "Loved it! Highly recommended",
    "Outstanding performance by all actors",
    "Brilliant cinematography and direction",
    "Best movie of the year",
    "Wonderful experience, great entertainment",
    "Perfect blend of action and emotion",
    "Superb acting and engaging plot",
    "Amazing special effects and sound"
]

negative_samples = [
    "Terrible movie, waste of time",
    "Boring and predictable plot",
    "Poor acting and bad direction",
    "Worst movie I've ever seen",
    "Disappointing and confusing story",
    "Awful script and terrible acting",
    "Complete disaster, avoid at all costs",
    "Poorly made with bad special effects",
    "Dull and uninteresting characters",
    "Failed to meet any expectations"
]

neutral_samples = [
    "Average movie, nothing special",
    "Okay film, could be better",
    "Not bad but not great either",
    "Decent enough for one viewing",
    "Mixed feelings about this movie",
    "Standard Hollywood production",
    "Watchable but forgettable",
    "Mediocre story with average acting",
    "Neither good nor bad",
    "Typical movie of this genre"
]

# Create training dataset
train_texts = positive_samples + negative_samples + neutral_samples
train_labels = ['positive'] * len(positive_samples) + ['negative'] * len(negative_samples) + ['neutral'] * len(neutral_samples)

print(f"Training dataset: {len(train_texts)} samples")
print(f"Positive: {len(positive_samples)}, Negative: {len(negative_samples)}, Neutral: {len(neutral_samples)}")

In [None]:
# Create TF-IDF features
vectorizer = TfidfVectorizer(max_features=1000, stop_words='english', lowercase=True)
X = vectorizer.fit_transform(train_texts)
y = train_labels

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# Train logistic regression model
model = LogisticRegression(random_state=42, max_iter=1000)
model.fit(X_train, y_train)

# Make predictions on test set
y_pred = model.predict(X_test)

print("Model Training Complete!")
print(f"Training set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")
print(f"Number of features: {X_train.shape[1]}")

In [None]:
# Evaluate the model
print("Classification Report:")
print(classification_report(y_test, y_pred))

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred, labels=['positive', 'negative', 'neutral'])
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['positive', 'negative', 'neutral'],
            yticklabels=['positive', 'negative', 'neutral'])
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

In [None]:
# Test the model on our original sample reviews
sample_features = vectorizer.transform(df['review'])
df['ml_sentiment'] = model.predict(sample_features)
df['ml_confidence'] = model.predict_proba(sample_features).max(axis=1)

print("ML Model Results on Sample Reviews:")
print(df[['review', 'expected_sentiment', 'textblob_sentiment', 'ml_sentiment', 'ml_confidence']].to_string())

## Method 3: Using Pre-trained Transformers (Optional)
This section uses the transformers library for state-of-the-art results.

In [None]:
# Note: This requires the transformers library to be installed
# pip install transformers torch

try:
    from transformers import pipeline
    
    # Load pre-trained sentiment analysis pipeline
    sentiment_pipeline = pipeline("sentiment-analysis")
    
    # Test on a few examples
    test_texts = df['review'].tolist()[:5]  # First 5 reviews
    
    results = sentiment_pipeline(test_texts)
    
    print("Transformer Model Results:")
    for text, result in zip(test_texts, results):
        print(f"Review: {text[:50]}...")
        print(f"Sentiment: {result['label']}, Confidence: {result['score']:.3f}")
        print("-" * 50)
        
except ImportError:
    print("Transformers library not installed. To use this section, install with:")
    print("pip install transformers torch")
except Exception as e:
    print(f"Error loading transformer model: {e}")
    print("This might be due to internet connectivity or model loading issues.")

## Comparing All Methods

In [None]:
# Calculate accuracy for each method
def calculate_accuracy(expected, predicted):
    return sum(1 for e, p in zip(expected, predicted) if e == p) / len(expected)

textblob_accuracy = calculate_accuracy(df['expected_sentiment'], df['textblob_sentiment'])
ml_accuracy = calculate_accuracy(df['expected_sentiment'], df['ml_sentiment'])

print("Method Comparison:")
print(f"TextBlob Accuracy: {textblob_accuracy:.2f}")
print(f"ML Model Accuracy: {ml_accuracy:.2f}")

# Visualization
methods = ['TextBlob', 'ML Model']
accuracies = [textblob_accuracy, ml_accuracy]

plt.figure(figsize=(8, 6))
bars = plt.bar(methods, accuracies, color=['skyblue', 'lightcoral'])
plt.title('Sentiment Analysis Method Comparison')
plt.ylabel('Accuracy')
plt.ylim(0, 1)

# Add value labels on bars
for bar, acc in zip(bars, accuracies):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
             f'{acc:.2f}', ha='center', va='bottom')

plt.show()

## Interactive Sentiment Analysis Function

In [None]:
def analyze_sentiment(text, method='all'):
    """
    Analyze sentiment of given text using different methods.
    
    Args:
        text (str): Text to analyze
        method (str): 'textblob', 'ml', or 'all'
    
    Returns:
        dict: Results from different methods
    """
    results = {}
    
    if method in ['textblob', 'all']:
        polarity, subjectivity = get_textblob_sentiment(text)
        results['textblob'] = {
            'sentiment': classify_sentiment(polarity),
            'polarity': polarity,
            'subjectivity': subjectivity
        }
    
    if method in ['ml', 'all']:
        text_features = vectorizer.transform([text])
        prediction = model.predict(text_features)[0]
        confidence = model.predict_proba(text_features).max()
        results['ml'] = {
            'sentiment': prediction,
            'confidence': confidence
        }
    
    return results

# Test the function
test_text = "This is an amazing product! I love it so much!"
result = analyze_sentiment(test_text)

print(f"Text: {test_text}")
print(f"TextBlob: {result['textblob']['sentiment']} (polarity: {result['textblob']['polarity']:.3f})")
print(f"ML Model: {result['ml']['sentiment']} (confidence: {result['ml']['confidence']:.3f})")

## Practice Exercise

Try analyzing sentiment for your own text examples:

In [None]:
# Add your own texts here and analyze their sentiment
your_texts = [
    "I had a wonderful day at the beach!",
    "The weather is so gloomy today.",
    "This restaurant serves decent food."
]

print("Your Text Analysis:")
print("=" * 50)

for text in your_texts:
    result = analyze_sentiment(text)
    print(f"Text: {text}")
    print(f"TextBlob: {result['textblob']['sentiment']} (polarity: {result['textblob']['polarity']:.3f})")
    print(f"ML Model: {result['ml']['sentiment']} (confidence: {result['ml']['confidence']:.3f})")
    print("-" * 50)

## Key Takeaways

1. **TextBlob** is simple and good for quick analysis, but may not be very accurate
2. **Machine Learning** approaches require training data but can be more accurate
3. **Pre-trained transformers** offer state-of-the-art performance but require more resources
4. **Context matters** - the same words can have different sentiments in different contexts
5. **Evaluation is important** - always test your sentiment analysis system on relevant data

Choose the method that best fits your needs based on accuracy requirements, computational resources, and data availability.