# Amazon Product Reviews NLP Analysis with spaCy
## NER and Sentiment Analysis

This Jupyter notebook provides a detailed walkthrough of the `npl_spacify_assignment.py` script, which performs Named Entity Recognition (NER) and Sentiment Analysis on Amazon product reviews using spaCy and other NLP libraries.

The analysis includes:
- **Entity Extraction**: Identifying product names, brands, and organizations from review text.
- **Sentiment Analysis**: Determining the sentiment (positive, negative, neutral) of reviews using both TextBlob and VADER.
- **Visualization**: Displaying extracted entities in the text.
- **Batch Analysis**: Processing multiple reviews and summarizing results.

We'll break down the code step-by-step, explaining each component and demonstrating its functionality.

## Step 1: Importing Required Libraries

First, we import all necessary libraries for NLP processing, data manipulation, and sentiment analysis.

- `spacy`: Core NLP library for tokenization, POS tagging, and NER.
- `pandas`: For data manipulation and creating DataFrames.
- `nltk`: Natural Language Toolkit, used here for VADER sentiment analysis.
- `displacy`: spaCy's visualization tool for entities.
- `Matcher`: For pattern-based entity matching.
- `TextBlob`: Rule-based sentiment analysis.
- `SentimentIntensityAnalyzer`: VADER sentiment analyzer from NLTK.

In [1]:
# Import essential libraries for Natural Language Processing
import spacy  # Core NLP library for tokenization, POS tagging, and Named Entity Recognition
import pandas as pd  # Data manipulation library for creating and managing DataFrames
import nltk  # Natural Language Toolkit for various NLP tasks

# Import specific components from spaCy
from spacy import displacy  # Visualization tool for displaying entities and dependencies
from spacy.matcher import Matcher  # Pattern-based entity matching for custom entity recognition

# Import sentiment analysis libraries
from textblob import TextBlob  # Rule-based sentiment analysis library
from nltk.sentiment.vader import SentimentIntensityAnalyzer  # VADER sentiment analyzer (optimized for social media/reviews)

# Download required NLTK data (run once)
# VADER lexicon contains sentiment scores for words, essential for VADER sentiment analysis
try:
    nltk.download('vader_lexicon')
    print("VADER lexicon downloaded successfully")
except:
    print("NLTK data already downloaded or download failed")

VADER lexicon downloaded successfully


[nltk_data] Downloading package vader_lexicon to C:\Users\Peter
[nltk_data]     Mwaura\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


## Step 2: Defining the AmazonReviewAnalyzer Class

We define a class `AmazonReviewAnalyzer` that encapsulates all the analysis functionality. This class will handle:
- Loading the spaCy model.
- Setting up pattern matching for product terms.
- Extracting entities.
- Analyzing sentiment.
- Visualizing results.

Let's start with the class initialization.

In [2]:
class AmazonReviewAnalyzer:
    """
    A comprehensive NLP analyzer for Amazon product reviews.
    
    This class combines Named Entity Recognition (NER) and Sentiment Analysis
    to extract meaningful insights from product review text. It uses spaCy for
    entity extraction and both TextBlob and VADER for sentiment analysis.
    """
    
    def __init__(self):
        """
        Initialize the analyzer with spaCy model and sentiment analyzer.
        
        This constructor:
        1. Loads the spaCy English model (en_core_web_sm)
        2. Initializes VADER sentiment analyzer
        3. Sets up custom pattern matcher for product recognition
        """
        # Load spaCy's English language model
        # en_core_web_sm is a small English model with NER, POS tagging, and parsing
        try:
            self.nlp = spacy.load("en_core_web_sm")
            print("spaCy model loaded successfully")
        except OSError:
            print("Please download spaCy model first: python -m spacy download en_core_web_sm")
            return
        
        # Initialize VADER sentiment analyzer
        # VADER is specifically designed for social media text and reviews
        self.sentiment_analyzer = SentimentIntensityAnalyzer()
        
        # Set up custom pattern matcher for enhanced product recognition
        self._setup_matcher()
    
    def _setup_matcher(self):
        """
        Setup pattern matcher for enhanced product recognition.
        
        This method creates custom patterns to identify common product terms
        that might not be caught by spaCy's built-in NER. It focuses on
        technology products commonly mentioned in Amazon reviews.
        """
        # Create a Matcher object using spaCy's vocabulary
        self.matcher = Matcher(self.nlp.vocab)
        
        # Define patterns for common product types in Amazon reviews
        # Each pattern is a list of token attributes to match
        product_patterns = [
            # Apple products
            [{"LOWER": "iphone"}], [{"LOWER": "macbook"}], 
            [{"LOWER": "ipad"}], [{"LOWER": "airpods"}],
            
            # Amazon products
            [{"LOWER": "kindle"}], [{"LOWER": "echo"}],
            [{"LOWER": "fire"}, {"LOWER": "tv"}],  # Fire TV
            
            # Samsung products
            [{"LOWER": "galaxy"}, {"LOWER": "phone"}],
            [{"LOWER": "galaxy"}, {"LOWER": "s"}],  # Galaxy S series
            
            # Google products
            [{"LOWER": "pixel"}, {"LOWER": "phone"}],
            
            # Gaming consoles
            [{"LOWER": "playstation"}], [{"LOWER": "xbox"}],
            [{"LOWER": "nintendo"}, {"LOWER": "switch"}]
        ]
        
        # Add the patterns to the matcher with a label "PRODUCT_TERMS"
        self.matcher.add("PRODUCT_TERMS", product_patterns)
    
    def extract_entities(self, review_text):
        """
        Extract product names and brands from review text using spaCy NER.
        
        This method uses spaCy's built-in Named Entity Recognition to identify
        organizations (ORG) and products (PRODUCT) in the review text.
        
        Args:
            review_text (str): The review text to analyze
            
        Returns:
            list: List of dictionaries containing entity information
        """
        # Process the text with spaCy's NLP pipeline
        doc = self.nlp(review_text)
        entities = []
        
        # Extract entities using spaCy's built-in NER
        # doc.ents contains all recognized entities
        for ent in doc.ents:
            # Filter for relevant entity types
            if ent.label_ in ["ORG", "PRODUCT"]:
                entities.append({
                    'text': ent.text,        # The actual text of the entity
                    'label': ent.label_,     # The entity type (ORG, PRODUCT, etc.)
                    'start': ent.start_char, # Character start position
                    'end': ent.end_char      # Character end position
                })
        
        return entities
    
    def extract_entities_enhanced(self, review_text):
        """
        Enhanced entity extraction combining spaCy NER and pattern matching.
        
        This method combines:
        1. Custom pattern matching for specific product terms
        2. spaCy's built-in NER for general entities
        3. Geographic entities (GPE) for location mentions
        
        Args:
            review_text (str): The review text to analyze
            
        Returns:
            list: List of dictionaries containing entity information
        """
        # Process the text with spaCy's NLP pipeline
        doc = self.nlp(review_text)
        entities = []
        
        # Use pattern matcher for product recognition
        # This finds custom patterns we defined in _setup_matcher
        matches = self.matcher(doc)
        for match_id, start, end in matches:
            span = doc[start:end]  # Get the matched text span
            entities.append({
                'text': span.text,
                'label': 'PRODUCT',  # Custom label for our patterns
                'start': start,      # Token start position
                'end': end           # Token end position
            })
        
        # Add spaCy's built-in entities
        for ent in doc.ents:
            # Include organizations, products, and geographic entities
            if ent.label_ in ["ORG", "PRODUCT", "GPE"]:
                # Avoid duplicates by checking if entity already exists
                if not any(e['text'] == ent.text and e['label'] == ent.label_ for e in entities):
                    entities.append({
                        'text': ent.text,
                        'label': ent.label_,
                        'start': ent.start_char,
                        'end': ent.end_char
                    })
        
        return entities
    
    def analyze_sentiment_textblob(self, review_text):
        """
        Analyze sentiment using TextBlob (rule-based approach).
        
        TextBlob uses a rule-based approach to sentiment analysis, calculating
        polarity (-1 to 1) and subjectivity (0 to 1) scores.
        
        Args:
            review_text (str): The review text to analyze
            
        Returns:
            dict: Dictionary containing sentiment, polarity, and subjectivity
        """
        # Create TextBlob object for sentiment analysis
        blob = TextBlob(review_text)
        polarity = blob.sentiment.polarity  # Range: -1 (negative) to 1 (positive)
        
        # Classify sentiment based on polarity thresholds
        if polarity > 0.1:
            sentiment = "POSITIVE"
        elif polarity < -0.1:
            sentiment = "NEGATIVE"
        else:
            sentiment = "NEUTRAL"
        
        return {
            'sentiment': sentiment,
            'polarity': polarity,
            'subjectivity': blob.sentiment.subjectivity  # Range: 0 (objective) to 1 (subjective)
        }
    
    def analyze_sentiment_vader(self, review_text):
        """
        Analyze sentiment using VADER (Valence Aware Dictionary and sEntiment Reasoner).
        
        VADER is specifically designed for social media text and reviews. It considers:
        - Punctuation (e.g., "good!!!" vs "good.")
        - Capitalization (e.g., "GOOD" vs "good")
        - Degree modifiers (e.g., "very good" vs "good")
        
        Args:
            review_text (str): The review text to analyze
            
        Returns:
            dict: Dictionary containing sentiment and detailed scores
        """
        # Get sentiment scores from VADER
        scores = self.sentiment_analyzer.polarity_scores(review_text)
        compound_score = scores['compound']  # Overall sentiment score (-1 to 1)
        
        # Classify sentiment based on compound score thresholds
        if compound_score >= 0.05:
            sentiment = "POSITIVE"
        elif compound_score <= -0.05:
            sentiment = "NEGATIVE"
        else:
            sentiment = "NEUTRAL"
        
        return {
            'sentiment': sentiment,
            'compound_score': compound_score,
            'scores': scores  # Contains pos, neg, neu, and compound scores
        }
    
    def analyze_review_complete(self, review_text, use_enhanced_ner=True):
        """
        Complete analysis: NER + Sentiment Analysis.
        
        This method combines entity extraction and sentiment analysis into
        a single comprehensive analysis of the review text.
        
        Args:
            review_text (str): The review text to analyze
            use_enhanced_ner (bool): Whether to use enhanced NER or basic NER
            
        Returns:
            dict: Complete analysis results including entities and sentiment
        """
        # Extract entities using chosen method
        if use_enhanced_ner:
            entities = self.extract_entities_enhanced(review_text)
        else:
            entities = self.extract_entities(review_text)
        
        # Analyze sentiment using VADER (better performance on reviews)
        sentiment_result = self.analyze_sentiment_vader(review_text)
        
        return {
            'review_text': review_text,
            'entities': entities,
            'sentiment': sentiment_result
        }
    
    def visualize_entities(self, review_text):
        """
        Visualize named entities in the review text.
        
        This method uses spaCy's displacy to render entities in the text.
        Note: This works best in Jupyter notebooks with proper HTML rendering.
        
        Args:
            review_text (str): The review text to visualize
        """
        # Process text and render entities
        doc = self.nlp(review_text)
        displacy.render(doc, style="ent", jupyter=False)
    
    def analyze_multiple_reviews(self, reviews):
        """
        Analyze multiple reviews and return results as DataFrame.
        
        This method processes a list of reviews and returns a pandas DataFrame
        with summarized results for easy analysis and visualization.
        
        Args:
            reviews (list): List of review texts to analyze
            
        Returns:
            pd.DataFrame: DataFrame with analysis results for each review
        """
        results = []
        
        # Process each review
        for review in reviews:
            analysis = self.analyze_review_complete(review)
            
            # Extract key information for DataFrame
            results.append({
                'review': review,
                'sentiment': analysis['sentiment']['sentiment'],
                'sentiment_score': analysis['sentiment']['compound_score'],
                'entities': [f"{ent['text']} ({ent['label']})" for ent in analysis['entities']],
                'entity_count': len(analysis['entities'])
            })
        
        # Convert results to pandas DataFrame for easy analysis
        return pd.DataFrame(results)

### Step 2.1: Setting Up the Pattern Matcher

The `_setup_matcher` method creates a `Matcher` object to identify common product terms that might not be caught by spaCy's built-in NER. This enhances entity recognition for specific product categories like phones, tablets, and gaming consoles.

In [None]:
# The _setup_matcher method is now included in the main class definition above

### Step 2.2: Basic Entity Extraction

The `extract_entities` method uses spaCy's built-in Named Entity Recognition to identify organizations (ORG) and products (PRODUCT) in the review text.

In [None]:
# The extract_entities method is now included in the main class definition above

### Step 2.3: Enhanced Entity Extraction

The `extract_entities_enhanced` method combines spaCy's NER with the custom pattern matcher for more comprehensive entity recognition, including geographic entities (GPE).

In [None]:
# The extract_entities_enhanced method is now included in the main class definition above

### Step 2.4: Sentiment Analysis with TextBlob

The `analyze_sentiment_textblob` method uses TextBlob for rule-based sentiment analysis, providing polarity and subjectivity scores.

In [None]:
# The analyze_sentiment_textblob method is now included in the main class definition above

### Step 2.5: Sentiment Analysis with VADER

The `analyze_sentiment_vader` method uses VADER (Valence Aware Dictionary and sEntiment Reasoner), which is specifically designed for social media and review text.

In [None]:
# The analyze_sentiment_vader method is now included in the main class definition above

### Step 2.6: Complete Review Analysis

The `analyze_review_complete` method combines entity extraction and sentiment analysis into a single comprehensive analysis.

In [None]:
# The analyze_review_complete method is now included in the main class definition above

### Step 2.7: Entity Visualization

The `visualize_entities` method uses spaCy's displacy to render entities in the text. Note: This works best in Jupyter notebooks.

In [None]:
# The visualize_entities method is now included in the main class definition above

### Step 2.8: Batch Analysis of Multiple Reviews

The `analyze_multiple_reviews` method processes a list of reviews and returns a pandas DataFrame with summarized results.

In [None]:
# The analyze_multiple_reviews method is now included in the main class definition above

## Step 3: Main Function and Demonstration

The `main` function demonstrates the analyzer by processing sample Amazon reviews. It shows individual analysis and summary statistics.

In [3]:
def main():
    """
    Main function to demonstrate the Amazon Review NLP Analysis.
    
    This function showcases the complete workflow of the analyzer by:
    1. Initializing the analyzer
    2. Processing sample Amazon reviews
    3. Displaying individual analysis results
    4. Creating summary statistics
    5. Demonstrating entity visualization
    """
    print("Amazon Product Reviews NLP Analysis")
    print("=" * 50)
    
    # Initialize the analyzer
    # This loads the spaCy model, sets up VADER, and configures pattern matching
    analyzer = AmazonReviewAnalyzer()
    
    # Sample Amazon reviews for demonstration
    # These reviews contain various product mentions and sentiment expressions
    sample_reviews = [
        # Mixed sentiment review mentioning Apple iPhone
        "The Apple iPhone 15 has an amazing camera but the battery life is terrible. I expected better from Apple.",
        
        # Positive review comparing Samsung and Google products
        "Samsung Galaxy S23 is fantastic! The display is brilliant and the performance is smooth. Much better than Google Pixel.",
        
        # Positive review about Amazon Kindle
        "I love my new Kindle Paperwhite for reading books at night! The backlight is perfect.",
        
        # Negative review about Sony product
        "The Sony headphones broke after just two weeks. Very disappointed with the quality.",
        
        # Mixed sentiment comparing Microsoft and Apple products
        "Microsoft Surface Pro is a great device for work, but the price is too high compared to Apple iPad.",
        
        # Positive review about Amazon Echo
        "Bought this Amazon Echo Dot and it's been working perfectly. Alexa understands all my commands.",
        
        # Mixed sentiment about Nintendo Switch
        "The battery on this Nintendo Switch doesn't last long. Otherwise, it's a good gaming console.",
        
        # Mixed sentiment about Google Pixel
        "Google Pixel camera is outstanding but the software has too many bugs.",
        
        # Positive review about MacBook
        "MacBook Air is lightweight and fast, perfect for students and professionals.",
        
        # Negative review about Dell laptop
        "The Dell laptop overheats constantly and the customer service was unhelpful."
    ]
    
    print(f"Analyzing {len(sample_reviews)} sample reviews...")
    print()
    
    # Analyze each review individually
    # This demonstrates the complete analysis workflow for each review
    for i, review in enumerate(sample_reviews, 1):
        print(f"REVIEW {i}:")
        print(f"Text: {review}")
        
        # Perform complete analysis using enhanced NER and VADER sentiment
        result = analyzer.analyze_review_complete(review)
        
        # Display sentiment analysis results
        sentiment = result['sentiment']
        print(f"Sentiment: {sentiment['sentiment']} (Score: {sentiment['compound_score']:.3f})")
        
        # Display extracted entities
        entities = result['entities']
        if entities:
            print("Entities Found:")
            for entity in entities:
                print(f"  - {entity['text']} ({entity['label']})")
        else:
            print("No relevant entities found.")
        
        print("-" * 80)
        print()
    
    # Create summary analysis using batch processing
    print("SUMMARY ANALYSIS")
    print("=" * 50)
    
    # Process all reviews in batch and create DataFrame
    df_results = analyzer.analyze_multiple_reviews(sample_reviews)
    
    # Calculate and display sentiment distribution
    print("\nSentiment Distribution:")
    sentiment_counts = df_results['sentiment'].value_counts()
    for sentiment, count in sentiment_counts.items():
        print(f"  {sentiment}: {count} reviews")
    
    # Display total entity count
    print(f"\nTotal Entities Found: {df_results['entity_count'].sum()}")
    
    # Show detailed results in a tabular format
    print("\nDetailed Results Table:")
    print(df_results[['review', 'sentiment', 'sentiment_score', 'entities']].to_string(index=False))
    
    # Demonstrate entity visualization with a specific example
    print("\n" + "="*50)
    print("ENTITY VISUALIZATION EXAMPLE")
    print("="*50)
    
    # Create an example review with multiple product mentions
    example_review = "Apple iPhone and Samsung Galaxy are both great phones, but I prefer Google Pixel for its camera."
    print(f"Review: {example_review}")
    print("\nEntity visualization (run in Jupyter for better display):")
    
    # Process the example review and display entities
    # This shows how spaCy identifies different entity types
    doc = analyzer.nlp(example_review)
    print("Entities found:")
    for ent in doc.ents:
        if ent.label_ in ["ORG", "PRODUCT", "GPE"]:
            print(f"  {ent.text} - {ent.label_}")

## Step 4: Running the Analysis

Now, let's execute the main function to see the analysis in action. This will process the sample reviews and display the results.

In [4]:
# Execute the main function when the script is run directly
# This ensures the analysis runs only when the notebook is executed, not when imported
if __name__ == "__main__":
    main()

Amazon Product Reviews NLP Analysis
spaCy model loaded successfully
Analyzing 10 sample reviews...

REVIEW 1:
Text: The Apple iPhone 15 has an amazing camera but the battery life is terrible. I expected better from Apple.
Sentiment: POSITIVE (Score: 0.273)
Entities Found:
  - iPhone (PRODUCT)
  - Apple (ORG)
--------------------------------------------------------------------------------

REVIEW 2:
Text: Samsung Galaxy S23 is fantastic! The display is brilliant and the performance is smooth. Much better than Google Pixel.
Sentiment: POSITIVE (Score: 0.891)
Entities Found:
  - Samsung Galaxy S23 (ORG)
--------------------------------------------------------------------------------

REVIEW 3:
Text: I love my new Kindle Paperwhite for reading books at night! The backlight is perfect.
Sentiment: POSITIVE (Score: 0.848)
Entities Found:
  - Kindle (PRODUCT)
  - Kindle Paperwhite (ORG)
--------------------------------------------------------------------------------

REVIEW 4:
Text: The Sony h