# Task 3 — Natural Language Processing with spaCy
### Dataset: Amazon Product Reviews
#### Objectives:
- Perform Named Entity Recognition (NER) to extract product names, brands, and other entities.
- Analyze the sentiment of each review using a simple rule-based approach.

In [1]:
# ============================
# 1. Import Dependencies
# ============================
# Import spaCy for advanced NLP tasks
import spacy
# Import pandas for data manipulation and creating DataFrames
import pandas as pd

# Print the installed spaCy version to confirm setup
print('spaCy version:', spacy.about.__version__)

spaCy version: 3.8.7


In [2]:
# ============================
# 2. Load English Language Model
# ============================
# Load the small English model provided by spaCy ('en_core_web_sm').
# This model is trained for various NLP tasks, including NER.
nlp = spacy.load('en_core_web_sm')
print('Model loaded successfully!')

Model loaded successfully!


In [3]:
# ============================
# 3. Create Sample Amazon Reviews
# ============================
# A list of sample product reviews to be analyzed.
reviews = [
    'I love the new Apple iPhone 14 Pro — the camera quality is amazing!',
    'The Samsung Galaxy Watch battery dies too quickly. Disappointed.',
    'Sony headphones deliver crystal clear sound, totally worth the price!',
    'Avoid the cheap Lenovo charger — it broke after two days.',
    'The Dell XPS laptop is sleek and fast, perfect for my work needs.'
]

# Create a pandas DataFrame to store and manage the reviews.
df = pd.DataFrame(reviews, columns=['Review'])
df

Unnamed: 0,Review
0,I love the new Apple iPhone 14 Pro — the camer...
1,The Samsung Galaxy Watch battery dies too quic...
2,"Sony headphones deliver crystal clear sound, t..."
3,Avoid the cheap Lenovo charger — it broke afte...
4,"The Dell XPS laptop is sleek and fast, perfect..."


In [4]:
# ============================
# 4. Perform Named Entity Recognition (NER)
# ============================
# Define a function to process text with the spaCy model and extract named entities.
def extract_entities(text):
    # Process the text with the loaded spaCy model.
    doc = nlp(text)
    # Extract the text and label for each entity found in the document.
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    return entities

# Apply the function to the 'Review' column to create a new 'Entities' column.
df['Entities'] = df['Review'].apply(extract_entities)

# Display the reviews and their extracted entities.
df[['Review', 'Entities']]

Unnamed: 0,Review,Entities
0,I love the new Apple iPhone 14 Pro — the camer...,"[(Apple, ORG), (iPhone 14 Pro, PRODUCT)]"
1,The Samsung Galaxy Watch battery dies too quic...,[]
2,"Sony headphones deliver crystal clear sound, t...","[(Sony, ORG)]"
3,Avoid the cheap Lenovo charger — it broke afte...,"[(Lenovo, ORG), (two days, DATE)]"
4,"The Dell XPS laptop is sleek and fast, perfect...",[]


In [5]:
# ============================
# 5. Perform Rule-Based Sentiment Analysis
# ============================
# Define simple lists of positive and negative keywords.
positive_words = ['love', 'amazing', 'great', 'good', 'fast', 'perfect', 'worth']
negative_words = ['bad', 'disappointed', 'poor', 'broke', 'cheap', 'slow', 'terrible']

# Define a function to analyze sentiment based on the presence of keywords.
def analyze_sentiment(text):
    text_lower = text.lower()
    # Count positive and negative words in the review.
    pos_count = sum(word in text_lower for word in positive_words)
    neg_count = sum(word in text_lower for word in negative_words)
    
    # Determine sentiment based on the counts.
    if pos_count > neg_count:
        return 'Positive'
    elif neg_count > pos_count:
        return 'Negative'
    else:
        return 'Neutral'

# Apply the sentiment analysis function to the 'Review' column.
df['Sentiment'] = df['Review'].apply(analyze_sentiment)

# Display the final DataFrame with reviews, entities, and sentiment.
df[['Review', 'Entities', 'Sentiment']]

Unnamed: 0,Review,Entities,Sentiment
0,I love the new Apple iPhone 14 Pro — the camer...,"[(Apple, ORG), (iPhone 14 Pro, PRODUCT)]",Positive
1,The Samsung Galaxy Watch battery dies too quic...,[],Negative
2,"Sony headphones deliver crystal clear sound, t...","[(Sony, ORG)]",Positive
3,Avoid the cheap Lenovo charger — it broke afte...,"[(Lenovo, ORG), (two days, DATE)]",Negative
4,"The Dell XPS laptop is sleek and fast, perfect...",[],Positive


In [6]:
# ============================
# 6. Display Sample Entities
# ============================
# Select a sample review for entity display.
sample_text = reviews[0]
# Process the text with the spaCy model.
doc = nlp(sample_text)
# Display entities in a simple text format.
print('Sample Review:', sample_text)
print('\nExtracted Entities:')
for ent in doc.ents:
    print(f'  - {ent.text}: {ent.label_}')
if not doc.ents:
    print('  - No entities found in this review')

Sample Review: I love the new Apple iPhone 14 Pro — the camera quality is amazing!

Extracted Entities:
  - Apple: ORG
  - iPhone 14 Pro: PRODUCT


### Deliverables:
- Jupyter notebook: `nlp_spacy.ipynb`
- Output table showing entities and sentiments for each review.
- Text-based entity display for the sample review.