<a href="https://colab.research.google.com/github/KijoSal-dev/NLP-with-Spacy-tsk3/blob/main/NLPSpctsk3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:
import spacy
from spacy import displacy
from spacy.tokens import Doc

# Load spaCy's medium English model (includes NER capabilities)
nlp = spacy.load("en_core_web_md")

# Define custom sentiment analysis rules
@spacy.Language.component("rule_based_sentiment")
def rule_based_sentiment_component(doc):
    # Define sentiment lexicons
    POSITIVE_WORDS = {"good", "great", "excellent", "awesome", "love",
                     "amazing", "perfect", "recommend", "wonderful", "best"}
    NEGATIVE_WORDS = {"bad", "poor", "terrible", "awful", "hate",
                     "disappointing", "worst", "waste", "broken", "return"}

    # Analyze sentiment based on word presence and modifiers
    sentiment_score = 0
    negation_terms = {"not", "no", "never", "n't"}

    for token in doc:
        # Check for negation
        if token.text.lower() in negation_terms:
            next_token = token.nbor() if token.i + 1 < len(doc) else None
            if next_token and next_token.text.lower() in POSITIVE_WORDS:
                sentiment_score -= 2  # Negated positive
            elif next_token and next_token.text.lower() in NEGATIVE_WORDS:
                sentiment_score += 2  # Negated negative
            continue

        # Check sentiment words
        if token.text.lower() in POSITIVE_WORDS:
            sentiment_score += 1
        elif token.text.lower() in NEGATIVE_WORDS:
            sentiment_score -= 1

    # Determine sentiment based on score
    if sentiment_score > 0:
        doc._.sentiment_analyzer = "positive"
    elif sentiment_score < 0:
        doc._.sentiment_analyzer = "negative"
    else:
        doc._.sentiment_analyzer = "neutral"
    return doc

# Register custom attribute for sentiment
if not Doc.has_extension("sentiment_analyzer"):
    Doc.set_extension("sentiment_analyzer", default=None)


# Add sentiment function to the pipeline
nlp.add_pipe("rule_based_sentiment", last=True)

# Sample Amazon product reviews
reviews = [
    "I absolutely love my new Apple iPhone 13 Pro! The camera quality is amazing.",
    "This Samsung Galaxy S22 Ultra is the worst phone I've ever used. Battery life is terrible.",
    "The Sony WH-1000XM4 headphones are good, but the noise cancellation isn't perfect.",
    "I bought a Nike running shoes from Amazon. They are comfortable but wore out quickly.",
    "Never buying Dell laptops again! My XPS 15 arrived broken and customer service was awful."
]

# Process reviews and extract insights
for review in reviews:
    doc = nlp(review)

    print(f"\nReview: '{review}'")
    print("=" * 80)

    # Extract product names and brands using NER
    products = [ent.text for ent in doc.ents if ent.label_ in ["ORG", "PRODUCT"]]
    print(f"Extracted Products/Brands: {', '.join(products) if products else 'None'}")

    # Display entities visualization
    print("\nNamed Entities:")
    displacy.render(doc, style="ent", jupyter=False)

    # Show sentiment analysis result
    print(f"\nSentiment: {doc._.sentiment_analyzer.upper()}")

    # Detailed entity information
    print("\nEntity Details:")
    for ent in doc.ents:
        if ent.label_ in ["ORG", "PRODUCT"]:
            print(f"- {ent.text} ({ent.label_})")

    print("=" * 80)

# Add custom NER patterns for better product recognition
product_patterns = [
    {"label": "PRODUCT", "pattern": [{"LOWER": "iphone"}, {"TEXT": {"REGEX": "\d+"}}, {"LOWER": "pro"}]},
    {"label": "PRODUCT", "pattern": [{"LOWER": "galaxy"}, {"LOWER": "s"}, {"TEXT": {"REGEX": "\d+"}}, {"LOWER": "ultra"}]},
    {"label": "PRODUCT", "pattern": [{"LOWER": "wh"}, {"TEXT": {"REGEX": "\d+"}}, {"TEXT": {"REGEX": "[A-Z]+\d+"}}]},
    {"label": "PRODUCT", "pattern": [{"LOWER": "xps"}, {"TEXT": {"REGEX": "\d+"}}]}
]

# Create ruler and add patterns
ruler = nlp.add_pipe("entity_ruler", before="ner")
ruler.add_patterns(product_patterns)


Review: 'I absolutely love my new Apple iPhone 13 Pro! The camera quality is amazing.'
Extracted Products/Brands: Apple

Named Entities:

Sentiment: POSITIVE

Entity Details:
- Apple (ORG)

Review: 'This Samsung Galaxy S22 Ultra is the worst phone I've ever used. Battery life is terrible.'
Extracted Products/Brands: None

Named Entities:

Sentiment: NEGATIVE

Entity Details:

Review: 'The Sony WH-1000XM4 headphones are good, but the noise cancellation isn't perfect.'
Extracted Products/Brands: None

Named Entities:

Sentiment: NEUTRAL

Entity Details:

Review: 'I bought a Nike running shoes from Amazon. They are comfortable but wore out quickly.'
Extracted Products/Brands: Nike, Amazon

Named Entities:

Sentiment: NEUTRAL

Entity Details:
- Nike (ORG)
- Amazon (ORG)

Review: 'Never buying Dell laptops again! My XPS 15 arrived broken and customer service was awful.'
Extracted Products/Brands: Dell

Named Entities:

Sentiment: NEGATIVE

Entity Details:
- Dell (ORG)




### Ethical Considerations and Bias Mitigation with Rule-Based Systems

Rule-based systems in spaCy, like the custom sentiment analyzer and entity ruler implemented above, can help mitigate some biases inherent in purely statistical or machine learning models. Here's how:

*   **Transparency and Explainability:** Rule-based systems are inherently transparent. The rules are explicitly defined and can be easily inspected and understood. This allows developers to identify and address potential sources of bias directly within the rules. Unlike complex black-box models, the decision-making process is clear, making it easier to debug and ensure fairness.
*   **Direct Control over Bias:** By defining specific rules, you can actively counteract known biases present in training data. For example, if a dataset has a bias against certain demographic groups, you can create rules that specifically adjust sentiment scores or entity recognition to compensate for this bias.
*   **Domain Adaptation:** Rule-based systems can be easily adapted to specific domains or contexts where statistical models might struggle due to a lack of relevant training data. This allows for the creation of highly tailored and less biased systems for particular applications.
*   **Reduced Reliance on Biased Data:** While statistical models heavily rely on large datasets that may contain societal biases, rule-based systems can be developed with less dependence on such data. The rules are based on linguistic principles and domain knowledge, which can be designed to be more neutral.
*   **Handling Edge Cases:** Rule-based systems can be effective in handling specific edge cases or nuanced language that statistical models might misinterpret, potentially leading to biased outcomes.

However, it's important to note that rule-based systems are not entirely immune to bias. The rules themselves can reflect the biases of the developers who create them. Therefore, careful consideration and testing are necessary to ensure that the rules are fair and do not introduce new biases. A hybrid approach combining rule-based and statistical methods, along with rigorous evaluation and bias detection techniques, is often the most effective way to build ethical and unbiased NLP systems.