Training data

In [None]:
reviews = [
    ("This phone is amazing, I love the camera quality!", "POSITIVE"),
    ("Terrible battery life and the screen cracks easily.", "NEGATIVE"),
    ("Very fast and reliable performance.", "POSITIVE"),
    ("The sound is awful and the device feels cheap.", "NEGATIVE"),
]


Rule-Based Sentiment Analyzer Setup

In this section, I create a simple rule-based sentiment analyzer using spaCy.
I define lists of positive and negative keywords and build a custom pipeline component (sentiment_analyzer) that counts occurrences of these words to classify text as POSITIVE, NEGATIVE, or NEUTRAL.

In [2]:
import spacy
from spacy.language import Language
from spacy.tokens import Doc

# Blank English model
nlp = spacy.blank("en")

# Define a list of positive and negative keywords
positive_words = ["amazing", "great", "love", "fast", "reliable", "excellent", "perfect"]
negative_words = ["terrible", "awful", "bad", "poor", "cheap", "slow", "crack", "disappointing"]

@Language.component("sentiment_analyzer")
def sentiment_analyzer(doc):
    pos_score = 0
    neg_score = 0

    for token in doc:
        word = token.text.lower()
        if word in positive_words:
            pos_score += 1
        elif word in negative_words:
            neg_score += 1

    # Rule-based decision
    if pos_score > neg_score:
        doc._.sentiment = "POSITIVE"
    elif neg_score > pos_score:
        doc._.sentiment = "NEGATIVE"
    else:
        doc._.sentiment = "NEUTRAL"
    
    return doc

# Register the custom attribute to store sentiment
Doc.set_extension("sentiment", default=None)

# Add component to the pipeline
nlp.add_pipe("sentiment_analyzer", last=True)


<function __main__.sentiment_analyzer(doc)>

Evaluating the Rule-Based Model

Here, I evaluate how well my rule-based sentiment analyzer performs using a small set of labeled reviews.
The model predicts sentiment for each review, and I compare it to the true label to calculate overall accuracy.

In [3]:
for review, _ in reviews:
    doc = nlp(review)
    print(f"Review: {review}")
    print(f"Predicted Sentiment: {doc._.sentiment}")
    print("-" * 50)


Review: This phone is amazing, I love the camera quality!
Predicted Sentiment: POSITIVE
--------------------------------------------------
Review: Terrible battery life and the screen cracks easily.
Predicted Sentiment: NEGATIVE
--------------------------------------------------
Review: Very fast and reliable performance.
Predicted Sentiment: POSITIVE
--------------------------------------------------
Review: The sound is awful and the device feels cheap.
Predicted Sentiment: NEGATIVE
--------------------------------------------------


In [4]:
correct = 0
for text, true_label in reviews:
    doc = nlp(text)
    if doc._.sentiment == true_label:
        correct += 1

accuracy = correct / len(reviews)
print(f"Model Accuracy: {accuracy:.2f}")


Model Accuracy: 0.75
