1. Lexicon-Based Sentiment Analysis with VADER
VADER (Valence Aware Dictionary for sEntiment Reasoning) is a rule‐based model that is particularly effective on social media texts. It comes with NLTK and can classify a piece of text as positive, negative, or neutral based on a compound score.

In [1]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


In [2]:
def classify_sentiment_vader(review):
    sid = SentimentIntensityAnalyzer()
    scores = sid.polarity_scores(review)
    compound = scores['compound']
    # Define thresholds (commonly: compound >= 0.05 => positive, <= -0.05 => negative)
    if compound >= 0.05:
        return "positive"
    elif compound <= -0.05:
        return "negative"
    else:
        return "neutral"

In [6]:
# Example usage
review_text = "I absolutely welled this movie, it was shitty!"
#review_text = "I absolutely loved this movie, it was fantastic!"
print(f"Review: {review_text}\nSentiment: {classify_sentiment_vader(review_text)}")

Review: I absolutely welled this movie, it was shitty!
Sentiment: negative


2.

Machine Learning Approach Using Scikit-Learn
Another common method is to train a classifier using labeled data. In this example, we use a small sample dataset, convert the text into numerical features using TF-IDF, and then train a logistic regression model.

In [7]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.pipeline import Pipeline

In [8]:
# Sample dataset: text reviews with associated sentiments.
data = {
    "review": [
        "I loved this movie, it was amazing!",
        "This film was terrible and boring.",
        "What a fantastic performance.",
        "I didn't like the movie at all.",
        "The plot was really interesting and fun.",
        "It was a waste of time, very disappointing."
    ],
    "sentiment": [
        "positive", "negative", "positive", "negative", "positive", "negative"
    ]
}

In [9]:
df = pd.DataFrame(data)

# Split the dataset into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(df['review'], df['sentiment'], test_size=0.2, random_state=42)


In [10]:
# Create a pipeline that first vectorizes the text then trains a classifier.
pipeline = Pipeline([
    ('tfidf', TfidfVectorizer()),
    ('clf', LogisticRegression())
])

In [11]:
# Train the model.
pipeline.fit(X_train, y_train)

In [12]:
# Evaluate the model.
predictions = pipeline.predict(X_test)
print("Accuracy:", accuracy_score(y_test, predictions))

Accuracy: 0.0


In [13]:
# Classify a new review.
new_review = "The movie was a brilliant display of storytelling!"
print(f"Review: {new_review}\nSentiment: {pipeline.predict([new_review])[0]}")

Review: The movie was a brilliant display of storytelling!
Sentiment: negative
