# Real-World Use Case: Movie Review Sentiment Analysis

## 1. The Problem
A streaming service wants to automatically tag user reviews as "Positive" or "Negative" to aggregate an audience score.

## 2. Why Naive Bayes?
*   **Text Data**: It works exceptionally well with high-dimensional sparse data like text (Bag of Words).
*   **Speed**: It is incredibly fast for training and prediction, even on massive datasets.

## 3. Data Simulation (IMDB Proxy)
We will use a list of sentences.

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Data
reviews = [
    "This movie was fantastic and thrilling",
    "Absolutely terrible acting and boring plot",
    "I loved every minute of it",
    "Waste of time, don't watch",
    "A masterpiece of cinema",
    "The worst movie I have ever seen",
    "Great acting but the ending was weak",
    "Predictable and uninspired",
    "Highly recommended!",
    "Disaster from start to finish"
] * 10 # Repeat to make it slightly bigger

labels = [1, 0, 1, 0, 1, 0, 1, 0, 1, 0] * 10 # 1=Pos, 0=Neg

X_train, X_test, y_train, y_test = train_test_split(reviews, labels, test_size=0.2, random_state=42)

# 2. Pipeline
# CountVectorizer turns text into a matrix of token counts
# MultinomialNB is designed for count data
model = make_pipeline(CountVectorizer(), MultinomialNB())

# 3. Train
model.fit(X_train, y_train)

# 4. Evaluate
print(f"Accuracy: {model.score(X_test, y_test):.2f}")

# 5. Inference
new_reviews = ["This was a boring disaster", "What a fantastic journey"]
preds = model.predict(new_reviews)
print(f"Predictions: {['Positive' if p==1 else 'Negative' for p in preds]}")