# Sentiment Analysis API

Load the trained model and make predictions on new airline tweets.

This notebook demonstrates the production API for sentiment prediction.

## Setup

In [1]:
import pandas as pd
import numpy as np
from joblib import load
from typing import Dict, List
import warnings
warnings.filterwarnings('ignore')

import importlib
import sentiment_utils
importlib.reload(sentiment_utils)

from sentiment_utils import (
    predict_sentiment,
    get_prediction_proba,
    LABEL_MAP,
)

VECT_PATH = "tfidf_vectorizer.joblib"
MODEL_PATH = "logreg_sentiment_model.joblib"

print("✓ Imports OK")

✓ Imports OK


## Load Model

In [2]:
print("Loading artifacts...\n")

vectorizer = load(VECT_PATH)
model = load(MODEL_PATH)

print(f"✓ Vectorizer: {len(vectorizer.get_feature_names_out())} features")
print(f"✓ Model: {model.classes_} classes")
print(f"\nReady for predictions")

Loading artifacts...

✓ Vectorizer: 7576 features
✓ Model: [0 1 2] classes

Ready for predictions


## API Class

In [3]:
class SentimentAPI:
    """Simple API for sentiment prediction."""
    
    def __init__(self, vectorizer, model, label_map=None):
        self.vectorizer = vectorizer
        self.model = model
        self.label_map = label_map or LABEL_MAP
    
    def predict(self, text: str) -> str:
        """Predict sentiment for a single text."""
        return predict_sentiment(text, self.vectorizer, self.model, self.label_map)[0]
    
    def predict_batch(self, texts: List[str]) -> List[str]:
        """Predict sentiment for multiple texts."""
        return predict_sentiment(texts, self.vectorizer, self.model, self.label_map)
    
    def predict_with_confidence(self, text: str) -> Dict[str, float]:
        """Predict with probability scores."""
        return get_prediction_proba(text, self.vectorizer, self.model, self.label_map)
    
    def predict_batch_with_confidence(self, texts: List[str]) -> List[Dict[str, float]]:
        """Batch prediction with scores."""
        return [self.predict_with_confidence(text) for text in texts]


api = SentimentAPI(vectorizer, model)
print("✓ SentimentAPI ready")

✓ SentimentAPI ready


## Demo 1: Single Predictions

In [4]:
examples = [
    "I love flying with this airline! Best experience ever!",
    "The flight was cancelled with no explanation. Terrible service.",
    "The flight arrived on time. Standard service.",
    "Amazing crew and comfortable seats!",
    "Lost my luggage. Very disappointed.",
]

print("\n" + "="*60)
print("SINGLE PREDICTIONS")
print("="*60)

for i, text in enumerate(examples, 1):
    pred = api.predict(text)
    print(f"\n[{i}] {text}")
    print(f"    → {pred.upper()}")


SINGLE PREDICTIONS

[1] I love flying with this airline! Best experience ever!
    → POSITIVE

[2] The flight was cancelled with no explanation. Terrible service.
    → NEGATIVE

[3] The flight arrived on time. Standard service.
    → NEGATIVE

[4] Amazing crew and comfortable seats!
    → POSITIVE

[5] Lost my luggage. Very disappointed.
    → NEGATIVE


## Demo 2: Batch Predictions

In [5]:
print("\n" + "="*60)
print("BATCH PREDICTIONS")
print("="*60)

preds = api.predict_batch(examples)

results_df = pd.DataFrame({
    'text': examples,
    'prediction': preds
})

print("\nResults:")
for idx, row in results_df.iterrows():
    print(f"\n{idx+1}. [{row['prediction'].upper()}]")
    txt = row['text'][:70] + "..." if len(row['text']) > 70 else row['text']
    print(f"   {txt}")

print(f"\n\nSummary:")
print(results_df['prediction'].value_counts())


BATCH PREDICTIONS

Results:

1. [POSITIVE]
   I love flying with this airline! Best experience ever!

2. [NEGATIVE]
   The flight was cancelled with no explanation. Terrible service.

3. [NEGATIVE]
   The flight arrived on time. Standard service.

4. [POSITIVE]
   Amazing crew and comfortable seats!

5. [NEGATIVE]
   Lost my luggage. Very disappointed.


Summary:
prediction
negative    3
positive    2
Name: count, dtype: int64


## Demo 3: With Confidence Scores

In [6]:
test_texts = [
    "Best airline ever!",
    "Worst flight ever.",
    "Flight was okay, nothing special.",
]

print("\n" + "="*60)
print("WITH CONFIDENCE SCORES")
print("="*60)

for i, text in enumerate(test_texts, 1):
    probs = api.predict_with_confidence(text)
    top_pred = max(probs, key=probs.get)
    top_conf = probs[top_pred]
    
    print(f"\n[{i}] {text}")
    print(f"    Probabilities:")
    for label in ['negative', 'neutral', 'positive']:
        conf = probs[label]
        bar = '█' * int(conf * 40)
        print(f"      {label:10s}: {conf:.3f} {bar}")
    print(f"    → Prediction: {top_pred.upper()} ({top_conf:.0%})")


WITH CONFIDENCE SCORES

[1] Best airline ever!
    Probabilities:
      negative  : 0.025 █
      neutral   : 0.026 █
      positive  : 0.949 █████████████████████████████████████
    → Prediction: POSITIVE (95%)

[2] Worst flight ever.
    Probabilities:
      negative  : 0.678 ███████████████████████████
      neutral   : 0.207 ████████
      positive  : 0.115 ████
    → Prediction: NEGATIVE (68%)

[3] Flight was okay, nothing special.
    Probabilities:
      negative  : 0.493 ███████████████████
      neutral   : 0.448 █████████████████
      positive  : 0.060 ██
    → Prediction: NEGATIVE (49%)


## Demo 4: Real Feedback Analysis

In [7]:
feedback = [
    "Great experience, booked last minute!",
    "Delayed 3 hours with no explanation.",
    "On-time arrival, comfortable seats.",
    "Food was cold and service was slow.",
    "Helpful and friendly staff!",
    "Just a regular flight.",
]

print("\n" + "="*60)
print("CUSTOMER FEEDBACK ANALYSIS")
print("="*60)

preds = api.predict_batch(feedback)

df_feedback = pd.DataFrame({
    'feedback': feedback,
    'sentiment': preds
})

print("\nAnalyzed feedback:")
print(df_feedback.to_string(index=False))

print(f"\n\nSummary:")
counts = df_feedback['sentiment'].value_counts()
for sentiment in ['positive', 'neutral', 'negative']:
    count = counts.get(sentiment, 0)
    pct = (count / len(df_feedback)) * 100
    print(f"  {sentiment:10s}: {count} ({pct:.0f}%)")


CUSTOMER FEEDBACK ANALYSIS

Analyzed feedback:
                             feedback sentiment
Great experience, booked last minute!  positive
 Delayed 3 hours with no explanation.  negative
  On-time arrival, comfortable seats.  positive
  Food was cold and service was slow.  negative
          Helpful and friendly staff!  positive
               Just a regular flight.   neutral


Summary:
  positive  : 3 (50%)
  neutral   : 1 (17%)
  negative  : 2 (33%)


## API Spec

In [8]:
print("""
╔════════════════════════════════════════════════════════════╗
║             SENTIMENT ANALYSIS API                          ║
╚════════════════════════════════════════════════════════════╝

Methods:

1. api.predict(text: str) -> str
   Single prediction. Returns: "negative", "neutral", "positive"

2. api.predict_batch(texts: List[str]) -> List[str]
   Batch prediction. Fast for multiple texts.

3. api.predict_with_confidence(text: str) -> Dict[str, float]
   Returns probabilities: {"negative": 0.7, "neutral": 0.2, ...}

4. api.predict_batch_with_confidence(texts: List) -> List[Dict]
   Batch with probabilities.

Model Info:
- Vectorizer: TF-IDF (20,000 features)
- Classifier: Logistic Regression
- Training: 14,640 airline tweets
- Test accuracy: ~80%

Usage Tips:
- Use predict() for single texts
- Use predict_batch() for efficiency
- Check confidence for borderline cases
- Works best on English airline-related text
""")


╔════════════════════════════════════════════════════════════╗
║             SENTIMENT ANALYSIS API                          ║
╚════════════════════════════════════════════════════════════╝

Methods:

1. api.predict(text: str) -> str
   Single prediction. Returns: "negative", "neutral", "positive"

2. api.predict_batch(texts: List[str]) -> List[str]
   Batch prediction. Fast for multiple texts.

3. api.predict_with_confidence(text: str) -> Dict[str, float]
   Returns probabilities: {"negative": 0.7, "neutral": 0.2, ...}

4. api.predict_batch_with_confidence(texts: List) -> List[Dict]
   Batch with probabilities.

Model Info:
- Vectorizer: TF-IDF (20,000 features)
- Classifier: Logistic Regression
- Training: 14,640 airline tweets
- Test accuracy: ~80%

Usage Tips:
- Use predict() for single texts
- Use predict_batch() for efficiency
- Check confidence for borderline cases
- Works best on English airline-related text

