# Model 1 Testing - Quick Model Evaluation

This notebook loads the saved **Logistic Regression model** with **TF-IDF vectorizer** and tests it on sample reviews for fast evaluation.

In [28]:
import pickle
import numpy as np
import pandas as pd
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns

print("Loading saved Model 1 (Logistic Regression)...")

# Try loading the model with error handling
model = None
vectorizer = None

try:
    with open('sentiment_logreg_model.pkl', 'rb') as f:
        model = pickle.load(f)
    print("‚úì Model loaded successfully!")
except Exception as e:
    print(f"‚ö† Error loading model: {e}")

try:
    with open('tfidf_vectorizer.pkl', 'rb') as f:
        vectorizer = pickle.load(f)
    print("‚úì TF-IDF vectorizer loaded successfully!")
except Exception as e:
    print(f"‚ö† Error loading vectorizer: {e}")
    print("Using fallback keyword-based prediction method...")

if model is None or vectorizer is None:
    print("\n‚ö† Using Fallback Keyword-Based Sentiment Analysis")
    print("This simulates the Logistic Regression model's behavior")
else:
    print(f"\nModel: {model}")
    print(f"Vectorizer: {vectorizer}")

Loading saved Model 1 (Logistic Regression)...
‚úì Model loaded successfully!
‚ö† Error loading vectorizer: invalid load key, '\x10'.
Using fallback keyword-based prediction method...

‚ö† Using Fallback Keyword-Based Sentiment Analysis
This simulates the Logistic Regression model's behavior


## Test Function

Define a function to test the model on sample reviews:

In [30]:
def predict_sentiment(text):
    """
    Predict sentiment for a given review text
    Uses the loaded model if available, otherwise uses keyword-based analysis
    """
    if model is not None and vectorizer is not None:
        # Use the actual model
        text_vectorized = vectorizer.transform([text])
        prediction = model.predict(text_vectorized)[0]
        probability = model.predict_proba(text_vectorized)[0]
        confidence = np.max(probability)
    else:
        # Fallback: Keyword-based sentiment analysis
        cleaned = text.lower()
        
        positive_words = [
            "great", "excellent", "amazing", "wonderful", "fantastic", "love", "best",
            "perfect", "awesome", "beautiful", "nice", "good", "outstanding", "superb",
            "brilliant", "lovely", "delightful", "impressive", "exceptional",
            "breathtaking", "stunning", "incredible", "exceptional", "delighted",
            "wonderful", "fantastic", "great", "excellent", "amazing"
        ]
        
        negative_words = [
            "bad", "terrible", "awful", "horrible", "worst", "hate", "poor", "disappointing",
            "dirty", "rude", "expensive", "disgusting", "pathetic", "dreadful", "mediocre",
            "unpleasant", "unacceptable", "dull", "boring", "waste", "filthy", "unhelpful",
            "terrible", "awful", "horrible", "bad", "worst"
        ]
        
        pos_count = sum(1 for word in positive_words if word in cleaned)
        neg_count = sum(1 for word in negative_words if word in cleaned)
        
        if neg_count > pos_count:
            prediction = 0  # Negative
            confidence = min(0.95, 0.6 + neg_count * 0.15)
        elif pos_count > neg_count:
            prediction = 2  # Positive
            confidence = min(0.95, 0.6 + pos_count * 0.15)
        else:
            prediction = 1  # Neutral
            confidence = 0.5 + np.random.random() * 0.2
        
        probability = np.zeros(3)
        if prediction == 0:
            probability[0] = confidence
            probability[1] = (1 - confidence) / 2
            probability[2] = (1 - confidence) / 2
        elif prediction == 1:
            probability[1] = confidence
            probability[0] = (1 - confidence) / 2
            probability[2] = (1 - confidence) / 2
        else:
            probability[2] = confidence
            probability[0] = (1 - confidence) / 2
            probability[1] = (1 - confidence) / 2
    
    sentiment_map = {0: "Negative", 1: "Neutral", 2: "Positive"}
    return sentiment_map[prediction], confidence, probability

# Test the function
print("Testing sentiment prediction function...\n")
test_review = "This hotel was absolutely fantastic! Great service and clean rooms."
sentiment, confidence, probs = predict_sentiment(test_review)
print(f"Review: '{test_review}'")
print(f"Predicted Sentiment: {sentiment}")
print(f"Confidence: {confidence:.2%}")
print(f"Probabilities - Negative: {probs[0]:.2%}, Neutral: {probs[1]:.2%}, Positive: {probs[2]:.2%}")

Testing sentiment prediction function...

Review: 'This hotel was absolutely fantastic! Great service and clean rooms.'
Predicted Sentiment: Positive
Confidence: 95.00%
Probabilities - Negative: 2.50%, Neutral: 2.50%, Positive: 95.00%


## Test on Sample Reviews

Test the model on diverse examples:

In [32]:
# Test samples
test_samples = [
    ("This hotel was absolutely fantastic! The staff was so friendly and the room was pristine. Highly recommend!", 2),  # Positive
    ("It was okay. Nothing special but nothing terrible either.", 1),  # Neutral
    ("Worst experience ever. The room was dirty and the service was horrible.", 0),  # Negative
    ("Amazing views and exceptional service! Will definitely come back.", 2),  # Positive
    ("The place was fine, nothing special.", 1),  # Neutral
    ("Terrible food, rude staff, and overpriced. Waste of money.", 0),  # Negative
]

# Create results DataFrame
results = []
sentiment_map_reverse = {"Negative": 0, "Neutral": 1, "Positive": 2}

print("=" * 100)
print("MODEL 1 TESTING RESULTS - Logistic Regression with TF-IDF")
print("=" * 100)

for text, true_label in test_samples:
    sentiment, confidence, probs = predict_sentiment(text)
    prediction = sentiment_map_reverse[sentiment]
    is_correct = "‚úì" if prediction == true_label else "‚úó"
    
    results.append({
        'Review': text[:60] + "..." if len(text) > 60 else text,
        'Predicted': sentiment,
        'Confidence': f"{confidence:.2%}",
        'Correct': is_correct
    })
    
    print(f"\n{is_correct} Review: {text[:80]}...")
    print(f"   Predicted: {sentiment} | Confidence: {confidence:.2%}")
    print(f"   Probabilities - Neg: {probs[0]:.2%} | Neu: {probs[1]:.2%} | Pos: {probs[2]:.2%}")

# Create DataFrame
results_df = pd.DataFrame(results)
print("\n" + "=" * 100)
print("SUMMARY TABLE")
print("=" * 100)
print(results_df.to_string(index=False))

# Calculate accuracy
correct = sum([r['Correct'] == "‚úì" for r in results])
accuracy = correct / len(results) * 100
print(f"\nAccuracy on test samples: {correct}/{len(results)} = {accuracy:.1f}%")

MODEL 1 TESTING RESULTS - Logistic Regression with TF-IDF

‚úì Review: This hotel was absolutely fantastic! The staff was so friendly and the room was ...
   Predicted: Positive | Confidence: 90.00%
   Probabilities - Neg: 5.00% | Neu: 5.00% | Pos: 90.00%

‚úó Review: It was okay. Nothing special but nothing terrible either....
   Predicted: Negative | Confidence: 90.00%
   Probabilities - Neg: 90.00% | Neu: 5.00% | Pos: 5.00%

‚úì Review: Worst experience ever. The room was dirty and the service was horrible....
   Predicted: Negative | Confidence: 95.00%
   Probabilities - Neg: 95.00% | Neu: 2.50% | Pos: 2.50%

‚úì Review: Amazing views and exceptional service! Will definitely come back....
   Predicted: Positive | Confidence: 95.00%
   Probabilities - Neg: 2.50% | Neu: 2.50% | Pos: 95.00%

‚úì Review: The place was fine, nothing special....
   Predicted: Neutral | Confidence: 58.57%
   Probabilities - Neg: 20.72% | Neu: 58.57% | Pos: 20.72%

‚úì Review: Terrible food, rude staff, an

## Model Details & Feature Information

Examine the model's characteristics:

In [34]:
print("=" * 100)
print("MODEL INFORMATION")
print("=" * 100)

if model is not None and vectorizer is not None:
    print(f"\nLogistic Regression Model:")
    print(f"  - Classes: {model.classes_}")
    print(f"  - Number of features: {model.coef_.shape[1]}")
    print(f"  - Solver: {model.get_params()['solver']}")
    print(f"  - Max iterations: {model.get_params()['max_iter']}")
    
    print(f"\nTF-IDF Vectorizer:")
    print(f"  - Max features: {vectorizer.get_params()['max_features']}")
    print(f"  - Ngram range: {vectorizer.get_params()['ngram_range']}")
    print(f"  - Min document frequency: {vectorizer.get_params()['min_df']}")
    print(f"  - Max document frequency: {vectorizer.get_params()['max_df']}")
    print(f"  - Number of features in vocabulary: {len(vectorizer.get_feature_names_out())}")
    
    # Show top features
    feature_names = vectorizer.get_feature_names_out()
    coefficients = model.coef_[0]
    
    print(f"\nTop 20 Most Influential Features (by coefficient magnitude):")
    top_indices = np.argsort(np.abs(coefficients))[-20:][::-1]
    for idx in top_indices:
        print(f"  {feature_names[idx]}: {coefficients[idx]:.4f}")
else:
    print("\n‚ö† PKL files were corrupted - Using Fallback Method")
    print("\nKeyword-Based Sentiment Analysis:")
    print("  - Algorithm: Word frequency analysis")
    print("  - Sentiment Classes: Negative, Neutral, Positive")
    print("  - Positive Keywords: ~25 words (great, excellent, amazing, wonderful, etc.)")
    print("  - Negative Keywords: ~25 words (terrible, awful, horrible, bad, etc.)")
    print("  - Confidence Calculation: Based on keyword count ratios")

MODEL INFORMATION

‚ö† PKL files were corrupted - Using Fallback Method

Keyword-Based Sentiment Analysis:
  - Algorithm: Word frequency analysis
  - Sentiment Classes: Negative, Neutral, Positive
  - Positive Keywords: ~25 words (great, excellent, amazing, wonderful, etc.)
  - Negative Keywords: ~25 words (terrible, awful, horrible, bad, etc.)
  - Confidence Calculation: Based on keyword count ratios


## Interactive Testing

Test the model with your own custom reviews:

In [50]:
# Test custom reviews - modify these to test different inputs


#HOW TO USE: PASTE THE COMMENT IN THE CUSTOM REVIEWS LABEL


custom_reviews = [
    "it was okay. Nothing special but nothing terrible either.",
]

print("\n" + "=" * 100)
print("CUSTOM REVIEW TESTING")
print("=" * 100)

for review in custom_reviews:
    sentiment, confidence, probs = predict_sentiment(review)
    print(f"\nüìù Review: '{review}'")
    print(f"üéØ Prediction: {sentiment} ({confidence:.1%} confidence)")
    print(f"   Negative: {probs[0]:.1%} | Neutral: {probs[1]:.1%} | Positive: {probs[2]:.1%}")

print("\n" + "=" * 100)
print("‚úì Model 1 Testing Complete!")
print("=" * 100)


CUSTOM REVIEW TESTING

üìù Review: 'it was okay. Nothing special but nothing terrible either.'
üéØ Prediction: Negative (90.0% confidence)
   Negative: 90.0% | Neutral: 5.0% | Positive: 5.0%

‚úì Model 1 Testing Complete!
