# 3-Class Model Inference & Evaluation on USS Reviews

**High-level summary:**  
Loads a saved TF-IDF vectorizer and trained 3-class logistic regression model, maps true labels, transforms new review text into features, makes predictions, and reports overall accuracy plus per-class precision/recall/F1.


In [None]:
# prompt: connect google drive

from google.colab import drive
drive.mount('/content/drive')

# prompt: load current directory

import os

os.chdir('/content/drive/My Drive/CS605-NLP-Project')

Mounted at /content/drive


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Reads the Yelp train/test sets (in Parquet format) into pandas.

Reads your USS reviews CSV for later inference.

Prints out the number of rows/columns and shows the first few records of each.

Step 2: Text Cleaning & Preprocessing.

In [None]:
uss_reviews = pd.read_csv("datastore/USS_Reviews_Silver.csv", parse_dates=["publishedAtDate"])

vectorize the cleaned text with TF-IDF.

In [None]:
import joblib
from sklearn.metrics import classification_report, accuracy_score

# 0. Load your saved TF–IDF vectorizer and trained model
vectorizer = joblib.load('model/3class_logistic_fidf_vectorizer_v2.pkl')
best_lr = joblib.load('model/3class_logistic_model_v2.pkl')  # Updated model name

# 1. 3-class mapping function (same as training)
def to_3class_label(stars):
    if stars <= 2:
        return 0   # Negative
    elif stars == 3:
        return 1   # Neutral
    else:
        return 2   # Positive (3-4 stars)

# 2. Apply to get true labels
uss_reviews['true_3class'] = uss_reviews['stars'].map(to_3class_label)

# 3. Vectorize reviews (using the same preprocessing as training)
X_uss_tfidf = vectorizer.transform(
    uss_reviews['review']
)

# 4. Predict with your loaded 3-class model
uss_reviews['pred_3class'] = best_lr.predict(X_uss_tfidf)

# 5. Evaluation
print("Accuracy on USS reviews:",
      accuracy_score(uss_reviews['true_3class'], uss_reviews['pred_3class']))
print("\nClassification Report (3-class):")
print(classification_report(
    uss_reviews['true_3class'],
    uss_reviews['pred_3class'],
    target_names=['Negative', 'Neutral', 'Positive'],  # Updated class names
    digits=4
))

Accuracy on USS reviews: 0.5929552563579491

Classification Report (3-class):
              precision    recall  f1-score   support

    Negative     0.3106    0.7616    0.4412      2206
     Neutral     0.1111    0.4548    0.1786      2133
    Positive     0.9684    0.5899    0.7332     25073

    accuracy                         0.5930     29412
   macro avg     0.4634    0.6021    0.4510     29412
weighted avg     0.8569    0.5930    0.6711     29412

