# Prediction Result Analysis (CS2 Reviews)

This notebook provides deep insights into the performance of the sentiment analysis model on Counter-Strike 2 reviews.
We move beyond basic EDA and focus on understanding *how* and *why* the model makes specific predictions.

**Theme:** Asiimov (Orange, Black, White, Grey)

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import numpy as np
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.calibration import calibration_curve

# --- Asiimov Color Scheme ---
asiimov_colors = {
    'orange': '#FF9900',
    'black': '#1A1A1A',
    'white': '#FFFFFF',
    'grey': '#5c5c5c',
    'light_grey': '#d1d1d1'
}

# Setup Plotly Template
pio.templates["asiimov"] = go.layout.Template(
    layout=go.Layout(
        colorway=[asiimov_colors['orange'], asiimov_colors['black'], asiimov_colors['grey']],
        plot_bgcolor=asiimov_colors['white'],
        paper_bgcolor=asiimov_colors['white'],
        font={'color': asiimov_colors['black']},
        title={'font': {'color': asiimov_colors['black']}},
        xaxis={'gridcolor': asiimov_colors['light_grey'], 'linecolor': asiimov_colors['black']},
        yaxis={'gridcolor': asiimov_colors['light_grey'], 'linecolor': asiimov_colors['black']},
    )
)
pio.templates.default = "asiimov"

print("Environment Setup Complete. Asiimov Theme Applied.")

In [None]:
# Load Predictions
df = pd.read_csv('cs2_10k_predictions.csv')

# --- Preprocessing ---
# Ensure ground truth is integer (1 for True/Positive, 0 for False/Negative)
df['actual_label'] = df['voted_up'].astype(int)

# Determine correctness
df['is_correct'] = df['actual_label'] == df['predicted_label']
df['result_type'] = df.apply(lambda x: 'TP' if x['actual_label']==1 and x['predicted_label']==1 else
                                     'TN' if x['actual_label']==0 and x['predicted_label']==0 else
                                     'FP' if x['actual_label']==0 and x['predicted_label']==1 else
                                     'FN', axis=1)

# Calculate review length
# Using 'cleaned_review' for consistency
df['review_length'] = df['cleaned_review'].fillna('').astype(str).apply(len)

print(f"Loaded {len(df)} predictions.")
df.head()

## Insight 1: Performance Overview (Confusion Matrix)
We visualize the Confusion Matrix to see the balance of True Positives, True Negatives, False Positives, and False Negatives.
We calculate metrics like Precision and Recall.

In [None]:
cm = confusion_matrix(df['actual_label'], df['predicted_label'])
labels = ['Negative', 'Positive']

# Create annotated heatmap
fig = px.imshow(cm, text_auto=True, 
                labels=dict(x="Predicted Label", y="Actual Label", color="Count"),
                x=labels, y=labels,
                color_continuous_scale=[[0, asiimov_colors['white']], [1, asiimov_colors['orange']]]
               )
fig.update_layout(title_text="Confusion Matrix Heatmap")
fig.show()

# Print text report
print(classification_report(df['actual_label'], df['predicted_label'], target_names=labels))

## Insight 2: Prediction Confidence Analysis
How confident is the model? We plot the distribution of predicted probabilities.
Ideally, we want the model to be confident (near 0 or 1). Predictions near 0.5 indicate uncertainty.
We separate the distributions by the **Actual Label** to visualize separation.

In [None]:
# Histogram of probabilities for Positive vs Negative Ground Truth
fig = px.histogram(df, x="predicted_prob", color="voted_up", 
                   nbins=50, 
                   marginal="box", # Box plot on top
                   opacity=0.7,
                   color_discrete_map={True: asiimov_colors['orange'], False: asiimov_colors['black']},
                   labels={'voted_up': 'Actual Sentiment'},
                   title="Prediction Probability Distribution by Actual Sentiment")
fig.update_layout(barmode='overlay')
fig.update_traces(marker_line_width=0)
fig.show()

## Insight 3: Accuracy vs. Review Length
Does the model perform better on longer or shorter reviews?
We bin reviews by length and calculate the accuracy for each bin.

In [None]:
# Create bins for review length
bins = [0, 50, 100, 200, 500, 1000, 5000]
labels_len = ['0-50', '50-100', '100-200', '200-500', '500-1000', '1000+']
df['length_bin'] = pd.cut(df['review_length'], bins=bins, labels=labels_len)

# Calculate accuracy per bin
acc_by_len = df.groupby('length_bin')['is_correct'].mean().reset_index()

fig = px.line(acc_by_len, x='length_bin', y='is_correct', markers=True,
              title="Model Accuracy by Review Length",
              labels={'length_bin': 'Review Length (Characters)', 'is_correct': 'Accuracy'},
              color_discrete_sequence=[asiimov_colors['orange']])
fig.update_yaxes(tickformat=".1%")
fig.show()

## Insight 4: Error Analysis (Confident Mistakes)
We examine cases where the model was **very confident** (prob > 0.90 or < 0.10) but **incorrect**.
These "confident errors" often reveal specific weaknesses (e.g., sarcasm, misleading keywords).

In [None]:
# Filter for confident errors
confident_fp = df[(df['result_type'] == 'FP') & (df['predicted_prob'] > 0.90)]
confident_fn = df[(df['result_type'] == 'FN') & (df['predicted_prob'] < 0.10)]

print(f"Confident False Positives (Predicted Positive, Actual Negative): {len(confident_fp)}")
print(f"Confident False Negatives (Predicted Negative, Actual Positive): {len(confident_fn)}")

print("\n--- Examples of Confident False Positives (Model thought it was GOOD, but it was BAD) ---")
for i, row in confident_fp.head(3).iterrows():
    print(f"[Prob: {row['predicted_prob']:.4f}] Review: {row['cleaned_review'][:200]}...")

print("\n--- Examples of Confident False Negatives (Model thought it was BAD, but it was GOOD) ---")
for i, row in confident_fn.head(3).iterrows():
    print(f"[Prob: {row['predicted_prob']:.4f}] Review: {row['cleaned_review'][:200]}...")

## Insight 5: Calibration Curve
A reliability diagram to check if the predicted probabilities are well-calibrated.
If the model predicts 0.7 for a set of samples, ~70% of them should actually be positive.
- **Perfectly Calibrated:** Diagonal dotted line.
- **S-shape:** Model is under-confident or over-confident.

In [None]:
prob_true, prob_pred = calibration_curve(df['actual_label'], df['predicted_prob'], n_bins=10)

calibration_df = pd.DataFrame({'Mean Predicted Probability': prob_pred, 'Fraction of Positives': prob_true})

fig = px.line(calibration_df, x='Mean Predicted Probability', y='Fraction of Positives',
              markers=True, title="Calibration Curve (Reliability Diagram)",
              color_discrete_sequence=[asiimov_colors['orange']])

# Add diagonal reference line
fig.add_shape(type="line", x0=0, y0=0, x1=1, y1=1,
              line=dict(color=asiimov_colors['grey'], width=2, dash="dash"))

fig.update_layout(xaxis_range=[0, 1], yaxis_range=[0, 1])
fig.show()