# Sentiment Prediction Analysis

This notebook analyzes the results of sentiment prediction on Counter-Strike 2 reviews.
We focus on deriving insights from the model's performance and behavior.

**Theme:** Asiimov (Orange, Black, White)

In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import numpy as np

# Define Asiimov Color Palette
# Inspired by the CS:GO Asiimov skin: Distinctive Orange, Black, and White.
asiimov_colors = {
    'orange': '#ff9d00',
    'black': '#1a1a1a',
    'white': '#ffffff',
    'grey': '#5c5c5c',
    'light_grey': '#d1d1d1'
}

# Set default template or color sequence
pio.templates["asiimov"] = go.layout.Template(
    layout=go.Layout(
        colorway=[asiimov_colors['orange'], asiimov_colors['black'], asiimov_colors['grey']],
        plot_bgcolor=asiimov_colors['white'],
        paper_bgcolor=asiimov_colors['white'],
        font={'color': asiimov_colors['black']},
        title={'font': {'color': asiimov_colors['black']}},
    )
)
pio.templates.default = "asiimov"

print("Libraries loaded and Asiimov theme defined.")

In [None]:
# Load the prediction results
df = pd.read_csv('cs2_10k_predictions.csv')

# Display first few rows to verify
df.head()

In [None]:
# Preprocessing for Analysis

# Create a column for correctness
# voted_up is True/False, predicted_label is 1/0.
df['actual_label'] = df['voted_up'].astype(int)
df['is_correct'] = df['actual_label'] == df['predicted_label']

# Calculate review length
df['review_length'] = df['clean_review'].astype(str).apply(len)

# Map numeric labels to string for better plotting
df['prediction_status'] = df['is_correct'].map({True: 'Correct', False: 'Incorrect'})
df['sentiment_label'] = df['actual_label'].map({1: 'Positive', 0: 'Negative'})

print("Preprocessing complete.")

## Insight 1: Model Prediction Distribution
We investigate how the model performs across positive and negative classes. Does it have a bias towards one sentiment?

In [None]:
# Confusion Matrix-style breakdown
confusion_data = df.groupby(['sentiment_label', 'predicted_label']).size().reset_index(name='count')
confusion_data['predicted_label_str'] = confusion_data['predicted_label'].map({1: 'Predicted Positive', 0: 'Predicted Negative'})

# Plotting with Asiimov colors
fig = px.bar(
    confusion_data,
    x='sentiment_label',
    y='count',
    color='predicted_label_str',
    title='Model Prediction Distribution by Actual Sentiment',
    color_discrete_map={
        'Predicted Positive': asiimov_colors['orange'],
        'Predicted Negative': asiimov_colors['black']
    },
    barmode='group'
)

fig.update_layout(
    xaxis_title="Actual Sentiment",
    yaxis_title="Count",
    legend_title="Prediction"
)

fig.show()

**Commentary:**
This chart visualizes the confusion matrix. 
- If the orange bar is high for 'Positive' and the black bar is high for 'Negative', the model is doing well.
- Significant bars of the 'wrong' color indicate the type of error (False Positive vs False Negative) that is more prevalent.

## Insight 2: Confidence Distribution
Is the model confident when it's wrong? We analyze the distribution of predicted probabilities.

In [None]:
# Histogram of probabilities
fig = px.histogram(
    df,
    x='predicted_prob',
    color='prediction_status',
    nbins=50,
    title='Distribution of Prediction Probabilities (Confidence)',
    color_discrete_map={
        'Correct': asiimov_colors['orange'],
        'Incorrect': asiimov_colors['black']
    },
    opacity=0.7,
    barmode='overlay'
)

fig.update_layout(
    xaxis_title="Predicted Probability (0=Negative, 1=Positive)",
    yaxis_title="Count"
)

fig.show()

**Commentary:**
- **Correct Predictions (Orange):** Should ideally cluster near 0 and 1 (high confidence).
- **Incorrect Predictions (Black):** 
    - If they cluster around 0.5, the model was uncertain.
    - If they cluster near 0 or 1, the model was "confidently wrong".

## Insight 3: Playtime vs. Prediction Accuracy
Do players with more experience write reviews that are easier or harder to classify? Veterans might use more slang or sarcasm.

In [None]:
# Convert playtime (minutes) to hours
df['playtime_hours'] = df['author_playtime_at_review'] / 60

# Create bins for playtime
bins = [0, 10, 100, 500, 1000, 5000, 100000]
labels = ['0-10h', '10-100h', '100-500h', '500-1k h', '1k-5k h', '5k+ h']
df['playtime_category'] = pd.cut(df['playtime_hours'], bins=bins, labels=labels)

# Calculate accuracy per bin
accuracy_by_playtime = df.groupby('playtime_category', observed=True)['is_correct'].mean().reset_index()

fig = px.bar(
    accuracy_by_playtime,
    x='playtime_category',
    y='is_correct',
    title='Model Accuracy by Player Experience (Playtime)',
    color_discrete_sequence=[asiimov_colors['orange']]
)

fig.update_layout(
    xaxis_title="Playtime at Review",
    yaxis_title="Accuracy",
    yaxis_tickformat='.1%'
)

fig.show()

## Insight 4: Review Length vs. Model Confidence
Does the model feel more confident when there is more text to analyze?

In [None]:
# Calculate 'confidence' as absolute distance from 0.5 (neutral).
df['model_confidence'] = (df['predicted_prob'] - 0.5).abs() * 2  # Scale 0 to 1

# Scatter plot of length vs confidence
fig = px.scatter(
    df,
    x='review_length',
    y='model_confidence',
    color='prediction_status',
    title='Review Length vs. Model Confidence',
    color_discrete_map={
        'Correct': asiimov_colors['orange'],
        'Incorrect': asiimov_colors['black']
    },
    opacity=0.6,
    log_x=True # Log scale for length
)

fig.update_layout(
    xaxis_title="Review Length (characters) - Log Scale",
    yaxis_title="Model Confidence (0=Unsure, 1=Sure)"
)

fig.show()

## Insight 5: The Impact of "Funny" Reviews
Are reviews voted as "Funny" harder to predict? These reviews often contain sarcasm, ASCII art, or jokes.

In [None]:
# Binning votes_funny
df['is_funny'] = df['votes_funny'] > 0
accuracy_funny = df.groupby('is_funny')['is_correct'].mean().reset_index()
accuracy_funny['is_funny_str'] = accuracy_funny['is_funny'].map({True: 'Rated Funny', False: 'Not Funny'})

fig = px.bar(
    accuracy_funny,
    x='is_funny_str',
    y='is_correct',
    title='Model Accuracy: Funny vs Normal Reviews',
    color='is_funny_str',
    color_discrete_map={
        'Rated Funny': asiimov_colors['orange'],
        'Not Funny': asiimov_colors['black']
    }
)

fig.update_layout(
    xaxis_title="Review Type",
    yaxis_title="Accuracy",
    yaxis_tickformat='.1%',
    showlegend=False
)

fig.show()