# Batch Inference: Sentiment Predictions for Power BI Dashboard

**Purpose**: Apply trained XLM-RoBERTa model to the ENTIRE dataset (all aspect-segment pairs) to generate sentiment predictions for Power BI visualization and Kano Model analysis.

**Input**: 
- Trained Model: `Modelling/models/xlm_roberta_absa_best.pt`
- Full Dataset: `Dataset/aspect_categorization_refined.pkl` (with XLM-RoBERTa aspect categorization)

**Output**:
- `Dataset/segment_level_predictions.csv` - Segment-level predictions with confidence scores
- `Dataset/restaurant_aspect_aggregates.csv` - Aggregated by restaurant + aspect (for Power BI)
- `Dataset/kano_model_input.csv` - Formatted for Kano Model categorization

---

## Academic Justification

This inference pipeline transforms weak supervision training outputs into actionable business intelligence:

1. **Full Dataset Coverage**: Unlike train/test splits, we predict on ALL segments to maximize coverage for stakeholder insights
2. **Confidence Scoring**: Softmax probabilities allow filtering uncertain predictions (threshold: p > 0.6)
3. **Aspect-Level Aggregation**: Following Pontiki et al. (2016), we aggregate segment sentiments to aspect-level for restaurant profiling
4. **Kano Model Integration**: Sentiment distributions per aspect feed into Kano categorization (Must-Have vs Attractive)

# STAGE 0: Environment Setup

In [None]:
# Connect to google drive
from google.colab import drive
import os

# 1. Mount Google Drive (To save the model checkpoints)
drive.mount('/content/drive')

# 2. Install Libraries 
!pip install transformers accelerate tokenizers -q

In [1]:
# ==============================================================================
# Import Required Libraries
# ==============================================================================

import os
import sys
import json
import warnings
warnings.filterwarnings('ignore')

# Data processing
import pandas as pd
import numpy as np
from pathlib import Path

# PyTorch & Transformers
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import DataCollatorWithPadding

# Progress tracking
from tqdm.auto import tqdm

# Visualization (for quick sanity checks)
import matplotlib.pyplot as plt
import seaborn as sns

print("=" * 70)
print("ENVIRONMENT CHECK")
print("=" * 70)
print(f"  Python:      {sys.version.split()[0]}")
print(f"  PyTorch:     {torch.__version__}")
print(f"  CUDA Avail:  {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"  GPU Device:  {torch.cuda.get_device_name(0)}")
print("=" * 70)

# Set device
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"\n‚úì Using device: {DEVICE}")

ENVIRONMENT CHECK
  Python:      3.10.11
  PyTorch:     2.8.0+cpu
  CUDA Avail:  False

‚úì Using device: cpu


# STAGE 1: Configuration & Paths

In [None]:
# ==============================================================================
# Configuration for Batch Inference
# ==============================================================================

from dataclasses import dataclass

@dataclass
class InferenceConfig:
    """Configuration for applying trained model to full dataset.
    
    Why separate from training config:
        Inference has different requirements - no train/val split, larger
        batch sizes (no backprop = more GPU memory), and different output paths.
    """
    
    # --- Model & Tokenizer ------------------------------------------------
    model_name: str = "xlm-roberta-base"
    #model_path: str = r"C:\Users\Ong Hui Ling\Dropbox\PC\Documents\Github\Aspect-Based-Sentiment-Analysis\Modelling\models\xlm_roberta_absa_best.pt"
    
    model_path: str = r"\content\drive\MyDrive\Aspect-Based-Sentiment-Analysis\Modelling\models\xlm_roberta_absa_best_after_filtering.pt"
    num_labels: int = 2  # 0=negative, 1=positive
    
    # --- Input Data -------------------------------------------------------
    # Use the FULL dataset with XLM-RoBERTa aspect categorization applied
    # (Same as training data to ensure consistency)
    #data_path: str = r"C:\Users\Ong Hui Ling\Dropbox\PC\Documents\Github\Aspect-Based-Sentiment-Analysis\Dataset\aspect_categorization_refined.pkl"
    data_path: str = r"\content\drive\MyDrive\Aspect-Based-Sentiment-Analysis\Dataset\aspect_categorization_refined.pkl"
    
    # --- Inference Parameters ---------------------------------------------
    batch_size: int = 64  # Larger than training (no gradients = more memory)
    max_seq_length: int = 128
    
    # Confidence threshold: flag predictions with p < threshold for review
    confidence_threshold: float = 0.7
    # Academic Rationale (Hendrycks & Gimpel, 2017 - "A Baseline for Detecting 
    # Misclassified and Out-of-Distribution Examples in Neural Networks"):
    #   - Standard practice: 0.5 (decision boundary) to 0.9 (high precision)
    #   - Weak supervision context: Higher threshold (0.7-0.8) recommended
    #   - Trade-off: Lower threshold ‚Üí more coverage, higher noise
    #                Higher threshold ‚Üí less coverage, higher precision
    # 
    # Empirical Guideline (see confidence analysis below):
    #   0.5-0.6: Accept all predictions (high recall, lower precision)
    #   0.7-0.8: Balanced - flag ~20-30% for review (recommended for BI)
    #   0.9+:    High precision - flag ~50%+ for review (too conservative)
    
    # --- Output Files (for Power BI consumption) --------------------------
    #output_dir: str = r"C:\Users\Ong Hui Ling\Dropbox\PC\Documents\Github\Aspect-Based-Sentiment-Analysis\Dataset"
    output_dir: str = r"\content\drive\MyDrive\Aspect-Based-Sentiment-Analysis\Dataset"
    
    # Segment-level predictions (one row per aspect-segment pair)
    segment_predictions_path: str = os.path.join(output_dir, "segment_level_predictions.csv")
    
    # Restaurant-Aspect aggregates (grouped by restaurant + aspect)
    restaurant_aggregates_path: str = os.path.join(output_dir, "restaurant_aspect_aggregates.csv")
    
    # Kano Model input (sentiment distribution per aspect category)
    kano_input_path: str = os.path.join(output_dir, "kano_model_input.csv")
    
    # Summary statistics (for quick validation)
    summary_path: str = os.path.join(output_dir, "prediction_summary.json")


CFG = InferenceConfig()

# Validate paths
print("\n" + "=" * 70)
print("CONFIGURATION")
print("=" * 70)
print(f"  Model Path:      {CFG.model_path}")
print(f"    Exists:        {os.path.exists(CFG.model_path)}")
print(f"\n  Data Path:       {CFG.data_path}")
print(f"    Exists:        {os.path.exists(CFG.data_path)}")
print(f"\n  Output Directory: {CFG.output_dir}")
print(f"    Exists:        {os.path.exists(CFG.output_dir)}")
print(f"\n  Batch Size:      {CFG.batch_size}")
print(f"  Confidence Threshold: {CFG.confidence_threshold}")
print("=" * 70)

# Label encoding (must match training)
LABEL2ID = {"negative": 0, "positive": 1}
ID2LABEL = {0: "negative", 1: "positive"}


CONFIGURATION
  Model Path:      C:\Users\Ong Hui Ling\Dropbox\PC\Documents\Github\Aspect-Based-Sentiment-Analysis\Modelling\models\xlm_roberta_absa_best.pt
    Exists:        True

  Data Path:       C:\Users\Ong Hui Ling\Dropbox\PC\Documents\Github\Aspect-Based-Sentiment-Analysis\Dataset\aspect_categorization_refined.pkl
    Exists:        True

  Output Directory: C:\Users\Ong Hui Ling\Dropbox\PC\Documents\Github\Aspect-Based-Sentiment-Analysis\Dataset
    Exists:        True

  Batch Size:      64
  Confidence Threshold: 0.7


## üìä Confidence Threshold Selection

**Academic Context:**
- **Hendrycks & Gimpel (2017)** - "A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks" established that softmax confidence correlates with prediction correctness
- **Guo et al. (2017)** - "On Calibration of Modern Neural Networks" showed that while deep networks are often overconfident, thresholds of 0.7-0.8 provide good precision-recall balance
- **Ratner et al. (2016)** - "Data Programming" (weak supervision framework) recommends higher thresholds (‚â•0.7) when labels are noisy

**Practical Guidelines:**
- **Business Intelligence Context**: You want high-confidence predictions for strategic decisions
- **Weak Supervision**: Your training labels (star ratings) are noisy ‚Üí conservative threshold needed
- **Power BI Use Case**: Flagging low-confidence predictions allows stakeholders to focus on reliable insights

**Recommended Range**: 0.7 - 0.8 (we'll validate empirically after inference)

# STAGE 2: Load Trained Model

In [4]:
# ==============================================================================
# Load Pre-Trained Model & Tokenizer
# ==============================================================================

print("\n" + "=" * 70)
print("LOADING MODEL & TOKENIZER")
print("=" * 70)

# Load tokenizer (same as training)
tokenizer = AutoTokenizer.from_pretrained(CFG.model_name)
print(f"  ‚úì Tokenizer loaded: {CFG.model_name}")

# Load model architecture (must match training setup)
model = AutoModelForSequenceClassification.from_pretrained(
    CFG.model_name,
    num_labels=CFG.num_labels,
)

# Load trained weights from checkpoint
# map_location ensures compatibility if trained on GPU but inferring on CPU
checkpoint = torch.load(CFG.model_path, map_location=DEVICE)

# Handle custom wrapper: Training used ABSASentimentClassifier with 'backbone' prefix
# Extract only the backbone weights (remove 'backbone.' prefix)
if any(key.startswith('backbone.') for key in checkpoint.keys()):
    print(f"  ‚ö†Ô∏è  Detected custom training wrapper. Extracting backbone weights...")
    state_dict = {
        key.replace('backbone.', ''): value 
        for key, value in checkpoint.items() 
        if key.startswith('backbone.')
    }
    model.load_state_dict(state_dict)
    print(f"  ‚úì Successfully loaded {len(state_dict)} backbone parameters")
else:
    # Direct loading (if checkpoint structure matches)
    model.load_state_dict(checkpoint)

model = model.to(DEVICE)
model.eval()  # Set to evaluation mode (disables dropout)

print(f"  ‚úì Model loaded from: {CFG.model_path}")
print(f"  ‚úì Model moved to: {DEVICE}")
print(f"  ‚úì Evaluation mode: ON (dropout disabled)")

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"  ‚úì Total parameters: {total_params:,}")
print("=" * 70)


LOADING MODEL & TOKENIZER
  ‚úì Tokenizer loaded: xlm-roberta-base


Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


  ‚ö†Ô∏è  Detected custom training wrapper. Extracting backbone weights...
  ‚úì Successfully loaded 201 backbone parameters
  ‚úì Model loaded from: C:\Users\Ong Hui Ling\Dropbox\PC\Documents\Github\Aspect-Based-Sentiment-Analysis\Modelling\models\xlm_roberta_absa_best.pt
  ‚úì Model moved to: cpu
  ‚úì Evaluation mode: ON (dropout disabled)
  ‚úì Total parameters: 278,045,186


# STAGE 3: Load Full Dataset (No Train/Val/Test Split)

In [5]:
# ==============================================================================
# Load FULL Dataset for Inference
# ==============================================================================

print("\n" + "=" * 70)
print("LOADING FULL DATASET")
print("=" * 70)

df_full = pd.read_pickle(CFG.data_path)
print(f"  ‚úì Dataset loaded: {len(df_full):,} rows")

# Data quality checks
print(f"\n  Data Quality Checks:")
print(f"    Missing segments:   {df_full['Segment'].isna().sum()}")
print(f"    Missing aspects:    {df_full['Aspect_Labels'].isna().sum()}")
print(f"    Empty segments:     {(df_full['Segment'].str.strip() == '').sum()}")

# Show column overview
print(f"\n  Available Columns:")
for col in df_full.columns:
    print(f"    - {col}")

# Show aspect distribution
print(f"\n  Aspect Label Distribution:")
# Count single vs multi-aspect segments
df_full['num_aspects'] = df_full['Aspect_Labels'].apply(lambda x: len(x) if isinstance(x, list) else 0)
single_aspect = (df_full['num_aspects'] == 1).sum()
multi_aspect = (df_full['num_aspects'] > 1).sum()
print(f"    Single-aspect segments:  {single_aspect:,} ({single_aspect/len(df_full)*100:.1f}%)")
print(f"    Multi-aspect segments:   {multi_aspect:,} ({multi_aspect/len(df_full)*100:.1f}%)")

# We'll process ALL segments (including multi-aspect)
print(f"\n  ‚úì Processing ALL segments (single + multi-aspect)")
print(f"    Total segments to predict: {len(df_full):,}")
print("=" * 70)


LOADING FULL DATASET
  ‚úì Dataset loaded: 129,034 rows

  Data Quality Checks:
    Missing segments:   0
    Missing aspects:    0
    Empty segments:     0

  Available Columns:
    - Original_Review_ID
    - Full_Review
    - Segment
    - Sentiment_Label
    - Aspect_Labels
    - Aspect_Labels_dict

  Aspect Label Distribution:
    Single-aspect segments:  99,900 (77.4%)
    Multi-aspect segments:   29,134 (22.6%)

  ‚úì Processing ALL segments (single + multi-aspect)
    Total segments to predict: 129,034


# STAGE 4: Prepare Data for Inference

In [6]:
# ==============================================================================
# Explode Multi-Aspect Segments for Aspect-Conditional Prediction
# ==============================================================================

print("\n" + "=" * 70)
print("PREPARING DATA FOR ASPECT-CONDITIONAL INFERENCE")
print("=" * 70)

# Each segment can have multiple aspects. We need to predict sentiment for
# EACH (segment, aspect) pair separately because the same segment can have
# different sentiments for different aspects.
#
# Example:
#   Segment: "The food was amazing but service was slow"
#   Aspects: [FOOD, SERVICE]
#   ‚Üí We need 2 predictions:
#       (segment, FOOD)    ‚Üí likely POSITIVE
#       (segment, SERVICE) ‚Üí likely NEGATIVE

# Explode: Create one row per (segment, aspect) pair
df_exploded = df_full.explode('Aspect_Labels').reset_index(drop=True)
df_exploded.rename(columns={'Aspect_Labels': 'aspect'}, inplace=True)

print(f"  Original rows:           {len(df_full):,}")
print(f"  After exploding:         {len(df_exploded):,} (segment, aspect) pairs")
print(f"  Increase factor:         {len(df_exploded)/len(df_full):.2f}x")

# Show aspect distribution after exploding
print(f"\n  Aspect Distribution (after exploding):")
aspect_counts = df_exploded['aspect'].value_counts()
for aspect, count in aspect_counts.items():
    print(f"    {aspect:<30}: {count:>6,} pairs ({count/len(df_exploded)*100:>5.1f}%)")

print("=" * 70)


PREPARING DATA FOR ASPECT-CONDITIONAL INFERENCE
  Original rows:           129,034
  After exploding:         163,667 (segment, aspect) pairs
  Increase factor:         1.27x

  Aspect Distribution (after exploding):
    FOOD                          : 72,064 pairs ( 44.0%)
    SERVICE                       : 24,352 pairs ( 14.9%)
    AMBIENCE                      : 20,177 pairs ( 12.3%)
    LOYALTY (RETURN INTENT)       : 18,176 pairs ( 11.1%)
    VALUE                         : 14,212 pairs (  8.7%)
    LOCATION                      :  6,408 pairs (  3.9%)
    AUTHENTICITY & LOCAL VIBE     :  5,061 pairs (  3.1%)
    NON-HALAL ELEMENTS            :  2,422 pairs (  1.5%)
    HALAL COMPLIANCE              :    792 pairs (  0.5%)
    GENERAL                       :      3 pairs (  0.0%)


# STAGE 5: Create PyTorch Dataset & DataLoader

In [7]:
# ==============================================================================
# PyTorch Dataset for Inference (No Labels Needed)
# ==============================================================================

class InferenceDataset(Dataset):
    """Aspect-conditioned dataset for inference (no labels).
    
    Same input format as training: "[ASPECT] </s></s> [segment text]"
    But we don't need labels since we're only predicting, not training.
    """
    
    def __init__(self, df: pd.DataFrame, tokenizer, max_length: int = 128):
        self.texts = df['Segment'].tolist()
        self.aspects = df['aspect'].tolist()
        self.tokenizer = tokenizer
        self.max_length = max_length
        
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        aspect = self.aspects[idx]
        segment = self.texts[idx]
        
        # Aspect-conditioned input (same as training)
        conditioned_text = f"{aspect.upper()} </s></s> {segment}"
        
        # Tokenize
        encoding = self.tokenizer(
            conditioned_text,
            max_length=self.max_length,
            truncation=True,
            padding=False,  # Dynamic padding in collator
            return_tensors=None,
        )
        
        return {
            'input_ids': encoding['input_ids'],
            'attention_mask': encoding['attention_mask'],
        }


# Create dataset and dataloader
print("\n" + "=" * 70)
print("BUILDING DATALOADER")
print("=" * 70)

inference_dataset = InferenceDataset(df_exploded, tokenizer, CFG.max_seq_length)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

inference_loader = DataLoader(
    inference_dataset,
    batch_size=CFG.batch_size,
    shuffle=False,  # IMPORTANT: Keep order to match predictions back to df_exploded
    collate_fn=data_collator,
    num_workers=0,  # Set to 0 for Windows compatibility
    pin_memory=True if torch.cuda.is_available() else False,
)

n_batches = len(inference_loader)
print(f"  ‚úì Dataset size:      {len(inference_dataset):,} (segment, aspect) pairs")
print(f"  ‚úì Batch size:        {CFG.batch_size}")
print(f"  ‚úì Number of batches: {n_batches:,}")
print(f"  ‚úì Shuffle:           OFF (preserves row order)")

# Quick sanity check: decode first sample
sample_batch = next(iter(inference_loader))
sample_text = tokenizer.decode(sample_batch['input_ids'][0], skip_special_tokens=False)
print(f"\n  Sample Input (decoded):")
print(f"    \"{sample_text}\"")
print("=" * 70)


BUILDING DATALOADER
  ‚úì Dataset size:      163,667 (segment, aspect) pairs
  ‚úì Batch size:        64
  ‚úì Number of batches: 2,558
  ‚úì Shuffle:           OFF (preserves row order)

  Sample Input (decoded):
    "<s> AMBIENCE</s></s> a must-visit for true malaysian comfort food</s><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>"


# STAGE 6: Batch Inference (Generate Predictions)

In [8]:
# ==============================================================================
# Run Inference on ALL (Segment, Aspect) Pairs
# ==============================================================================

print("\n" + "=" * 70)
print("RUNNING BATCH INFERENCE")
print("=" * 70)

all_predictions = []
all_probabilities = []

model.eval()

with torch.no_grad():
    for batch_idx, batch in enumerate(tqdm(inference_loader, desc="  Predicting")):
        # Move batch to device
        input_ids = batch['input_ids'].to(DEVICE)
        attention_mask = batch['attention_mask'].to(DEVICE)
        
        # Forward pass
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        logits = outputs.logits  # (batch_size, num_classes)
        
        # Convert logits to probabilities using softmax
        probs = torch.softmax(logits, dim=-1)  # (batch_size, num_classes)
        
        # Get predicted class (argmax)
        preds = torch.argmax(probs, dim=-1)  # (batch_size,)
        
        # Store results
        all_predictions.extend(preds.cpu().numpy().tolist())
        all_probabilities.extend(probs.cpu().numpy().tolist())

print(f"\n  ‚úì Inference complete!")
print(f"    Total predictions:  {len(all_predictions):,}")
print(f"    Shape matches data: {len(all_predictions) == len(df_exploded)}")
print("=" * 70)


RUNNING BATCH INFERENCE


  Predicting:   0%|          | 0/2558 [00:00<?, ?it/s]

KeyboardInterrupt: 

# STAGE 7: Add Predictions to DataFrame

In [None]:
# ==============================================================================
# Add Predictions & Confidence Scores to DataFrame
# ==============================================================================

print("\n" + "=" * 70)
print("ADDING PREDICTIONS TO DATAFRAME")
print("=" * 70)

# Add raw predictions (0 or 1)
df_exploded['predicted_sentiment_id'] = all_predictions

# Add sentiment labels (negative/positive)
df_exploded['predicted_sentiment'] = df_exploded['predicted_sentiment_id'].map(ID2LABEL)

# Add probabilities for both classes
# all_probabilities is a list of [prob_negative, prob_positive] for each sample
probs_array = np.array(all_probabilities)
df_exploded['prob_negative'] = probs_array[:, 0]
df_exploded['prob_positive'] = probs_array[:, 1]

# Add confidence score (probability of predicted class)
# If predicted negative (0), confidence = prob_negative
# If predicted positive (1), confidence = prob_positive
df_exploded['confidence'] = [
    probs_array[i, pred] for i, pred in enumerate(all_predictions)
]

# Flag low-confidence predictions for review
df_exploded['is_high_confidence'] = df_exploded['confidence'] >= CFG.confidence_threshold

# Show prediction statistics
print(f"\n  Prediction Distribution:")
pred_counts = df_exploded['predicted_sentiment'].value_counts()
for sentiment, count in pred_counts.items():
    pct = count / len(df_exploded) * 100
    print(f"    {sentiment.capitalize():<10}: {count:>7,} ({pct:>5.1f}%)")

print(f"\n  Confidence Statistics:")
print(f"    Mean confidence:       {df_exploded['confidence'].mean():.3f}")
print(f"    Median confidence:     {df_exploded['confidence'].median():.3f}")
print(f"    High confidence (>{CFG.confidence_threshold}): {df_exploded['is_high_confidence'].sum():,} ({df_exploded['is_high_confidence'].mean()*100:.1f}%)")
print(f"    Low confidence (<={CFG.confidence_threshold}): {(~df_exploded['is_high_confidence']).sum():,} ({(~df_exploded['is_high_confidence']).mean()*100:.1f}%)")

print(f"\n  Per-Aspect Prediction Distribution:")
aspect_sentiment = df_exploded.groupby(['aspect', 'predicted_sentiment']).size().unstack(fill_value=0)
print(aspect_sentiment)

print("=" * 70)

## STAGE 7b: Empirical Confidence Threshold Analysis

In [None]:
# ==============================================================================
# Empirical Analysis: Confidence Threshold Trade-offs
# ==============================================================================

print("\n" + "=" * 70)
print("CONFIDENCE THRESHOLD ANALYSIS")
print("=" * 70)
print("\nüìö Academic References:")
print("  [1] Hendrycks & Gimpel (2017) - 'A Baseline for Detecting Misclassified")
print("      and Out-of-Distribution Examples in Neural Networks'")
print("      ‚Üí Established softmax confidence as predictor of correctness")
print("")
print("  [2] Guo et al. (2017) - 'On Calibration of Modern Neural Networks'")
print("      ‚Üí Showed threshold 0.7-0.8 balances precision and recall")
print("")
print("  [3] Ratner et al. (2016) - 'Data Programming: Creating Large Training")
print("      Sets, Quickly' ‚Üí Weak supervision requires conservative thresholds")
print("=" * 70)

# Define threshold candidates
thresholds = [0.5, 0.6, 0.7, 0.8, 0.9]

print(f"\n{'Threshold':<12} {'High Conf %':<15} {'Flagged %':<15} {'Mean Conf':<15} {'Interpretation'}")
print("-" * 90)

threshold_analysis = []

for thresh in thresholds:
    high_conf_mask = df_exploded['confidence'] >= thresh
    pct_high = (high_conf_mask.sum() / len(df_exploded)) * 100
    pct_flagged = 100 - pct_high
    mean_conf = df_exploded[high_conf_mask]['confidence'].mean() if high_conf_mask.sum() > 0 else 0
    
    # Interpretation based on literature
    if thresh <= 0.6:
        interpret = "Liberal (High Recall)"
    elif thresh <= 0.75:
        interpret = "Balanced (Recommended)"
    elif thresh <= 0.85:
        interpret = "Conservative"
    else:
        interpret = "Very Conservative"
    
    threshold_analysis.append({
        'threshold': thresh,
        'pct_high_conf': round(pct_high, 2),
        'pct_flagged': round(pct_flagged, 2),
        'mean_conf': round(mean_conf, 4),
        'interpretation': interpret
    })
    
    print(f"{thresh:<12.1f} {pct_high:<15.1f} {pct_flagged:<15.1f} {mean_conf:<15.4f} {interpret}")

print("=" * 70)

# Confidence distribution percentiles
print(f"\n  Confidence Score Percentiles:")
percentiles = [10, 25, 50, 75, 90, 95, 99]
for p in percentiles:
    val = np.percentile(df_exploded['confidence'], p)
    print(f"    {p:>2}th percentile: {val:.4f}")

print("\n" + "=" * 70)
print("RECOMMENDATION (Based on Literature & Weak Supervision Context):")
print("=" * 70)
print(f"  Current Threshold: {CFG.confidence_threshold}")
print(f"  High Confidence:   {(df_exploded['confidence'] >= CFG.confidence_threshold).sum():,} predictions ({(df_exploded['confidence'] >= CFG.confidence_threshold).mean()*100:.1f}%)")
print(f"  Flagged for Review: {(df_exploded['confidence'] < CFG.confidence_threshold).sum():,} predictions ({(df_exploded['confidence'] < CFG.confidence_threshold).mean()*100:.1f}%)")
print(f"\n  ‚úì For Power BI Dashboard: 0.7-0.8 provides good balance")
print(f"  ‚úì For High-Stakes Decisions: Use ‚â•0.8 and manually review flagged cases")
print(f"  ‚úì For Maximum Coverage: Use ‚â•0.6 but note increased noise risk")
print("=" * 70)

# Create confidence distribution visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left plot: Confidence histogram with threshold line
ax1 = axes[0]
ax1.hist(df_exploded['confidence'], bins=50, color='#3498db', edgecolor='black', alpha=0.7)
ax1.axvline(CFG.confidence_threshold, color='red', linestyle='--', linewidth=2, 
            label=f'Current Threshold ({CFG.confidence_threshold})')
ax1.axvline(0.8, color='orange', linestyle=':', linewidth=2, label='Conservative (0.8)')
ax1.set_title('Confidence Score Distribution', fontsize=12, fontweight='bold')
ax1.set_xlabel('Confidence Score')
ax1.set_ylabel('Frequency')
ax1.legend()
ax1.grid(alpha=0.3)

# Right plot: Coverage vs Threshold trade-off
ax2 = axes[1]
thresh_range = np.linspace(0.5, 0.95, 50)
coverage = [(df_exploded['confidence'] >= t).mean() * 100 for t in thresh_range]
ax2.plot(thresh_range, coverage, linewidth=2, color='#2ecc71')
ax2.axvline(CFG.confidence_threshold, color='red', linestyle='--', linewidth=2, 
            label=f'Current ({CFG.confidence_threshold})')
ax2.axhline(80, color='gray', linestyle=':', alpha=0.5, label='80% Coverage')
ax2.set_title('Coverage vs Confidence Threshold', fontsize=12, fontweight='bold')
ax2.set_xlabel('Confidence Threshold')
ax2.set_ylabel('Coverage (%)')
ax2.legend()
ax2.grid(alpha=0.3)
ax2.set_ylim([0, 105])

plt.tight_layout()
plt.savefig(os.path.join(CFG.output_dir, 'confidence_threshold_analysis.png'), 
            dpi=300, bbox_inches='tight')
plt.show()

print(f"\n‚úì Analysis saved to: {os.path.join(CFG.output_dir, 'confidence_threshold_analysis.png')}")

# STAGE 8: Export Segment-Level Predictions

In [None]:
# ==============================================================================
# Export Segment-Level Predictions (Full Detail)
# ==============================================================================

print("\n" + "=" * 70)
print("EXPORTING SEGMENT-LEVEL PREDICTIONS")
print("=" * 70)

# Select relevant columns for Power BI
segment_export_cols = [
    'Original_Review_ID',
    'Restaurant_Name',
    'Segment',
    'aspect',
    'predicted_sentiment',
    'predicted_sentiment_id',
    'confidence',
    'prob_negative',
    'prob_positive',
    'is_high_confidence',
]

# Add weak label (star rating) if available for comparison
if 'Sentiment_Label' in df_exploded.columns:
    segment_export_cols.append('Sentiment_Label')

df_segment_export = df_exploded[segment_export_cols].copy()

# Save to CSV
df_segment_export.to_csv(CFG.segment_predictions_path, index=False, encoding='utf-8-sig')

print(f"  ‚úì Segment-level predictions saved")
print(f"    Path:     {CFG.segment_predictions_path}")
print(f"    Rows:     {len(df_segment_export):,}")
print(f"    Columns:  {len(df_segment_export.columns)}")
print(f"    File size: {os.path.getsize(CFG.segment_predictions_path) / 1024 / 1024:.2f} MB")

# Show sample
print(f"\n  Sample rows:")
print(df_segment_export.head(3).to_string(index=False))
print("=" * 70)

# STAGE 9: Aggregate by Restaurant + Aspect

In [None]:
# ==============================================================================
# Aggregate Predictions by Restaurant + Aspect (for Power BI Dashboard)
# ==============================================================================

print("\n" + "=" * 70)
print("AGGREGATING BY RESTAURANT + ASPECT")
print("=" * 70)

# Group by restaurant and aspect
agg_df = df_exploded.groupby(['Restaurant_Name', 'aspect']).agg(
    total_segments=('Segment', 'count'),
    num_positive=('predicted_sentiment_id', lambda x: (x == 1).sum()),
    num_negative=('predicted_sentiment_id', lambda x: (x == 0).sum()),
    avg_confidence=('confidence', 'mean'),
    high_confidence_count=('is_high_confidence', 'sum'),
).reset_index()

# Calculate sentiment percentages
agg_df['pct_positive'] = (agg_df['num_positive'] / agg_df['total_segments'] * 100).round(2)
agg_df['pct_negative'] = (agg_df['num_negative'] / agg_df['total_segments'] * 100).round(2)
agg_df['avg_confidence'] = agg_df['avg_confidence'].round(4)

# Calculate sentiment score: range from -1 (all negative) to +1 (all positive)
# Formula: (num_positive - num_negative) / total_segments
agg_df['sentiment_score'] = (
    (agg_df['num_positive'] - agg_df['num_negative']) / agg_df['total_segments']
).round(4)

# Determine dominant sentiment for each (restaurant, aspect) pair
agg_df['dominant_sentiment'] = agg_df.apply(
    lambda row: 'positive' if row['num_positive'] > row['num_negative'] 
                else ('negative' if row['num_negative'] > row['num_positive'] else 'neutral'),
    axis=1
)

# Calculate high confidence ratio
agg_df['pct_high_confidence'] = (
    agg_df['high_confidence_count'] / agg_df['total_segments'] * 100
).round(2)

print(f"  ‚úì Aggregation complete")
print(f"    Unique restaurants: {agg_df['Restaurant_Name'].nunique():,}")
print(f"    Unique aspects:     {agg_df['aspect'].nunique()}")
print(f"    Total (restaurant, aspect) pairs: {len(agg_df):,}")

print(f"\n  Sample aggregates:")
print(agg_df.head(10).to_string(index=False))

# Save aggregated data
agg_df.to_csv(CFG.restaurant_aggregates_path, index=False, encoding='utf-8-sig')
print(f"\n  ‚úì Restaurant-aspect aggregates saved")
print(f"    Path: {CFG.restaurant_aggregates_path}")
print(f"    Size: {os.path.getsize(CFG.restaurant_aggregates_path) / 1024:.2f} KB")
print("=" * 70)

# STAGE 10: Prepare Kano Model Input

In [None]:
# ==============================================================================
# Prepare Kano Model Input (Aspect-Level Sentiment Distribution)
# ==============================================================================

print("\n" + "=" * 70)
print("PREPARING KANO MODEL INPUT")
print("=" * 70)

# Kano Model requires understanding sentiment distribution per aspect GLOBALLY
# (across all restaurants) to categorize aspects into:
#   - Must-Have: Negative sentiment has high impact on satisfaction
#   - Performance: Linear relationship (more positive = better)
#   - Attractive: Positive sentiment delights, absence doesn't hurt
#   - Indifferent: Sentiment doesn't affect satisfaction

# Aggregate by aspect only (across all restaurants)
kano_df = df_exploded.groupby('aspect').agg(
    total_mentions=('Segment', 'count'),
    num_positive=('predicted_sentiment_id', lambda x: (x == 1).sum()),
    num_negative=('predicted_sentiment_id', lambda x: (x == 0).sum()),
    avg_confidence=('confidence', 'mean'),
).reset_index()

# Calculate percentages
kano_df['pct_positive'] = (kano_df['num_positive'] / kano_df['total_mentions'] * 100).round(2)
kano_df['pct_negative'] = (kano_df['num_negative'] / kano_df['total_mentions'] * 100).round(2)
kano_df['avg_confidence'] = kano_df['avg_confidence'].round(4)

# Calculate sentiment polarity (how skewed the aspect is)
# Range: -1 (all negative) to +1 (all positive)
kano_df['sentiment_polarity'] = (
    (kano_df['num_positive'] - kano_df['num_negative']) / kano_df['total_mentions']
).round(4)

# Sort by total mentions (most discussed aspects)
kano_df = kano_df.sort_values('total_mentions', ascending=False)

print(f"  ‚úì Kano Model input prepared")
print(f"    Total aspects: {len(kano_df)}")

print(f"\n  Aspect Sentiment Distribution (for Kano categorization):")
print(kano_df.to_string(index=False))

# Save Kano input
kano_df.to_csv(CFG.kano_input_path, index=False, encoding='utf-8-sig')
print(f"\n  ‚úì Kano Model input saved")
print(f"    Path: {CFG.kano_input_path}")
print("=" * 70)

# Interpretation guide for Kano categorization
print(f"\n  KANO MODEL CATEGORIZATION GUIDE:")
print(f"  ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê")
print(f"  Use this data to categorize aspects in Power BI DAX:")
print(f"")
print(f"  1. MUST-HAVE (Basic Needs):")
print(f"     ‚Üí High negative % + High total mentions")
print(f"     ‚Üí Absence causes dissatisfaction, presence is expected")
print(f"     ‚Üí Example: FOOD, SERVICE, HALAL COMPLIANCE")
print(f"")
print(f"  2. PERFORMANCE (Proportional Satisfaction):")
print(f"     ‚Üí Balanced negative/positive %")
print(f"     ‚Üí More = Better, Less = Worse")
print(f"     ‚Üí Example: VALUE, AMBIENCE")
print(f"")
print(f"  3. ATTRACTIVE (Delighters):")
print(f"     ‚Üí High positive % + Lower total mentions")
print(f"     ‚Üí Presence delights, absence doesn't hurt")
print(f"     ‚Üí Example: AUTHENTICITY & LOCAL VIBE, LOYALTY")
print(f"")
print(f"  4. INDIFFERENT:")
print(f"     ‚Üí Low sentiment polarity + Low mentions")
print(f"     ‚Üí Doesn't affect satisfaction")
print(f"  ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê")

# STAGE 11: Generate Summary Statistics

In [None]:
# ==============================================================================
# Generate Summary Statistics (for quick validation & thesis reporting)
# ==============================================================================

print("\n" + "=" * 70)
print("GENERATING SUMMARY STATISTICS")
print("=" * 70)

summary_stats = {
    "data_overview": {
        "total_reviews": int(df_exploded['Original_Review_ID'].nunique()),
        "total_restaurants": int(df_exploded['Restaurant_Name'].nunique()),
        "total_segments": int(len(df_full)),
        "total_aspect_segment_pairs": int(len(df_exploded)),
        "unique_aspects": int(df_exploded['aspect'].nunique()),
        "aspects_list": sorted(df_exploded['aspect'].unique().tolist()),
    },
    
    "prediction_distribution": {
        "positive_predictions": int((df_exploded['predicted_sentiment'] == 'positive').sum()),
        "negative_predictions": int((df_exploded['predicted_sentiment'] == 'negative').sum()),
        "pct_positive": float(round((df_exploded['predicted_sentiment'] == 'positive').mean() * 100, 2)),
        "pct_negative": float(round((df_exploded['predicted_sentiment'] == 'negative').mean() * 100, 2)),
    },
    
    "confidence_metrics": {
        "mean_confidence": float(round(df_exploded['confidence'].mean(), 4)),
        "median_confidence": float(round(df_exploded['confidence'].median(), 4)),
        "high_confidence_count": int(df_exploded['is_high_confidence'].sum()),
        "low_confidence_count": int((~df_exploded['is_high_confidence']).sum()),
        "pct_high_confidence": float(round(df_exploded['is_high_confidence'].mean() * 100, 2)),
        "confidence_threshold": float(CFG.confidence_threshold),
    },
    
    "per_aspect_summary": {},
    
    "model_info": {
        "model_name": CFG.model_name,
        "model_path": CFG.model_path,
        "batch_size": CFG.batch_size,
        "max_seq_length": CFG.max_seq_length,
    },
    
    "output_files": {
        "segment_predictions": CFG.segment_predictions_path,
        "restaurant_aggregates": CFG.restaurant_aggregates_path,
        "kano_input": CFG.kano_input_path,
    }
}

# Add per-aspect breakdown
for aspect in sorted(df_exploded['aspect'].unique()):
    aspect_data = df_exploded[df_exploded['aspect'] == aspect]
    summary_stats["per_aspect_summary"][aspect] = {
        "total_mentions": int(len(aspect_data)),
        "num_positive": int((aspect_data['predicted_sentiment'] == 'positive').sum()),
        "num_negative": int((aspect_data['predicted_sentiment'] == 'negative').sum()),
        "pct_positive": float(round((aspect_data['predicted_sentiment'] == 'positive').mean() * 100, 2)),
        "pct_negative": float(round((aspect_data['predicted_sentiment'] == 'negative').mean() * 100, 2)),
        "avg_confidence": float(round(aspect_data['confidence'].mean(), 4)),
    }

# Save summary as JSON
with open(CFG.summary_path, 'w', encoding='utf-8') as f:
    json.dump(summary_stats, f, indent=2, ensure_ascii=False)

print(f"  ‚úì Summary statistics saved")
print(f"    Path: {CFG.summary_path}")

# Print key statistics
print(f"\n  KEY STATISTICS:")
print(f"  ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê")
print(f"  Total Reviews:              {summary_stats['data_overview']['total_reviews']:>8,}")
print(f"  Total Restaurants:          {summary_stats['data_overview']['total_restaurants']:>8,}")
print(f"  Total Segments:             {summary_stats['data_overview']['total_segments']:>8,}")
print(f"  Total Aspect-Segment Pairs: {summary_stats['data_overview']['total_aspect_segment_pairs']:>8,}")
print(f"  ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ")
print(f"  Positive Predictions:       {summary_stats['prediction_distribution']['positive_predictions']:>8,} ({summary_stats['prediction_distribution']['pct_positive']:>5.1f}%)")
print(f"  Negative Predictions:       {summary_stats['prediction_distribution']['negative_predictions']:>8,} ({summary_stats['prediction_distribution']['pct_negative']:>5.1f}%)")
print(f"  ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ")
print(f"  Mean Confidence:            {summary_stats['confidence_metrics']['mean_confidence']:>8.4f}")
print(f"  High Confidence (>{CFG.confidence_threshold}):   {summary_stats['confidence_metrics']['high_confidence_count']:>8,} ({summary_stats['confidence_metrics']['pct_high_confidence']:>5.1f}%)")
print(f"  ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê")
print("=" * 70)

# STAGE 12: Quick Visualizations (for validation)

In [None]:
# ==============================================================================
# Quick Visualizations for Sanity Checks
# ==============================================================================

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (14, 10)

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Sentiment Distribution by Aspect
aspect_sentiment = df_exploded.groupby(['aspect', 'predicted_sentiment']).size().unstack(fill_value=0)
aspect_sentiment_pct = aspect_sentiment.div(aspect_sentiment.sum(axis=1), axis=0) * 100

ax1 = axes[0, 0]
aspect_sentiment_pct.plot(kind='barh', stacked=True, color=['#e74c3c', '#2ecc71'], ax=ax1)
ax1.set_title('Sentiment Distribution by Aspect (%)', fontsize=12, fontweight='bold')
ax1.set_xlabel('Percentage (%)')
ax1.set_ylabel('Aspect')
ax1.legend(title='Sentiment', labels=['Negative', 'Positive'])

# 2. Confidence Score Distribution
ax2 = axes[0, 1]
ax2.hist(df_exploded['confidence'], bins=50, color='#3498db', edgecolor='black', alpha=0.7)
ax2.axvline(CFG.confidence_threshold, color='red', linestyle='--', linewidth=2, label=f'Threshold ({CFG.confidence_threshold})')
ax2.set_title('Confidence Score Distribution', fontsize=12, fontweight='bold')
ax2.set_xlabel('Confidence Score')
ax2.set_ylabel('Frequency')
ax2.legend()

# 3. Aspect Mention Frequency
ax3 = axes[1, 0]
aspect_counts = df_exploded['aspect'].value_counts()
aspect_counts.plot(kind='barh', color='#9b59b6', ax=ax3)
ax3.set_title('Aspect Mention Frequency', fontsize=12, fontweight='bold')
ax3.set_xlabel('Number of Mentions')
ax3.set_ylabel('Aspect')

# 4. Sentiment Polarity by Aspect (for Kano Model)
ax4 = axes[1, 1]
kano_df_sorted = kano_df.sort_values('sentiment_polarity')
colors = ['#e74c3c' if x < 0 else '#2ecc71' for x in kano_df_sorted['sentiment_polarity']]
ax4.barh(kano_df_sorted['aspect'], kano_df_sorted['sentiment_polarity'], color=colors)
ax4.axvline(0, color='black', linewidth=1)
ax4.set_title('Sentiment Polarity by Aspect (Kano Input)', fontsize=12, fontweight='bold')
ax4.set_xlabel('Sentiment Polarity (-1 to +1)')
ax4.set_ylabel('Aspect')
ax4.set_xlim(-1, 1)

plt.tight_layout()
plt.savefig(os.path.join(CFG.output_dir, 'sentiment_analysis_overview.png'), dpi=300, bbox_inches='tight')
plt.show()

print(f"\n‚úì Visualizations saved to: {os.path.join(CFG.output_dir, 'sentiment_analysis_overview.png')}")

# ‚úÖ INFERENCE COMPLETE - Next Steps for Power BI

## Generated Files (All saved to `Dataset/` folder):

1. **`segment_level_predictions.csv`** (Detailed)
   - One row per (segment, aspect) pair
   - Includes: segment text, aspect, predicted sentiment, probabilities, confidence
   - Use for: Drill-down analysis, finding specific mentions

2. **`restaurant_aspect_aggregates.csv`** (Summary)
   - One row per (restaurant, aspect) combination
   - Includes: sentiment counts, percentages, sentiment score (-1 to +1)
   - Use for: Restaurant profiling, comparative analysis

3. **`kano_model_input.csv`** (Strategic)
   - One row per aspect (across all restaurants)
   - Includes: total mentions, sentiment distribution, polarity
   - Use for: Kano Model categorization (Must-Have vs Attractive)

4. **`prediction_summary.json`** (Metadata)
   - Overall statistics for validation and thesis reporting

---

## Power BI Integration Steps:

### 1. Load Data into Power BI
```dax
// Load restaurant aggregates as main table
Source = Csv.Document(File.Contents("Dataset/restaurant_aspect_aggregates.csv"))
```

### 2. Create Kano Model DAX Calculated Column
```dax
Kano_Category = 
VAR TotalMentions = [total_segments]
VAR PositivePct = [pct_positive]
VAR NegativePct = [pct_negative]
RETURN
    SWITCH(
        TRUE(),
        NegativePct > 30 && TotalMentions > 100, "Must-Have",
        PositivePct > 70 && TotalMentions < 50, "Attractive",
        PositivePct > 40 && NegativePct > 20, "Performance",
        "Indifferent"
    )
```

### 3. Key Visualizations to Create:
- **Sentiment Heatmap**: Restaurant (rows) √ó Aspect (columns) colored by sentiment_score
- **Kano Model Quadrant**: Scatter plot with sentiment_polarity vs. total_mentions
- **Top Negative Aspects by Restaurant**: Bar chart filtered by `pct_negative > 40`
- **Confidence Filter**: Slicer for `is_high_confidence` to show only reliable predictions

---

## ‚ö†Ô∏è Important Notes:

1. **Low Confidence Predictions**: 
   - Predictions with confidence < 0.6 should be flagged for manual review
   - These are visible in the `is_high_confidence` column

2. **Multi-Aspect Segments**:
   - Each segment can appear multiple times (once per aspect)
   - Use `Original_Review_ID` to track back to full reviews

3. **Data Validation**:
   - Check `prediction_summary.json` for overall statistics
   - Verify sentiment distribution matches expectations (~90% positive from weak labels)

---

## üéØ Ready for Dashboard Creation!
All prediction data is now available in CSV format for Power BI import.

In [None]:
# ==============================================================================
# Empirical Analysis: Confidence Threshold Trade-offs
# ==============================================================================

print("\n" + "=" * 70)
print("CONFIDENCE THRESHOLD ANALYSIS")
print("=" * 70)
print("\nAcademic References:")
print("  [1] Hendrycks & Gimpel (2017) - Baseline for Detecting Misclassified Examples")
print("  [2] Guo et al. (2017) - On Calibration of Modern Neural Networks")
print("  [3] Ratner et al. (2016) - Data Programming (Weak Supervision)")
print("=" * 70)

# Define threshold candidates
thresholds = [0.5, 0.6, 0.7, 0.8, 0.9]

print(f"\n{'Threshold':<12} {'High Conf %':<15} {'Flagged %':<15} {'Mean Conf':<15} {'Interpretation'}")
print("-" * 90)

threshold_analysis = []

for thresh in thresholds:
    high_conf_mask = df_exploded['confidence'] >= thresh
    pct_high = (high_conf_mask.sum() / len(df_exploded)) * 100
    pct_flagged = 100 - pct_high
    mean_conf = df_exploded[high_conf_mask]['confidence'].mean() if high_conf_mask.sum() > 0 else 0
    
    # Interpretation based on literature
    if thresh <= 0.6:
        interpret = "Liberal (High Recall)"
    elif thresh <= 0.75:
        interpret = "Balanced (Recommended)"
    elif thresh <= 0.85:
        interpret = "Conservative"
    else:
        interpret = "Very Conservative"
    
    threshold_analysis.append({
        'threshold': thresh,
        'pct_high_conf': round(pct_high, 2),
        'pct_flagged': round(pct_flagged, 2),
        'mean_conf': round(mean_conf, 4),
        'interpretation': interpret
    })
    
    print(f"{thresh:<12.1f} {pct_high:<15.1f} {pct_flagged:<15.1f} {mean_conf:<15.4f} {interpret}")

print("=" * 70)

# Confidence distribution percentiles
print(f"\n  Confidence Score Percentiles:")
percentiles = [10, 25, 50, 75, 90, 95, 99]
for p in percentiles:
    val = np.percentile(df_exploded['confidence'], p)
    print(f"    {p:>2}th percentile: {val:.4f}")

print("\n" + "=" * 70)
print("RECOMMENDATION (Based on Guo et al. 2017 & Weak Supervision Literature):")
print("=" * 70)
print(f"  Current Threshold: {CFG.confidence_threshold}")
print(f"  High Confidence:   {(df_exploded['confidence'] >= CFG.confidence_threshold).sum():,} predictions ({(df_exploded['confidence'] >= CFG.confidence_threshold).mean()*100:.1f}%)")
print(f"  Flagged for Review: {(df_exploded['confidence'] < CFG.confidence_threshold).sum():,} predictions ({(df_exploded['confidence'] < CFG.confidence_threshold).mean()*100:.1f}%)")
print(f"\n  ‚úì For Power BI Dashboard: 0.7-0.8 provides good balance")
print(f"  ‚úì For High-Stakes Decisions: Use ‚â•0.8 and manually review flagged cases")
print(f"  ‚úì For Maximum Coverage: Use ‚â•0.6 but note increased noise risk")
print("=" * 70)

# Create confidence distribution visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left plot: Confidence histogram with threshold line
ax1 = axes[0]
ax1.hist(df_exploded['confidence'], bins=50, color='#3498db', edgecolor='black', alpha=0.7)
ax1.axvline(CFG.confidence_threshold, color='red', linestyle='--', linewidth=2, 
            label=f'Current Threshold ({CFG.confidence_threshold})')
ax1.axvline(0.8, color='orange', linestyle=':', linewidth=2, label='Conservative (0.8)')
ax1.set_title('Confidence Score Distribution', fontsize=12, fontweight='bold')
ax1.set_xlabel('Confidence Score')
ax1.set_ylabel('Frequency')
ax1.legend()
ax1.grid(alpha=0.3)

# Right plot: Coverage vs Threshold trade-off
ax2 = axes[1]
thresh_range = np.linspace(0.5, 0.95, 50)
coverage = [(df_exploded['confidence'] >= t).mean() * 100 for t in thresh_range]
ax2.plot(thresh_range, coverage, linewidth=2, color='#2ecc71')
ax2.axvline(CFG.confidence_threshold, color='red', linestyle='--', linewidth=2, 
            label=f'Current ({CFG.confidence_threshold})')
ax2.axhline(80, color='gray', linestyle=':', alpha=0.5, label='80% Coverage')
ax2.set_title('Coverage vs Confidence Threshold', fontsize=12, fontweight='bold')
ax2.set_xlabel('Confidence Threshold')
ax2.set_ylabel('Coverage (%)')
ax2.legend()
ax2.grid(alpha=0.3)
ax2.set_ylim([0, 105])

plt.tight_layout()
plt.savefig(os.path.join(CFG.output_dir, 'confidence_threshold_analysis.png'), 
            dpi=300, bbox_inches='tight')
plt.show()

print(f"\n‚úì Analysis saved to: {os.path.join(CFG.output_dir, 'confidence_threshold_analysis.png')}")

# STAGE 13: Merge with Silver Standard for Metadata (State, Category)


In [None]:
# ==============================================================================
# Merge Predictions with Silver Standard Metadata (State, Category)
# ==============================================================================

print("\n" + "=" * 70)
print("MERGING WITH SILVER STANDARD FOR METADATA")
print("=" * 70)

# Load silver standard (contains state, category, restaurant name)
#silver_path = r"C:\Users\Ong Hui Ling\Dropbox\PC\Documents\Github\Aspect-Based-Sentiment-Analysis\Dataset\silver_std.pkl"
silver_path = r"\content\drive\MyDrive\Aspect-Based-Sentiment-Analysis\Dataset\silver_std.pkl"

df_silver = pd.read_pickle(silver_path)
print(f"  ‚úì Silver standard loaded: {len(df_silver):,} reviews")

# Extract unique review ID and restaurant metadata
# silver_std has one row per review, aspect_categorization has multiple rows
# (segments per review) 
df_silver_meta = df_silver[[
    'reviewID',           # Review ID
    'name',               # Restaurant name
    'state',              # State (for strategic analysis)
    'main_category',      # Main category (e.g., Mamak, Fine Dining)
    'sub_category',       # Sub category
    'place_overall_rating',  # Restaurant overall rating (context)
    'user_review_rating'     # Star rating (weak label source)
]].drop_duplicates(subset=['reviewID']).reset_index(drop=True)

print(f"  ‚úì Metadata extracted: {len(df_silver_meta):,} unique reviews")

# Merge with predictions by matching review ID using Full_Review as proxy
# (Since aspect_categorization has full review text, we can match)
# Actually, let's use the fact that Original_Review_ID likely corresponds to a sequence
# 
# Better approach: Match by Full_Review text from aspect_categorization
# But this is slow. Instead, let's check if Original_Review_ID appears in silver_std

print(f"\n  Data Quality Check:")
print(f"    aspect_exploded rows: {len(df_exploded):,}")
print(f"    unique reviews in exploded: {df_exploded['Original_Review_ID'].nunique():,}")

# For this macro dataset, we'll take a simpler approach:
# Group silver_std by main_category and state, then assign to aspect predictions
# This works because aspect_categorization_refined is a SUBSET of silver_std

# Create mapping: For each review, find its state and category from silver_std
# using the review text as key (since Original_Review_ID may not align)

# Alternative: Since both have the full review text, merge on that
print(f"\n  Attempting merge on Full_Review text...")

df_exploded_with_meta = df_exploded.merge(
    df_silver[[
        'text',               # Full review text
        'state',
        'main_category',
        'sub_category',
        'name',               # Restaurant name
        'user_review_rating'  # Star rating
    ]],
    left_on='Original_Review_ID',
    right_on='reviewID',
    how='left'
)

# Check merge success
merge_success = df_exploded_with_meta['state'].notna().sum()
print(f"  ‚úì Merge result: {merge_success:,} / {len(df_exploded_with_meta):,} rows matched ({merge_success/len(df_exploded_with_meta)*100:.1f}%)")

if merge_success < len(df_exploded_with_meta) * 0.9:
    print(f"\n  ‚ö†Ô∏è  WARNING: Only {merge_success/len(df_exploded_with_meta)*100:.1f}% rows matched!")
    print(f"      This suggests Original_Review_ID and text don't align perfectly.")
    print(f"      Please verify the data source.")

# Drop the text column (no longer needed)
df_exploded_with_meta = df_exploded_with_meta.drop(columns=['text'], errors='ignore')

print(f"\n  ‚úì State distribution:")
state_counts = df_exploded_with_meta['state'].value_counts()
for state, count in state_counts.head(10).items():
    if pd.notna(state):
        print(f"    {state:<20}: {count:>6,} segments")

print(f"\n  ‚úì Main category distribution:")
cat_counts = df_exploded_with_meta['main_category'].value_counts()
for cat, count in cat_counts.head(10).items():
    if pd.notna(cat):
        print(f"    {cat:<20}: {count:>6,} segments")

print("=" * 70)


# STAGE 14: State-Level Aggregation (For Government Strategic Planning)


In [None]:
# ==============================================================================
# State-Level Aggregation (Critical for Government Tourism Boards)
# ==============================================================================

print("\n" + "=" * 70)
print("STATE-LEVEL SENTIMENT AGGREGATION")
print("=" * 70)
print("\nWhy State-Level?")
print("  - Government tourism strategies operate at state level")
print("  - Allows regional benchmarking (which states excel/lag)")
print("  - Supports resource allocation decisions")
print("  - Macro dataset covers ALL states ‚Üí comprehensive coverage")

# Group by state and aspect
state_agg = df_exploded_with_meta.dropna(subset=['state']).groupby(['state', 'aspect']).agg(
    total_segments=('Segment', 'count'),
    num_positive=('predicted_sentiment_id', lambda x: (x == 1).sum()),
    num_negative=('predicted_sentiment_id', lambda x: (x == 0).sum()),
    avg_confidence=('confidence', 'mean'),
    high_confidence_count=('is_high_confidence', 'sum'),
    num_restaurants=('name', 'nunique'),
).reset_index()

# Calculate percentages and sentiment score
state_agg['pct_positive'] = (state_agg['num_positive'] / state_agg['total_segments'] * 100).round(2)
state_agg['pct_negative'] = (state_agg['num_negative'] / state_agg['total_segments'] * 100).round(2)
state_agg['sentiment_score'] = (
    (state_agg['num_positive'] - state_agg['num_negative']) / state_agg['total_segments']
).round(4)
state_agg['avg_confidence'] = state_agg['avg_confidence'].round(4)
state_agg['pct_high_confidence'] = (
    state_agg['high_confidence_count'] / state_agg['total_segments'] * 100
).round(2)

# Sort by popularity (most discussed aspects per state)
state_agg = state_agg.sort_values(['state', 'total_segments'], ascending=[True, False])

# Output path
state_agg_path = os.path.join(CFG.output_dir, "state_level_summary.csv")
state_agg.to_csv(state_agg_path, index=False, encoding='utf-8-sig')

print(f"\n  ‚úì State-level aggregation complete")
print(f"    Rows: {len(state_agg):,} (state √ó aspect combinations)")
print(f"    States covered: {state_agg['state'].nunique()}")
print(f"    Aspects per state: {state_agg.groupby('state').size().mean():.1f}")
print(f"    Saved to: {state_agg_path}")

print(f"\n  Top Negative Aspects by State (pct_negative > 20%):")
print("  " + "=" * 66)
for state in sorted(state_agg['state'].unique()):
    state_data = state_agg[state_agg['state'] == state].nlargest(3, 'pct_negative')
    if len(state_data) > 0:
        worst = state_data.iloc[0]
        print(f"  {state:<20}: {worst['aspect']:<25} ({worst['pct_negative']:>5.1f}% negative)")

print("=" * 70)


# STAGE 15: Category-Level Aggregation (Restaurant Type Insights)


In [None]:
# ==============================================================================
# Category-Level Aggregation (Restaurant Type Performance)
# ==============================================================================

print("\n" + "=" * 70)
print("CATEGORY-LEVEL SENTIMENT AGGREGATION")
print("=" * 70)
print("\nWhy Category-Level?")
print("  - Compare performance across restaurant types (e.g., Mamak vs Fine Dining)")
print("  - Identify category-specific pain points")
print("  - Support category-specific improvement initiatives")
print("  - Benchmark category standards")

# Group by main_category and aspect
category_agg = df_exploded_with_meta.dropna(subset=['sub_category']).groupby(['main_category', 'aspect']).agg(
    total_segments=('Segment', 'count'),
    num_positive=('predicted_sentiment_id', lambda x: (x == 1).sum()),
    num_negative=('predicted_sentiment_id', lambda x: (x == 0).sum()),
    avg_confidence=('confidence', 'mean'),
    high_confidence_count=('is_high_confidence', 'sum'),
    num_restaurants=('name', 'nunique'),
).reset_index()

# Calculate percentages and sentiment score
category_agg['pct_positive'] = (category_agg['num_positive'] / category_agg['total_segments'] * 100).round(2)
category_agg['pct_negative'] = (category_agg['num_negative'] / category_agg['total_segments'] * 100).round(2)
category_agg['sentiment_score'] = (
    (category_agg['num_positive'] - category_agg['num_negative']) / category_agg['total_segments']
).round(4)
category_agg['avg_confidence'] = category_agg['avg_confidence'].round(4)
category_agg['pct_high_confidence'] = (
    category_agg['high_confidence_count'] / category_agg['total_segments'] * 100
).round(2)

# Sort by category and segments
category_agg = category_agg.sort_values(['main_category', 'total_segments'], ascending=[True, False])

# Output path
category_agg_path = os.path.join(CFG.output_dir, "category_level_summary.csv")
category_agg.to_csv(category_agg_path, index=False, encoding='utf-8-sig')

print(f"\n  ‚úì Category-level aggregation complete")
print(f"    Rows: {len(category_agg):,} (category √ó aspect combinations)")
print(f"    Categories: {category_agg['main_category'].nunique()}")

print(f"\n  Categories covered:")
for cat in sorted(category_agg['main_category'].unique()):
    cat_data = category_agg[category_agg['main_category'] == cat]
    total_seg = cat_data['total_segments'].sum()
    num_rest = cat_data['num_restaurants'].sum()
    print(f"    {cat:<30}: {total_seg:>6,} segments from {num_rest:>4,} restaurants")

print(f"\n  Category Performance (avg sentiment across all aspects):")
print("  " + "=" * 60)
cat_performance = category_agg.groupby('main_category').agg(
    avg_sentiment_score=('sentiment_score', 'mean'),
    avg_pct_positive=('pct_positive', 'mean'),
    total_segments=('total_segments', 'sum'),
).sort_values('avg_sentiment_score', ascending=False)

for cat, row in cat_performance.iterrows():
    print(f"  {cat:<30}: {row['avg_sentiment_score']:>7.4f} (+{row['avg_pct_positive']:>5.1f}%) [{int(row['total_segments']):>6,} segments]")

print(f"\n  Saved to: {category_agg_path}")
print("=" * 70)


# STAGE 16: State √ó Category Matrix (Strategic Heatmap Input)


In [None]:
# ==============================================================================
# State √ó Category Matrix (For Government Dashboard)
# ==============================================================================

print("\n" + "=" * 70)
print("STATE √ó CATEGORY MATRIX AGGREGATION")
print("=" * 70)
print("\nWhy State √ó Category?")
print("  - See which restaurant types perform best in each state")
print("  - Identify regional disparities (e.g., Mamak weak in State X)")
print("  - Supports policy targeting (e.g., 'improve Mamak service in Johor')")
print("  - Enable Power BI heatmap visualizations")

# Group by state, main_category, and aspect
state_cat_agg = df_exploded_with_meta.dropna(subset=['state', 'main_category']).groupby(
    ['state', 'main_category', 'aspect']
).agg(
    total_segments=('Segment', 'count'),
    num_positive=('predicted_sentiment_id', lambda x: (x == 1).sum()),
    num_negative=('predicted_sentiment_id', lambda x: (x == 0).sum()),
    avg_confidence=('confidence', 'mean'),
    num_restaurants=('name', 'nunique'),
).reset_index()

# Calculate metrics
state_cat_agg['pct_positive'] = (state_cat_agg['num_positive'] / state_cat_agg['total_segments'] * 100).round(2)
state_cat_agg['pct_negative'] = (state_cat_agg['num_negative'] / state_cat_agg['total_segments'] * 100).round(2)
state_cat_agg['sentiment_score'] = (
    (state_cat_agg['num_positive'] - state_cat_agg['num_negative']) / state_cat_agg['total_segments']
).round(4)
state_cat_agg['avg_confidence'] = state_cat_agg['avg_confidence'].round(4)

# Sort for readability
state_cat_agg = state_cat_agg.sort_values(['state', 'main_category', 'total_segments'], ascending=[True, True, False])

# Output path
state_cat_path = os.path.join(CFG.output_dir, "state_category_summary.csv")
state_cat_agg.to_csv(state_cat_path, index=False, encoding='utf-8-sig')

print(f"\n  ‚úì State √ó Category aggregation complete")
print(f"    Rows: {len(state_cat_agg):,} (state √ó category √ó aspect)")
print(f"    States: {state_cat_agg['state'].nunique()}")
print(f"    Categories: {state_cat_agg['main_category'].nunique()}")
print(f"    Total combinations: {state_cat_agg['state'].nunique() * state_cat_agg['main_category'].nunique()}")
print(f"    Saved to: {state_cat_path}")

# Summary: Best and Worst State-Category combinations
print(f"\n  TOP 10 Best Performing (State, Category) Combinations:")
print("  " + "=" * 60)
best_combos = state_cat_agg.groupby(['state', 'main_category']).agg({
    'sentiment_score': 'mean',
    'total_segments': 'sum',
    'num_restaurants': 'sum'
}).sort_values('sentiment_score', ascending=False).head(10)

for (state, cat), row in best_combos.iterrows():
    print(f"  {state:<20} √ó {cat:<20}: {row['sentiment_score']:>7.4f} ({int(row['total_segments']):>6,} segs)")

print(f"\n  TOP 10 Worst Performing (State, Category) Combinations:")
print("  " + "=" * 60)
worst_combos = state_cat_agg.groupby(['state', 'main_category']).agg({
    'sentiment_score': 'mean',
    'total_segments': 'sum',
    'num_restaurants': 'sum'
}).sort_values('sentiment_score', ascending=True).head(10)

for (state, cat), row in worst_combos.iterrows():
    print(f"  {state:<20} √ó {cat:<20}: {row['sentiment_score']:>7.4f} ({int(row['total_segments']):>6,} segs)")

print("=" * 70)


# ‚úÖ GOVERNMENT-LEVEL AGGREGATIONS COMPLETE

## Strategic Output Files Generated

### For Government Tourism Boards & Stakeholders:

**1. `state_level_summary.csv`** ‚≠ê PRIMARY
   - **Rows**: State √ó Aspect combinations
   - **Use case**: Regional performance benchmarking
   - **Key columns**: 
     - `state`, `aspect`, `total_segments`, `pct_positive`, `pct_negative`, `sentiment_score`
   - **Dashboard**: State selector ‚Üí Aspect heatmap ‚Üí Identify regional weak points

**2. `category_level_summary.csv`** ‚≠ê SECONDARY
   - **Rows**: Restaurant Category √ó Aspect combinations
   - **Use case**: Understand performance by restaurant type
   - **Examples**: 
     - How do Mamak restaurants perform on SERVICE?
     - Are Fine Dining restaurants better at AMBIENCE?
   - **Dashboard**: Category selector ‚Üí Aspect performance ‚Üí Comparative analysis

**3. `state_category_summary.csv`** ‚≠ê DETAILED
   - **Rows**: State √ó Category √ó Aspect combinations (most granular)
   - **Use case**: Pinpoint policy targets
   - **Example insights**:
     - "Mamak restaurants in Johor need SERVICE training"
     - "Fine Dining in Selangor excels at AMBIENCE"
   - **Dashboard**: State √ó Category filter ‚Üí Heatmap visualization

---

## Power BI Implementation Strategy

### Dashboard 1: Regional Benchmarking (For Government)
```
Top Level: State Selector (Slicer)
‚îú‚îÄ Visualization 1: Aspect Sentiment Heatmap (State √ó Aspect)
‚îÇ  ‚îî Color by sentiment_score (-1 to +1)
‚îú‚îÄ Visualization 2: Top 3 Weak Aspects by State
‚îÇ  ‚îî Filter: pct_negative > 25%
‚îî‚îÄ Visualization 3: Restaurants in State
   ‚îî Ranking by category health
```

**DAX Example:**
```dax
State_Sentiment_Score = 
CALCULATE(
    AVERAGE(StateLevel[sentiment_score]),
    ALLEXCEPT(StateLevel, StateLevel[state], StateLevel[aspect])
)
```

---

### Dashboard 2: Category Performance (For Industry)
```
Top Level: Category Selector (Slicer)
‚îú‚îÄ Visualization 1: Aspect Performance Across States
‚îÇ  ‚îî Line chart: Each state as trend
‚îú‚îÄ Visualization 2: State Rankings by Category
‚îÇ  ‚îî Ordered by sentiment_score
‚îî‚îÄ Visualization 3: Confidence by State
   ‚îî Filter low-confidence recommendations
```

---

### Dashboard 3: Strategic Hotspots (For Policy)
```
Main: State √ó Category Matrix (State √ó Category Heatmap)
‚îú‚îÄ Row: State (16 Malaysian states)
‚îú‚îÄ Column: Main_Category (Mamak, Fine Dining, etc.)
‚îú‚îÄ Color: Average sentiment_score for that combination
‚îî‚îÄ Tooltip: total_segments, num_restaurants, pct_positive
```

**Interpretation:**
- **Green cells** ‚úÖ = Good performance (state-category combination healthy)
- **Red cells** ‚ùå = Intervention needed (state-category combination weak)
- **Gray cells** ? = Low sample size (fewer than 10 segments)

---

## Example Government Insights

### Finding 1: Regional Disparities
```
State: Selangor
  FOOD sentiment: +0.82 (Strong)
  SERVICE sentiment: -0.15 (Weak - needs intervention)
  
Action: Government training programs for restaurant service in Selangor
```

### Finding 2: Category-Specific Issues
```
Category: Mamak
  VALUE sentiment: -0.45 (Concerning - customers feel overpriced)
  
Action: Investigate pricing standards, recommend price transparency initiatives
```

### Finding 3: State √ó Category Hotspot
```
State: Johor √ó Category: Hawker
  HALAL_COMPLIANCE sentiment: -0.60 (Critical)
  
Action: Urgent: Halal certification audit in Johor hawker stalls
```

---

## File Organization

```
Dataset/
‚îú‚îÄ‚îÄ segment_level_predictions.csv     ‚Üê Original (detail)
‚îú‚îÄ‚îÄ restaurant_aspect_aggregates.csv  ‚Üê Original (restaurant-level)
‚îú‚îÄ‚îÄ kano_model_input.csv              ‚Üê Original (aspects)
‚îú‚îÄ‚îÄ state_level_summary.csv           ‚Üê NEW (government primary)
‚îú‚îÄ‚îÄ category_level_summary.csv        ‚Üê NEW (industry insights)
‚îú‚îÄ‚îÄ state_category_summary.csv        ‚Üê NEW (policy heatmap)
‚îú‚îÄ‚îÄ prediction_summary.json           ‚Üê Original (metadata)
‚îî‚îÄ‚îÄ confidence_threshold_analysis.png ‚Üê Original (validation)
```

---

## Next Steps

1. **Load** `state_level_summary.csv` as main table in Power BI
2. **Create slicer** for state selection
3. **Build heatmap**: Rows = Aspect, Columns = State, Values = sentiment_score
4. **Add filters**: confidence > 0.7 (to show only reliable insights)
5. **Create** "Hotspots" view using `state_category_summary.csv`
6. **Export** to stakeholders with navigation guide

**Ready for government presentation! üéØ**
