# üîÆ What-If Engine - Scenario Analysis

## Objective

This notebook implements a **"What-If Engine"** to explore how behavioral changes affect burnout risk.

### Use Case
An employee with high burnout risk wants to know:
> "If I sleep 1 extra hour and reduce 2 hours of screen time, will my risk decrease?"

### How It Works
1. Take an example from the dataset (e.g., a high burnout case)
2. Apply modifications ("deltas") to specific features
3. Compare predicted probabilities before/after

### Practical Applications
- **HR Analytics**: identify personalized interventions for at-risk employees
- **Self-monitoring**: wellness apps that suggest behavioral changes
- **Policy making**: evaluate impact of company policies (e.g., reduced hours)

### Limitations
- The model predicts **correlations**, not **causation**
- Real interventions may have different effects
- Synthetic dataset: validate on real data before deployment

In [None]:
# =============================================================================
# SETUP AND MODEL LOADING
# =============================================================================
# Load trained MLP model for what-if predictions

import numpy as np
import pandas as pd
import torch
from torch import nn
from pathlib import Path
from sklearn.model_selection import train_test_split
import joblib

# Paths
DATA_DIR = Path('../data/processed')
MODEL_DIR = Path('../models/saved')

# =============================================================================
# DATA LOADING
# =============================================================================
# Use same dataset as training for consistent features

df = pd.read_parquet(DATA_DIR / 'tabular_ml_ready.parquet')
feature_cols = [c for c in df.columns if c not in {'burnout_level', 'burnout_score'}]
X = df[feature_cols].values.astype(np.float32)
y = df['burnout_level'].values.astype(np.int64)

# Split identical to training (same random_state!)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

# Convert to DataFrame to keep column names
X_train = pd.DataFrame(X_train, columns=feature_cols)
X_test = pd.DataFrame(X_test, columns=feature_cols)

# =============================================================================
# MLP MODEL LOADING
# =============================================================================
# Define same architecture used in training

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class MLP(nn.Module):
    """Same architecture as 03_deep_learning_mlp.ipynb"""
    def __init__(self, input_dim, num_classes):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 256),
            nn.BatchNorm1d(256),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(256, 128),
            nn.BatchNorm1d(128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, num_classes),
        )
    def forward(self, x):
        return self.net(x)

# Load saved model
model_path = MODEL_DIR / 'mlp_classifier.pt'
if model_path.exists():
    checkpoint = torch.load(model_path, map_location=DEVICE, weights_only=False)
    mlp = MLP(len(feature_cols), 3).to(DEVICE)
    mlp.load_state_dict(checkpoint['model_state'])
    mlp.eval()  # Inference mode
    print("‚úÖ Model loaded successfully!")
else:
    print(f"‚ùå Model not found at {model_path}. Run train_mlp.py first.")

# Load scaler (optional, for de-normalizing features)
scaler_path = DATA_DIR / 'feature_scaler.joblib'
scaler = joblib.load(scaler_path) if scaler_path.exists() else None
feature_names = feature_cols

FileNotFoundError: [Errno 2] No such file or directory: 'data/processed/X_train.joblib'

In [None]:
# =============================================================================
# WHAT-IF FUNCTION
# =============================================================================
# Main function for scenario analysis

def what_if_scenario(model, x_row: pd.Series, deltas: dict, device=DEVICE):
    """
    Analyze how feature modifications change predictions.
    
    Args:
        model: Trained PyTorch model
        x_row: A row of features (pd.Series with column names)
        deltas: Dict {feature_name: change}
                e.g., {"sleep_hours_mean": +1.0} = +1 hour of sleep
        device: Torch device
    
    Returns:
        dict with:
        - x_base: original features
        - x_new: modified features
        - base_proba: class probabilities (original)
        - new_proba: class probabilities (after modification)
        - proba_change: difference new - base
    """
    x_base = x_row.copy()
    x_new = x_row.copy()
    
    # Apply modifications
    for feat, delta in deltas.items():
        if feat in x_new.index:
            x_new[feat] = x_new[feat] + delta
        else:
            print(f"‚ö†Ô∏è Warning: feature '{feat}' not found in data")

    # Predictions
    model.eval()
    with torch.no_grad():
        # Convert to tensors
        x_base_tensor = torch.from_numpy(
            x_base.values.astype(np.float32)
        ).unsqueeze(0).to(device)
        x_new_tensor = torch.from_numpy(
            x_new.values.astype(np.float32)
        ).unsqueeze(0).to(device)
        
        # Forward pass
        base_logits = model(x_base_tensor)
        new_logits = model(x_new_tensor)
        
        # Convert logits to probabilities
        base_proba = torch.softmax(base_logits, dim=1).cpu().numpy()[0]
        new_proba = torch.softmax(new_logits, dim=1).cpu().numpy()[0]

    return {
        "x_base": x_base,
        "x_new": x_new,
        "base_proba": base_proba,           # [P(low), P(medium), P(high)]
        "new_proba": new_proba,
        "proba_change": new_proba - base_proba,  # Difference
    }

## üß™ Example: Intervention Scenario

Let's take a **high burnout** case from the test set and simulate an intervention:
- +1 hour of sleep (`sleep_hours_mean`)
- -2 hours of screen time (`screen_time_hours_mean`)
- -1 hour of work (`work_hours_mean`)

This simulates a week with better work-life balance.

In [None]:
# =============================================================================
# WHAT-IF ANALYSIS EXECUTION
# =============================================================================

# Find a high burnout case (class 2) for analysis
mask_high = (y_test == 2)
high_indices = np.where(mask_high)[0]

if len(high_indices) > 0:
    # Take first high burnout case
    idx = high_indices[0]
    x_example = X_test.iloc[idx]
    
    # Define scenario: improve sleep, reduce screen time and work hours
    deltas = {
        "sleep_hours_mean": +1.0,       # +1 hour average sleep
        "screen_time_hours_mean": -2.0,  # -2 hours average screen time
        "work_hours_mean": -1.0,         # -1 hour average work
    }
    
    # Execute analysis
    result = what_if_scenario(mlp, x_example, deltas)
    
    # =============================================================================
    # RESULTS VISUALIZATION
    # =============================================================================
    print("=" * 50)
    print("üîÆ WHAT-IF ANALYSIS RESULTS")
    print("=" * 50)
    
    print("\nüìä Scenario:")
    for feat, delta in deltas.items():
        sign = "+" if delta > 0 else ""
        print(f"   {feat}: {sign}{delta}")
    
    print(f"\nüìà Original prediction (Low, Medium, High):")
    print(f"   {result['base_proba'].round(3)}")
    
    print(f"\nüìâ Modified prediction (Low, Medium, High):")
    print(f"   {result['new_proba'].round(3)}")
    
    print(f"\nüîÑ Probability change:")
    print(f"   {result['proba_change'].round(3)}")
    
    # Automatic interpretation
    print("\nüí° Interpretation:")
    if result['proba_change'][2] < 0:
        reduction = abs(result['proba_change'][2] * 100)
        print(f"   ‚úÖ High burnout risk DECREASED by {reduction:.1f}%")
    else:
        increase = result['proba_change'][2] * 100
        print(f"   ‚ö†Ô∏è High burnout risk INCREASED by {increase:.1f}%")
    
    if result['proba_change'][0] > 0:
        improvement = result['proba_change'][0] * 100
        print(f"   ‚úÖ Low burnout probability INCREASED by {improvement:.1f}%")

else:
    print("‚ùå No high burnout cases found in test set")

## üìù Conclusions

### Analysis Results
The example shows how small behavioral changes can significantly reduce burnout risk:
- **+1 hour sleep**: improves physical and mental recovery
- **-2 hours screen time**: reduces visual fatigue and digital stimulation
- **-1 hour work**: improves work-life balance

### Future Developments
1. **Interactive UI**: sliders to modify features in real-time
2. **Batch analysis**: analyze entire team for targeted interventions
3. **Causal model**: use causal inference to validate interventions
4. **App integration**: connect to wellness app for personalized recommendations

### Limitations
- Predictions are based on correlations, not causation
- Dataset is synthetic - validate on real data
- Real effects depend on many unmodeled factors