# Decision Intelligence Studio - Interactive Demo

This notebook provides an interactive walkthrough of the Decision Intelligence system.

## Overview

We'll demonstrate:
1. **Data Exploration** - Understanding the dataset
2. **Causal Analysis** - Estimating treatment effects
3. **Segmentation** - Identifying high-value customers
4. **What-If Simulation** - Business scenario planning
5. **Action Recommendations** - Optimal targeting strategy

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print("âœ“ Libraries imported successfully")

## 1. Data Exploration

In [None]:
# Load canonical data
canonical_df = pd.read_parquet('../data/processed/canonical_events.parquet')

print(f"Dataset shape: {canonical_df.shape}")
print(f"Date range: {canonical_df['event_ts'].min()} to {canonical_df['event_ts'].max()}")
print(f"\nTreatment distribution:")
print(canonical_df['treatment'].value_counts())

canonical_df.head()

In [None]:
# Visualize treatment vs outcome
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Distribution comparison
treated = canonical_df[canonical_df['treatment'] == 1]['outcome']
control = canonical_df[canonical_df['treatment'] == 0]['outcome']

axes[0].hist([control, treated], bins=30, label=['Control', 'Treated'], alpha=0.7)
axes[0].set_xlabel('Outcome (Revenue)')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Outcome Distribution by Treatment Group')
axes[0].legend()

# Mean comparison
means = canonical_df.groupby('treatment')['outcome'].mean()
axes[1].bar(['Control', 'Treated'], means, color=['#ff6b6b', '#4ecdc4'])
axes[1].set_ylabel('Mean Outcome ($)')
axes[1].set_title('Average Revenue by Group')
axes[1].axhline(y=means.mean(), color='gray', linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()

print(f"\nNaive ATE estimate: ${(means[1] - means[0]):.2f}")
print("(This may be biased due to confounding!)")

## 2. Causal Analysis Results

In [None]:
# Load uplift scores
uplift_df = pd.read_parquet('../data/outputs/uplift_scores.parquet')

print(f"Causal estimates for {len(uplift_df)} customers")
print(f"\nAverage Treatment Effect (ATE): ${uplift_df['uplift_score'].mean():.2f}")
print(f"CATE range: ${uplift_df['uplift_score'].min():.2f} to ${uplift_df['uplift_score'].max():.2f}")
print(f"CATE std: ${uplift_df['uplift_score'].std():.2f}")

uplift_df.head(10)

In [None]:
# Visualize uplift distribution
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Uplift distribution
axes[0, 0].hist(uplift_df['uplift_score'], bins=50, color='#667eea', alpha=0.7)
axes[0, 0].axvline(x=uplift_df['uplift_score'].mean(), color='red', 
                   linestyle='--', label=f'Mean: ${uplift_df["uplift_score"].mean():.2f}')
axes[0, 0].set_xlabel('Uplift Score ($)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_title('Distribution of Individual Treatment Effects')
axes[0, 0].legend()

# 2. Uplift by segment
segment_means = uplift_df.groupby('segment_name')['uplift_score'].mean().sort_values()
segment_means.plot(kind='barh', ax=axes[0, 1], color='#764ba2')
axes[0, 1].set_xlabel('Mean Uplift ($)')
axes[0, 1].set_title('Mean Uplift by Segment')

# 3. Segment sizes
segment_counts = uplift_df['segment_name'].value_counts()
axes[1, 0].pie(segment_counts, labels=segment_counts.index, autopct='%1.1f%%',
               colors=['#667eea', '#764ba2', '#ed64a6', '#ff9a9e'])
axes[1, 0].set_title('Customer Segment Distribution')

# 4. Uplift vs Outcome scatter
for segment in uplift_df['segment_name'].unique():
    seg_data = uplift_df[uplift_df['segment_name'] == segment]
    axes[1, 1].scatter(seg_data['uplift_score'], seg_data['outcome'], 
                      label=segment, alpha=0.5, s=20)
axes[1, 1].set_xlabel('Uplift Score ($)')
axes[1, 1].set_ylabel('Observed Outcome ($)')
axes[1, 1].set_title('Uplift vs Outcome by Segment')
axes[1, 1].legend()

plt.tight_layout()
plt.show()

## 3. Segment Analysis

In [None]:
# Detailed segment statistics
segment_stats = uplift_df.groupby('segment_name').agg({
    'user_id': 'count',
    'uplift_score': ['mean', 'median', 'std'],
    'outcome': 'mean',
    'treatment': 'mean'
}).round(2)

segment_stats.columns = ['Count', 'Mean Uplift', 'Median Uplift', 'Uplift Std', 
                         'Mean Outcome', 'Treatment Rate']
segment_stats = segment_stats.sort_values('Mean Uplift', ascending=False)

print("\nSEGMENT ANALYSIS")
print("=" * 80)
print(segment_stats)

# Key insights
print("\n\nKEY INSIGHTS:")
print("=" * 80)
high_seg = segment_stats.iloc[0]
low_seg = segment_stats.iloc[-1]

print(f"1. High Uplift segment shows {high_seg['Mean Uplift']:.2f}x more benefit than Low Uplift segment")
print(f"2. High Uplift segment represents {(high_seg['Count']/len(uplift_df)*100):.1f}% of customers")
print(f"3. Targeting top 25% could yield ${(high_seg['Mean Uplift'] * high_seg['Count']):.2f} in incremental revenue")

## 4. What-If Simulation

In [None]:
# Define business parameters
COST_PER_TREATMENT = 10.0
BUDGET = 50000.0

def simulate_campaign(segment_name=None, budget=BUDGET):
    """Simulate campaign for a specific segment"""
    
    # Filter data
    if segment_name:
        data = uplift_df[uplift_df['segment_name'] == segment_name].copy()
    else:
        data = uplift_df.copy()
    
    # Calculate metrics
    max_users = int(budget / COST_PER_TREATMENT)
    
    # Sort by uplift and take top users within budget
    data = data.sort_values('uplift_score', ascending=False).head(max_users)
    
    n_users = len(data)
    total_cost = n_users * COST_PER_TREATMENT
    expected_revenue = data['uplift_score'].sum()
    roi = (expected_revenue - total_cost) / total_cost if total_cost > 0 else 0
    
    return {
        'segment': segment_name or 'All Users',
        'n_users': n_users,
        'cost': total_cost,
        'expected_revenue': expected_revenue,
        'roi': roi
    }

# Run simulations for each segment
print("CAMPAIGN SIMULATION RESULTS")
print("=" * 80)
print(f"Budget: ${BUDGET:,.2f}")
print(f"Cost per treatment: ${COST_PER_TREATMENT:.2f}")
print("\n")

results = []
for segment in ['High Uplift', 'Medium-High', 'Medium-Low', 'Low Uplift', None]:
    result = simulate_campaign(segment)
    results.append(result)
    print(f"Scenario: {result['segment']}")
    print(f"  Users: {result['n_users']:,}")
    print(f"  Cost: ${result['cost']:,.2f}")
    print(f"  Expected Revenue: ${result['expected_revenue']:,.2f}")
    print(f"  ROI: {result['roi']:.1%}")
    print()

# Visualize
results_df = pd.DataFrame(results)
fig, ax = plt.subplots(figsize=(12, 6))
x = np.arange(len(results_df))
width = 0.35

ax.bar(x - width/2, results_df['expected_revenue'], width, label='Expected Revenue', color='#4ecdc4')
ax.bar(x + width/2, results_df['cost'], width, label='Cost', color='#ff6b6b')

ax.set_xlabel('Targeting Strategy')
ax.set_ylabel('Amount ($)')
ax.set_title('Campaign ROI by Targeting Strategy')
ax.set_xticks(x)
ax.set_xticklabels(results_df['segment'], rotation=45, ha='right')
ax.legend()
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

## 5. Action Recommendations

In [None]:
# Generate optimal recommendation
MIN_ROI_THRESHOLD = 0.5  # 50% minimum ROI

# Calculate expected value for each user
recommendation_df = uplift_df.copy()
recommendation_df['expected_gain'] = recommendation_df['uplift_score']
recommendation_df['net_benefit'] = recommendation_df['expected_gain'] - COST_PER_TREATMENT
recommendation_df['individual_roi'] = recommendation_df['net_benefit'] / COST_PER_TREATMENT

# Filter profitable users
candidates = recommendation_df[
    (recommendation_df['net_benefit'] > 0) & 
    (recommendation_df['individual_roi'] > MIN_ROI_THRESHOLD)
].sort_values('net_benefit', ascending=False)

# Apply budget
max_users = int(BUDGET / COST_PER_TREATMENT)
recommended = candidates.head(max_users)

print("RECOMMENDED ACTION")
print("=" * 80)
print(f"\nTarget {len(recommended):,} customers with highest uplift")
print(f"Expected incremental revenue: ${recommended['expected_gain'].sum():,.2f}")
print(f"Campaign cost: ${len(recommended) * COST_PER_TREATMENT:,.2f}")
print(f"Net profit: ${(recommended['expected_gain'].sum() - len(recommended) * COST_PER_TREATMENT):,.2f}")
print(f"Overall ROI: {(recommended['net_benefit'].sum() / (len(recommended) * COST_PER_TREATMENT)):.1%}")

print("\n\nSegment breakdown of recommended users:")
print(recommended['segment_name'].value_counts())

# Visualize recommended vs not recommended
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Uplift distribution
axes[0].hist([recommendation_df[~recommendation_df['user_id'].isin(recommended['user_id'])]['uplift_score'],
              recommended['uplift_score']], 
             bins=30, label=['Not Recommended', 'Recommended'], alpha=0.7)
axes[0].set_xlabel('Uplift Score ($)')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Uplift Distribution: Recommended vs Not Recommended')
axes[0].legend()

# ROI scatter
axes[1].scatter(recommendation_df['uplift_score'], recommendation_df['individual_roi'], 
               alpha=0.3, label='All Users', s=20)
axes[1].scatter(recommended['uplift_score'], recommended['individual_roi'], 
               alpha=0.6, label='Recommended', color='green', s=20)
axes[1].axhline(y=MIN_ROI_THRESHOLD, color='red', linestyle='--', 
               label=f'Min ROI Threshold ({MIN_ROI_THRESHOLD:.0%})')
axes[1].set_xlabel('Uplift Score ($)')
axes[1].set_ylabel('Individual ROI')
axes[1].set_title('ROI vs Uplift: Targeting Strategy')
axes[1].legend()

plt.tight_layout()
plt.show()

## Summary

### Key Findings:

1. **Heterogeneous Treatment Effects**: Treatment effects vary significantly across customers (demonstrated by CATE distribution)

2. **Segment-Specific Strategy**: High Uplift segment shows substantially higher benefit from treatment

3. **ROI Optimization**: By targeting high-uplift customers, we can achieve significantly better ROI than random targeting

4. **Actionable Insights**: Clear recommendations on which customers to target for maximum business impact

### Business Value:

- **Precision Targeting**: Focus resources where they have most impact
- **Cost Efficiency**: Avoid wasting budget on customers who won't respond
- **Measurable Impact**: Track predictions vs actual outcomes
- **Continuous Learning**: Feed experiment results back into model

### Technical Approach:

- **Causal Inference**: DoWhy for identification + EconML for estimation
- **Robustness**: Multiple refutation tests validate conclusions
- **Production-Ready**: API-first design, containerized, monitored
- **Scalable**: Cloud-native architecture handles large datasets