# Retail Store Sales Optimization

**Goal**: Find optimal interventions to increase store sales by 20%

This notebook demonstrates how to use the Intervention Search system to identify the best ways to improve retail store performance through causal interventions.

## 1. Load Data and Setup

In [1]:
import pandas as pd
import numpy as np
import networkx as nx
import warnings
warnings.filterwarnings('ignore')

# Load retail data
df = pd.read_csv('data/retail_data.csv')
print(f"Loaded {len(df)} retail stores")
print(f"\nColumns: {list(df.columns)}")
df.head()

Loaded 500 retail stores

Columns: ['store_id', 'store_location', 'store_size', 'marketing_spend', 'price_discount', 'staff_count', 'competitor_proximity', 'foot_traffic', 'inventory_level', 'conversion_rate', 'customer_satisfaction', 'sales']


Unnamed: 0,store_id,store_location,store_size,marketing_spend,price_discount,staff_count,competitor_proximity,foot_traffic,inventory_level,conversion_rate,customer_satisfaction,sales
0,STORE_000,Suburban,7283.0,3591.86,15.6,9,7.1,3013.0,3673.0,24.23,71.0,63267.9
1,STORE_001,Rural,5825.0,8586.61,14.4,5,5.56,6971.0,2815.0,27.2,63.5,154348.83
2,STORE_002,Suburban,3786.0,13221.24,0.8,16,4.76,10890.0,2215.0,17.8,82.4,157303.11
3,STORE_003,Suburban,8324.0,11251.15,10.2,12,1.3,9255.0,4233.0,22.16,65.1,163656.9
4,STORE_004,Urban,7163.0,12291.86,11.4,18,4.29,10172.0,3510.0,24.36,86.1,201706.98


## 2. Define Causal Graph

**Causal Structure:**
- `store_location ‚Üí foot_traffic ‚Üí sales`
- `store_size ‚Üí inventory_level ‚Üí sales`
- `marketing_spend ‚Üí foot_traffic`
- `price_discount ‚Üí conversion_rate ‚Üí sales`
- `staff_count ‚Üí customer_satisfaction ‚Üí sales`
- `competitor_proximity ‚Üí foot_traffic`

In [2]:
# Define causal graph as adjacency matrix
nodes = ['store_location', 'store_size', 'marketing_spend', 'price_discount', 
         'staff_count', 'competitor_proximity', 'foot_traffic', 'inventory_level', 
         'conversion_rate', 'customer_satisfaction', 'sales']

edges = [
    ('store_location', 'foot_traffic'),
    ('marketing_spend', 'foot_traffic'),
    ('competitor_proximity', 'foot_traffic'),
    ('store_size', 'inventory_level'),
    ('price_discount', 'conversion_rate'),
    ('staff_count', 'customer_satisfaction'),
    ('foot_traffic', 'sales'),
    ('inventory_level', 'sales'),
    ('conversion_rate', 'sales'),
    ('customer_satisfaction', 'sales')
]

# Create adjacency matrix
adj_matrix = pd.DataFrame(0, index=nodes, columns=nodes)
for parent, child in edges:
    adj_matrix.loc[parent, child] = 1

print("Causal Graph:")
adj_matrix

Causal Graph:


Unnamed: 0,store_location,store_size,marketing_spend,price_discount,staff_count,competitor_proximity,foot_traffic,inventory_level,conversion_rate,customer_satisfaction,sales
store_location,0,0,0,0,0,0,1,0,0,0,0
store_size,0,0,0,0,0,0,0,1,0,0,0
marketing_spend,0,0,0,0,0,0,1,0,0,0,0
price_discount,0,0,0,0,0,0,0,0,1,0,0
staff_count,0,0,0,0,0,0,0,0,0,1,0
competitor_proximity,0,0,0,0,0,0,1,0,0,0,0
foot_traffic,0,0,0,0,0,0,0,0,0,0,1
inventory_level,0,0,0,0,0,0,0,0,0,0,1
conversion_rate,0,0,0,0,0,0,0,0,0,0,1
customer_satisfaction,0,0,0,0,0,0,0,0,0,0,1


## 3. Train Causal Model

In [3]:
import sys
sys.path.append('..')  # Adjust the path as needed to import ht_categ

In [4]:
from ht_categ import HT, HTConfig

# Create and train HT model
config = HTConfig(graph=adj_matrix, model_type='XGBoost')
ht_model = HT(config)
ht_model.train(df)

print("‚úì Causal model trained")
print(f"\nModel metrics (R¬≤ scores):")
for node, metrics in ht_model.model_metrics.items():
    if 'r2' in metrics:
        print(f"  {node}: {metrics['r2']:.3f}")

üéì TRAINING MODELS WITH QUALITY ASSESSMENT

üìä Detecting variable types...
   ‚úì store_location: CATEGORICAL (3 classes: ['Rural', 'Suburban', 'Urban']...)
   ‚úì store_size: CONTINUOUS
   ‚úì marketing_spend: CONTINUOUS
   ‚úì price_discount: CONTINUOUS
   ‚úì staff_count: CONTINUOUS
   ‚úì competitor_proximity: CONTINUOUS
   ‚úì foot_traffic: CONTINUOUS
   ‚úì inventory_level: CONTINUOUS
   ‚úì conversion_rate: CONTINUOUS
   ‚úì customer_satisfaction: CONTINUOUS
   ‚úì sales: CONTINUOUS

üîß Training models (model_type: XGBoost)...
   ‚úì store_location: Root node (no parents) - baseline scaling only
   ‚úì store_size: Root node (no parents) - baseline scaling only
   ‚úì marketing_spend: Root node (no parents) - baseline scaling only
   ‚úì price_discount: Root node (no parents) - baseline scaling only
   ‚úì staff_count: Root node (no parents) - baseline scaling only
   ‚úì competitor_proximity: Root node (no parents) - baseline scaling only
   ‚úì foot_traffic: regressor tra

## 4. Find Optimal Interventions

**Objective**: Increase sales by 20% with high confidence

In [None]:
from intervention_search import InterventionSearch

# Initialize intervention search with increased simulations for narrower CIs
searcher = InterventionSearch(
    graph=ht_model.graph,
    ht_model=ht_model,
    n_simulations=5000  # Increased from 1000 for more precise confidence intervals
)

# Search for interventions to increase sales by 20%
results = searcher.find_interventions(
    target_outcome='sales',
    target_change=20.0,  # +20% increase
    tolerance=3.0,       # ¬±3% tolerance
    confidence_level=0.90,
    max_intervention_pct=30.0,
    verbose=True
)

In [6]:
results

{'best_intervention': {'intervention_type': 'single',
  'nodes': ['conversion_rate'],
  'required_pct_changes': {'conversion_rate': 16.67},
  'actual_effect': 21.2,
  'error_from_target': 1.2,
  'within_tolerance': True,
  'ci_90': (-206.13745818685257, 50546.84328215676),
  'ci_50': (18231.231489011454, 36524.92158498851),
  'prediction_uncertainty_std': 14807.829666218966,
  'confidence': 0.4854903937081826,
  'search_iterations': 8,
  'validation': {'is_valid': True,
   'is_feasible': True,
   'is_safe': True,
   'confidence_adjustment': 1.0,
   'errors': []},
  'ranking_scores': {'overall': 0.625,
   'accuracy': 1.0,
   'uncertainty': 4.496668749592266e-304,
   'model_quality': 0.5,
   'simplicity': 1.0,
   'safety_multiplier': 1.0},
  'overall_score': 0.625,
  'rank': 1},
 'all_candidates': [{'intervention_type': 'single',
   'nodes': ['conversion_rate'],
   'required_pct_changes': {'conversion_rate': 16.67},
   'actual_effect': 21.2,
   'error_from_target': 1.2,
   'within_tolera

## 5. Analyze Best Intervention

In [7]:
best = results['best_intervention']

print("\n" + "="*70)
print("RECOMMENDED INTERVENTION")
print("="*70)
print(f"\nIntervene on: {', '.join(best['nodes'])}")
print(f"\nRequired changes:")
for node, change in best['required_pct_changes'].items():
    baseline = ht_model.baseline_stats[node]['mean']
    new_value = baseline * (1 + change/100)
    print(f"  ‚Ä¢ {node}: {change:+.1f}% (from {baseline:.0f} to {new_value:.0f})")

print(f"\nExpected Impact:")
print(f"  ‚Ä¢ Predicted sales change: {best['actual_effect']:+.1f}% (target: +20.0%)")
print(f"  ‚Ä¢ 90% Confidence Interval: [{best['ci_90'][0]:+.1f}%, {best['ci_90'][1]:+.1f}%]")
print(f"  ‚Ä¢ 50% Confidence Interval: [{best['ci_50'][0]:+.1f}%, {best['ci_50'][1]:+.1f}%]")
print(f"  ‚Ä¢ Confidence Score: {best['confidence']:.0%}")
print(f"  ‚Ä¢ Status: {'‚úÖ APPROVED' if best['within_tolerance'] else '‚ùå NOT APPROVED'}")
print("="*70)


RECOMMENDED INTERVENTION

Intervene on: conversion_rate

Required changes:
  ‚Ä¢ conversion_rate: +16.7% (from 24 to 28)

Expected Impact:
  ‚Ä¢ Predicted sales change: +21.2% (target: +20.0%)
  ‚Ä¢ 90% Confidence Interval: [-206.1%, +50546.8%]
  ‚Ä¢ 50% Confidence Interval: [+18231.2%, +36524.9%]
  ‚Ä¢ Confidence Score: 49%
  ‚Ä¢ Status: ‚úÖ APPROVED


## 6. Compare Top Interventions

In [None]:
# Show top 5 interventions
print("\nTop 5 Interventions:\n")
for i, candidate in enumerate(results['all_candidates'][:5], 1):
    print(f"{i}. {', '.join(candidate['nodes'])}")
    print(f"   Effect: {candidate['actual_effect']:+.1f}% | "
          f"Confidence: {candidate['confidence']:.0%} | "
          f"Status: {'‚úÖ' if candidate.get('within_tolerance', False) else '‚ö†Ô∏è'}")
    print(f"   Changes: {candidate['required_pct_changes']}\n")

## 7. Path Analysis

Understanding which causal paths contribute most to the effect

In [9]:
if 'path_analysis' in results:
    path_info = results['path_analysis']
    print("\nCausal Path Sensitivity Analysis:")
    print(f"  ‚Ä¢ Total paths analyzed: {path_info.get('total_paths', 'N/A')}")
    print(f"  ‚Ä¢ High quality paths: {path_info.get('high_quality_paths', 'N/A')}")
    print(f"  ‚Ä¢ Average path quality: {path_info.get('avg_path_quality', 0):.3f}")


Causal Path Sensitivity Analysis:
  ‚Ä¢ Total paths analyzed: N/A
  ‚Ä¢ High quality paths: N/A
  ‚Ä¢ Average path quality: 0.000


## 8. Business Interpretation

### Key Insights:

1. **Primary Levers**: The analysis identifies which operational variables have the strongest causal impact on sales
2. **Confidence Levels**: High confidence scores indicate reliable predictions based on strong model quality
3. **Uncertainty**: Confidence intervals account for model uncertainty through Monte Carlo simulation
4. **Feasibility**: Interventions are validated for out-of-distribution detection and practical constraints

### Recommended Actions:

Based on the best intervention identified:
- Implement the recommended changes gradually
- Monitor actual vs. predicted outcomes
- Consider multi-node interventions for robust improvements
- Focus on high-quality causal paths for maximum reliability

## Summary

This notebook demonstrated:
- ‚úÖ Loading and preparing retail data
- ‚úÖ Defining causal graph structure
- ‚úÖ Training causal models with HT
- ‚úÖ Finding optimal interventions with uncertainty quantification
- ‚úÖ Analyzing causal paths and model quality
- ‚úÖ Interpreting results for business decisions