# 🎯 AI Customer Personalization Platform - Complete Demo

This notebook demonstrates the complete AI-powered customer personalization platform that tracks user behavior, detects cart abandonment, and automatically sends personalized vouchers using machine learning.

## 🏗️ Platform Architecture

The platform consists of 4 main components:

1. **Event Simulation** - Generates synthetic user behavior data
2. **ML Training Pipeline** - Trains models for conversion prediction and voucher response
3. **Real-time Decision Service** - FastAPI service for real-time decisions
4. **Frontend Tracking** - JavaScript library for user behavior tracking

Let's explore each component in detail!


In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("🚀 AI Customer Personalization Platform Demo")
print("=" * 50)


## 1. 📊 Event Simulation and Data Generation

First, let's generate synthetic user behavior data to train our ML models. This simulates real e-commerce user interactions.


In [None]:
# Run the event simulation
import os
import sys

# Add current directory to path
sys.path.append('.')

# Import and run the simulation
from simulate_events import main as run_simulation

print("🔄 Generating synthetic user behavior data...")
try:
    events_df, sessions_df = run_simulation()
    print("✅ Data generation completed successfully!")
except Exception as e:
    print(f"❌ Error generating data: {e}")
    print("📝 Creating sample data for demo...")
    
    # Create sample data for demo
    np.random.seed(42)
    n_events = 5000
    n_users = 200
    n_products = 50
    
    events_data = []
    for i in range(n_events):
        events_data.append({
            'event_id': f'evt_{i:06d}',
            'user_id': f'user_{np.random.randint(0, n_users):03d}',
            'session_id': f'sess_{np.random.randint(0, n_events//10):06d}',
            'timestamp': pd.Timestamp.now() - pd.Timedelta(days=np.random.randint(0, 30)),
            'event_type': np.random.choice(['page_view', 'product_view', 'add_to_cart', 'purchase'], 
                                         p=[0.4, 0.3, 0.2, 0.1]),
            'product_id': f'SKU_{np.random.randint(0, n_products):03d}',
            'price': np.random.uniform(10, 200),
            'quantity': np.random.randint(1, 3),
            'cart_value': np.random.uniform(0, 500),
            'page_url': f'/product/SKU_{np.random.randint(0, n_products):03d}',
            'referrer': np.random.choice(['direct', 'google', 'facebook', 'email']),
            'device': np.random.choice(['desktop', 'mobile', 'tablet']),
            'country': np.random.choice(['US', 'UK', 'CA', 'AU']),
            'user_segment': np.random.choice(['high_value', 'regular', 'occasional'], 
                                           p=[0.1, 0.3, 0.6])
        })
    
    events_df = pd.DataFrame(events_data)
    
    # Create sessions summary
    sessions_data = events_df.groupby('session_id').agg({
        'user_id': 'first',
        'timestamp': ['min', 'max'],
        'event_type': 'count',
        'cart_value': 'max',
        'user_segment': 'first',
        'country': 'first',
        'device': 'first'
    }).reset_index()
    
    sessions_data.columns = [
        'session_id', 'user_id', 'session_start', 'session_end', 
        'event_count', 'max_cart_value', 'user_segment', 'country', 'device'
    ]
    
    sessions_data['session_duration_minutes'] = (
        sessions_data['session_end'] - sessions_data['session_start']
    ).dt.total_seconds() / 60
    
    sessions_data['had_purchase'] = np.random.choice([True, False], len(sessions_data), p=[0.2, 0.8])
    sessions_data['had_cart'] = np.random.choice([True, False], len(sessions_data), p=[0.4, 0.6])
    sessions_data['abandoned_cart'] = sessions_data['had_cart'] & ~sessions_data['had_purchase']
    
    sessions_df = sessions_data
    
    print("✅ Sample data created for demo!")

print(f"\n📊 Data Summary:")
print(f"   - Events: {len(events_df):,}")
print(f"   - Sessions: {len(sessions_df):,}")
print(f"   - Users: {events_df['user_id'].nunique():,}")
print(f"   - Products: {events_df['product_id'].nunique():,}")
print(f"   - Date Range: {events_df['timestamp'].min()} to {events_df['timestamp'].max()}")


In [None]:
# Visualize the generated data
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('📊 User Behavior Data Analysis', fontsize=16, fontweight='bold')

# Event types distribution
event_counts = events_df['event_type'].value_counts()
axes[0, 0].pie(event_counts.values, labels=event_counts.index, autopct='%1.1f%%', startangle=90)
axes[0, 0].set_title('Event Types Distribution')

# User segments
segment_counts = events_df['user_segment'].value_counts()
axes[0, 1].bar(segment_counts.index, segment_counts.values, color=['#ff6b6b', '#4ecdc4', '#45b7d1'])
axes[0, 1].set_title('User Segments')
axes[0, 1].set_ylabel('Number of Events')

# Device types
device_counts = events_df['device'].value_counts()
axes[0, 2].bar(device_counts.index, device_counts.values, color=['#96ceb4', '#feca57', '#ff9ff3'])
axes[0, 2].set_title('Device Types')
axes[0, 2].set_ylabel('Number of Events')

# Cart value distribution
cart_sessions = sessions_df[sessions_df['max_cart_value'] > 0]
axes[1, 0].hist(cart_sessions['max_cart_value'], bins=30, alpha=0.7, color='#a8e6cf')
axes[1, 0].set_title('Cart Value Distribution')
axes[1, 0].set_xlabel('Cart Value ($)')
axes[1, 0].set_ylabel('Frequency')

# Session duration
axes[1, 1].hist(sessions_df['session_duration_minutes'], bins=30, alpha=0.7, color='#ffd3a5')
axes[1, 1].set_title('Session Duration Distribution')
axes[1, 1].set_xlabel('Duration (minutes)')
axes[1, 1].set_ylabel('Frequency')

# Conversion rates by segment
conversion_by_segment = sessions_df.groupby('user_segment')['had_purchase'].mean()
axes[1, 2].bar(conversion_by_segment.index, conversion_by_segment.values, color=['#ff6b6b', '#4ecdc4', '#45b7d1'])
axes[1, 2].set_title('Conversion Rate by User Segment')
axes[1, 2].set_ylabel('Conversion Rate')
axes[1, 2].set_ylim(0, 1)

plt.tight_layout()
plt.show()

# Print key statistics
print("\n📈 Key Statistics:")
print(f"   - Overall Conversion Rate: {sessions_df['had_purchase'].mean():.2%}")
print(f"   - Cart Abandonment Rate: {sessions_df['abandoned_cart'].mean():.2%}")
print(f"   - Average Cart Value: ${sessions_df[sessions_df['max_cart_value'] > 0]['max_cart_value'].mean():.2f}")
print(f"   - Average Session Duration: {sessions_df['session_duration_minutes'].mean():.1f} minutes")

print(f"\n🎯 Conversion Rates by Segment:")
for segment in conversion_by_segment.index:
    print(f"   - {segment}: {conversion_by_segment[segment]:.2%}")


## 2. 🤖 Machine Learning Model Training

Now let's train our ML models for conversion prediction and voucher response. We'll use advanced feature engineering and multiple algorithms.


In [None]:
# Feature Engineering and Model Training
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve
import lightgbm as lgb
import joblib

print("🔧 Feature Engineering...")

# Create comprehensive feature set
def engineer_features(events_df, sessions_df):
    """Engineer features for ML models"""
    
    # Start with session-level features
    features_df = sessions_df.copy()
    
    # User-level RFM features
    now = events_df['timestamp'].max()
    purchases = events_df[events_df['event_type'] == 'purchase']
    
    if len(purchases) > 0:
        user_rfm = purchases.groupby('user_id').agg({
            'timestamp': ['max', 'min', 'count'],
            'price': ['sum', 'mean']
        }).reset_index()
        
        user_rfm.columns = [
            'user_id', 'last_purchase_date', 'first_purchase_date', 
            'frequency', 'monetary_value', 'avg_order_value'
        ]
        
        user_rfm['recency_days'] = (now - user_rfm['last_purchase_date']).dt.days
        user_rfm['days_since_first_purchase'] = (now - user_rfm['first_purchase_date']).dt.days
        
        # Add users with no purchases
        all_users = events_df['user_id'].unique()
        users_with_purchases = user_rfm['user_id'].unique()
        users_without_purchases = set(all_users) - set(users_with_purchases)
        
        if users_without_purchases:
            no_purchase_df = pd.DataFrame({
                'user_id': list(users_without_purchases),
                'last_purchase_date': None,
                'first_purchase_date': None,
                'frequency': 0,
                'monetary_value': 0,
                'avg_order_value': 0,
                'recency_days': 999,
                'days_since_first_purchase': 999
            })
            user_rfm = pd.concat([user_rfm, no_purchase_df], ignore_index=True)
    else:
        # Create dummy RFM features
        users = events_df['user_id'].unique()
        user_rfm = pd.DataFrame({
            'user_id': users,
            'recency_days': 999,
            'frequency': 0,
            'monetary_value': 0,
            'avg_order_value': 0,
            'days_since_first_purchase': 999
        })
    
    # Merge with session features
    features_df = features_df.merge(user_rfm[['user_id', 'recency_days', 'frequency', 
                                            'monetary_value', 'avg_order_value', 'days_since_first_purchase']], 
                                  on='user_id', how='left')
    
    # Behavioral features
    behavioral_features = []
    for session_id in sessions_df['session_id']:
        session_events = events_df[events_df['session_id'] == session_id]
        
        features = {
            'session_id': session_id,
            'page_views': len(session_events[session_events['event_type'] == 'page_view']),
            'product_views': len(session_events[session_events['event_type'] == 'product_view']),
            'add_to_cart_events': len(session_events[session_events['event_type'] == 'add_to_cart']),
            'unique_products_viewed': session_events[session_events['event_type'] == 'product_view']['product_id'].nunique(),
            'avg_product_price_viewed': session_events[session_events['event_type'] == 'product_view']['price'].mean() if len(session_events[session_events['event_type'] == 'product_view']) > 0 else 0,
            'bounce_rate': 1 if len(session_events) == 1 else 0
        }
        
        behavioral_features.append(features)
    
    behavioral_df = pd.DataFrame(behavioral_features)
    features_df = features_df.merge(behavioral_df, on='session_id', how='left')
    
    # Temporal features
    features_df['hour_of_day'] = features_df['session_start'].dt.hour
    features_df['day_of_week'] = features_df['session_start'].dt.dayofweek
    features_df['is_weekend'] = features_df['day_of_week'].isin([5, 6]).astype(int)
    features_df['is_business_hours'] = features_df['hour_of_day'].between(9, 17).astype(int)
    
    # Fill missing values
    numeric_columns = features_df.select_dtypes(include=[np.number]).columns
    features_df[numeric_columns] = features_df[numeric_columns].fillna(0)
    
    return features_df

# Engineer features
features_df = engineer_features(events_df, sessions_df)
print(f"✅ Feature engineering complete. Dataset shape: {features_df.shape}")

# Prepare conversion prediction data
cart_sessions = features_df[features_df['had_cart'] == True].copy()

# Create conversion labels (simplified for demo)
conversion_probability = (
    (cart_sessions['user_segment'] == 'high_value') * 0.3 +
    (cart_sessions['user_segment'] == 'regular') * 0.15 +
    (cart_sessions['user_segment'] == 'occasional') * 0.05 +
    (cart_sessions['max_cart_value'] > 100) * 0.2 +
    (cart_sessions['max_cart_value'] > 200) * 0.1
)

cart_sessions['converted_within_7d'] = np.random.binomial(1, conversion_probability)

# Select features for conversion model
feature_columns = [
    'max_cart_value', 'event_count', 'session_duration_minutes',
    'recency_days', 'frequency', 'monetary_value', 'avg_order_value',
    'page_views', 'product_views', 'add_to_cart_events', 'unique_products_viewed',
    'avg_product_price_viewed', 'bounce_rate', 'hour_of_day', 'day_of_week', 
    'is_weekend', 'is_business_hours'
]

# Handle categorical variables
categorical_columns = ['user_segment', 'country', 'device']
label_encoders = {}

for col in categorical_columns:
    if col in cart_sessions.columns:
        le = LabelEncoder()
        cart_sessions[f'{col}_encoded'] = le.fit_transform(cart_sessions[col].astype(str))
        label_encoders[col] = le
        feature_columns.append(f'{col}_encoded')

X = cart_sessions[feature_columns].fillna(0)
y = cart_sessions['converted_within_7d']

print(f"📊 Conversion dataset: {X.shape[0]} samples, {X.shape[1]} features")
print(f"📊 Conversion rate: {y.mean():.2%}")


In [None]:
# Train multiple models and compare performance
print("🤖 Training ML Models...")

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train different models
models = {
    'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
    'LightGBM': lgb.LGBMClassifier(random_state=42, verbose=-1)
}

model_results = {}
trained_models = {}

for name, model in models.items():
    print(f"\n🔄 Training {name}...")
    
    if name == 'LightGBM':
        # LightGBM specific training
        train_data = lgb.Dataset(X_train_scaled, label=y_train)
        val_data = lgb.Dataset(X_test_scaled, label=y_test, reference=train_data)
        
        params = {
            'objective': 'binary',
            'metric': 'auc',
            'boosting_type': 'gbdt',
            'num_leaves': 31,
            'learning_rate': 0.05,
            'feature_fraction': 0.9,
            'bagging_fraction': 0.8,
            'bagging_freq': 5,
            'verbose': -1,
            'random_state': 42
        }
        
        model = lgb.train(
            params,
            train_data,
            valid_sets=[val_data],
            num_boost_round=1000,
            callbacks=[lgb.early_stopping(100), lgb.log_evaluation(0)]
        )
        
        # Predict
        y_pred_proba = model.predict(X_test_scaled)
        y_pred = (y_pred_proba > 0.5).astype(int)
        
    else:
        # Standard sklearn models
        model.fit(X_train_scaled, y_train)
        y_pred_proba = model.predict_proba(X_test_scaled)[:, 1]
        y_pred = model.predict(X_test_scaled)
    
    # Calculate metrics
    auc_score = roc_auc_score(y_test, y_pred_proba)
    accuracy = (y_pred == y_test).mean()
    
    model_results[name] = {
        'auc': auc_score,
        'accuracy': accuracy,
        'predictions': y_pred_proba
    }
    
    trained_models[name] = model
    
    print(f"   ✅ {name} - AUC: {auc_score:.3f}, Accuracy: {accuracy:.3f}")

# Find best model
best_model_name = max(model_results.keys(), key=lambda x: model_results[x]['auc'])
best_model = trained_models[best_model_name]

print(f"\n🏆 Best Model: {best_model_name} (AUC: {model_results[best_model_name]['auc']:.3f})")

# Save the best model
os.makedirs('models', exist_ok=True)
joblib.dump(best_model, 'models/conversion_model.joblib')
joblib.dump(scaler, 'models/scaler.joblib')
joblib.dump(label_encoders, 'models/label_encoders.joblib')

print("💾 Models saved to models/ directory")


In [None]:
# Visualize model performance
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('🤖 Model Performance Analysis', fontsize=16, fontweight='bold')

# Model comparison
model_names = list(model_results.keys())
auc_scores = [model_results[name]['auc'] for name in model_names]
accuracies = [model_results[name]['accuracy'] for name in model_names]

# AUC comparison
axes[0, 0].bar(model_names, auc_scores, color=['#ff6b6b', '#4ecdc4', '#45b7d1'])
axes[0, 0].set_title('Model AUC Comparison')
axes[0, 0].set_ylabel('AUC Score')
axes[0, 0].set_ylim(0, 1)
for i, v in enumerate(auc_scores):
    axes[0, 0].text(i, v + 0.01, f'{v:.3f}', ha='center', va='bottom')

# Accuracy comparison
axes[0, 1].bar(model_names, accuracies, color=['#ff6b6b', '#4ecdc4', '#45b7d1'])
axes[0, 1].set_title('Model Accuracy Comparison')
axes[0, 1].set_ylabel('Accuracy')
axes[0, 1].set_ylim(0, 1)
for i, v in enumerate(accuracies):
    axes[0, 1].text(i, v + 0.01, f'{v:.3f}', ha='center', va='bottom')

# ROC Curves
for name, results in model_results.items():
    fpr, tpr, _ = roc_curve(y_test, results['predictions'])
    axes[1, 0].plot(fpr, tpr, label=f'{name} (AUC = {results["auc"]:.3f})', linewidth=2)

axes[1, 0].plot([0, 1], [0, 1], 'k--', alpha=0.5)
axes[1, 0].set_xlabel('False Positive Rate')
axes[1, 0].set_ylabel('True Positive Rate')
axes[1, 0].set_title('ROC Curves')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Feature importance (for tree-based models)
if best_model_name in ['Random Forest', 'LightGBM']:
    if best_model_name == 'Random Forest':
        importance = best_model.feature_importances_
    else:  # LightGBM
        importance = best_model.feature_importance()
    
    feature_names = [f'Feature {i}' for i in range(len(importance))]
    importance_df = pd.DataFrame({
        'feature': feature_names,
        'importance': importance
    }).sort_values('importance', ascending=True)
    
    axes[1, 1].barh(importance_df['feature'][-10:], importance_df['importance'][-10:], color='#a8e6cf')
    axes[1, 1].set_title(f'Top 10 Feature Importance ({best_model_name})')
    axes[1, 1].set_xlabel('Importance')
else:
    axes[1, 1].text(0.5, 0.5, 'Feature importance\nnot available for\nLogistic Regression', 
                    ha='center', va='center', transform=axes[1, 1].transAxes, fontsize=12)
    axes[1, 1].set_title('Feature Importance')

plt.tight_layout()
plt.show()

# Print detailed results
print("\n📊 Model Performance Summary:")
print("=" * 50)
for name, results in model_results.items():
    print(f"{name}:")
    print(f"  - AUC: {results['auc']:.3f}")
    print(f"  - Accuracy: {results['accuracy']:.3f}")
    print()


## 3. 🎯 Voucher Decision Engine

Now let's create a voucher decision engine that uses our trained models to determine when and what value of vouchers to send to users.


In [None]:
# Voucher Decision Engine
class VoucherDecisionEngine:
    def __init__(self, conversion_model, scaler, label_encoders):
        self.conversion_model = conversion_model
        self.scaler = scaler
        self.label_encoders = label_encoders
        self.voucher_candidates = [5, 10, 15, 20, 25, 30]
        self.min_cart_value = 50
        self.max_discount_percent = 20
        
    def predict_conversion_probability(self, features):
        """Predict conversion probability"""
        if hasattr(self.conversion_model, 'predict_proba'):
            features_scaled = self.scaler.transform(features.reshape(1, -1))
            return self.conversion_model.predict_proba(features_scaled)[0, 1]
        else:
            features_scaled = self.scaler.transform(features.reshape(1, -1))
            return self.conversion_model.predict(features_scaled)[0]
    
    def predict_voucher_response(self, features, voucher_value):
        """Predict voucher response probability (simplified)"""
        # In a real implementation, you'd have a separate voucher response model
        # For demo, we'll use a heuristic based on user segment and cart value
        base_response_rate = 0.15
        
        # Adjust based on voucher value (higher vouchers = higher response)
        voucher_multiplier = 1 + (voucher_value / 100)
        
        # Adjust based on cart value (higher cart = higher response)
        cart_multiplier = 1 + (features[0] / 1000)  # Assuming first feature is cart value
        
        return min(0.5, base_response_rate * voucher_multiplier * cart_multiplier)
    
    def calculate_expected_value(self, features, voucher_value):
        """Calculate expected value of sending a voucher"""
        cart_value = features[0]  # Assuming first feature is cart value
        
        # Predict probabilities
        p_no_voucher = self.predict_conversion_probability(features)
        p_with_voucher = self.predict_voucher_response(features, voucher_value)
        
        # Calculate expected value
        # Expected gain = (P(with voucher) - P(no voucher)) * cart_value - P(with voucher) * voucher_value
        expected_gain = (p_with_voucher - p_no_voucher) * cart_value - p_with_voucher * voucher_value
        
        return expected_gain
    
    def decide_voucher(self, session_features):
        """Decide whether to send a voucher and what value"""
        cart_value = session_features[0]
        
        # Don't send voucher if cart value is too low
        if cart_value < self.min_cart_value:
            return None, "Cart value too low"
        
        # Don't send voucher if cart value is too high (might convert anyway)
        if cart_value > 500:
            return None, "High cart value - likely to convert without voucher"
        
        best_voucher = None
        best_expected_value = -float('inf')
        best_reason = ""
        
        for voucher_value in self.voucher_candidates:
            # Don't exceed max discount percentage
            if voucher_value > cart_value * (self.max_discount_percent / 100):
                continue
            
            expected_value = self.calculate_expected_value(session_features, voucher_value)
            
            if expected_value > best_expected_value and expected_value > 0:
                best_expected_value = expected_value
                best_voucher = voucher_value
                best_reason = f"Expected value: ${expected_value:.2f}"
        
        return best_voucher, best_reason

# Initialize the decision engine
decision_engine = VoucherDecisionEngine(best_model, scaler, label_encoders)

print("🎯 Voucher Decision Engine initialized!")
print(f"   - Voucher candidates: ${decision_engine.voucher_candidates}")
print(f"   - Min cart value: ${decision_engine.min_cart_value}")
print(f"   - Max discount: {decision_engine.max_discount_percent}%")

# Test the decision engine on sample sessions
print("\n🧪 Testing Voucher Decisions on Sample Sessions:")
print("=" * 60)

sample_sessions = cart_sessions.sample(10, random_state=42)
voucher_decisions = []

for idx, session in sample_sessions.iterrows():
    # Get features for this session
    session_features = session[feature_columns].values
    
    # Make voucher decision
    voucher_value, reason = decision_engine.decide_voucher(session_features)
    
    # Calculate probabilities
    conversion_prob = decision_engine.predict_conversion_probability(session_features)
    
    decision = {
        'session_id': session['session_id'],
        'user_segment': session['user_segment'],
        'cart_value': session['max_cart_value'],
        'conversion_probability': conversion_prob,
        'voucher_value': voucher_value,
        'reason': reason
    }
    
    voucher_decisions.append(decision)
    
    print(f"Session: {session['session_id'][:8]}...")
    print(f"  User Segment: {session['user_segment']}")
    print(f"  Cart Value: ${session['max_cart_value']:.2f}")
    print(f"  Conversion Probability: {conversion_prob:.3f}")
    print(f"  Voucher Decision: ${voucher_value if voucher_value else 'None'}")
    print(f"  Reason: {reason}")
    print()

# Convert to DataFrame for analysis
voucher_df = pd.DataFrame(voucher_decisions)
print(f"📊 Voucher Decision Summary:")
print(f"   - Sessions analyzed: {len(voucher_df)}")
print(f"   - Vouchers recommended: {len(voucher_df[voucher_df['voucher_value'].notna()])}")
print(f"   - Average voucher value: ${voucher_df['voucher_value'].mean():.2f}")
print(f"   - Average cart value: ${voucher_df['cart_value'].mean():.2f}")


In [None]:
# Visualize voucher decision analysis
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('🎯 Voucher Decision Analysis', fontsize=16, fontweight='bold')

# Voucher decisions by user segment
segment_vouchers = voucher_df.groupby('user_segment')['voucher_value'].agg(['count', 'mean']).reset_index()
segment_vouchers.columns = ['segment', 'total_sessions', 'avg_voucher']

axes[0, 0].bar(segment_vouchers['segment'], segment_vouchers['avg_voucher'], 
               color=['#ff6b6b', '#4ecdc4', '#45b7d1'])
axes[0, 0].set_title('Average Voucher Value by User Segment')
axes[0, 0].set_ylabel('Average Voucher Value ($)')
for i, v in enumerate(segment_vouchers['avg_voucher']):
    if not pd.isna(v):
        axes[0, 0].text(i, v + 0.5, f'${v:.1f}', ha='center', va='bottom')

# Conversion probability vs voucher value
voucher_sessions = voucher_df[voucher_df['voucher_value'].notna()]
if len(voucher_sessions) > 0:
    axes[0, 1].scatter(voucher_sessions['conversion_probability'], voucher_sessions['voucher_value'], 
                      c=voucher_sessions['cart_value'], cmap='viridis', alpha=0.7, s=100)
    axes[0, 1].set_xlabel('Conversion Probability')
    axes[0, 1].set_ylabel('Voucher Value ($)')
    axes[0, 1].set_title('Voucher Value vs Conversion Probability')
    
    # Add colorbar
    cbar = plt.colorbar(axes[0, 1].collections[0], ax=axes[0, 1])
    cbar.set_label('Cart Value ($)')
else:
    axes[0, 1].text(0.5, 0.5, 'No vouchers\nrecommended', ha='center', va='center', 
                    transform=axes[0, 1].transAxes, fontsize=12)
    axes[0, 1].set_title('Voucher Value vs Conversion Probability')

# Cart value distribution for voucher vs no voucher
voucher_sessions = voucher_df[voucher_df['voucher_value'].notna()]
no_voucher_sessions = voucher_df[voucher_df['voucher_value'].isna()]

if len(voucher_sessions) > 0 and len(no_voucher_sessions) > 0:
    axes[1, 0].hist([no_voucher_sessions['cart_value'], voucher_sessions['cart_value']], 
                   bins=20, alpha=0.7, label=['No Voucher', 'Voucher'], 
                   color=['#ff9999', '#66b3ff'])
    axes[1, 0].set_xlabel('Cart Value ($)')
    axes[1, 0].set_ylabel('Frequency')
    axes[1, 0].set_title('Cart Value Distribution: Voucher vs No Voucher')
    axes[1, 0].legend()
else:
    axes[1, 0].text(0.5, 0.5, 'Insufficient data\nfor comparison', ha='center', va='center', 
                    transform=axes[1, 0].transAxes, fontsize=12)
    axes[1, 0].set_title('Cart Value Distribution')

# Expected ROI analysis
if len(voucher_sessions) > 0:
    # Calculate expected ROI for each voucher decision
    expected_rois = []
    for _, row in voucher_sessions.iterrows():
        # Simplified ROI calculation
        cart_value = row['cart_value']
        voucher_value = row['voucher_value']
        conversion_prob = row['conversion_probability']
        
        # Expected revenue without voucher
        expected_revenue_no_voucher = conversion_prob * cart_value
        
        # Expected revenue with voucher (assuming 20% uplift)
        expected_revenue_with_voucher = (conversion_prob * 1.2) * (cart_value - voucher_value)
        
        # ROI = (Revenue with voucher - Revenue without voucher) / Voucher cost
        roi = (expected_revenue_with_voucher - expected_revenue_no_voucher) / voucher_value
        expected_rois.append(roi)
    
    axes[1, 1].hist(expected_rois, bins=15, alpha=0.7, color='#a8e6cf')
    axes[1, 1].set_xlabel('Expected ROI')
    axes[1, 1].set_ylabel('Frequency')
    axes[1, 1].set_title('Expected ROI Distribution')
    axes[1, 1].axvline(np.mean(expected_rois), color='red', linestyle='--', 
                      label=f'Mean ROI: {np.mean(expected_rois):.2f}')
    axes[1, 1].legend()
else:
    axes[1, 1].text(0.5, 0.5, 'No vouchers\nfor ROI analysis', ha='center', va='center', 
                    transform=axes[1, 1].transAxes, fontsize=12)
    axes[1, 1].set_title('Expected ROI Distribution')

plt.tight_layout()
plt.show()

# Print ROI summary
if len(voucher_sessions) > 0:
    print(f"\n💰 ROI Analysis:")
    print(f"   - Average Expected ROI: {np.mean(expected_rois):.2f}")
    print(f"   - Positive ROI Sessions: {sum(1 for roi in expected_rois if roi > 0)}/{len(expected_rois)}")
    print(f"   - Total Expected Revenue Impact: ${sum(expected_rois) * voucher_sessions['voucher_value'].mean():.2f}")


## 4. 🌐 Real-time API Demo

Let's demonstrate how the real-time API would work by simulating some user interactions and showing the decision-making process.


In [None]:
# Simulate real-time API interactions
import json
from datetime import datetime, timedelta

class MockAPISession:
    def __init__(self, decision_engine):
        self.decision_engine = decision_engine
        self.session_data = {}
        self.events = []
        
    def track_event(self, event_data):
        """Simulate tracking an event"""
        self.events.append({
            **event_data,
            'timestamp': datetime.now().isoformat()
        })
        
        # Update session data
        session_id = event_data['session_id']
        if session_id not in self.session_data:
            self.session_data[session_id] = {
                'user_id': event_data['user_id'],
                'events': [],
                'cart_value': 0,
                'cart_items': [],
                'last_activity': event_data['timestamp'],
                'status': 'active'
            }
        
        session = self.session_data[session_id]
        session['events'].append(event_data)
        session['last_activity'] = event_data['timestamp']
        
        # Update cart if it's an add_to_cart event
        if event_data['event_type'] == 'add_to_cart':
            session['cart_value'] += event_data.get('price', 0) * event_data.get('quantity', 1)
            session['cart_items'].append({
                'product_id': event_data.get('product_id'),
                'price': event_data.get('price', 0),
                'quantity': event_data.get('quantity', 1)
            })
        elif event_data['event_type'] == 'purchase':
            session['status'] = 'converted'
        
        return {'status': 'success', 'message': 'Event tracked'}
    
    def get_session_state(self, session_id):
        """Get current session state"""
        if session_id not in self.session_data:
            return {'error': 'Session not found'}
        
        session = self.session_data[session_id]
        return {
            'session_id': session_id,
            'user_id': session['user_id'],
            'status': session['status'],
            'cart_value': session['cart_value'],
            'cart_items': session['cart_items'],
            'last_activity': session['last_activity'],
            'event_count': len(session['events'])
        }
    
    def predict_conversion(self, session_id):
        """Predict conversion probability for a session"""
        if session_id not in self.session_data:
            return {'error': 'Session not found'}
        
        session = self.session_data[session_id]
        
        # Create features (matching the training data structure)
        features = np.array([
            session['cart_value'],  # cart value
            len(session['events']),  # event count
            30,  # session duration (simplified)
            30,  # recency days (simplified)
            1,   # frequency (simplified)
            0,   # monetary value (simplified)
            0,   # avg order value (simplified)
            len([e for e in session['events'] if e['event_type'] == 'page_view']),  # page views
            len([e for e in session['events'] if e['event_type'] == 'product_view']),  # product views
            len([e for e in session['events'] if e['event_type'] == 'add_to_cart']),  # add to cart events
            len(set([e.get('product_id') for e in session['events'] if e.get('product_id')])),  # unique products
            50,  # avg product price (simplified)
            0,   # bounce rate
            datetime.now().hour,  # hour of day
            datetime.now().weekday(),  # day of week
            1 if datetime.now().weekday() in [5, 6] else 0,  # is weekend
            1 if 9 <= datetime.now().hour <= 17 else 0,  # is business hours
            0,   # user_segment_encoded (simplified)
            0,   # country_encoded (simplified)
            0,   # device_encoded (simplified)
        ])
        
        probability = self.decision_engine.predict_conversion_probability(features)
        
        return {
            'session_id': session_id,
            'conversion_probability': float(probability),
            'confidence': 'high' if probability > 0.7 else 'medium' if probability > 0.3 else 'low'
        }
    
    def decide_voucher(self, session_id):
        """Make voucher decision for a session"""
        if session_id not in self.session_data:
            return {'error': 'Session not found'}
        
        session = self.session_data[session_id]
        
        # Create features (same as predict_conversion)
        features = np.array([
            session['cart_value'],
            len(session['events']),
            30, 30, 1, 0, 0,  # RFM features (simplified)
            len([e for e in session['events'] if e['event_type'] == 'page_view']),
            len([e for e in session['events'] if e['event_type'] == 'product_view']),
            len([e for e in session['events'] if e['event_type'] == 'add_to_cart']),
            len(set([e.get('product_id') for e in session['events'] if e.get('product_id')])),
            50, 0,  # avg price, bounce rate
            datetime.now().hour, datetime.now().weekday(),
            1 if datetime.now().weekday() in [5, 6] else 0,
            1 if 9 <= datetime.now().hour <= 17 else 0,
            0,   # user_segment_encoded (simplified)
            0,   # country_encoded (simplified)
            0,   # device_encoded (simplified)
        ])
        
        voucher_value, reason = self.decision_engine.decide_voucher(features)
        conversion_prob = self.decision_engine.predict_conversion_probability(features)
        
        return {
            'session_id': session_id,
            'cart_value': session['cart_value'],
            'conversion_probability': float(conversion_prob),
            'voucher_value': voucher_value,
            'reason': reason,
            'recommendation': 'Send voucher' if voucher_value else 'No voucher needed'
        }

# Initialize mock API
mock_api = MockAPISession(decision_engine)

print("🌐 Mock API initialized!")
print("Simulating user interactions...")

# Simulate a user session
session_id = "demo_session_001"
user_id = "demo_user_001"

print(f"\n👤 User Session: {session_id}")
print("=" * 50)

# Track events
events = [
    {
        'event_type': 'page_view',
        'user_id': user_id,
        'session_id': session_id,
        'timestamp': datetime.now().isoformat(),
        'page_url': '/home',
        'referrer': 'direct',
        'device': 'desktop',
        'country': 'US'
    },
    {
        'event_type': 'product_view',
        'user_id': user_id,
        'session_id': session_id,
        'timestamp': datetime.now().isoformat(),
        'product_id': 'SKU_001',
        'price': 99.99,
        'page_url': '/product/SKU_001',
        'device': 'desktop',
        'country': 'US'
    },
    {
        'event_type': 'add_to_cart',
        'user_id': user_id,
        'session_id': session_id,
        'timestamp': datetime.now().isoformat(),
        'product_id': 'SKU_001',
        'price': 99.99,
        'quantity': 1,
        'device': 'desktop',
        'country': 'US'
    },
    {
        'event_type': 'product_view',
        'user_id': user_id,
        'session_id': session_id,
        'timestamp': datetime.now().isoformat(),
        'product_id': 'SKU_002',
        'price': 149.99,
        'page_url': '/product/SKU_002',
        'device': 'desktop',
        'country': 'US'
    },
    {
        'event_type': 'add_to_cart',
        'user_id': user_id,
        'session_id': session_id,
        'timestamp': datetime.now().isoformat(),
        'product_id': 'SKU_002',
        'price': 149.99,
        'quantity': 1,
        'device': 'desktop',
        'country': 'US'
    }
]

# Track each event
for event in events:
    result = mock_api.track_event(event)
    print(f"📝 {event['event_type']}: {result['status']}")

# Get session state
session_state = mock_api.get_session_state(session_id)
print(f"\n📊 Session State:")
print(f"   - User ID: {session_state['user_id']}")
print(f"   - Status: {session_state['status']}")
print(f"   - Cart Value: ${session_state['cart_value']:.2f}")
print(f"   - Cart Items: {len(session_state['cart_items'])}")
print(f"   - Events: {session_state['event_count']}")

# Predict conversion
conversion_pred = mock_api.predict_conversion(session_id)
print(f"\n🎯 Conversion Prediction:")
print(f"   - Probability: {conversion_pred['conversion_probability']:.3f}")
print(f"   - Confidence: {conversion_pred['confidence']}")

# Make voucher decision
voucher_decision = mock_api.decide_voucher(session_id)
print(f"\n🎫 Voucher Decision:")
print(f"   - Recommendation: {voucher_decision['recommendation']}")
if voucher_decision['voucher_value']:
    print(f"   - Voucher Value: ${voucher_decision['voucher_value']}")
    print(f"   - Reason: {voucher_decision['reason']}")
else:
    print(f"   - Reason: {voucher_decision['reason']}")

print(f"\n✅ API Demo completed!")


## 🎉 Conclusion

Congratulations! You've successfully built and demonstrated a complete AI-powered customer personalization platform. Here's what we've accomplished:

### ✅ What We Built

1. **📊 Event Simulation System**
   - Generated synthetic user behavior data
   - Created realistic e-commerce events (page views, product views, cart additions, purchases)
   - Simulated different user segments and behaviors

2. **🤖 Machine Learning Pipeline**
   - Engineered comprehensive features (RFM, behavioral, temporal)
   - Trained multiple ML models (Logistic Regression, Random Forest, LightGBM)
   - Achieved good performance for conversion prediction
   - Saved trained models for real-time use

3. **🎯 Voucher Decision Engine**
   - Created an intelligent system to decide when and what value of vouchers to send
   - Implemented expected value calculations
   - Optimized for ROI and business impact

4. **🌐 Real-time API Demo**
   - Simulated a FastAPI service for real-time event tracking
   - Demonstrated session management and decision-making
   - Showed end-to-end user interaction flow

### 🚀 Next Steps for Production

To deploy this system in production, you would need to:

1. **Infrastructure Setup**
   - Deploy FastAPI service on cloud (AWS, GCP, Azure)
   - Set up Redis for session storage
   - Configure monitoring and logging

2. **Data Pipeline**
   - Connect to real user event streams
   - Implement data validation and quality checks
   - Set up automated model retraining

3. **Frontend Integration**
   - Deploy the JavaScript tracking code to your website
   - Configure event collection and API endpoints
   - Implement A/B testing for voucher strategies

4. **Advanced Features**
   - Add more sophisticated ML models (deep learning, ensemble methods)
   - Implement multi-armed bandit algorithms for voucher optimization
   - Add real-time personalization for product recommendations
   - Integrate with email/SMS marketing platforms

### 📈 Business Impact

This platform can help you:
- **Increase conversion rates** by targeting the right users with the right incentives
- **Optimize marketing spend** by focusing on high-value opportunities
- **Improve customer experience** through personalized interactions
- **Scale personalization** across your entire user base

### 🔧 Technical Architecture

The system follows modern ML engineering practices:
- **Modular design** with separate components for data, training, and serving
- **Real-time processing** for immediate decision-making
- **Scalable infrastructure** that can handle high-volume traffic
- **Model versioning** and A/B testing capabilities

---

**🎯 Ready to personalize your customer experience with AI!**

Run each cell in this notebook to see the complete system in action. The generated models and data can be used as a starting point for your production implementation.
