# Multi-Product Cross-Sell Model - Enhanced Feature Engineering

## Objective
Add interaction features and market context indicators to improve model performance and explainability for agent recommendations.

## Key Features:
1. **Product Affinity Patterns**: FROM → TO product transition features
2. **Market Context Indicators**: S&P 500 volatility, trends, momentum using actual data
3. **Interaction Features**: Age × Product, AUM × Product, Channel × Product, Market × Product
4. **Cross-Sell Timing Features**: Quick vs delayed cross-sell patterns
5. **Explainability Features**: Rich metadata for agent conversations

## Approach
- Build on multi_product_training_data from notebook 01 (already has market data and life stage triggers)
- Add product transition affinity features
- Create market context indicators using actual S&P fields
- Generate interaction features for better predictions
- Prepare final feature set for multi-output model training

## Note
Notebook 01 already provides: client features, life stage triggers, risk tolerance, asset allocation, and S&P 500 data.
This notebook adds the final layer of interaction and affinity features.


In [0]:
# Configuration
dbutils.widgets.text("target_schema", "eda_smartlist.us_wealth_management_smartlist")
dbutils.widgets.text("business_month", "202510")

# Get parameters
target_schema = dbutils.widgets.get("target_schema")
business_month = dbutils.widgets.get("business_month")

print(f"Target Schema: {target_schema}")
print(f"Business Month: {business_month}")

# Import required libraries
from pyspark.sql.functions import col, when, abs as spark_abs


Target Schema: eda_smartlist.us_wealth_management_smartlist
Business Month: 202510


## Step 1: Create Market Context Indicators

### Add S&P 500 volatility, trends, and momentum indicators using actual data from notebook 01


In [0]:
# Step 1: Verify source data and check available S&P 500 fields
print("=== Checking Source Data from Notebook 01 ===\n")

# Check what fields we have from multi_product_training_data
source_data = spark.sql(f"SELECT * FROM {target_schema}.multi_product_training_data LIMIT 1")
print(f"Total columns available: {len(source_data.columns)}")

# Check S&P 500 fields available
snp_fields = [col for col in source_data.columns if 'snp' in col.lower()]
print(f"\nS&P 500 fields available: {len(snp_fields)}")
for field in snp_fields[:10]:  # Show first 10
    print(f"  • {field}")

# Check record count
total_records = spark.sql(f"SELECT COUNT(*) as total FROM {target_schema}.multi_product_training_data").collect()[0]['total']
print(f"\nTotal training records: {total_records:,}")
print("✅ Source data verified")


=== Checking Source Data from Notebook 01 ===

Total columns available: 108

S&P 500 fields available: 25
  • snp_business_month
  • snp_close_variation
  • snp_open_variation
  • snp_close_month
  • snp_open_month
  • snp_high_month
  • snp_low_month
  • snp_high_variation
  • snp_low_variation
  • snp_close_lead_3

Total training records: 297,376
✅ Source data verified


## Step 2: Drop Existing Final Training Table

### Prepare for creating the final enhanced training dataset


In [0]:
# Step 2: Drop existing final training table
spark.sql(f"DROP TABLE IF EXISTS {target_schema}.multi_product_final_training_data")
print("✅ Dropped existing final training table (if it existed)")


✅ Dropped existing final training table (if it existed)


## Step 3: Create Final Training Dataset with Enhanced Features

### Add market context indicators, product affinity patterns, and interaction features


In [0]:
# Step 3: Create final training dataset with all enhanced features

spark.sql(f"""
CREATE TABLE {target_schema}.multi_product_final_training_data
USING delta 
AS
SELECT 
    base.*,
    
    -- ============================================
    -- MARKET CONTEXT INDICATORS (Using actual S&P fields)
    -- ============================================
    
    -- Market volatility indicators (using snp_close_variation)
    CASE WHEN ABS(snp_close_variation) > 2.0 THEN 1 ELSE 0 END AS high_market_volatility,
    CASE WHEN ABS(snp_close_variation) > 5.0 THEN 1 ELSE 0 END AS extreme_market_volatility,
    CASE WHEN ABS(snp_close_variation) < 1.0 THEN 1 ELSE 0 END AS low_market_volatility,
    
    -- Market trend indicators (using snp_close_lead fields)
    CASE WHEN snp_close_lead_3 > 0 THEN 1 ELSE 0 END AS market_uptrend_3m,
    CASE WHEN snp_close_lead_6 > 0 THEN 1 ELSE 0 END AS market_uptrend_6m,
    CASE WHEN snp_close_lead_12 > 0 THEN 1 ELSE 0 END AS market_uptrend_12m,
    
    -- Market momentum strength
    CASE WHEN snp_close_lead_3 > 5.0 THEN 1 ELSE 0 END AS strong_market_momentum_3m,
    CASE WHEN snp_close_lead_6 > 10.0 THEN 1 ELSE 0 END AS strong_market_momentum_6m,
    
    -- Market conditions for product recommendations
    CASE WHEN ABS(snp_close_variation) < 1.0 AND snp_close_lead_3 > 0 THEN 1 ELSE 0 END AS stable_growth_market,
    CASE WHEN ABS(snp_close_variation) > 3.0 THEN 1 ELSE 0 END AS volatile_market,
    CASE WHEN snp_close_lead_6 < -5.0 THEN 1 ELSE 0 END AS declining_market,
    
    -- ============================================
    -- PRODUCT AFFINITY PATTERNS (FROM → TO transitions)
    -- ============================================
    
    -- Common cross-sell patterns (based on test data: Investment 64%, Retirement 26%, Life 5%)
    CASE WHEN product_category = 'LIFE_INSURANCE' AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS life_to_retirement_pattern,
    CASE WHEN product_category = 'LIFE_INSURANCE' AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS life_to_investment_pattern,
    CASE WHEN product_category = 'RETIREMENT' AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS retirement_to_investment_pattern,
    CASE WHEN product_category = 'RETIREMENT' AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS retirement_to_life_pattern,
    CASE WHEN product_category = 'INVESTMENT' AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS investment_to_retirement_pattern,
    CASE WHEN product_category = 'INVESTMENT' AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS investment_to_life_pattern,
    
    -- ============================================
    -- AGE × PRODUCT INTERACTIONS
    -- ============================================
    
    -- Senior clients (55+) and product targets
    CASE WHEN client_age >= 55 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS senior_retirement_target,
    CASE WHEN client_age >= 55 AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS senior_life_target,
    CASE WHEN client_age >= 55 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS senior_investment_target,
    
    -- Mid-career clients (40-54) and product targets
    CASE WHEN client_age >= 40 AND client_age < 55 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS midcareer_retirement_target,
    CASE WHEN client_age >= 40 AND client_age < 55 AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS midcareer_life_target,
    CASE WHEN client_age >= 40 AND client_age < 55 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS midcareer_investment_target,
    
    -- Young clients (<40) and product targets
    CASE WHEN client_age < 40 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS young_retirement_target,
    CASE WHEN client_age < 40 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS young_investment_target,
    
    -- ============================================
    -- AUM × PRODUCT INTERACTIONS
    -- ============================================
    
    -- High AUM clients and product targets
    CASE WHEN acct_val_amt > 100000 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS high_aum_investment_target,
    CASE WHEN acct_val_amt > 100000 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS high_aum_retirement_target,
    CASE WHEN acct_val_amt > 200000 AND cross_sell_product_category = 'NETWORK_PRODUCTS' THEN 1 ELSE 0 END AS ultra_high_aum_network_target,
    
    -- Medium AUM clients and product targets
    CASE WHEN acct_val_amt >= 50000 AND acct_val_amt <= 100000 AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS medium_aum_life_target,
    CASE WHEN acct_val_amt >= 50000 AND acct_val_amt <= 100000 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS medium_aum_retirement_target,
    
    -- ============================================
    -- CHANNEL × PRODUCT INTERACTIONS
    -- ============================================
    
    -- Branch channel and product targets
    CASE WHEN channel = 'Branch Assist' AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS branch_to_retirement,
    CASE WHEN channel = 'Branch Assist' AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS branch_to_life,
    
    -- Advisor channel and product targets
    CASE WHEN channel = 'Advisor Assist/Retail' AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS advisor_to_investment,
    CASE WHEN channel = 'Advisor Assist/Retail' AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS advisor_to_retirement,
    
    -- ============================================
    -- MARKET × PRODUCT AFFINITY
    -- ============================================
    
    -- Volatile market and product demand
    CASE WHEN ABS(snp_close_variation) > 3.0 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS volatile_market_retirement_demand,
    CASE WHEN ABS(snp_close_variation) > 3.0 AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS volatile_market_life_demand,
    
    -- Stable market and product demand
    CASE WHEN ABS(snp_close_variation) < 1.0 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS stable_market_investment_demand,
    CASE WHEN ABS(snp_close_variation) < 1.0 AND snp_close_lead_6 > 5.0 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS stable_growth_investment_demand,
    
    -- ============================================
    -- RISK TOLERANCE × PRODUCT INTERACTIONS
    -- ============================================
    
    -- Aggressive investors and product targets
    CASE WHEN aggressive_investor = 1 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS aggressive_to_investment,
    CASE WHEN aggressive_investor = 1 AND cross_sell_product_category = 'NETWORK_PRODUCTS' THEN 1 ELSE 0 END AS aggressive_to_network,
    
    -- Conservative investors and product targets
    CASE WHEN conservative_investor = 1 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS conservative_to_retirement,
    CASE WHEN conservative_investor = 1 AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS conservative_to_life,
    
    -- ============================================
    -- LIFE STAGE × PRODUCT INTERACTIONS
    -- ============================================
    
    -- Retirement planning trigger and product targets
    CASE WHEN retirement_planning_trigger = 1 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS retirement_trigger_to_retirement,
    CASE WHEN retirement_planning_trigger = 1 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS retirement_trigger_to_investment,
    CASE WHEN retirement_planning_trigger = 1 AND snp_close_lead_12 > 0 THEN 1 ELSE 0 END AS retirement_trigger_in_uptrend,
    
    -- Family protection trigger and product targets
    CASE WHEN family_protection_trigger = 1 AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS family_trigger_to_life,
    CASE WHEN family_protection_trigger = 1 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS family_trigger_to_retirement,
    
    -- Wealth building trigger and product targets
    CASE WHEN wealth_building_trigger = 1 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS wealth_trigger_to_investment,
    CASE WHEN wealth_building_trigger = 1 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS wealth_trigger_to_retirement,
    
    -- ============================================
    -- CROSS-SELL TIMING FEATURES
    -- ============================================
    
    -- Quick cross-sell (within 1 year)
    CASE WHEN days_to_cross_sell < 365 THEN 1 ELSE 0 END AS quick_cross_sell,
    CASE WHEN days_to_cross_sell < 365 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS quick_to_investment,
    CASE WHEN days_to_cross_sell < 365 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS quick_to_retirement,
    
    -- Medium timing (1-3 years)
    CASE WHEN days_to_cross_sell >= 365 AND days_to_cross_sell < 1095 THEN 1 ELSE 0 END AS medium_timing_cross_sell,
    
    -- Delayed cross-sell (3+ years)
    CASE WHEN days_to_cross_sell >= 1095 THEN 1 ELSE 0 END AS delayed_cross_sell,
    CASE WHEN days_to_cross_sell >= 1825 THEN 1 ELSE 0 END AS very_delayed_cross_sell,
    
    -- ============================================
    -- PRODUCT CATEGORY × MARKET CONDITIONS
    -- ============================================
    
    -- Current product in market context
    CASE WHEN product_category = 'LIFE_INSURANCE' AND ABS(snp_close_variation) > 3.0 THEN 1 ELSE 0 END AS life_in_volatile_market,
    CASE WHEN product_category = 'RETIREMENT' AND ABS(snp_close_variation) < 1.0 THEN 1 ELSE 0 END AS retirement_in_stable_market,
    CASE WHEN product_category = 'INVESTMENT' AND snp_close_lead_6 > 5.0 THEN 1 ELSE 0 END AS investment_in_growth_market,
    
    -- ============================================
    -- COMBINED AFFINITY FEATURES (For Explainability)
    -- ============================================
    
    -- High-value senior retirement planning
    CASE WHEN client_age >= 55 AND acct_val_amt > 100000 AND retirement_planning_trigger = 1 
              AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS ideal_retirement_candidate,
    
    -- Young affluent wealth building
    CASE WHEN client_age < 45 AND acct_val_amt > 50000 AND wealth_building_trigger = 1 
              AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS ideal_investment_candidate,
    
    -- Family protection life insurance
    CASE WHEN client_age >= 30 AND client_age < 55 AND family_protection_trigger = 1 
              AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS ideal_life_candidate,
    
    -- High net worth network products
    CASE WHEN acct_val_amt > 200000 AND cross_sell_product_category = 'NETWORK_PRODUCTS' THEN 1 ELSE 0 END AS ideal_network_candidate,
    
    -- ============================================
    -- TENURE × PRODUCT AFFINITY
    -- ============================================
    
    -- Long tenure clients (5+ years)
    CASE WHEN client_tenure_years > 5 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS long_tenure_retirement_affinity,
    CASE WHEN client_tenure_years > 5 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS long_tenure_investment_affinity,
    
    -- New clients (<2 years)
    CASE WHEN client_tenure_years < 2 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS new_client_investment_affinity,
    CASE WHEN client_tenure_years < 2 AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS new_client_life_affinity,
    
    -- ============================================
    -- AUM SEGMENT × PRODUCT AFFINITY
    -- ============================================
    
    CASE WHEN aum_segment = 'ULTRA_HIGH' AND cross_sell_product_category = 'NETWORK_PRODUCTS' THEN 1 ELSE 0 END AS ultra_high_to_network,
    CASE WHEN aum_segment = 'ULTRA_HIGH' AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS ultra_high_to_investment,
    CASE WHEN aum_segment = 'HIGH' AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS high_to_retirement,
    CASE WHEN aum_segment = 'MEDIUM' AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS medium_to_life,
    
    -- ============================================
    -- MARKET × CLIENT × PRODUCT (3-way interactions)
    -- ============================================
    
    -- Volatile market + aggressive investor + investment target
    CASE WHEN ABS(snp_close_variation) > 3.0 AND aggressive_investor = 1 AND cross_sell_product_category = 'INVESTMENT' THEN 1 ELSE 0 END AS volatile_aggressive_investment,
    
    -- Stable market + conservative investor + retirement target
    CASE WHEN ABS(snp_close_variation) < 1.0 AND conservative_investor = 1 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS stable_conservative_retirement,
    
    -- Growth market + retirement planning + retirement target
    CASE WHEN snp_close_lead_12 > 10.0 AND retirement_planning_trigger = 1 AND cross_sell_product_category = 'RETIREMENT' THEN 1 ELSE 0 END AS growth_retirement_planning,
    
    -- Volatile market + family protection + life target
    CASE WHEN ABS(snp_close_variation) > 3.0 AND family_protection_trigger = 1 AND cross_sell_product_category = 'LIFE_INSURANCE' THEN 1 ELSE 0 END AS volatile_family_life
    
FROM {target_schema}.multi_product_training_data base
""")

print("✅ Final training dataset created with enhanced interaction features")


✅ Final training dataset created with enhanced interaction features


## Step 4: Feature Summary and Quality Check

### Verify enhanced features and interaction patterns


In [0]:
# Step 4: Feature summary and quality check

print("=== Final Training Dataset Summary ===\n")

# Check total records
total_records = spark.sql(f"SELECT COUNT(*) AS total FROM {target_schema}.multi_product_final_training_data").collect()[0]['total']
print(f"Total training records: {total_records:,}")

# Check market context indicators
print("\n=== Market Context Indicators Distribution ===")
market_dist = spark.sql(f"""
SELECT 
    SUM(high_market_volatility) AS high_volatility,
    SUM(extreme_market_volatility) AS extreme_volatility,
    SUM(low_market_volatility) AS low_volatility,
    SUM(stable_growth_market) AS stable_growth,
    SUM(volatile_market) AS volatile_market,
    SUM(declining_market) AS declining_market,
    SUM(market_uptrend_3m) AS uptrend_3m,
    SUM(market_uptrend_6m) AS uptrend_6m,
    SUM(market_uptrend_12m) AS uptrend_12m
FROM {target_schema}.multi_product_final_training_data
""").collect()[0]

for condition, count in market_dist.asDict().items():
    percentage = (count / total_records) * 100 if total_records > 0 else 0
    print(f"{condition}: {count:,} clients ({percentage:.1f}%)")

# Check product affinity patterns
print("\n=== Product Affinity Patterns Distribution ===")
affinity_dist = spark.sql(f"""
SELECT 
    SUM(life_to_retirement_pattern) AS life_to_retirement,
    SUM(life_to_investment_pattern) AS life_to_investment,
    SUM(retirement_to_investment_pattern) AS retirement_to_investment,
    SUM(retirement_to_life_pattern) AS retirement_to_life,
    SUM(investment_to_retirement_pattern) AS investment_to_retirement,
    SUM(investment_to_life_pattern) AS investment_to_life
FROM {target_schema}.multi_product_final_training_data
""").collect()[0]

for pattern, count in affinity_dist.asDict().items():
    print(f"{pattern}: {count:,} clients")

# Check ideal candidate features (for explainability)
print("\n=== Ideal Candidate Features Distribution ===")
ideal_dist = spark.sql(f"""
SELECT 
    SUM(ideal_retirement_candidate) AS ideal_retirement,
    SUM(ideal_investment_candidate) AS ideal_investment,
    SUM(ideal_life_candidate) AS ideal_life,
    SUM(ideal_network_candidate) AS ideal_network
FROM {target_schema}.multi_product_final_training_data
""").collect()[0]

for candidate, count in ideal_dist.asDict().items():
    percentage = (count / total_records) * 100 if total_records > 0 else 0
    print(f"{candidate}: {count:,} clients ({percentage:.1f}%)")

# Check cross-sell timing distribution
print("\n=== Cross-Sell Timing Distribution ===")
timing_dist = spark.sql(f"""
SELECT 
    SUM(quick_cross_sell) AS quick_cross_sell,
    SUM(medium_timing_cross_sell) AS medium_timing,
    SUM(delayed_cross_sell) AS delayed_cross_sell,
    SUM(very_delayed_cross_sell) AS very_delayed
FROM {target_schema}.multi_product_final_training_data
""").collect()[0]

for timing, count in timing_dist.asDict().items():
    percentage = (count / total_records) * 100 if total_records > 0 else 0
    print(f"{timing}: {count:,} clients ({percentage:.1f}%)")

print("\n✅ Feature engineering validation complete!")


=== Final Training Dataset Summary ===

Total training records: 297,376

=== Market Context Indicators Distribution ===
high_volatility: 217,318 clients (73.1%)
extreme_volatility: 81,202 clients (27.3%)
low_volatility: 45,051 clients (15.1%)
stable_growth: 34,400 clients (11.6%)
volatile_market: 162,704 clients (54.7%)
declining_market: 28,305 clients (9.5%)
uptrend_3m: 213,603 clients (71.8%)
uptrend_6m: 226,528 clients (76.2%)
uptrend_12m: 241,786 clients (81.3%)

=== Product Affinity Patterns Distribution ===
life_to_retirement: 2,430 clients
life_to_investment: 1,512 clients
retirement_to_investment: 8,913 clients
retirement_to_life: 2,023 clients
investment_to_retirement: 30,941 clients
investment_to_life: 8,872 clients

=== Ideal Candidate Features Distribution ===
ideal_retirement: 5,686 clients (1.9%)
ideal_investment: 40 clients (0.0%)
ideal_life: 656 clients (0.2%)
ideal_network: 1,906 clients (0.6%)

=== Cross-Sell Timing Distribution ===
quick_cross_sell: 159,125 clients (

## Step 5: Sample Data Preview with Explainability Features

### Preview the final training data showing key interaction and affinity features


In [0]:
# Step 5: Sample data preview with explainability features
print("=== Sample Final Training Data ===\n")

# Show key features for model training
sample_data = spark.sql(f"""
SELECT 
    axa_party_id,
    client_age,
    acct_val_amt,
    product_category AS current_product,
    cross_sell_product_category AS target_product,
    days_to_cross_sell,
    
    -- Primary targets
    life_insurance_cross_sell,
    retirement_cross_sell,
    investment_cross_sell,
    
    -- Life stage triggers
    retirement_planning_trigger,
    family_protection_trigger,
    wealth_building_trigger,
    
    -- Market context
    high_market_volatility,
    stable_growth_market,
    market_uptrend_6m,
    
    -- Product affinity patterns
    life_to_retirement_pattern,
    retirement_to_investment_pattern,
    
    -- Ideal candidate features (for explainability)
    ideal_retirement_candidate,
    ideal_investment_candidate,
    ideal_life_candidate,
    
    -- Interaction features
    senior_retirement_target,
    high_aum_investment_target,
    advisor_to_investment
    
FROM {target_schema}.multi_product_final_training_data
LIMIT 10
""")

sample_data.show(truncate=False)

# Show feature counts
print("\n=== Total Feature Count ===")
total_columns = len(spark.sql(f"SELECT * FROM {target_schema}.multi_product_final_training_data LIMIT 1").columns)
print(f"Total columns in final training data: {total_columns}")
print("✅ Final training dataset ready for model training!")


=== Sample Final Training Data ===

+--------------------+----------+------------+----------------+----------------+------------------+-------------------------+---------------------+---------------------+---------------------------+-------------------------+-----------------------+----------------------+--------------------+-----------------+--------------------------+--------------------------------+--------------------------+--------------------------+--------------------+------------------------+--------------------------+---------------------+
|axa_party_id        |client_age|acct_val_amt|current_product |target_product  |days_to_cross_sell|life_insurance_cross_sell|retirement_cross_sell|investment_cross_sell|retirement_planning_trigger|family_protection_trigger|wealth_building_trigger|high_market_volatility|stable_growth_market|market_uptrend_6m|life_to_retirement_pattern|retirement_to_investment_pattern|ideal_retirement_candidate|ideal_investment_candidate|ideal_life_candidate|s

## Summary

### This notebook creates:
1. **Market Context Indicators**: S&P 500 volatility, trends, momentum using actual fields
2. **Product Affinity Patterns**: FROM → TO transition features (6 patterns)
3. **Age × Product Interactions**: Senior, mid-career, young client targeting (8 features)
4. **AUM × Product Interactions**: High, medium AUM targeting (5 features)
5. **Channel × Product Interactions**: Branch, advisor channel affinity (4 features)
6. **Market × Product Affinity**: Volatile/stable market product demand (4 features)
7. **Risk Tolerance × Product**: Aggressive/conservative investor targeting (4 features)
8. **Life Stage × Product**: Trigger-based product affinity (6 features)
9. **Cross-Sell Timing Features**: Quick, medium, delayed patterns (6 features)
10. **Ideal Candidate Features**: Combined affinity for explainability (4 features)
11. **3-Way Interactions**: Market × Client × Product combinations (4 features)

### Total Enhanced Features Added: ~55 interaction and affinity features

### Output Table:
- **multi_product_final_training_data**: Complete training dataset with all features ready for model training

### Key Innovations:
- **Uses actual S&P fields**: No simulated data, only real market indicators
- **Product affinity patterns**: Learns FROM → TO transitions
- **Explainability ready**: Ideal candidate features for agent conversations
- **Business aligned**: Features match how agents think about clients
- **Timing aware**: Cross-sell speed patterns for urgency scoring

### Next Steps:
- **Notebook 03**: Train multi-output XGBoost models using this final dataset
- **Focus on 4 main categories**: Life, Retirement, Investment, Network (Health/Disability optional)
- **Use AutoML or custom training**: Both approaches supported
- **Feature importance**: Will show which interaction features matter most

### Expected Feature Importance:
Based on the interaction features, we expect these to be highly predictive:
- Product affinity patterns (FROM → TO)
- Ideal candidate features (combined triggers)
- Age × Product interactions
- AUM × Product interactions
- Market context indicators
