# Causal Analysis for Cross-Modal Audience Intelligence

This notebook demonstrates how to perform causal analysis on audience engagement data, including:
- Causal structure discovery
- Causal effect estimation
- Counterfactual analysis
- Causal feature selection
- Actionable recommendations for optimizing content

We'll use the structural causal model and counterfactual analysis tools from the CAIP platform to understand what truly drives audience engagement, going beyond simple correlations.

In [1]:
%pip install pandas numpy matplotlib seaborn networkx scipy tqdm torch torchvision scikit-learn pillow


Note: you may need to restart the kernel to use updated packages.


In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import networkx as nx
from scipy import stats
from typing import Dict, List, Tuple, Optional, Union, Any
import torch
from tqdm.notebook import tqdm
import time
import warnings
import os
import json

# Import CAIP causal inference components
from causal.structural_model import CausalGraph, StructuralCausalModel
from causal.causal_features import CausalFeatureSelector
from causal.counterfactual import CounterfactualAnalyzer

# Import other CAIP components as needed
from models.text.text_features import TextFeatureExtractor
from models.visual.visual_features import VisualFeatureExtractor
from data.data_loader import DataLoader

# Configure matplotlib
plt.style.use('seaborn-whitegrid')
plt.rcParams['figure.figsize'] = (14, 8)
plt.rcParams['font.size'] = 12

# Suppress warnings
warnings.filterwarnings('ignore')

# Set seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

## 1. Data Loading and Preprocessing

Let's load our audience engagement dataset and prepare it for causal analysis. We'll use a combination of Nielsen panel data, streaming platform metrics, and content features.

In [None]:
# Configure paths
DATA_DIR = "./data"
MODELS_DIR = "./models/saved"
RESULTS_DIR = "./results/causal"

# Create directories if they don't exist
os.makedirs(RESULTS_DIR, exist_ok=True)

# Initialize data loader
data_loader = DataLoader(cache_dir=f"{DATA_DIR}/cache")

try:
    # Try to load processed data
    audience_data = data_loader.load_processed_data()
    print(f"Loaded audience data with {len(audience_data)} entries")
except FileNotFoundError:
    print("No processed data found, creating synthetic data for demonstration")
    
    # Generate synthetic data for demonstration
    # This would be replaced with real data loading in production
    np.random.seed(42)
    n_samples = 1000
    
    # Create synthetic content features and engagement data
    audience_data = pd.DataFrame({
        'content_id': [f'SHOW{i:03d}' for i in range(n_samples)],
        'content_duration': np.random.randint(15, 120, n_samples),  # in minutes
        'episode_number': np.random.randint(1, 20, n_samples),
        'genre_drama': np.random.randint(0, 2, n_samples),
        'genre_comedy': np.random.randint(0, 2, n_samples),
        'genre_action': np.random.randint(0, 2, n_samples),
        'has_popular_actor': np.random.randint(0, 2, n_samples),
        'budget_tier': np.random.randint(1, 4, n_samples),  # 1=low, 2=medium, 3=high
        'promotion_level': np.random.randint(1, 5, n_samples),  # 1-4 scale
        'title_length': np.random.randint(1, 8, n_samples),  # in words
        'is_sequel': np.random.randint(0, 2, n_samples),
        'thumbnail_brightness': np.random.uniform(0.2, 0.9, n_samples),
        'thumbnail_saturation': np.random.uniform(0.3, 1.0, n_samples),
        'content_freshness': np.random.randint(0, 100, n_samples),  # days since release
        'description_sentiment': np.random.normal(0.2, 0.3, n_samples).clip(-1, 1),
        'text_complexity': np.random.uniform(0.1, 0.9, n_samples),
        'social_media_mentions': np.random.exponential(scale=50, size=n_samples).astype(int),
        'weekday_release': np.random.randint(0, 2, n_samples),
        'release_month': np.random.randint(1, 13, n_samples),
        'similar_content_performance': np.random.normal(0.5, 0.15, n_samples).clip(0, 1)
    })
    
    # Define the true causal structure for synthetic data generation
    # This is our "ground truth" for demonstration purposes
    
    # Create engagement with a known causal structure
    audience_data['engagement'] = (
        0.3 * audience_data['promotion_level'] / 4 +
        0.2 * audience_data['has_popular_actor'] +
        0.15 * (audience_data['thumbnail_brightness'] > 0.6).astype(int) +
        -0.1 * (audience_data['content_duration'] > 90).astype(int) +
        0.2 * audience_data['genre_action'] -
        0.15 * audience_data['content_freshness'] / 100 +
        0.25 * audience_data['similar_content_performance'] +
        0.1 * np.random.normal(0, 1, n_samples)  # Noise term
    )
    
    # Normalize engagement to [0, 1] range
    audience_data['engagement'] = (audience_data['engagement'] - audience_data['engagement'].min()) / \
                              (audience_data['engagement'].max() - audience_data['engagement'].min())
    
    print(f"Created synthetic data with {len(audience_data)} samples")

# Display the first few rows
audience_data.head()

## 2. Exploratory Analysis

Before diving into causal analysis, let's explore the correlations between features and engagement to see what traditional analysis would tell us.

In [None]:
# Calculate correlation with engagement
correlation_with_engagement = audience_data.corr()['engagement'].sort_values(ascending=False)

# Plot correlation with engagement
plt.figure(figsize=(12, 8))
correlation_with_engagement.drop('engagement').plot(kind='bar')
plt.title('Correlation of Features with Engagement')
plt.xlabel('Features')
plt.ylabel('Pearson Correlation')
plt.axhline(y=0, color='black', linestyle='--', alpha=0.3)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

# Display top correlations
print("Top correlations with engagement:")
print(correlation_with_engagement.head(10))

print("\nBottom correlations with engagement:")
print(correlation_with_engagement.tail(10))

In [None]:
# Visualize the relationship between key features and engagement
plt.figure(figsize=(18, 12))

# Select top correlated features
top_features = correlation_with_engagement.drop('engagement').abs().sort_values(ascending=False).head(6).index

for i, feature in enumerate(top_features):
    plt.subplot(2, 3, i+1)
    
    # For categorical features, use box plot
    if audience_data[feature].nunique() < 10:
        sns.boxplot(x=feature, y='engagement', data=audience_data)
        plt.title(f'{feature} vs Engagement')
    # For continuous features, use scatter plot with regression line
    else:
        sns.regplot(x=feature, y='engagement', data=audience_data, scatter_kws={'alpha':0.5}, line_kws={'color':'red'})
        plt.title(f'{feature} vs Engagement (r={correlation_with_engagement[feature]:.2f})')
    
    plt.tight_layout()

plt.suptitle('Relationship Between Top Features and Engagement', fontsize=16, y=1.02)
plt.tight_layout()
plt.show()

### The Problem with Correlation

While correlation analysis gives us a starting point, it has critical limitations:

1. **Spurious Correlations**: Features may correlate with engagement due to common causes, not direct effects
2. **Confounding Variables**: Hidden factors can make non-causal features appear important
3. **No Direction of Causality**: Correlation doesn't tell us if A causes B or vice versa
4. **No Intervention Prediction**: Correlation can't reliably predict what will happen if we change a feature

We need causal analysis to overcome these limitations and discover the true drivers of engagement.

## 3. Causal Structure Discovery

Let's discover the causal structure (graph) in our data to understand the relationships between variables.

In [None]:
# Select features for causal analysis (exclude IDs and other non-causal variables)
causal_features = audience_data.drop(columns=['content_id', 'engagement']).columns.tolist()

# Initialize structural causal model
causal_model = StructuralCausalModel(
    discovery_method='pc',  # Use PC algorithm for causal discovery
    alpha=0.05,  # Significance level for independence tests
    feature_names=causal_features
)

# Discover causal graph
print("Discovering causal graph... This may take some time for complex data.")
start_time = time.time()

causal_graph = causal_model.discover_graph(
    data=audience_data[causal_features + ['engagement']],
    outcome_var='engagement',
    treatment_vars=['promotion_level', 'has_popular_actor', 'thumbnail_brightness',
                   'genre_drama', 'genre_comedy', 'genre_action']
)

print(f"Causal graph discovery completed in {time.time() - start_time:.2f} seconds")

# Display some basic info about the graph
print(f"Number of nodes: {len(causal_graph.graph.nodes())}")
print(f"Number of edges: {len(causal_graph.graph.edges())}")

In [None]:
# Visualize the causal graph
plt.figure(figsize=(20, 12))

# Create layout
pos = nx.spring_layout(causal_graph.graph, seed=42, k=1.5)

# Draw the graph
nx.draw(
    causal_graph.graph, pos,
    with_labels=True,
    node_color='lightblue',
    node_size=3000,
    font_size=10,
    font_weight='bold',
    edge_color='gray',
    arrowsize=20,
    arrowstyle='->',
    width=1.5,
    alpha=0.9
)

# Highlight the outcome node (engagement)
nx.draw_networkx_nodes(
    causal_graph.graph,
    pos,
    nodelist=['engagement'],
    node_color='lightcoral',
    node_size=3500
)

# Highlight direct causes of engagement
direct_causes = list(causal_graph.get_parents('engagement'))
if direct_causes:
    nx.draw_networkx_nodes(
        causal_graph.graph,
        pos,
        nodelist=direct_causes,
        node_color='lightgreen',
        node_size=3500
    )
    
    # Highlight edges from direct causes to engagement
    edges_to_engagement = [(cause, 'engagement') for cause in direct_causes]
    nx.draw_networkx_edges(
        causal_graph.graph,
        pos,
        edgelist=edges_to_engagement,
        edge_color='green',
        width=3,
        arrowsize=25
    )

plt.title('Causal Graph of Audience Engagement Factors', fontsize=16)
plt.axis('off')

# Add legend
from matplotlib.lines import Line2D
legend_elements = [
    Line2D([0], [0], marker='o', color='w', markerfacecolor='lightblue', markersize=15, label='Feature'),
    Line2D([0], [0], marker='o', color='w', markerfacecolor='lightgreen', markersize=15, label='Direct Cause of Engagement'),
    Line2D([0], [0], marker='o', color='w', markerfacecolor='lightcoral', markersize=15, label='Engagement (Outcome)'),
    Line2D([0], [0], color='gray', lw=2, label='Causal Relationship'),
    Line2D([0], [0], color='green', lw=2, label='Direct Effect on Engagement')
]
plt.legend(handles=legend_elements, loc='lower right', fontsize=12)

plt.tight_layout()
plt.savefig(f"{RESULTS_DIR}/causal_graph.png", dpi=300, bbox_inches='tight')
plt.show()

### Analyzing Causal Paths

Let's analyze the causal paths to understand both direct and indirect effects on engagement.

In [None]:
# Identify direct causes of engagement
direct_causes = list(causal_graph.get_parents('engagement'))
print(f"Direct causes of engagement: {direct_causes}")

# Find indirect causes (ancestors of engagement excluding direct causes)
all_ancestors = causal_graph.get_ancestors('engagement')
indirect_causes = [node for node in all_ancestors if node not in direct_causes]
print(f"Indirect causes of engagement: {indirect_causes}")

# Find confounders (variables that affect both a cause and the outcome)
confounders = set()
for cause in direct_causes:
    cause_parents = set(causal_graph.get_parents(cause))
    engagement_parents = set(direct_causes)
    common_parents = cause_parents.intersection(engagement_parents)
    confounders.update(common_parents)
print(f"Potential confounders: {confounders}")

# Calculate the minimal adjustment set for causal effect estimation
for cause in direct_causes:
    adjustment_set = causal_graph.get_minimal_adjustment_set(cause, 'engagement')
    print(f"Adjustment set for {cause} -> engagement: {adjustment_set}")

## 4. Causal Effect Estimation

Now that we've discovered the causal structure, let's estimate the causal effects to quantify how much each factor truly influences engagement.

In [None]:
# Fit models for estimating causal effects
print("Fitting structural models...")
causal_model.fit_models(
    data=audience_data[causal_features + ['engagement']],
    model_type='random_forest',  # Use random forest for flexible modeling
    outcome_vars=['engagement'] + direct_causes  # Model direct causes and outcome
)

# Estimate causal effects for all potential causal factors
print("Estimating causal effects...")
causal_effects = causal_model.estimate_all_effects(
    data=audience_data[causal_features + ['engagement']],
    outcome='engagement',
    method='backdoor',  # Use backdoor adjustment method
    min_effect=0.01  # Minimum effect size to consider
)

In [None]:
# Convert effects to DataFrame for easier analysis
effects_df = pd.DataFrame([
    {
        'feature': feature,
        'causal_effect': effect['causal_effect'],
        'p_value': effect.get('p_value', float('nan')),
        'significant': effect.get('p_value', 1.0) < 0.05,
        'relative_effect': effect.get('relative_effect', float('nan'))
    }
    for feature, effect in causal_effects.items()
])

# Sort by absolute causal effect
effects_df['abs_effect'] = effects_df['causal_effect'].abs()
effects_df = effects_df.sort_values('abs_effect', ascending=False).drop('abs_effect', axis=1)

# Display causal effects
effects_df

In [None]:
# Visualize causal effects
plt.figure(figsize=(12, 10))

# Plot bars with confidence intervals
effects = effects_df['causal_effect'].values
features = effects_df['feature'].values
significant = effects_df['significant'].values

# Use different colors for significant vs non-significant effects
colors = ['green' if sig else 'gray' for sig in significant]
alphas = [0.8 if sig else 0.5 for sig in significant]

y_pos = np.arange(len(features))
bars = plt.barh(y_pos, effects, color=colors, alpha=alphas)

# Add feature names
plt.yticks(y_pos, features)
plt.xlabel('Causal Effect on Engagement')
plt.title('Estimated Causal Effects on Audience Engagement')

# Add a vertical line at zero
plt.axvline(x=0, color='black', linestyle='--', alpha=0.7)

# Add a legend
from matplotlib.patches import Patch
legend_elements = [
    Patch(facecolor='green', alpha=0.8, label='Significant (p < 0.05)'),
    Patch(facecolor='gray', alpha=0.5, label='Not Significant')
]
plt.legend(handles=legend_elements, loc='lower right')

plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.savefig(f"{RESULTS_DIR}/causal_effects.png", dpi=300, bbox_inches='tight')
plt.show()

### Correlation vs. Causation

Now let's compare the correlation analysis with our causal analysis to see the differences.

In [None]:
# Create a dataframe for comparison
comparison_df = pd.DataFrame({
    'feature': effects_df['feature'],
    'causal_effect': effects_df['causal_effect'],
    'correlation': [correlation_with_engagement.get(feature, 0) 
                   for feature in effects_df['feature']]
})

# Calculate the difference
comparison_df['difference'] = comparison_df['causal_effect'] - comparison_df['correlation']
comparison_df['abs_difference'] = comparison_df['difference'].abs()

# Sort by absolute difference to highlight the most misleading correlations
comparison_df = comparison_df.sort_values('abs_difference', ascending=False)

# Display comparison
comparison_df[['feature', 'causal_effect', 'correlation', 'difference']]

In [None]:
# Visualize the comparison
plt.figure(figsize=(14, 10))

# Select top differences for clarity
top_diff = comparison_df.head(12)

# Set up bar positions
bar_width = 0.35
r1 = np.arange(len(top_diff))
r2 = [x + bar_width for x in r1]

# Create bars
plt.barh(r1, top_diff['causal_effect'], bar_width, label='Causal Effect', color='green', alpha=0.7)
plt.barh(r2, top_diff['correlation'], bar_width, label='Correlation', color='blue', alpha=0.7)

# Add feature names
plt.yticks([r + bar_width/2 for r in r1], top_diff['feature'])

# Add a vertical line at zero
plt.axvline(x=0, color='black', linestyle='--', alpha=0.5)

plt.xlabel('Effect on Engagement')
plt.title('Causal Effect vs. Correlation: Top Differences')
plt.legend()

plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.savefig(f"{RESULTS_DIR}/causal_vs_correlation.png", dpi=300, bbox_inches='tight')
plt.show()

## 5. Counterfactual Analysis

Now let's perform counterfactual analysis to understand how interventions on specific features would affect engagement.

In [None]:
# Initialize counterfactual analyzer
cf_analyzer = CounterfactualAnalyzer(causal_model)

# Select a sample instance for counterfactual analysis
# Choose an instance with moderate engagement
engagement_median = audience_data['engagement'].median()
sample_instance = audience_data[
    (audience_data['engagement'] > engagement_median - 0.05) &
    (audience_data['engagement'] < engagement_median + 0.05)
].iloc[0]

print("Sample instance for counterfactual analysis:")
for col in ['content_id', 'engagement'] + [col for col in effects_df['feature'][:8]]:
    print(f"{col}: {sample_instance[col]}")

In [None]:
# Generate counterfactuals for top causal features
top_causal_features = effects_df[effects_df['significant']]['feature'].head(5).tolist()
counterfactuals = {}

print("Generating counterfactuals for top causal features...")
for feature in top_causal_features:
    # For binary features
    if audience_data[feature].nunique() <= 2:
        # Flip the value (0 to 1 or 1 to 0)
        new_value = 1 - sample_instance[feature]
        intervention = {feature: new_value}
        
    # For continuous features
    else:
        # Increase by 50%
        current_value = sample_instance[feature]
        new_value = current_value * 1.5
        # Ensure we don't exceed the range of the data
        new_value = min(new_value, audience_data[feature].max())
        intervention = {feature: new_value}
    
    # Generate counterfactual
    result = cf_analyzer.generate_counterfactual(
        data=audience_data[causal_features + ['engagement']],
        interventions=intervention,
        outcome_var='engagement',
        reference_values=sample_instance.to_dict()
    )
    
    counterfactuals[feature] = result
    
    print(f"\nCounterfactual for {feature}:")
    print(f"  Current value: {sample_instance[feature]}")
    print(f"  Intervention: {intervention[feature]}")
    print(f"  Factual engagement: {result['factual_outcome']:.4f}")
    print(f"  Counterfactual engagement: {result['counterfactual_outcome']:.4f}")
    print(f"  Change: {result['outcome_change']:.4f} ({result['outcome_change_percent']:.2f}%)")

In [None]:
# Visualize counterfactual results
plt.figure(figsize=(14, 8))

# Extract data for plotting
features = list(counterfactuals.keys())
factual = [cf['factual_outcome'] for cf in counterfactuals.values()]
counterfactual = [cf['counterfactual_outcome'] for cf in counterfactuals.values()]
changes = [cf['outcome_change'] for cf in counterfactuals.values()]

# Calculate positions
x = np.arange(len(features))
width = 0.35

# Create grouped bar chart
plt.bar(x - width/2, factual, width, label='Factual Engagement', color='blue', alpha=0.7)
plt.bar(x + width/2, counterfactual, width, label='Counterfactual Engagement', color='green', alpha=0.7)

# Add feature names
plt.xticks(x, features)
plt.xlabel('Intervened Feature')
plt.ylabel('Engagement Score')
plt.title('Counterfactual Analysis of Feature Interventions')
plt.legend()

# Add percentage change annotations
for i, change in enumerate(changes):
    pct_change = change / factual[i] * 100 if factual[i] != 0 else float('inf')
    plt.annotate(
        f'{pct_change:+.1f}%',
        xy=(x[i], max(factual[i], counterfactual[i]) + 0.02),
        ha='center',
        va='bottom',
        fontweight='bold',
        color='black' if pct_change > 0 else 'red'
    )

plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig(f"{RESULTS_DIR}/counterfactual_analysis.png", dpi=300, bbox_inches='tight')
plt.show()

## 6. Finding Optimal Interventions

Let's find the optimal interventions to maximize engagement.

In [None]:
# Find optimal interventions to maximize engagement
print("Finding optimal interventions to maximize engagement...")
optimal_result = cf_analyzer.find_optimal_intervention(
    data=audience_data[causal_features + ['engagement']],
    outcome_var='engagement',
    target_outcome=0.9,  # Target high engagement
    candidate_features=effects_df[effects_df['significant']]['feature'].tolist(),
    reference_values=sample_instance.to_dict(),
    max_features=3  # Limit to 3 features for simplicity
)

print("\nOptimal intervention strategy:")
print(f"Baseline engagement: {optimal_result['baseline_outcome']:.4f}")
print(f"Predicted engagement after intervention: {optimal_result['predicted_outcome']:.4f}")
print(f"Improvement: {optimal_result['improvement']:.4f} ({optimal_result['percent_improvement']:.2f}%)")
print("\nFeature interventions:")
for feature, value in optimal_result['optimal_intervention'].items():
    original_value = sample_instance[feature]
    print(f"  {feature}: {original_value:.4f} -> {value:.4f}")

In [None]:
# Sensitivity analysis for key features
print("Performing sensitivity analysis...")
feature_to_analyze = effects_df['feature'].iloc[0]  # Top causal feature

sensitivity = cf_analyzer.analyze_sensitivity(
    data=audience_data[causal_features + ['engagement']],
    outcome_var='engagement',
    feature=feature_to_analyze,
    reference_values=sample_instance.to_dict(),
    perturbation_range=0.3,  # Perturb by ±30%
    num_points=15
)

print(f"\nSensitivity analysis for {feature_to_analyze}:")
print(f"Reference value: {sensitivity['reference_value']:.4f}")
print(f"Average elasticity: {sensitivity['average_elasticity']:.4f}")
print(f"Sensitivity score: {sensitivity['sensitivity_score']:.4f}")

In [None]:
# Visualize sensitivity analysis
plt.figure(figsize=(12, 6))

# Plot outcomes vs perturbation values
plt.plot(
    sensitivity['perturbation_values'],
    sensitivity['outcomes'],
    marker='o',
    linestyle='-',
    linewidth=2,
    markersize=8
)

# Highlight reference point
ref_idx = np.abs(np.array(sensitivity['perturbation_values']) - sensitivity['reference_value']).argmin()
plt.plot(
    sensitivity['perturbation_values'][ref_idx],
    sensitivity['outcomes'][ref_idx],
    marker='o',
    markersize=12,
    markerfacecolor='red',
    markeredgecolor='black',
    markeredgewidth=2,
    label='Reference Point'
)

plt.xlabel(f"{feature_to_analyze} Value")
plt.ylabel('Engagement')
plt.title(f'Sensitivity of Engagement to Changes in {feature_to_analyze}')
plt.grid(True, alpha=0.3)

# Add elasticity info
plt.annotate(
    f"Elasticity: {sensitivity['average_elasticity']:.2f}\nSensitivity: {sensitivity['sensitivity_score']:.2f}",
    xy=(0.05, 0.95),
    xycoords='axes fraction',
    bbox=dict(boxstyle='round', facecolor='white', alpha=0.8)
)

plt.legend()
plt.tight_layout()
plt.savefig(f"{RESULTS_DIR}/sensitivity_analysis.png", dpi=300, bbox_inches='tight')
plt.show()

## 7. Causal Feature Selection

Let's use causal feature selection to identify the most important causal features for audience engagement.

In [None]:
# Initialize causal feature selector
feature_selector = CausalFeatureSelector(causal_model)

# Fit the selector to identify causal features
causal_features_dict = feature_selector.fit(
    data=audience_data[causal_features + ['engagement']],
    outcome_var='engagement',
    discovery_method='manual',  # Use the already discovered graph
    exclude_vars=[]
)

# Get top causal features
top_features = feature_selector.get_top_features(n=10)
print("Top causal features:")
for feature, effect in top_features.items():
    print(f"  {feature}: {effect:.4f}")

In [None]:
# Visualize causal features
feature_selector.visualize_causal_features(
    figsize=(16, 10),
    show_effect_size=True
)

In [None]:
# Generate a feature importance summary
feature_summary = feature_selector.feature_effect_summary()
feature_summary

## 8. Generate Actionable Recommendations

Based on our causal analysis, let's generate actionable recommendations to optimize content for audience engagement.

In [None]:
# Generate recommendations for a specific piece of content
sample_content = audience_data.iloc[10]  # Select a sample content item
target_engagement = 0.8  # Set a high target engagement

print(f"Content ID: {sample_content['content_id']}")
print(f"Current engagement: {sample_content['engagement']:.4f}")
print(f"Target engagement: {target_engagement:.4f}")

# Set feature constraints based on what can realistically be changed
constraints = {
    # Some features can't be changed or have limited range
    'content_duration': (sample_content['content_duration'] * 0.9, sample_content['content_duration'] * 1.1),  # ±10%
    'genre_drama': (0, 1),  # Binary constraint
    'genre_comedy': (0, 1),  # Binary constraint
    'genre_action': (0, 1),  # Binary constraint
    'has_popular_actor': (sample_content['has_popular_actor'], 1),  # Can only add, not remove actors
    'promotion_level': (sample_content['promotion_level'], 4),  # Can only increase up to max (4)
    'thumbnail_brightness': (0.2, 0.9),
    'thumbnail_saturation': (0.3, 1.0)
}

# Generate recommendations
recommendations = feature_selector.generate_feature_recommendations(
    target_outcome=target_engagement,
    current_values=sample_content.to_dict(),
    constraints=constraints
)

# Display recommendations
print("\nRecommended changes to achieve target engagement:")
for feature, rec in recommendations.items():
    print(f"  {feature}: {rec['current_value']:.4f} -> {rec['recommended_value']:.4f} (Impact: {rec['impact']:.4f})")

In [None]:
# Visualize recommendations
plt.figure(figsize=(14, 8))

# Extract data for plotting
features = list(recommendations.keys())
current = [rec['current_value'] for rec in recommendations.values()]
recommended = [rec['recommended_value'] for rec in recommendations.values()]
impacts = [rec['impact'] for rec in recommendations.values()]

# Sort by absolute impact
impact_abs = [abs(impact) for impact in impacts]
sorted_indices = np.argsort(impact_abs)[::-1]

features = [features[i] for i in sorted_indices]
current = [current[i] for i in sorted_indices]
recommended = [recommended[i] for i in sorted_indices]
impacts = [impacts[i] for i in sorted_indices]

# Get top 5 recommendations by impact
features = features[:5]
current = current[:5]
recommended = recommended[:5]
impacts = impacts[:5]

# Normalize values for better visualization
max_values = [max(curr, rec) for curr, rec in zip(current, recommended)]
current_norm = [curr / max_val if max_val != 0 else 0 for curr, max_val in zip(current, max_values)]
recommended_norm = [rec / max_val if max_val != 0 else 0 for rec, max_val in zip(recommended, max_values)]

# Set up bar positions
x = np.arange(len(features))
width = 0.35

# Create bars
plt.bar(x - width/2, current_norm, width, label='Current Value', color='blue', alpha=0.7)
plt.bar(x + width/2, recommended_norm, width, label='Recommended Value', color='green', alpha=0.7)

# Add feature names
plt.xticks(x, features, rotation=45, ha='right')
plt.xlabel('Feature')
plt.ylabel('Relative Value')
plt.title('Top Feature Recommendations for Optimal Engagement')
plt.legend()

# Add impact annotations
for i, impact in enumerate(impacts):
    plt.annotate(
        f'Impact: {impact:.3f}',
        xy=(x[i], max(current_norm[i], recommended_norm[i]) + 0.05),
        ha='center',
        va='bottom',
        fontweight='bold',
        color='darkgreen' if impact > 0 else 'darkred'
    )

plt.ylim(0, 1.3)  # Set y-axis limit with room for annotations
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig(f"{RESULTS_DIR}/feature_recommendations.png", dpi=300, bbox_inches='tight')
plt.show()

## 9. Save Causal Model for Production Use

Let's save the causal model for use in production systems.

In [None]:
# Save the causal model
model_path = f"{MODELS_DIR}/causal_model.pt"
causal_model.save(model_path)
print(f"Saved causal model to {model_path}")

# Test loading the model
loaded_model = StructuralCausalModel.load(model_path)
print("Successfully loaded causal model")

# Save key insights and findings
insights = {
    "causal_effects": {feature: float(effect) for feature, effect in causal_features_dict.items()},
    "top_features": {feature: float(effect) for feature, effect in top_features.items()},
    "direct_causes": direct_causes,
    "indirect_causes": indirect_causes,
    "confounders": list(confounders),
    "sample_recommendations": {
        feature: {
            "current": float(rec["current_value"]),
            "recommended": float(rec["recommended_value"]),
            "impact": float(rec["impact"])
        } for feature, rec in recommendations.items()
    },
    "analysis_date": time.strftime("%Y-%m-%d %H:%M:%S")
}

# Save insights to JSON
insights_path = f"{RESULTS_DIR}/causal_insights.json"
with open(insights_path, 'w') as f:
    json.dump(insights, f, indent=2)
print(f"Saved causal insights to {insights_path}")

## 10. Conclusion

In this notebook, we demonstrated how causal analysis goes beyond traditional correlational analysis to uncover the true drivers of audience engagement.

### Key Findings:

1. **Causal Structure**: We discovered the causal relationships between content features and audience engagement
2. **Causal Effects**: We quantified the true causal impact of each feature on engagement
3. **Correlation vs. Causation**: We identified features that were misleadingly correlated but not causal
4. **Counterfactual Analysis**: We predicted how changing features would affect engagement
5. **Optimal Interventions**: We found the most effective changes to maximize engagement
6. **Actionable Recommendations**: We generated specific recommendations for optimizing content

### Applications:

- **Content Strategy**: Focus resources on modifying the most impactful features
- **A/B Testing**: Design tests based on causal insights rather than correlations
- **Audience Targeting**: Better understand which factors truly drive engagement for different audience segments
- **Content Optimization**: Automatically recommend changes to improve engagement
- **ROI Analysis**: Prioritize investments based on causal impact rather than misleading correlations

By applying causal inference to audience intelligence, we can make more informed decisions and create more engaging content.