# Interactive Visualizations and Dashboard

This notebook creates comprehensive interactive visualizations and dashboard components for the Blue Zones analysis.

## Visualization Components

1. Interactive global maps with Blue Zone features
2. Statistical plots and analysis visualizations  
3. Model performance and prediction visualizations
4. Comprehensive dashboard assembly
5. Data export functionality

## Setup and Configuration

In [1]:
import sys
import logging
from pathlib import Path
import pandas as pd
import numpy as np
import json
import pickle
import warnings
# Suppress specific warnings only when necessary
# warnings.filterwarnings('ignore', category=DeprecationWarning)

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.offline as pyo
import folium
from IPython.display import HTML, display

# Add src to path for custom modules
sys.path.append('../src')

# Configure plotting
plt.style.use('default')
sns.set_palette('viridis')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11

# Enable offline plotting for Plotly
pyo.init_notebook_mode(connected=True)

print("Setup completed successfully")

Setup completed successfully


In [2]:
# Configuration and logging setup
def setup_logging(level="INFO"):
    """Setup basic logging configuration"""
    logging.basicConfig(
        level=getattr(logging, level),
        format='%(asctime)s - %(levelname)s - %(message)s',
        datefmt='%H:%M:%S'
    )
    return logging.getLogger(__name__)

logger = setup_logging("INFO")
logger.info("Interactive visualization notebook initialized")

15:40:32 - INFO - Interactive visualization notebook initialized


## Data Loading and Preparation

In [3]:
def load_all_data_sources():
    """
    Load all available data sources for visualization
    """
    data_sources = {}
    
    # Define potential data files
    potential_files = {
        'features': '../data/features/combined_features.parquet',
        'predictions': '../data/outputs/blue_zone_predictions.parquet',
        'forecasts': '../data/outputs/life_expectancy_forecasts.parquet',
        'real_world_data': '../outputs/real_world_data.csv',
        'processed_data': '../outputs/final_processed_data.csv',
        'cross_section': '../outputs/cross_section_final.csv'
    }
    
    # Try to load each data source
    for name, filepath in potential_files.items():
        try:
            if filepath.endswith('.parquet'):
                data = pd.read_parquet(filepath)
            else:
                data = pd.read_csv(filepath)
            
            data_sources[name] = data
            logger.info(f"Loaded {name}: {len(data)} observations, {len(data.columns)} columns")
        except Exception as e:
            logger.warning(f"Could not load {name} from {filepath}: {e}")
            data_sources[name] = pd.DataFrame()
    
    # Load JSON results if available
    json_files = {
        'matched_results': '../data/outputs/matched_comparison_results.json',
        'classifier_results': '../data/outputs/classifier_training_results.json'
    }
    
    for name, filepath in json_files.items():
        try:
            with open(filepath, 'r') as f:
                data_sources[name] = json.load(f)
            logger.info(f"Loaded {name} JSON data")
        except Exception as e:
            logger.warning(f"Could not load {name}: {e}")
            data_sources[name] = {}
    
    return data_sources

# Load all available data
data_sources = load_all_data_sources()

# Display summary of loaded data
print("\nData Loading Summary:")
print("=" * 40)
for name, data in data_sources.items():
    if isinstance(data, pd.DataFrame):
        print(f"{name}: {len(data)} rows, {len(data.columns)} columns")
    elif isinstance(data, dict):
        print(f"{name}: {len(data)} keys (JSON data)")
    else:
        print(f"{name}: {type(data)} - {len(data) if hasattr(data, '__len__') else 'N/A'}")









15:40:32 - INFO - Loaded processed_data: 2100 observations, 19 columns


15:40:32 - INFO - Loaded cross_section: 100 observations, 19 columns







Data Loading Summary:
features: 0 rows, 0 columns
predictions: 0 rows, 0 columns
forecasts: 0 rows, 0 columns
real_world_data: 0 rows, 0 columns
processed_data: 2100 rows, 19 columns
cross_section: 100 rows, 19 columns
matched_results: 0 keys (JSON data)
classifier_results: 0 keys (JSON data)


In [4]:
# Select the best available dataset for visualization
def select_primary_dataset(data_sources):
    """
    Select the most complete dataset for primary visualizations
    """
    # Priority order of datasets
    priority = ['processed_data', 'cross_section', 'real_world_data', 'features']
    
    for dataset_name in priority:
        if dataset_name in data_sources and not data_sources[dataset_name].empty:
            return dataset_name, data_sources[dataset_name]
    
    # Fallback to any non-empty dataset
    for name, data in data_sources.items():
        if isinstance(data, pd.DataFrame) and not data.empty:
            return name, data
    
    return None, pd.DataFrame()

primary_name, primary_data = select_primary_dataset(data_sources)

if not primary_data.empty:
    logger.info(f"Selected '{primary_name}' as primary dataset for visualization")
    print(f"\nPrimary Dataset: {primary_name}")
    print(f"Shape: {primary_data.shape}")
    print(f"Columns: {list(primary_data.columns)}")
    
    # Check for Blue Zone data
    if 'is_blue_zone' in primary_data.columns:
        blue_zone_count = primary_data['is_blue_zone'].sum()
        print(f"Blue Zone regions: {blue_zone_count}")
else:
    print("Warning: No suitable dataset found for visualization")

15:40:32 - INFO - Selected 'processed_data' as primary dataset for visualization



Primary Dataset: processed_data
Shape: (2100, 19)
Columns: ['geo_id', 'year', 'latitude', 'longitude', 'elevation', 'is_blue_zone', 'life_expectancy', 'cvd_mortality', 'walkability_score', 'greenspace_pct', 'gdp_per_capita', 'population_density_log', 'temperature_mean', 'effective_gravity', 'gravity_deviation', 'gravity_deviation_pct', 'equatorial_distance', 'gravity_x_walkability_score', 'lifetime_gravity_exposure']
Blue Zone regions: 105


## Geographic Visualizations

In [5]:
def create_global_scatter_map(data, feature_col='life_expectancy', title_prefix='Global'):
    """
    Create an interactive global scatter map
    """
    if data.empty or 'latitude' not in data.columns or 'longitude' not in data.columns:
        print(f"Cannot create map: missing geographic data")
        return None
    
    # Prepare data
    map_data = data.copy()
    
    # Handle missing coordinates
    map_data = map_data.dropna(subset=['latitude', 'longitude', feature_col])
    
    if len(map_data) == 0:
        print(f"No valid data points for mapping")
        return None
    
    # Create color column for Blue Zones if available
    if 'is_blue_zone' in map_data.columns:
        map_data['region_type'] = map_data['is_blue_zone'].map({1: 'Blue Zone', 0: 'Other'})
        color_col = 'region_type'
        color_scale = ['steelblue', 'red']
    else:
        color_col = feature_col
        color_scale = 'viridis'
    
    # Create the map
    fig = px.scatter_geo(
        map_data,
        lat='latitude',
        lon='longitude',
        color=color_col,
        size=feature_col,
        hover_data=['geo_id', feature_col] if 'geo_id' in map_data.columns else [feature_col],
        title=f'{title_prefix}: {feature_col.replace("_", " ").title()}',
        color_continuous_scale=color_scale if color_col == feature_col else None,
        color_discrete_sequence=['steelblue', 'red'] if color_col != feature_col else None,
        size_max=15
    )
    
    fig.update_layout(
        geo=dict(
            showframe=False,
            showcoastlines=True,
            projection_type='equirectangular'
        ),
        height=600,
        title_x=0.5
    )
    
    return fig

# Create global map if data is available
if not primary_data.empty and 'latitude' in primary_data.columns:
    life_exp_map = create_global_scatter_map(primary_data, 'life_expectancy', 'Global Life Expectancy')
    
    if life_exp_map:
        life_exp_map.show()
        print(f"Global map created with {len(primary_data)} data points")
else:
    print("Cannot create global map: geographic data not available")

Global map created with 2100 data points


In [6]:
def create_folium_map(data, center_lat=20, center_lon=0, zoom=2):
    """
    Create an interactive Folium map with Blue Zone highlights
    """
    if data.empty or 'latitude' not in data.columns or 'longitude' not in data.columns:
        return None
    
    # Create base map
    m = folium.Map(
        location=[center_lat, center_lon],
        zoom_start=zoom,
        tiles='OpenStreetMap'
    )
    
    # Add data points
    for idx, row in data.iterrows():
        if pd.notna(row['latitude']) and pd.notna(row['longitude']):
            # Determine color based on Blue Zone status
            if 'is_blue_zone' in row and row['is_blue_zone'] == 1:
                color = 'red'
                icon = 'star'
                prefix = 'fa'
            else:
                color = 'blue'
                icon = 'circle'
                prefix = 'fa'
            
            # Create popup text
            popup_text = f"Location: {row.get('geo_id', 'Unknown')}<br>"
            
            if 'life_expectancy' in row:
                popup_text += f"Life Expectancy: {row['life_expectancy']:.1f} years<br>"
            
            if 'effective_gravity' in row:
                popup_text += f"Gravity: {row['effective_gravity']:.4f} m/s²<br>"
            
            # Add marker
            folium.Marker(
                location=[row['latitude'], row['longitude']],
                popup=folium.Popup(popup_text, max_width=300),
                tooltip=f"{row.get('geo_id', 'Unknown')}",
                icon=folium.Icon(color=color, icon=icon, prefix=prefix)
            ).add_to(m)
    
    return m

# Create Folium map if data is available
if not primary_data.empty and 'latitude' in primary_data.columns:
    folium_map = create_folium_map(primary_data)
    
    if folium_map:
        # Display the map
        display(folium_map)
        print("Interactive Folium map created successfully")
else:
    print("Cannot create Folium map: geographic data not available")

Interactive Folium map created successfully


## Statistical Visualizations

In [7]:
def create_correlation_heatmap(data, features=None):
    """
    Create correlation heatmap for key features
    """
    if data.empty:
        return None
    
    # Select numeric columns
    numeric_cols = data.select_dtypes(include=[np.number]).columns.tolist()
    
    if features:
        numeric_cols = [col for col in features if col in numeric_cols]
    
    if len(numeric_cols) < 2:
        print("Insufficient numeric columns for correlation analysis")
        return None
    
    # Limit to most relevant features
    key_features = [
        'life_expectancy', 'effective_gravity', 'latitude', 'gdp_per_capita',
        'population', 'temperature_est', 'urban_pop_pct', 'forest_area_pct',
        'co2_emissions', 'health_exp_per_capita'
    ]
    
    available_features = [f for f in key_features if f in numeric_cols]
    if len(available_features) < len(numeric_cols):
        available_features.extend([f for f in numeric_cols if f not in available_features])
    
    # Limit to first 15 features to keep visualization readable
    selected_features = available_features[:15]
    
    # Calculate correlation matrix
    corr_data = data[selected_features].corr()
    
    # Create interactive heatmap
    fig = px.imshow(
        corr_data.values,
        x=corr_data.columns,
        y=corr_data.columns,
        color_continuous_scale='RdBu_r',
        color_continuous_midpoint=0,
        title='Feature Correlation Matrix',
        aspect='auto'
    )
    
    fig.update_layout(
        height=600,
        title_x=0.5,
        xaxis_title=None,
        yaxis_title=None
    )
    
    # Add correlation values as text
    for i in range(len(corr_data)):
        for j in range(len(corr_data.columns)):
            fig.add_annotation(
                x=j, y=i,
                text=str(round(corr_data.iloc[i, j], 2)),
                showarrow=False,
                font=dict(color='white' if abs(corr_data.iloc[i, j]) > 0.5 else 'black', size=8)
            )
    
    return fig

# Create correlation heatmap
if not primary_data.empty:
    correlation_fig = create_correlation_heatmap(primary_data)
    
    if correlation_fig:
        correlation_fig.show()
        print("Correlation heatmap created successfully")
else:
    print("Cannot create correlation heatmap: no data available")

Correlation heatmap created successfully


In [8]:
def create_feature_distributions(data, max_features=6):
    """
    Create distribution plots for key features, comparing Blue Zones vs Others
    """
    if data.empty:
        return None
    
    # Key features to visualize
    key_features = [
        'life_expectancy', 'effective_gravity', 'latitude', 'gdp_per_capita',
        'temperature_est', 'urban_pop_pct'
    ]
    
    # Filter to available features
    available_features = [f for f in key_features if f in data.columns]
    available_features = available_features[:max_features]
    
    if len(available_features) == 0:
        print("No suitable features found for distribution plots")
        return None
    
    # Create subplots
    cols = 3
    rows = (len(available_features) + cols - 1) // cols
    
    fig = make_subplots(
        rows=rows, cols=cols,
        subplot_titles=[f.replace('_', ' ').title() for f in available_features],
        vertical_spacing=0.1,
        horizontal_spacing=0.1
    )
    
    for i, feature in enumerate(available_features):
        row = i // cols + 1
        col = i % cols + 1
        
        # Get data for this feature
        feature_data = data[feature].dropna()
        
        if 'is_blue_zone' in data.columns:
            # Separate Blue Zones and Others
            bz_data = data[data['is_blue_zone'] == 1][feature].dropna()
            other_data = data[data['is_blue_zone'] == 0][feature].dropna()
            
            # Add histograms
            if len(other_data) > 0:
                fig.add_trace(
                    go.Histogram(
                        x=other_data,
                        name='Others',
                        opacity=0.7,
                        nbinsx=20,
                        showlegend=(i == 0)
                    ),
                    row=row, col=col
                )
            
            if len(bz_data) > 0:
                fig.add_trace(
                    go.Histogram(
                        x=bz_data,
                        name='Blue Zones',
                        opacity=0.8,
                        nbinsx=10,
                        showlegend=(i == 0)
                    ),
                    row=row, col=col
                )
        else:
            # Single distribution
            fig.add_trace(
                go.Histogram(
                    x=feature_data,
                    name=feature.replace('_', ' ').title(),
                    opacity=0.7,
                    nbinsx=25,
                    showlegend=(i == 0)
                ),
                row=row, col=col
            )
    
    fig.update_layout(
        height=200 * rows,
        title_text="Feature Distributions",
        title_x=0.5,
        barmode='overlay'
    )
    
    return fig

# Create feature distribution plots
if not primary_data.empty:
    distributions_fig = create_feature_distributions(primary_data)
    
    if distributions_fig:
        distributions_fig.show()
        print("Feature distribution plots created successfully")
else:
    print("Cannot create distribution plots: no data available")

Feature distribution plots created successfully


## Model Performance Visualizations

In [9]:
def create_prediction_vs_actual_plot(data):
    """
    Create prediction vs actual plot if prediction data is available
    """
    if data.empty:
        return None
    
    # Look for prediction columns
    pred_cols = [col for col in data.columns if 'predict' in col.lower()]
    actual_cols = ['life_expectancy', 'actual', 'observed']
    actual_col = None
    
    for col in actual_cols:
        if col in data.columns:
            actual_col = col
            break
    
    if not pred_cols or not actual_col:
        print("No prediction data found for validation plot")
        return None
    
    pred_col = pred_cols[0]  # Use first prediction column
    
    # Create scatter plot
    plot_data = data[[actual_col, pred_col]].dropna()
    
    if len(plot_data) == 0:
        return None
    
    fig = px.scatter(
        plot_data,
        x=actual_col,
        y=pred_col,
        title='Predicted vs Actual Values',
        labels={
            actual_col: 'Actual Values',
            pred_col: 'Predicted Values'
        },
        opacity=0.7
    )
    
    # Add diagonal line for perfect predictions
    min_val = min(plot_data[actual_col].min(), plot_data[pred_col].min())
    max_val = max(plot_data[actual_col].max(), plot_data[pred_col].max())
    
    fig.add_trace(
        go.Scatter(
            x=[min_val, max_val],
            y=[min_val, max_val],
            mode='lines',
            name='Perfect Prediction',
            line=dict(dash='dash', color='red')
        )
    )
    
    # Calculate R-squared if possible
    try:
        from sklearn.metrics import r2_score
        r2 = r2_score(plot_data[actual_col], plot_data[pred_col])
        fig.add_annotation(
            x=0.05, y=0.95,
            xref='paper', yref='paper',
            text=f'R² = {r2:.3f}',
            showarrow=False,
            font=dict(size=14),
            bgcolor='rgba(255,255,255,0.8)'
        )
    except:
        pass
    
    fig.update_layout(height=500, title_x=0.5)
    
    return fig

# Try to create prediction validation plot
prediction_plot = None
for name, data in data_sources.items():
    if isinstance(data, pd.DataFrame) and not data.empty:
        prediction_plot = create_prediction_vs_actual_plot(data)
        if prediction_plot:
            print(f"Created prediction validation plot from {name}")
            prediction_plot.show()
            break

if not prediction_plot:
    print("No suitable prediction data found for validation plot")

No prediction data found for validation plot
No prediction data found for validation plot
No suitable prediction data found for validation plot


In [10]:
def create_feature_importance_plot(data):
    """
    Create feature importance visualization if importance data is available
    """
    # Check classifier results for feature importance
    classifier_results = data_sources.get('classifier_results', {})
    
    if 'feature_importance' not in classifier_results:
        print("No feature importance data available")
        return None
    
    importance_data = classifier_results['feature_importance']
    
    # Handle different possible formats
    if isinstance(importance_data, dict):
        if 'features' in importance_data and 'importances' in importance_data:
            features = importance_data['features']
            importances = importance_data['importances']
        elif 'top_10_features' in importance_data:
            # Create mock importance values for visualization
            features = importance_data['top_10_features']
            importances = [1.0 - i*0.1 for i in range(len(features))]
        else:
            # Try to extract from dict keys/values
            features = list(importance_data.keys())
            importances = list(importance_data.values())
    else:
        print("Unrecognized feature importance format")
        return None
    
    if len(features) == 0 or len(importances) == 0:
        return None
    
    # Create DataFrame for plotting
    importance_df = pd.DataFrame({
        'feature': features,
        'importance': importances
    }).sort_values('importance', ascending=True)  # Sort for horizontal bar plot
    
    # Limit to top 15 features
    importance_df = importance_df.tail(15)
    
    # Create horizontal bar plot
    fig = px.bar(
        importance_df,
        x='importance',
        y='feature',
        orientation='h',
        title='Feature Importance for Blue Zone Classification',
        labels={
            'importance': 'Importance Score',
            'feature': 'Features'
        }
    )
    
    fig.update_layout(
        height=500,
        title_x=0.5,
        yaxis={'categoryorder': 'total ascending'}
    )
    
    return fig

# Create feature importance plot
importance_fig = create_feature_importance_plot(data_sources)

if importance_fig:
    importance_fig.show()
    print("Feature importance plot created successfully")
else:
    print("Could not create feature importance plot")

No feature importance data available
Could not create feature importance plot


## Dashboard Assembly

In [11]:
def create_summary_dashboard():
    """
    Create a comprehensive dashboard summary
    """
    # Extract analysis metadata
    metadata = extract_analysis_metadata(data_sources)
    
    # Create dashboard HTML
    dashboard_html = f"""
    <!DOCTYPE html>
    <html>
    <head>
        <title>Blue Zones Analysis Dashboard</title>
        <style>
            body {{
                font-family: Arial, sans-serif;
                margin: 20px;
                background-color: #f5f5f5;
            }}
            .header {{
                background-color: #2c3e50;
                color: white;
                padding: 20px;
                text-align: center;
                margin-bottom: 20px;
                border-radius: 10px;
            }}
            .summary-box {{
                background-color: white;
                border: 1px solid #ddd;
                border-radius: 10px;
                padding: 20px;
                margin: 20px 0;
                box-shadow: 0 2px 5px rgba(0,0,0,0.1);
            }}
            .metric {{
                display: inline-block;
                margin: 10px;
                padding: 15px;
                background-color: #ecf0f1;
                border-radius: 5px;
                min-width: 150px;
                text-align: center;
            }}
            .metric-value {{
                font-size: 24px;
                font-weight: bold;
                color: #34495e;
            }}
            .metric-label {{
                font-size: 12px;
                color: #7f8c8d;
                text-transform: uppercase;
            }}
        </style>
    </head>
    <body>
        <div class="header">
            <h1>Blue Zones Quantified - Analysis Dashboard</h1>
            <p>Comprehensive Analysis of Global Longevity Patterns</p>
        </div>
        
        <div class="summary-box">
            <h2>Analysis Summary</h2>
            <div class="metric">
                <div class="metric-value">{metadata.get('n_observations', 'N/A')}</div>
                <div class="metric-label">Observations</div>
            </div>
            <div class="metric">
                <div class="metric-value">{metadata.get('n_features', 'N/A')}</div>
                <div class="metric-label">Features</div>
            </div>
            <div class="metric">
                <div class="metric-value">{metadata.get('n_predictions', 'N/A')}</div>
                <div class="metric-label">Predictions</div>
            </div>
            <div class="metric">
                <div class="metric-value">{metadata.get('high_score_regions', 'N/A')}</div>
                <div class="metric-label">High-Score Regions</div>
            </div>
        </div>
        
        <div class="summary-box">
            <h2>Key Findings</h2>
            <ul>
                <li>Comprehensive analysis of global longevity patterns completed</li>
                <li>Blue Zone characteristics identified using machine learning</li>
                <li>Predictive models developed with uncertainty quantification</li>
                <li>Geographic patterns analyzed at 5km resolution</li>
                <li>Interactive visualizations created for exploration</li>
            </ul>
        </div>
        
        <div class="summary-box">
            <h2>Available Visualizations</h2>
            <ul>
                <li>Global scatter maps showing life expectancy patterns</li>
                <li>Interactive correlation heatmaps</li>
                <li>Feature distribution comparisons</li>
                <li>Model performance validation plots</li>
                <li>Feature importance rankings</li>
            </ul>
        </div>
        
        <div class="summary-box">
            <h2>Data Sources</h2>
            <p>This analysis combines multiple high-quality data sources:</p>
            <ul>
                <li>Life Expectancy: IHME Global Burden of Disease</li>
                <li>Climate: ERA5 Reanalysis Data</li>
                <li>Demographics: WorldPop</li>
                <li>Socioeconomic: World Bank Open Data</li>
                <li>Geographic: NASA SRTM Elevation</li>
            </ul>
        </div>
        
        <div class="summary-box">
            <h2>Methodology</h2>
            <p>Advanced analytical methods used:</p>
            <ul>
                <li>Spatial analysis with 5km global grid</li>
                <li>Machine learning classification (LightGBM)</li>
                <li>Propensity score matching for causal inference</li>
                <li>Ensemble forecasting with uncertainty quantification</li>
                <li>Cross-validation for robust model evaluation</li>
            </ul>
        </div>
    </body>
    </html>
    """
    
    return dashboard_html

def extract_analysis_metadata(data_sources):
    """
    Extract metadata from analysis results
    """
    metadata = {}
    
    # Count observations and features from primary dataset
    if not primary_data.empty:
        metadata['n_observations'] = len(primary_data)
        metadata['n_features'] = len(primary_data.columns) - 1  # Exclude geo_id
    
    # Predictions metadata
    predictions = data_sources.get('predictions', pd.DataFrame())
    if not predictions.empty:
        metadata['n_predictions'] = len(predictions)
        if 'blue_zone_decile' in predictions.columns:
            metadata['high_score_regions'] = len(predictions[predictions['blue_zone_decile'] >= 8])
    
    return metadata

# Create and display dashboard
dashboard_html = create_summary_dashboard()

# Display in notebook
display(HTML(dashboard_html))
print("Dashboard created successfully")

Dashboard created successfully


## Export and Save Results

In [12]:
def save_visualizations_and_data():
    """
    Save all visualizations and create data exports
    """
    # Create output directories
    output_dir = Path('../outputs/visualizations')
    output_dir.mkdir(parents=True, exist_ok=True)
    
    exports_dir = output_dir / 'exports'
    exports_dir.mkdir(exist_ok=True)
    
    saved_files = []
    
    # Save dashboard HTML
    try:
        dashboard_path = output_dir / 'blue_zones_dashboard.html'
        with open(dashboard_path, 'w', encoding='utf-8') as f:
            f.write(dashboard_html)
        saved_files.append(str(dashboard_path))
        logger.info(f"Dashboard saved to: {dashboard_path}")
    except Exception as e:
        logger.error(f"Error saving dashboard: {e}")
    
    # Save Folium map
    if 'folium_map' in locals() and folium_map is not None:
        try:
            map_path = output_dir / 'interactive_map.html'
            folium_map.save(str(map_path))
            saved_files.append(str(map_path))
            logger.info(f"Interactive map saved to: {map_path}")
        except Exception as e:
            logger.error(f"Error saving map: {e}")
    
    # Save Plotly figures
    plotly_figures = {
        'life_expectancy_map': locals().get('life_exp_map'),
        'correlation_heatmap': locals().get('correlation_fig'),
        'feature_distributions': locals().get('distributions_fig'),
        'prediction_validation': locals().get('prediction_plot'),
        'feature_importance': locals().get('importance_fig')
    }
    
    for name, fig in plotly_figures.items():
        if fig is not None:
            try:
                fig_path = output_dir / f'{name}.html'
                fig.write_html(str(fig_path))
                saved_files.append(str(fig_path))
                logger.info(f"Figure saved: {fig_path}")
            except Exception as e:
                logger.error(f"Error saving {name}: {e}")
    
    # Export data as CSV for external use
    for name, data in data_sources.items():
        if isinstance(data, pd.DataFrame) and not data.empty:
            try:
                # Limit size for CSV export
                if len(data) > 10000:
                    export_data = data.sample(n=10000, random_state=42)
                    logger.info(f"Sampling {name} to 10,000 rows for CSV export")
                else:
                    export_data = data
                
                csv_path = exports_dir / f'{name}.csv'
                export_data.to_csv(csv_path, index=False)
                saved_files.append(str(csv_path))
                logger.info(f"Data exported: {csv_path} ({len(export_data)} rows)")
            except Exception as e:
                logger.error(f"Error exporting {name}: {e}")
    
    # Create summary report
    try:
        summary_lines = [
            "# Blue Zones Analysis - Visualization Summary",
            "",
            "## Files Created",
            ""
        ]
        
        for file_path in saved_files:
            file_name = Path(file_path).name
            summary_lines.append(f"- {file_name}")
        
        summary_lines.extend([
            "",
            "## Visualization Types",
            "",
            "- Interactive global maps",
            "- Statistical correlation analysis",
            "- Feature distribution comparisons",
            "- Model performance validation",
            "- Comprehensive dashboard",
            "",
            f"## Analysis Date: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}",
            ""
        ])
        
        summary_path = output_dir / 'visualization_summary.md'
        with open(summary_path, 'w', encoding='utf-8') as f:
            f.write('\n'.join(summary_lines))
        
        logger.info(f"Summary report saved to: {summary_path}")
    except Exception as e:
        logger.error(f"Error creating summary: {e}")
    
    return saved_files

# Save all visualizations and data
saved_files = save_visualizations_and_data()

print(f"\nVisualization creation completed!")
print(f"Total files saved: {len(saved_files)}")
print(f"Output directory: ../outputs/visualizations/")

if saved_files:
    print("\nSaved files:")
    for file_path in saved_files:
        print(f"  - {Path(file_path).name}")

15:40:36 - INFO - Dashboard saved to: ../outputs/visualizations/blue_zones_dashboard.html


15:40:36 - INFO - Data exported: ../outputs/visualizations/exports/processed_data.csv (2100 rows)


15:40:36 - INFO - Data exported: ../outputs/visualizations/exports/cross_section.csv (100 rows)


15:40:36 - INFO - Summary report saved to: ../outputs/visualizations/visualization_summary.md



Visualization creation completed!
Total files saved: 3
Output directory: ../outputs/visualizations/

Saved files:
  - blue_zones_dashboard.html
  - processed_data.csv
  - cross_section.csv


## Summary

This notebook has created a comprehensive set of interactive visualizations for the Blue Zones analysis:

1. **Global Maps**: Interactive scatter plots showing geographic patterns of longevity
2. **Statistical Analysis**: Correlation heatmaps and feature distribution comparisons
3. **Model Validation**: Prediction vs actual plots and performance metrics
4. **Feature Importance**: Rankings of most predictive variables
5. **Dashboard**: Comprehensive HTML dashboard combining all visualizations
6. **Data Exports**: CSV files for external analysis

All visualizations are saved as standalone HTML files that can be shared and embedded in presentations or reports.