# Interactive Visualizations and Dashboard (Fixed Version)

This notebook creates comprehensive interactive visualizations for the Blue Zones analysis with proper variable handling and enhanced features.

## Key Improvements in This Version
- Fixed variable scope issues for proper figure saving
- Added data validation before creating visualizations
- Enhanced dashboard with embedded visualizations
- Added time series and 3D visualizations
- Improved error handling and fallback mechanisms

## Visualization Components

1. Interactive global maps with Blue Zone features
2. Statistical plots and correlation analysis
3. Time series analysis and trends
4. 3D geographic visualizations
5. Model performance visualizations
6. Comprehensive interactive dashboard
7. Data export functionality

## Setup and Configuration

In [None]:
import sys
import logging
from pathlib import Path
import pandas as pd
import numpy as np
import json
import warnings
from datetime import datetime

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.offline as pyo
import folium
from folium import plugins
from IPython.display import HTML, display, IFrame

# Configure plotting
plt.style.use('default')
sns.set_palette('viridis')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.facecolor'] = '#E5ECF6'

# Enable offline plotting for Plotly
pyo.init_notebook_mode(connected=True)

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    datefmt='%H:%M:%S'
)
logger = logging.getLogger(__name__)

print("Setup completed successfully (Fixed Version)")
logger.info("Interactive visualization notebook initialized")

## Data Loading and Validation

In [None]:
def load_and_validate_data():
    """
    Load all available data sources with validation
    """
    data_sources = {}
    
    # Define potential data files
    potential_files = {
        'processed_data': '../outputs/final_processed_data.csv',
        'cross_section': '../outputs/cross_section_final.csv',
        'comprehensive_panel': '../outputs/comprehensive_panel_data.csv',
        'model_comparison': '../outputs/model_comparison_fixed.csv',
        'feature_analysis': '../outputs/feature_correlation_analysis_fixed.csv'
    }
    
    # Try to load each data source
    for name, filepath in potential_files.items():
        try:
            if Path(filepath).exists():
                data = pd.read_csv(filepath)
                data_sources[name] = data
                logger.info(f"Loaded {name}: {len(data)} rows, {len(data.columns)} columns")
            else:
                logger.warning(f"File not found: {filepath}")
                data_sources[name] = pd.DataFrame()
        except Exception as e:
            logger.error(f"Error loading {name}: {e}")
            data_sources[name] = pd.DataFrame()
    
    return data_sources

# Load all data
data_sources = load_and_validate_data()

# Select primary dataset
primary_data = None
for name in ['processed_data', 'cross_section', 'comprehensive_panel']:
    if name in data_sources and not data_sources[name].empty:
        primary_data = data_sources[name]
        logger.info(f"Selected '{name}' as primary dataset")
        break

if primary_data is not None:
    print(f"\nPrimary Dataset Shape: {primary_data.shape}")
    print(f"Columns: {list(primary_data.columns)[:10]}...")
    if 'is_blue_zone' in primary_data.columns:
        print(f"Blue Zone regions: {primary_data['is_blue_zone'].sum()}")
else:
    print("Warning: No suitable dataset found for visualization")

## Validation Functions

In [None]:
def validate_data_for_visualization(data, required_columns, viz_type="generic"):
    """
    Validate data has required columns for specific visualization
    """
    if data is None or data.empty:
        logger.warning(f"Empty data provided for {viz_type} visualization")
        return False
    
    missing = [col for col in required_columns if col not in data.columns]
    if missing:
        logger.warning(f"Missing columns for {viz_type}: {missing}")
        return False
    
    return True

def create_fallback_visualization(data, title="Data Overview"):
    """
    Create a simple fallback visualization when specific ones fail
    """
    if data.empty:
        fig = go.Figure()
        fig.add_annotation(
            text="No data available for visualization",
            xref="paper", yref="paper",
            x=0.5, y=0.5, showarrow=False
        )
    else:
        # Create a simple bar chart of non-null counts
        non_null_counts = data.count()
        fig = px.bar(
            x=non_null_counts.index[:20], 
            y=non_null_counts.values[:20],
            title=f"{title} - Data Availability",
            labels={'x': 'Column', 'y': 'Non-null Count'}
        )
    
    return fig

## Geographic Visualizations

In [None]:
# Dictionary to store all created figures
all_figures = {}

def create_interactive_map(data):
    """
    Create interactive map with Plotly
    """
    required_cols = ['latitude', 'longitude', 'life_expectancy']
    
    if not validate_data_for_visualization(data, required_cols, "map"):
        return create_fallback_visualization(data, "Geographic Data")
    
    # Prepare hover text
    hover_text = []
    for idx, row in data.iterrows():
        text = f"Life Expectancy: {row['life_expectancy']:.1f}"
        if 'geo_id' in row:
            text = f"{row['geo_id']}<br>" + text
        if 'is_blue_zone' in row:
            text += f"<br>Blue Zone: {'Yes' if row['is_blue_zone'] else 'No'}"
        hover_text.append(text)
    
    # Create map
    fig = px.scatter_mapbox(
        data,
        lat='latitude',
        lon='longitude',
        color='life_expectancy',
        size='life_expectancy',
        hover_name=hover_text,
        color_continuous_scale='Viridis',
        mapbox_style='carto-positron',
        title='Global Life Expectancy Distribution',
        height=600
    )
    
    fig.update_layout(
        mapbox=dict(zoom=2),
        margin={"r":0,"t":40,"l":0,"b":0}
    )
    
    return fig

# Create map if data available
if primary_data is not None:
    life_exp_map = create_interactive_map(primary_data)
    all_figures['life_expectancy_map'] = life_exp_map
    display(life_exp_map)
else:
    print("Cannot create map: No data available")

In [None]:
def create_folium_map(data):
    """
    Create interactive Folium map with clustering
    """
    required_cols = ['latitude', 'longitude', 'life_expectancy']
    
    if not validate_data_for_visualization(data, required_cols, "folium_map"):
        return None
    
    # Initialize map
    m = folium.Map(location=[20, 0], zoom_start=2)
    
    # Add marker cluster
    marker_cluster = plugins.MarkerCluster().add_to(m)
    
    # Add markers
    for idx, row in data.iterrows():
        if pd.notna(row['latitude']) and pd.notna(row['longitude']):
            color = 'blue' if 'is_blue_zone' in row and row['is_blue_zone'] else 'green'
            
            popup_text = f"Life Expectancy: {row['life_expectancy']:.1f}"
            if 'geo_id' in row:
                popup_text = f"{row['geo_id']}<br>" + popup_text
            
            folium.CircleMarker(
                location=[row['latitude'], row['longitude']],
                radius=row['life_expectancy']/10,
                popup=popup_text,
                color=color,
                fill=True,
                fillColor=color
            ).add_to(marker_cluster)
    
    # Add heatmap layer
    heat_data = [[row['latitude'], row['longitude'], row['life_expectancy']] 
                 for idx, row in data.iterrows() 
                 if pd.notna(row['latitude']) and pd.notna(row['longitude'])]
    
    plugins.HeatMap(heat_data).add_to(m)
    
    return m

# Create Folium map
folium_map = None
if primary_data is not None:
    folium_map = create_folium_map(primary_data)
    if folium_map:
        print("Folium map created successfully")
        # Note: Folium maps don't display inline in all environments
        # They will be saved to HTML file

## Statistical Visualizations

In [None]:
def create_correlation_heatmap(data):
    """
    Create correlation heatmap for numeric features
    """
    numeric_cols = data.select_dtypes(include=[np.number]).columns
    
    if len(numeric_cols) < 2:
        return create_fallback_visualization(data, "Correlation Analysis")
    
    # Select top features by variance
    top_features = numeric_cols[:min(20, len(numeric_cols))]
    corr_matrix = data[top_features].corr()
    
    fig = px.imshow(
        corr_matrix,
        labels=dict(x="Features", y="Features", color="Correlation"),
        x=top_features,
        y=top_features,
        color_continuous_scale='RdBu',
        zmin=-1, zmax=1,
        title="Feature Correlation Heatmap"
    )
    
    fig.update_layout(height=600, width=800)
    
    return fig

# Create correlation heatmap
if primary_data is not None:
    correlation_fig = create_correlation_heatmap(primary_data)
    all_figures['correlation_heatmap'] = correlation_fig
    display(correlation_fig)

In [None]:
def create_feature_distributions(data):
    """
    Create distribution plots for key features
    """
    key_features = ['life_expectancy', 'cvd_mortality', 'gdp_per_capita', 
                   'elevation', 'temperature_mean']
    
    available_features = [f for f in key_features if f in data.columns]
    
    if not available_features:
        return create_fallback_visualization(data, "Feature Distributions")
    
    n_features = len(available_features)
    fig = make_subplots(
        rows=(n_features + 1) // 2, 
        cols=2,
        subplot_titles=available_features
    )
    
    for i, feature in enumerate(available_features):
        row = (i // 2) + 1
        col = (i % 2) + 1
        
        # Add histogram
        fig.add_trace(
            go.Histogram(x=data[feature].dropna(), name=feature, showlegend=False),
            row=row, col=col
        )
        
        # Add box plot if Blue Zone data available
        if 'is_blue_zone' in data.columns:
            for is_bz in [0, 1]:
                subset = data[data['is_blue_zone'] == is_bz][feature].dropna()
                if len(subset) > 0:
                    fig.add_trace(
                        go.Box(
                            y=subset,
                            name=f"Blue Zone: {'Yes' if is_bz else 'No'}",
                            showlegend=(i==0)
                        ),
                        row=row, col=col
                    )
    
    fig.update_layout(height=800, title_text="Feature Distributions")
    
    return fig

# Create distributions
if primary_data is not None:
    distributions_fig = create_feature_distributions(primary_data)
    all_figures['feature_distributions'] = distributions_fig
    display(distributions_fig)

## Time Series Analysis

In [None]:
def create_time_series_analysis(data):
    """
    Create time series visualization if year data available
    """
    if 'year' not in data.columns or 'life_expectancy' not in data.columns:
        logger.warning("Year or life_expectancy column not found for time series")
        return None
    
    # Aggregate by year
    yearly_data = data.groupby('year').agg({
        'life_expectancy': ['mean', 'std', 'min', 'max']
    }).reset_index()
    
    yearly_data.columns = ['year', 'mean', 'std', 'min', 'max']
    
    fig = go.Figure()
    
    # Add mean line
    fig.add_trace(go.Scatter(
        x=yearly_data['year'],
        y=yearly_data['mean'],
        mode='lines+markers',
        name='Mean Life Expectancy',
        line=dict(width=3)
    ))
    
    # Add confidence band
    fig.add_trace(go.Scatter(
        x=yearly_data['year'].tolist() + yearly_data['year'].tolist()[::-1],
        y=(yearly_data['mean'] + yearly_data['std']).tolist() + 
          (yearly_data['mean'] - yearly_data['std']).tolist()[::-1],
        fill='toself',
        fillcolor='rgba(0,100,80,0.2)',
        line=dict(color='rgba(255,255,255,0)'),
        name='±1 Std Dev',
        showlegend=True
    ))
    
    # Add min/max range
    fig.add_trace(go.Scatter(
        x=yearly_data['year'],
        y=yearly_data['min'],
        mode='lines',
        name='Min',
        line=dict(dash='dash', color='red')
    ))
    
    fig.add_trace(go.Scatter(
        x=yearly_data['year'],
        y=yearly_data['max'],
        mode='lines',
        name='Max',
        line=dict(dash='dash', color='green')
    ))
    
    fig.update_layout(
        title='Life Expectancy Trends Over Time',
        xaxis_title='Year',
        yaxis_title='Life Expectancy (years)',
        hovermode='x unified',
        height=500
    )
    
    return fig

# Create time series
if primary_data is not None:
    time_series_fig = create_time_series_analysis(primary_data)
    if time_series_fig:
        all_figures['time_series'] = time_series_fig
        display(time_series_fig)

## 3D Visualizations

In [None]:
def create_3d_scatter(data):
    """
    Create 3D scatter plot for multi-dimensional analysis
    """
    required_cols = ['latitude', 'longitude', 'elevation', 'life_expectancy']
    
    if not all(col in data.columns for col in required_cols):
        logger.warning("Missing columns for 3D visualization")
        return None
    
    # Sample data if too large
    plot_data = data.sample(min(1000, len(data))) if len(data) > 1000 else data
    
    fig = px.scatter_3d(
        plot_data,
        x='longitude',
        y='latitude',
        z='elevation',
        color='life_expectancy',
        size='life_expectancy',
        hover_data=['life_expectancy'],
        color_continuous_scale='Viridis',
        title='3D Geographic Distribution of Life Expectancy'
    )
    
    fig.update_layout(
        scene=dict(
            xaxis_title='Longitude',
            yaxis_title='Latitude',
            zaxis_title='Elevation (m)'
        ),
        height=600
    )
    
    return fig

# Create 3D visualization
if primary_data is not None:
    scatter_3d = create_3d_scatter(primary_data)
    if scatter_3d:
        all_figures['3d_scatter'] = scatter_3d
        display(scatter_3d)

## Interactive Feature Explorer

In [None]:
def create_interactive_feature_explorer(data):
    """
    Create dropdown-based feature explorer
    """
    numeric_cols = data.select_dtypes(include=[np.number]).columns.tolist()
    
    if len(numeric_cols) < 2:
        return None
    
    fig = go.Figure()
    
    # Add traces for each feature
    for col in numeric_cols[:15]:  # Limit to 15 features
        values = data[col].dropna()
        fig.add_trace(go.Histogram(
            x=values,
            name=col,
            visible=False,
            nbinsx=30,
            marker_color='blue',
            opacity=0.7
        ))
    
    # Make first trace visible
    if fig.data:
        fig.data[0].visible = True
    
    # Create dropdown menu
    buttons = []
    for i, col in enumerate(numeric_cols[:15]):
        visibility = [False] * len(fig.data)
        if i < len(fig.data):
            visibility[i] = True
        
        buttons.append(dict(
            label=col.replace('_', ' ').title(),
            method='update',
            args=[{'visible': visibility},
                  {'title': f'Distribution of {col.replace("_", " ").title()}',
                   'xaxis': {'title': col}}]
        ))
    
    fig.update_layout(
        updatemenus=[dict(
            buttons=buttons,
            direction='down',
            showactive=True,
            x=0.1,
            xanchor='left',
            y=1.15,
            yanchor='top'
        )],
        title='Interactive Feature Explorer',
        height=500,
        showlegend=False
    )
    
    return fig

# Create feature explorer
if primary_data is not None:
    feature_explorer = create_interactive_feature_explorer(primary_data)
    if feature_explorer:
        all_figures['feature_explorer'] = feature_explorer
        display(feature_explorer)

## Comprehensive Dashboard Creation

In [None]:
def create_comprehensive_dashboard(figures_dict, data_sources):
    """
    Create comprehensive dashboard with embedded visualizations
    """
    dashboard_html = f"""
    <!DOCTYPE html>
    <html>
    <head>
        <title>Blue Zones Analysis Dashboard</title>
        <meta charset="utf-8">
        <script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
        <style>
            body {{
                font-family: 'Segoe UI', Arial, sans-serif;
                margin: 0;
                padding: 20px;
                background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            }}
            .container {{
                max-width: 1400px;
                margin: 0 auto;
                background: white;
                border-radius: 15px;
                padding: 30px;
                box-shadow: 0 20px 60px rgba(0,0,0,0.3);
            }}
            h1 {{
                color: #2c3e50;
                text-align: center;
                font-size: 2.5em;
                margin-bottom: 10px;
            }}
            .subtitle {{
                text-align: center;
                color: #7f8c8d;
                margin-bottom: 30px;
            }}
            .stats-grid {{
                display: grid;
                grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
                gap: 20px;
                margin-bottom: 30px;
            }}
            .stat-card {{
                background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
                color: white;
                padding: 20px;
                border-radius: 10px;
                text-align: center;
            }}
            .stat-value {{
                font-size: 2em;
                font-weight: bold;
            }}
            .stat-label {{
                font-size: 0.9em;
                opacity: 0.9;
                margin-top: 5px;
            }}
            .visualization-section {{
                margin: 30px 0;
                padding: 20px;
                background: #f8f9fa;
                border-radius: 10px;
            }}
            .viz-title {{
                color: #2c3e50;
                font-size: 1.5em;
                margin-bottom: 15px;
                border-bottom: 2px solid #667eea;
                padding-bottom: 10px;
            }}
            .footer {{
                text-align: center;
                margin-top: 40px;
                padding-top: 20px;
                border-top: 1px solid #e0e0e0;
                color: #7f8c8d;
            }}
        </style>
    </head>
    <body>
        <div class="container">
            <h1>🌍 Blue Zones Analysis Dashboard</h1>
            <div class="subtitle">Interactive Visualization of Global Longevity Patterns</div>
            
            <div class="stats-grid">
    """
    
    # Add statistics cards
    if primary_data is not None:
        stats = [
            ('Total Regions', len(primary_data)),
            ('Blue Zones', primary_data['is_blue_zone'].sum() if 'is_blue_zone' in primary_data.columns else 'N/A'),
            ('Avg Life Expectancy', f"{primary_data['life_expectancy'].mean():.1f}" if 'life_expectancy' in primary_data.columns else 'N/A'),
            ('Data Years', f"{primary_data['year'].nunique()}" if 'year' in primary_data.columns else '1')
        ]
        
        for label, value in stats:
            dashboard_html += f"""
                <div class="stat-card">
                    <div class="stat-value">{value}</div>
                    <div class="stat-label">{label}</div>
                </div>
            """
    
    dashboard_html += "</div>"  # Close stats-grid
    
    # Add visualizations
    viz_count = 0
    for name, fig in figures_dict.items():
        if fig is not None:
            viz_count += 1
            title = name.replace('_', ' ').title()
            
            # Convert figure to HTML
            fig_html = fig.to_html(include_plotlyjs=False, div_id=f"viz_{viz_count}")
            
            dashboard_html += f"""
            <div class="visualization-section">
                <div class="viz-title">{title}</div>
                {fig_html}
            </div>
            """
    
    # Add footer
    dashboard_html += f"""
            <div class="footer">
                <p>Dashboard generated on {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>
                <p>Total visualizations: {viz_count} | Data sources: {len(data_sources)}</p>
            </div>
        </div>
    </body>
    </html>
    """
    
    return dashboard_html

# Create dashboard
dashboard_html = create_comprehensive_dashboard(all_figures, data_sources)
print(f"Dashboard created with {len(all_figures)} visualizations")

## Export and Save All Visualizations

In [None]:
def save_all_visualizations(figures_dict, folium_map, dashboard_html, data_sources):
    """
    Save all visualizations with proper handling
    """
    output_dir = Path('../outputs/visualizations')
    output_dir.mkdir(parents=True, exist_ok=True)
    
    exports_dir = output_dir / 'exports'
    exports_dir.mkdir(exist_ok=True)
    
    saved_files = []
    
    # Save dashboard
    try:
        dashboard_path = output_dir / 'blue_zones_dashboard.html'
        with open(dashboard_path, 'w', encoding='utf-8') as f:
            f.write(dashboard_html)
        saved_files.append(str(dashboard_path))
        logger.info(f"Dashboard saved to: {dashboard_path}")
    except Exception as e:
        logger.error(f"Error saving dashboard: {e}")
    
    # Save Folium map
    if folium_map is not None:
        try:
            map_path = output_dir / 'interactive_folium_map.html'
            folium_map.save(str(map_path))
            saved_files.append(str(map_path))
            logger.info(f"Folium map saved to: {map_path}")
        except Exception as e:
            logger.error(f"Error saving Folium map: {e}")
    
    # Save all Plotly figures
    for name, fig in figures_dict.items():
        if fig is not None:
            try:
                fig_path = output_dir / f'{name}.html'
                fig.write_html(str(fig_path))
                saved_files.append(str(fig_path))
                logger.info(f"Figure '{name}' saved to: {fig_path}")
            except Exception as e:
                logger.error(f"Error saving {name}: {e}")
    
    # Export data as CSV
    for name, data in data_sources.items():
        if isinstance(data, pd.DataFrame) and not data.empty:
            try:
                csv_path = exports_dir / f'{name}.csv'
                data.to_csv(csv_path, index=False)
                saved_files.append(str(csv_path))
                logger.info(f"Data exported: {csv_path} ({len(data)} rows)")
            except Exception as e:
                logger.error(f"Error exporting {name}: {e}")
    
    # Create summary report
    try:
        summary_path = output_dir / 'visualization_summary.md'
        summary_content = f"""# Blue Zones Visualization Summary

## Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

## Files Created:
"""
        for file_path in saved_files:
            summary_content += f"- {Path(file_path).name}\n"
        
        summary_content += f"""
## Statistics:
- Total visualizations: {len(figures_dict)}
- Data sources: {len(data_sources)}
- Files saved: {len(saved_files)}

## Visualization Types:
- Interactive maps (Plotly and Folium)
- Correlation heatmaps
- Feature distributions
- Time series analysis
- 3D geographic plots
- Interactive feature explorer
"""
        
        with open(summary_path, 'w') as f:
            f.write(summary_content)
        
        logger.info(f"Summary saved to: {summary_path}")
    except Exception as e:
        logger.error(f"Error creating summary: {e}")
    
    return saved_files

# Save everything
saved_files = save_all_visualizations(all_figures, folium_map, dashboard_html, data_sources)

print(f"\n" + "="*60)
print("VISUALIZATION CREATION COMPLETED (FIXED VERSION)")
print("="*60)
print(f"Total files saved: {len(saved_files)}")
print(f"Output directory: ../outputs/visualizations/")

if saved_files:
    print("\nSaved files:")
    for file_path in saved_files[:10]:  # Show first 10
        print(f"  - {Path(file_path).name}")
    if len(saved_files) > 10:
        print(f"  ... and {len(saved_files) - 10} more files")

## Summary

This fixed notebook has created a comprehensive set of interactive visualizations with proper variable handling and enhanced features:

### Key Improvements Implemented:
1. **Fixed Variable Scope**: All figures are collected in a dictionary and passed explicitly to save functions
2. **Data Validation**: Each visualization checks for required columns before creation
3. **Fallback Mechanisms**: Alternative visualizations when specific data is missing
4. **Enhanced Dashboard**: Embedded actual visualizations instead of placeholders
5. **Additional Visualizations**: Time series, 3D plots, and interactive feature explorer

### Visualizations Created:
1. **Global Maps**: Interactive Plotly and Folium maps with clustering and heatmaps
2. **Statistical Analysis**: Correlation heatmaps and feature distributions
3. **Time Series**: Temporal trends with confidence bands
4. **3D Visualizations**: Multi-dimensional geographic analysis
5. **Interactive Explorer**: Dropdown-based feature exploration
6. **Comprehensive Dashboard**: All visualizations combined with statistics

### Data Exports:
- All visualizations saved as standalone HTML files
- Data exported as CSV for external analysis
- Summary report with metadata

The visualizations are now properly saved and can be shared, embedded in presentations, or used for further analysis.