# Module 12: Interactive Dashboards and Streamlit

## 🚀 Building Dynamic Data Applications

**Learning Objectives:**
- Master Streamlit for rapid dashboard development
- Create interactive visualizations with user controls
- Build multi-page data applications with navigation
- Implement real-time data updates and filtering
- Deploy dashboards for sharing and collaboration
- Optimize performance for large datasets and multiple users

**Topics Covered:**
1. **Streamlit Fundamentals** - Layout, widgets, and basic interactivity
2. **Advanced Components** - Custom widgets, multi-select, date ranges
3. **Data Integration** - File uploads, database connections, APIs
4. **Interactive Plotting** - Plotly integration, dynamic updates
5. **Multi-Page Apps** - Navigation, state management, user sessions
6. **Performance Optimization** - Caching, lazy loading, efficient updates

---

### Why Interactive Dashboards Matter

- **User Engagement**: Interactive elements encourage data exploration
- **Self-Service Analytics**: Enable stakeholders to answer their own questions
- **Real-Time Insights**: Dynamic updates provide current information
- **Accessibility**: Web-based dashboards work across devices and platforms

Let's build professional interactive dashboards! 🎯

In [2]:
# Essential libraries for interactive dashboards
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import time
import warnings
warnings.filterwarnings('ignore')

# Dashboard styling and utilities
import io
import base64
from PIL import Image

# Data sources
from sklearn.datasets import load_iris, load_wine, make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

print("🚀 Interactive Dashboard Libraries Loaded!")
print("✅ Streamlit version:", st.__version__)
import plotly
print("✅ Plotly version:", plotly.__version__)
print("✅ Ready to build interactive dashboards!")

# Create comprehensive sample datasets for dashboard demonstrations
def load_dashboard_data():
    """Load and prepare sample datasets for dashboard examples"""
    
    # Dataset 1: Sales data with time series
    np.random.seed(42)
    dates = pd.date_range('2023-01-01', '2024-12-31', freq='D')
    n_days = len(dates)
    
    # Simulate realistic sales patterns
    trend = np.linspace(1000, 1500, n_days)
    seasonality = 200 * np.sin(2 * np.pi * np.arange(n_days) / 365.25)
    weekly_pattern = 100 * np.sin(2 * np.pi * np.arange(n_days) / 7)
    noise = np.random.normal(0, 50, n_days)
    
    sales_data = pd.DataFrame({
        'date': dates,
        'sales': trend + seasonality + weekly_pattern + noise,
        'region': np.random.choice(['North', 'South', 'East', 'West'], n_days),
        'product': np.random.choice(['A', 'B', 'C', 'D'], n_days),
        'sales_rep': np.random.choice([f'Rep_{i}' for i in range(1, 11)], n_days)
    })
    sales_data['sales'] = np.maximum(sales_data['sales'], 100)  # Ensure positive sales
    
    # Dataset 2: Enhanced Iris dataset
    iris = load_iris()
    iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
    iris_df['species'] = iris.target_names[iris.target]
    iris_df['measurement_date'] = pd.date_range('2024-01-01', periods=len(iris_df), freq='D')
    iris_df['researcher'] = np.random.choice(['Dr. Smith', 'Dr. Johnson', 'Dr. Brown'], len(iris_df))
    
    # Dataset 3: Customer analytics data
    n_customers = 1000
    customer_data = pd.DataFrame({
        'customer_id': range(1, n_customers + 1),
        'age': np.random.normal(40, 15, n_customers).astype(int),
        'income': np.random.lognormal(10.5, 0.5, n_customers),
        'spending': np.random.gamma(2, 500, n_customers),
        'satisfaction': np.random.beta(3, 1, n_customers) * 5,
        'segment': np.random.choice(['Premium', 'Standard', 'Basic'], n_customers, p=[0.2, 0.5, 0.3]),
        'signup_date': pd.date_range('2020-01-01', '2024-01-01', periods=n_customers)
    })
    customer_data['age'] = np.clip(customer_data['age'], 18, 80)
    
    return sales_data, iris_df, customer_data

# Load the datasets
sales_data, iris_df, customer_data = load_dashboard_data()

print("📊 Sample Datasets Created:")
print(f"   • Sales Data: {len(sales_data):,} records from {sales_data['date'].min()} to {sales_data['date'].max()}")
print(f"   • Iris Dataset: {len(iris_df)} botanical measurements")
print(f"   • Customer Analytics: {len(customer_data):,} customer profiles")
print("✅ Ready for interactive dashboard development!")

🚀 Interactive Dashboard Libraries Loaded!
✅ Streamlit version: 1.48.1
✅ Plotly version: 6.3.0
✅ Ready to build interactive dashboards!
📊 Sample Datasets Created:
   • Sales Data: 731 records from 2023-01-01 00:00:00 to 2024-12-31 00:00:00
   • Iris Dataset: 150 botanical measurements
   • Customer Analytics: 1,000 customer profiles
✅ Ready for interactive dashboard development!


## 🎮 Creating Interactive Streamlit Dashboards

Since Streamlit apps run as web applications, we'll create dashboard code files that can be launched separately. Let's build several examples showcasing different dashboard patterns and capabilities.

In [3]:
# Create dashboard directory
import os
dashboard_dir = '/Users/sanjeevadodlapati/Downloads/Repos/DataVisualization-Comprehensive-Tutorial/outputs/dashboards'
os.makedirs(dashboard_dir, exist_ok=True)

# Dashboard 1: Sales Analytics Dashboard
sales_dashboard_code = '''
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from datetime import datetime, timedelta

# Configure page
st.set_page_config(
    page_title="Sales Analytics Dashboard",
    page_icon="📊",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Custom CSS for styling
st.markdown("""
<style>
    .main-header {
        font-size: 3rem;
        color: #1f77b4;
        text-align: center;
        margin-bottom: 2rem;
    }
    .metric-card {
        background-color: #f0f2f6;
        padding: 1rem;
        border-radius: 0.5rem;
        margin: 0.5rem 0;
    }
    .sidebar .sidebar-content {
        background-color: #262730;
    }
</style>
""", unsafe_allow_html=True)

# Sample data generation (same as in notebook)
@st.cache_data
def load_sales_data():
    np.random.seed(42)
    dates = pd.date_range('2023-01-01', '2024-12-31', freq='D')
    n_days = len(dates)
    
    trend = np.linspace(1000, 1500, n_days)
    seasonality = 200 * np.sin(2 * np.pi * np.arange(n_days) / 365.25)
    weekly_pattern = 100 * np.sin(2 * np.pi * np.arange(n_days) / 7)
    noise = np.random.normal(0, 50, n_days)
    
    sales_data = pd.DataFrame({
        'date': dates,
        'sales': trend + seasonality + weekly_pattern + noise,
        'region': np.random.choice(['North', 'South', 'East', 'West'], n_days),
        'product': np.random.choice(['Product A', 'Product B', 'Product C', 'Product D'], n_days),
        'sales_rep': np.random.choice([f'Rep {i}' for i in range(1, 11)], n_days)
    })
    sales_data['sales'] = np.maximum(sales_data['sales'], 100)
    return sales_data

# Load data
df = load_sales_data()

# Header
st.markdown('<h1 class="main-header">📊 Sales Analytics Dashboard</h1>', unsafe_allow_html=True)

# Sidebar filters
st.sidebar.header("🔧 Filters & Controls")
st.sidebar.markdown("---")

# Date range selector
date_range = st.sidebar.date_input(
    "Select Date Range",
    value=[df['date'].min().date(), df['date'].max().date()],
    min_value=df['date'].min().date(),
    max_value=df['date'].max().date()
)

# Region selector
regions = st.sidebar.multiselect(
    "Select Regions",
    options=df['region'].unique(),
    default=df['region'].unique()
)

# Product selector
products = st.sidebar.multiselect(
    "Select Products",
    options=df['product'].unique(),
    default=df['product'].unique()
)

# Filter data based on selections
if len(date_range) == 2:
    filtered_df = df[
        (df['date'] >= pd.to_datetime(date_range[0])) &
        (df['date'] <= pd.to_datetime(date_range[1])) &
        (df['region'].isin(regions)) &
        (df['product'].isin(products))
    ]
else:
    filtered_df = df[
        (df['region'].isin(regions)) &
        (df['product'].isin(products))
    ]

# Key Metrics Row
st.subheader("📈 Key Performance Indicators")
col1, col2, col3, col4 = st.columns(4)

total_sales = filtered_df['sales'].sum()
avg_daily_sales = filtered_df['sales'].mean()
max_daily_sales = filtered_df['sales'].max()
total_days = len(filtered_df)

with col1:
    st.metric(
        label="Total Sales",
        value=f"${total_sales:,.0f}",
        delta=f"{total_sales/len(filtered_df)*30:,.0f} (30-day proj.)"
    )

with col2:
    st.metric(
        label="Average Daily Sales", 
        value=f"${avg_daily_sales:,.0f}",
        delta=f"{(avg_daily_sales-1200)/1200*100:+.1f}%"
    )

with col3:
    st.metric(
        label="Peak Daily Sales",
        value=f"${max_daily_sales:,.0f}",
        delta="Record high" if max_daily_sales > 1800 else "Within range"
    )

with col4:
    st.metric(
        label="Days in Period",
        value=f"{total_days:,}",
        delta=f"{total_days} days selected"
    )

# Charts Row 1
st.subheader("📊 Sales Trends Analysis")
col1, col2 = st.columns([2, 1])

with col1:
    # Time series chart
    daily_sales = filtered_df.groupby('date')['sales'].sum().reset_index()
    fig_ts = px.line(
        daily_sales, 
        x='date', 
        y='sales',
        title='Daily Sales Trend',
        color_discrete_sequence=['#1f77b4']
    )
    fig_ts.update_layout(
        xaxis_title="Date",
        yaxis_title="Sales ($)",
        hovermode='x unified'
    )
    st.plotly_chart(fig_ts, use_container_width=True)

with col2:
    # Regional distribution
    regional_sales = filtered_df.groupby('region')['sales'].sum().reset_index()
    fig_pie = px.pie(
        regional_sales,
        values='sales',
        names='region',
        title='Sales by Region'
    )
    st.plotly_chart(fig_pie, use_container_width=True)

# Charts Row 2
col1, col2 = st.columns(2)

with col1:
    # Product performance
    product_sales = filtered_df.groupby('product')['sales'].agg(['sum', 'mean', 'count']).reset_index()
    product_sales.columns = ['product', 'total_sales', 'avg_sales', 'transaction_count']
    
    fig_bar = px.bar(
        product_sales,
        x='product',
        y='total_sales',
        title='Total Sales by Product',
        color='avg_sales',
        color_continuous_scale='viridis'
    )
    st.plotly_chart(fig_bar, use_container_width=True)

with col2:
    # Sales rep performance
    rep_performance = filtered_df.groupby('sales_rep')['sales'].agg(['sum', 'count']).reset_index()
    rep_performance.columns = ['sales_rep', 'total_sales', 'transaction_count']
    rep_performance = rep_performance.sort_values('total_sales', ascending=True).tail(10)
    
    fig_horizontal = px.bar(
        rep_performance,
        x='total_sales',
        y='sales_rep',
        orientation='h',
        title='Top 10 Sales Representatives',
        color='total_sales',
        color_continuous_scale='plasma'
    )
    st.plotly_chart(fig_horizontal, use_container_width=True)

# Advanced Analytics Section
st.subheader("🧪 Advanced Analytics")

# Moving averages
st.subheader("📈 Moving Average Analysis")
ma_days = st.slider("Moving Average Days", min_value=7, max_value=60, value=30)

daily_sales_ma = daily_sales.copy()
daily_sales_ma[f'MA_{ma_days}'] = daily_sales_ma['sales'].rolling(window=ma_days).mean()

fig_ma = go.Figure()
fig_ma.add_trace(go.Scatter(
    x=daily_sales_ma['date'],
    y=daily_sales_ma['sales'],
    mode='lines',
    name='Daily Sales',
    line=dict(color='lightblue', width=1),
    opacity=0.7
))
fig_ma.add_trace(go.Scatter(
    x=daily_sales_ma['date'],
    y=daily_sales_ma[f'MA_{ma_days}'],
    mode='lines',
    name=f'{ma_days}-Day Moving Average',
    line=dict(color='red', width=3)
))
fig_ma.update_layout(
    title=f'Sales Trend with {ma_days}-Day Moving Average',
    xaxis_title='Date',
    yaxis_title='Sales ($)'
)
st.plotly_chart(fig_ma, use_container_width=True)

# Data Table
st.subheader("📋 Detailed Data")
if st.checkbox("Show raw data"):
    st.dataframe(
        filtered_df.head(100),
        use_container_width=True
    )

# Summary statistics
st.subheader("📊 Summary Statistics")
col1, col2 = st.columns(2)

with col1:
    st.write("**Sales Statistics**")
    st.write(filtered_df['sales'].describe())

with col2:
    st.write("**Regional Breakdown**")
    regional_stats = filtered_df.groupby('region')['sales'].agg(['count', 'mean', 'std']).round(2)
    st.write(regional_stats)

# Footer
st.markdown("---")
st.markdown("**Dashboard Created with Streamlit** | Last Updated: " + datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
'''

# Save the sales dashboard
with open(f"{dashboard_dir}/sales_dashboard.py", "w") as f:
    f.write(sales_dashboard_code)

print("📊 Sales Analytics Dashboard Created!")
print(f"   📁 File: {dashboard_dir}/sales_dashboard.py")
print("   🚀 To run: streamlit run sales_dashboard.py")
print()

# Dashboard 2: Scientific Data Explorer
scientific_dashboard_code = '''
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.datasets import load_iris, load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import seaborn as sns

st.set_page_config(
    page_title="Scientific Data Explorer",
    page_icon="🔬",
    layout="wide"
)

st.title("🔬 Scientific Data Explorer")
st.markdown("**Interactive analysis of scientific datasets with machine learning insights**")

# Sidebar for dataset selection
st.sidebar.header("🧪 Dataset Selection")
dataset_choice = st.sidebar.selectbox(
    "Choose a dataset:",
    ["Iris Flower Dataset", "Wine Quality Dataset"]
)

@st.cache_data
def load_scientific_data(dataset_name):
    if dataset_name == "Iris Flower Dataset":
        data = load_iris()
        df = pd.DataFrame(data.data, columns=data.feature_names)
        df['target'] = data.target
        df['species'] = [data.target_names[i] for i in data.target]
        return df, data.feature_names, 'species'
    else:  # Wine dataset
        data = load_wine()
        df = pd.DataFrame(data.data, columns=data.feature_names)
        df['target'] = data.target
        df['wine_class'] = [f'Class {i}' for i in data.target]
        return df, data.feature_names, 'wine_class'

df, feature_names, target_col = load_scientific_data(dataset_choice)

# Analysis options
st.sidebar.header("🔍 Analysis Options")
analysis_type = st.sidebar.selectbox(
    "Select Analysis Type:",
    ["Exploratory Data Analysis", "Principal Component Analysis", "Clustering Analysis", "Feature Relationships"]
)

if analysis_type == "Exploratory Data Analysis":
    st.header("📊 Exploratory Data Analysis")
    
    # Dataset overview
    col1, col2, col3 = st.columns(3)
    with col1:
        st.metric("Total Samples", len(df))
    with col2:
        st.metric("Features", len(feature_names))
    with col3:
        st.metric("Classes", df[target_col].nunique())
    
    # Feature selection for visualization
    selected_features = st.multiselect(
        "Select features to analyze:",
        feature_names,
        default=feature_names[:4] if len(feature_names) >= 4 else feature_names
    )
    
    if selected_features:
        # Distribution plots
        st.subheader("📈 Feature Distributions")
        fig_dist = make_subplots(
            rows=2, cols=2,
            subplot_titles=selected_features[:4],
            specs=[[{"type": "xy"}, {"type": "xy"}],
                   [{"type": "xy"}, {"type": "xy"}]]
        )
        
        for i, feature in enumerate(selected_features[:4]):
            row = i // 2 + 1
            col = i % 2 + 1
            
            for class_name in df[target_col].unique():
                class_data = df[df[target_col] == class_name][feature]
                fig_dist.add_trace(
                    go.Histogram(
                        x=class_data,
                        name=f"{class_name}",
                        opacity=0.7,
                        legendgroup=class_name,
                        showlegend=(i == 0)
                    ),
                    row=row, col=col
                )
        
        fig_dist.update_layout(
            title="Feature Distributions by Class",
            height=600,
            barmode='overlay'
        )
        st.plotly_chart(fig_dist, use_container_width=True)
        
        # Correlation matrix
        st.subheader("🔗 Feature Correlation Matrix")
        corr_matrix = df[selected_features].corr()
        
        fig_corr = px.imshow(
            corr_matrix,
            text_auto=True,
            aspect="auto",
            title="Feature Correlation Heatmap",
            color_continuous_scale="RdBu"
        )
        st.plotly_chart(fig_corr, use_container_width=True)

elif analysis_type == "Principal Component Analysis":
    st.header("🎯 Principal Component Analysis")
    
    # PCA computation
    X = df[feature_names]
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    n_components = st.slider("Number of PCA Components", 2, min(len(feature_names), 10), 3)
    pca = PCA(n_components=n_components)
    X_pca = pca.fit_transform(X_scaled)
    
    # Explained variance
    st.subheader("📊 Explained Variance")
    variance_df = pd.DataFrame({
        'Component': [f'PC{i+1}' for i in range(n_components)],
        'Explained_Variance': pca.explained_variance_ratio_,
        'Cumulative_Variance': np.cumsum(pca.explained_variance_ratio_)
    })
    
    fig_var = px.bar(
        variance_df,
        x='Component',
        y='Explained_Variance',
        title='Explained Variance by Principal Component'
    )
    st.plotly_chart(fig_var, use_container_width=True)
    
    # PCA scatter plot
    st.subheader("🎨 PCA Visualization")
    if n_components >= 2:
        pca_df = pd.DataFrame(X_pca[:, :3], columns=[f'PC{i+1}' for i in range(min(3, n_components))])
        pca_df[target_col] = df[target_col]
        
        if n_components >= 3:
            fig_3d = px.scatter_3d(
                pca_df,
                x='PC1',
                y='PC2',
                z='PC3',
                color=target_col,
                title='3D PCA Visualization'
            )
            st.plotly_chart(fig_3d, use_container_width=True)
        
        fig_2d = px.scatter(
            pca_df,
            x='PC1',
            y='PC2',
            color=target_col,
            title='2D PCA Visualization'
        )
        st.plotly_chart(fig_2d, use_container_width=True)

elif analysis_type == "Clustering Analysis":
    st.header("🎯 K-Means Clustering Analysis")
    
    # Feature selection for clustering
    clustering_features = st.multiselect(
        "Select features for clustering:",
        feature_names,
        default=feature_names[:4] if len(feature_names) >= 4 else feature_names
    )
    
    if clustering_features:
        X_cluster = df[clustering_features]
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X_cluster)
        
        # Number of clusters
        n_clusters = st.slider("Number of Clusters", 2, 10, 3)
        
        # Perform clustering
        kmeans = KMeans(n_clusters=n_clusters, random_state=42)
        cluster_labels = kmeans.fit_predict(X_scaled)
        
        # Add cluster labels to dataframe
        df_clustered = df.copy()
        df_clustered['Cluster'] = [f'Cluster {i}' for i in cluster_labels]
        
        # Visualization
        if len(clustering_features) >= 2:
            fig_cluster = px.scatter(
                df_clustered,
                x=clustering_features[0],
                y=clustering_features[1],
                color='Cluster',
                symbol=target_col,
                title=f'K-Means Clustering (k={n_clusters})',
                hover_data=clustering_features[2:] if len(clustering_features) > 2 else None
            )
            st.plotly_chart(fig_cluster, use_container_width=True)
        
        # Cluster centers
        st.subheader("🎯 Cluster Centers")
        centers_df = pd.DataFrame(
            scaler.inverse_transform(kmeans.cluster_centers_),
            columns=clustering_features,
            index=[f'Cluster {i}' for i in range(n_clusters)]
        )
        st.dataframe(centers_df, use_container_width=True)

else:  # Feature Relationships
    st.header("🔗 Feature Relationships Analysis")
    
    # Feature selection
    col1, col2 = st.columns(2)
    with col1:
        x_feature = st.selectbox("Select X-axis feature:", feature_names)
    with col2:
        y_feature = st.selectbox("Select Y-axis feature:", feature_names, index=1)
    
    # Scatter plot with regression
    fig_scatter = px.scatter(
        df,
        x=x_feature,
        y=y_feature,
        color=target_col,
        trendline="ols",
        title=f'{x_feature} vs {y_feature}',
        hover_data=feature_names
    )
    st.plotly_chart(fig_scatter, use_container_width=True)
    
    # Statistical summary
    st.subheader("📊 Statistical Summary")
    correlation = df[x_feature].corr(df[y_feature])
    st.metric("Pearson Correlation", f"{correlation:.3f}")
    
    # Box plots
    st.subheader("📦 Distribution by Class")
    col1, col2 = st.columns(2)
    
    with col1:
        fig_box1 = px.box(df, x=target_col, y=x_feature, title=f'{x_feature} by Class')
        st.plotly_chart(fig_box1, use_container_width=True)
    
    with col2:
        fig_box2 = px.box(df, x=target_col, y=y_feature, title=f'{y_feature} by Class')
        st.plotly_chart(fig_box2, use_container_width=True)

# Raw data display
st.sidebar.markdown("---")
if st.sidebar.checkbox("Show Raw Data"):
    st.subheader("📋 Raw Dataset")
    st.dataframe(df, use_container_width=True)

st.markdown("---")
st.markdown("**Scientific Data Explorer** | Built with Streamlit & Scikit-learn")
'''

# Save the scientific dashboard
with open(f"{dashboard_dir}/scientific_explorer.py", "w") as f:
    f.write(scientific_dashboard_code)

print("🔬 Scientific Data Explorer Created!")
print(f"   📁 File: {dashboard_dir}/scientific_explorer.py")
print("   🚀 To run: streamlit run scientific_explorer.py")
print()

# Dashboard 3: Multi-Page Dashboard Template
multipage_dashboard_code = '''
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta

# Configure the page
st.set_page_config(
    page_title="Multi-Page Dashboard",
    page_icon="📚",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Initialize session state
if 'page' not in st.session_state:
    st.session_state.page = 'Overview'

# Navigation
st.sidebar.title("📚 Navigation")
pages = {
    "📊 Overview": "Overview",
    "📈 Analytics": "Analytics", 
    "🎯 Predictions": "Predictions",
    "⚙️ Settings": "Settings"
}

selected_page = st.sidebar.radio("Go to:", list(pages.keys()))
st.session_state.page = pages[selected_page]

# Sample data
@st.cache_data
def generate_sample_data():
    np.random.seed(42)
    dates = pd.date_range('2024-01-01', periods=365, freq='D')
    data = pd.DataFrame({
        'date': dates,
        'metric_a': np.cumsum(np.random.randn(365)) + 100,
        'metric_b': np.cumsum(np.random.randn(365)) + 50,
        'category': np.random.choice(['X', 'Y', 'Z'], 365),
        'region': np.random.choice(['North', 'South', 'East', 'West'], 365)
    })
    return data

df = generate_sample_data()

# Page content
if st.session_state.page == "Overview":
    st.title("📊 Dashboard Overview")
    st.markdown("Welcome to the multi-page interactive dashboard!")
    
    # Key metrics
    col1, col2, col3, col4 = st.columns(4)
    with col1:
        st.metric("Total Records", len(df), delta="365 days")
    with col2:
        st.metric("Avg Metric A", f"{df['metric_a'].mean():.1f}", delta="↑ 2.3%")
    with col3:
        st.metric("Avg Metric B", f"{df['metric_b'].mean():.1f}", delta="↓ 1.1%")
    with col4:
        st.metric("Categories", df['category'].nunique(), delta="3 active")
    
    # Overview charts
    col1, col2 = st.columns(2)
    
    with col1:
        fig1 = px.line(df, x='date', y='metric_a', title='Metric A Trend')
        st.plotly_chart(fig1, use_container_width=True)
    
    with col2:
        category_counts = df['category'].value_counts()
        fig2 = px.pie(values=category_counts.values, names=category_counts.index, title='Category Distribution')
        st.plotly_chart(fig2, use_container_width=True)

elif st.session_state.page == "Analytics":
    st.title("📈 Advanced Analytics")
    
    # Filters
    st.sidebar.header("🔧 Filters")
    date_range = st.sidebar.date_input(
        "Date Range",
        value=[df['date'].min().date(), df['date'].max().date()],
        min_value=df['date'].min().date(),
        max_value=df['date'].max().date()
    )
    
    selected_categories = st.sidebar.multiselect(
        "Categories",
        df['category'].unique(),
        default=df['category'].unique()
    )
    
    # Filter data
    if len(date_range) == 2:
        filtered_df = df[
            (df['date'] >= pd.to_datetime(date_range[0])) &
            (df['date'] <= pd.to_datetime(date_range[1])) &
            (df['category'].isin(selected_categories))
        ]
    else:
        filtered_df = df[df['category'].isin(selected_categories)]
    
    # Analytics content
    tab1, tab2, tab3 = st.tabs(["📊 Trends", "🔍 Correlations", "📋 Statistics"])
    
    with tab1:
        fig = px.line(filtered_df, x='date', y=['metric_a', 'metric_b'], title='Metrics Over Time')
        st.plotly_chart(fig, use_container_width=True)
        
        # Regional analysis
        regional_data = filtered_df.groupby(['region', 'category']).agg({
            'metric_a': 'mean',
            'metric_b': 'mean'
        }).reset_index()
        
        fig_region = px.bar(
            regional_data, 
            x='region', 
            y='metric_a', 
            color='category',
            title='Average Metric A by Region and Category'
        )
        st.plotly_chart(fig_region, use_container_width=True)
    
    with tab2:
        # Correlation analysis
        correlation = filtered_df[['metric_a', 'metric_b']].corr()
        fig_corr = px.imshow(correlation, text_auto=True, title='Metric Correlation')
        st.plotly_chart(fig_corr, use_container_width=True)
        
        # Scatter plot
        fig_scatter = px.scatter(
            filtered_df, 
            x='metric_a', 
            y='metric_b', 
            color='category',
            title='Metric A vs Metric B'
        )
        st.plotly_chart(fig_scatter, use_container_width=True)
    
    with tab3:
        st.subheader("📊 Summary Statistics")
        st.dataframe(filtered_df[['metric_a', 'metric_b']].describe())
        
        st.subheader("📋 Sample Data")
        st.dataframe(filtered_df.head(20))

elif st.session_state.page == "Predictions":
    st.title("🎯 Predictive Analytics")
    st.info("This section would contain machine learning models and predictions.")
    
    # Simple moving average prediction
    window = st.slider("Moving Average Window", 5, 50, 20)
    
    df_pred = df.copy()
    df_pred['ma_metric_a'] = df_pred['metric_a'].rolling(window=window).mean()
    
    # Simple linear extrapolation for demo
    last_values = df_pred['metric_a'].tail(window).values
    trend = np.polyfit(range(window), last_values, 1)
    future_dates = pd.date_range(df['date'].max() + timedelta(days=1), periods=30, freq='D')
    future_values = [trend[0] * (window + i) + trend[1] for i in range(30)]
    
    # Combine historical and predicted
    pred_df = pd.DataFrame({
        'date': list(df['date']) + list(future_dates),
        'metric_a': list(df['metric_a']) + [np.nan] * 30,
        'predicted': [np.nan] * len(df) + future_values
    })
    
    fig_pred = go.Figure()
    fig_pred.add_trace(go.Scatter(
        x=pred_df['date'][:len(df)],
        y=pred_df['metric_a'][:len(df)],
        mode='lines',
        name='Historical',
        line=dict(color='blue')
    ))
    fig_pred.add_trace(go.Scatter(
        x=pred_df['date'][len(df):],
        y=pred_df['predicted'][len(df):],
        mode='lines',
        name='Predicted',
        line=dict(color='red', dash='dash')
    ))
    fig_pred.update_layout(title='Metric A: Historical vs Predicted')
    st.plotly_chart(fig_pred, use_container_width=True)

else:  # Settings
    st.title("⚙️ Dashboard Settings")
    
    # Theme settings
    st.subheader("🎨 Appearance")
    theme_color = st.color_picker("Primary Color", "#1f77b4")
    
    # Data settings
    st.subheader("📊 Data Configuration")
    refresh_rate = st.selectbox("Data Refresh Rate", ["Manual", "5 minutes", "15 minutes", "1 hour"])
    
    # Export settings
    st.subheader("📤 Export Options")
    export_format = st.selectbox("Default Export Format", ["CSV", "Excel", "JSON"])
    
    # Save settings
    if st.button("Save Settings"):
        st.success("Settings saved successfully!")
        
    # Reset button
    if st.button("Reset to Defaults"):
        st.warning("Settings reset to defaults.")

# Footer
st.sidebar.markdown("---")
st.sidebar.markdown("**Multi-Page Dashboard** | Built with Streamlit")
st.sidebar.markdown(f"Current time: {datetime.now().strftime('%H:%M:%S')}")
'''

# Save the multi-page dashboard
with open(f"{dashboard_dir}/multipage_dashboard.py", "w") as f:
    f.write(multipage_dashboard_code)

print("📚 Multi-Page Dashboard Template Created!")
print(f"   📁 File: {dashboard_dir}/multipage_dashboard.py")
print("   🚀 To run: streamlit run multipage_dashboard.py")
print()

print("✅ All Dashboard Examples Created Successfully!")
print(f"📁 Dashboard files location: {dashboard_dir}")
print("\n🚀 To run any dashboard:")
print("   1. Open terminal in the dashboard directory")
print("   2. Run: streamlit run <dashboard_name>.py")
print("   3. Dashboard will open in your web browser")
print("\n📋 Available Dashboards:")
print("   • sales_dashboard.py - Sales analytics with filters and KPIs")
print("   • scientific_explorer.py - Scientific data analysis with ML")
print("   • multipage_dashboard.py - Multi-page app template")

📊 Sales Analytics Dashboard Created!
   📁 File: /Users/sanjeevadodlapati/Downloads/Repos/DataVisualization-Comprehensive-Tutorial/outputs/dashboards/sales_dashboard.py
   🚀 To run: streamlit run sales_dashboard.py

🔬 Scientific Data Explorer Created!
   📁 File: /Users/sanjeevadodlapati/Downloads/Repos/DataVisualization-Comprehensive-Tutorial/outputs/dashboards/scientific_explorer.py
   🚀 To run: streamlit run scientific_explorer.py

📚 Multi-Page Dashboard Template Created!
   📁 File: /Users/sanjeevadodlapati/Downloads/Repos/DataVisualization-Comprehensive-Tutorial/outputs/dashboards/multipage_dashboard.py
   🚀 To run: streamlit run multipage_dashboard.py

✅ All Dashboard Examples Created Successfully!
📁 Dashboard files location: /Users/sanjeevadodlapati/Downloads/Repos/DataVisualization-Comprehensive-Tutorial/outputs/dashboards

🚀 To run any dashboard:
   1. Open terminal in the dashboard directory
   2. Run: streamlit run <dashboard_name>.py
   3. Dashboard will open in your web brows

## ⚡ Performance Optimization & Deployment

Let's explore advanced Streamlit techniques for optimizing dashboard performance and preparing for deployment.

In [4]:
# Performance Optimization Techniques for Streamlit

# 1. Demonstrate caching strategies
print("⚡ STREAMLIT PERFORMANCE OPTIMIZATION TECHNIQUES")
print("=" * 60)

# Example: Advanced caching with @st.cache_data
caching_example = '''
# ✅ GOOD: Efficient data loading with caching
@st.cache_data
def load_large_dataset(file_path, filters=None):
    """Cached function for loading and filtering data"""
    df = pd.read_csv(file_path)
    if filters:
        for column, values in filters.items():
            df = df[df[column].isin(values)]
    return df

# ✅ GOOD: Cached expensive computations  
@st.cache_data
def compute_complex_analysis(data, analysis_type):
    """Cache expensive calculations"""
    if analysis_type == "pca":
        from sklearn.decomposition import PCA
        pca = PCA(n_components=2)
        return pca.fit_transform(data)
    elif analysis_type == "clustering":
        from sklearn.cluster import KMeans
        kmeans = KMeans(n_clusters=3)
        return kmeans.fit_predict(data)

# ✅ GOOD: Session state for user interactions
if 'user_selections' not in st.session_state:
    st.session_state.user_selections = {}

# ❌ BAD: Loading data without caching
# df = pd.read_csv("large_file.csv")  # Runs every time!

# ❌ BAD: Complex computations without caching
# pca_result = PCA().fit_transform(data)  # Recalculates every interaction!
'''

print("📋 Caching Best Practices:")
print("✅ Use @st.cache_data for data loading and transformations")
print("✅ Use @st.cache_resource for ML models and database connections")
print("✅ Use st.session_state for user interaction state")
print("✅ Cache expensive computations (PCA, clustering, etc.)")
print("❌ Don't cache Streamlit widgets or UI elements")
print()

# 2. Create optimized dashboard example
optimized_dashboard_code = '''
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import time

# Configure page with performance settings
st.set_page_config(
    page_title="Optimized Dashboard",
    page_icon="⚡",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Performance monitoring
@st.cache_data
def generate_large_dataset(n_samples=10000):
    """Generate large dataset with caching"""
    X, y = make_classification(
        n_samples=n_samples,
        n_features=20,
        n_informative=15,
        n_redundant=5,
        n_clusters_per_class=1,
        random_state=42
    )
    
    feature_names = [f'feature_{i}' for i in range(20)]
    df = pd.DataFrame(X, columns=feature_names)
    df['target'] = y
    df['category'] = np.random.choice(['A', 'B', 'C'], n_samples)
    
    return df

@st.cache_data
def compute_pca_analysis(data, n_components=2):
    """Cached PCA computation"""
    scaler = StandardScaler()
    data_scaled = scaler.fit_transform(data)
    
    pca = PCA(n_components=n_components)
    pca_result = pca.fit_transform(data_scaled)
    
    return pca_result, pca.explained_variance_ratio_

@st.cache_data
def filter_data(df, selected_categories, feature_range):
    """Cached data filtering"""
    filtered_df = df[df['category'].isin(selected_categories)]
    
    # Apply feature range filtering if provided
    if feature_range:
        for feature, (min_val, max_val) in feature_range.items():
            filtered_df = filtered_df[
                (filtered_df[feature] >= min_val) & 
                (filtered_df[feature] <= max_val)
            ]
    
    return filtered_df

# Initialize session state
if 'data_loaded' not in st.session_state:
    st.session_state.data_loaded = False
if 'analysis_cache' not in st.session_state:
    st.session_state.analysis_cache = {}

# Title and description
st.title("⚡ High-Performance Dashboard")
st.markdown("**Demonstration of optimized Streamlit techniques for large datasets**")

# Performance metrics in sidebar
st.sidebar.header("📊 Performance Metrics")

# Data loading with progress
if not st.session_state.data_loaded:
    with st.spinner("Loading dataset..."):
        start_time = time.time()
        df = generate_large_dataset(10000)
        load_time = time.time() - start_time
        st.session_state.data_loaded = True
        st.session_state.df = df
        st.sidebar.success(f"Data loaded in {load_time:.2f}s")
else:
    df = st.session_state.df
    st.sidebar.info("Data loaded from cache")

# Sidebar controls
st.sidebar.header("🔧 Controls")

# Category filter
selected_categories = st.sidebar.multiselect(
    "Select Categories",
    options=['A', 'B', 'C'],
    default=['A', 'B', 'C'],
    key="category_filter"
)

# Feature selection
feature_columns = [col for col in df.columns if col.startswith('feature_')]
selected_features = st.sidebar.multiselect(
    "Select Features for Analysis",
    options=feature_columns,
    default=feature_columns[:5],
    key="feature_selection"
)

# Analysis type
analysis_type = st.sidebar.selectbox(
    "Analysis Type",
    ["Overview", "PCA Analysis", "Feature Correlation"],
    key="analysis_type"
)

# Only filter data when selections change
filter_key = f"{tuple(selected_categories)}_{tuple(selected_features)}"
if filter_key not in st.session_state.analysis_cache:
    with st.spinner("Filtering data..."):
        start_time = time.time()
        filtered_df = filter_data(df, selected_categories, None)
        filter_time = time.time() - start_time
        st.session_state.analysis_cache[filter_key] = filtered_df
        st.sidebar.info(f"Filter time: {filter_time:.3f}s")
else:
    filtered_df = st.session_state.analysis_cache[filter_key]
    st.sidebar.info("Filtered data from cache")

# Display metrics
col1, col2, col3, col4 = st.columns(4)
with col1:
    st.metric("Total Records", f"{len(df):,}")
with col2:
    st.metric("Filtered Records", f"{len(filtered_df):,}")
with col3:
    st.metric("Selected Features", len(selected_features))
with col4:
    st.metric("Cache Entries", len(st.session_state.analysis_cache))

# Main content based on analysis type
if analysis_type == "Overview":
    st.subheader("📊 Dataset Overview")
    
    # Use columns for efficient layout
    col1, col2 = st.columns(2)
    
    with col1:
        # Category distribution
        category_counts = filtered_df['category'].value_counts()
        fig_pie = px.pie(
            values=category_counts.values,
            names=category_counts.index,
            title="Category Distribution"
        )
        st.plotly_chart(fig_pie, use_container_width=True, key="category_pie")
    
    with col2:
        # Target distribution
        target_counts = filtered_df['target'].value_counts()
        fig_bar = px.bar(
            x=target_counts.index,
            y=target_counts.values,
            title="Target Distribution"
        )
        st.plotly_chart(fig_bar, use_container_width=True, key="target_bar")

elif analysis_type == "PCA Analysis":
    if len(selected_features) >= 2:
        st.subheader("🎯 Principal Component Analysis")
        
        # Cached PCA computation
        pca_key = f"pca_{filter_key}_{tuple(selected_features)}"
        if pca_key not in st.session_state.analysis_cache:
            with st.spinner("Computing PCA..."):
                start_time = time.time()
                pca_result, variance_ratio = compute_pca_analysis(
                    filtered_df[selected_features], 
                    n_components=min(3, len(selected_features))
                )
                pca_time = time.time() - start_time
                st.session_state.analysis_cache[pca_key] = (pca_result, variance_ratio)
                st.sidebar.info(f"PCA time: {pca_time:.3f}s")
        else:
            pca_result, variance_ratio = st.session_state.analysis_cache[pca_key]
            st.sidebar.info("PCA from cache")
        
        # Explained variance
        col1, col2 = st.columns(2)
        
        with col1:
            variance_df = pd.DataFrame({
                'Component': [f'PC{i+1}' for i in range(len(variance_ratio))],
                'Explained_Variance': variance_ratio
            })
            fig_var = px.bar(
                variance_df,
                x='Component',
                y='Explained_Variance',
                title='Explained Variance by Component'
            )
            st.plotly_chart(fig_var, use_container_width=True)
        
        with col2:
            # PCA scatter plot
            pca_df = pd.DataFrame(pca_result[:, :2], columns=['PC1', 'PC2'])
            pca_df['category'] = filtered_df['category'].values
            pca_df['target'] = filtered_df['target'].values
            
            fig_pca = px.scatter(
                pca_df,
                x='PC1',
                y='PC2',
                color='category',
                symbol='target',
                title='PCA Visualization'
            )
            st.plotly_chart(fig_pca, use_container_width=True)
    else:
        st.warning("Please select at least 2 features for PCA analysis")

else:  # Feature Correlation
    if len(selected_features) >= 2:
        st.subheader("🔗 Feature Correlation Analysis")
        
        # Cached correlation computation
        corr_key = f"corr_{filter_key}_{tuple(selected_features)}"
        if corr_key not in st.session_state.analysis_cache:
            with st.spinner("Computing correlations..."):
                start_time = time.time()
                correlation_matrix = filtered_df[selected_features].corr()
                corr_time = time.time() - start_time
                st.session_state.analysis_cache[corr_key] = correlation_matrix
                st.sidebar.info(f"Correlation time: {corr_time:.3f}s")
        else:
            correlation_matrix = st.session_state.analysis_cache[corr_key]
            st.sidebar.info("Correlation from cache")
        
        # Correlation heatmap
        fig_corr = px.imshow(
            correlation_matrix,
            text_auto=True,
            aspect="auto",
            title="Feature Correlation Matrix"
        )
        st.plotly_chart(fig_corr, use_container_width=True)
    else:
        st.warning("Please select at least 2 features for correlation analysis")

# Performance summary
st.sidebar.markdown("---")
st.sidebar.subheader("⚡ Performance Summary")
st.sidebar.info(f"Cache size: {len(st.session_state.analysis_cache)} entries")
st.sidebar.info(f"Total records: {len(df):,}")

# Clear cache button
if st.sidebar.button("Clear Cache"):
    st.session_state.analysis_cache.clear()
    st.sidebar.success("Cache cleared!")
'''

# Save optimized dashboard
with open(f"{dashboard_dir}/optimized_dashboard.py", "w") as f:
    f.write(optimized_dashboard_code)

print("⚡ Optimized Performance Dashboard Created!")
print(f"   📁 File: {dashboard_dir}/optimized_dashboard.py")
print("   🚀 Features: Caching, session state, performance monitoring")
print()

# 3. Deployment configurations
print("🚀 DEPLOYMENT CONFIGURATIONS")
print("=" * 40)

# Create Streamlit config
config_toml = '''
[global]
dataFrameSerialization = "arrow"

[server]
runOnSave = true
port = 8501
enableCORS = false
enableXsrfProtection = false

[browser]
gatherUsageStats = false

[theme]
primaryColor = "#1f77b4"
backgroundColor = "#ffffff"
secondaryBackgroundColor = "#f0f2f6"
textColor = "#262730"
'''

# Save config
config_dir = f"{dashboard_dir}/.streamlit"
os.makedirs(config_dir, exist_ok=True)
with open(f"{config_dir}/config.toml", "w") as f:
    f.write(config_toml)

# Create requirements file
requirements_txt = '''
streamlit>=1.48.0
pandas>=2.0.0
numpy>=1.24.0
plotly>=5.0.0
scikit-learn>=1.3.0
'''

with open(f"{dashboard_dir}/requirements.txt", "w") as f:
    f.write(requirements_txt)

# Create deployment script
deploy_script = '''#!/bin/bash
# Streamlit Dashboard Deployment Script

echo "🚀 Starting Streamlit Dashboard Deployment"

# Check if virtual environment exists
if [ ! -d "venv" ]; then
    echo "📦 Creating virtual environment..."
    python -m venv venv
fi

# Activate virtual environment
source venv/bin/activate

# Install requirements
echo "📥 Installing dependencies..."
pip install -r requirements.txt

# Set environment variables
export STREAMLIT_SERVER_ENABLE_CORS=false
export STREAMLIT_SERVER_ENABLE_XSRF_PROTECTION=false

echo "✅ Setup complete!"
echo "🌐 Available dashboards:"
echo "   • streamlit run sales_dashboard.py"
echo "   • streamlit run scientific_explorer.py"
echo "   • streamlit run multipage_dashboard.py"
echo "   • streamlit run optimized_dashboard.py"
'''

with open(f"{dashboard_dir}/deploy.sh", "w") as f:
    f.write(deploy_script)

# Make script executable
os.chmod(f"{dashboard_dir}/deploy.sh", 0o755)

print("📦 Deployment Files Created:")
print("   ✅ .streamlit/config.toml - Streamlit configuration")
print("   ✅ requirements.txt - Python dependencies")
print("   ✅ deploy.sh - Deployment script")
print()

print("🌐 DEPLOYMENT OPTIONS:")
print("=" * 30)
print("1. 🖥️  Local Development:")
print("   streamlit run dashboard_name.py")
print()
print("2. ☁️  Streamlit Cloud:")
print("   • Push to GitHub repository")
print("   • Connect at share.streamlit.io")
print("   • Automatic deployment from main branch")
print()
print("3. 🐳 Docker Deployment:")
print("   • Create Dockerfile with Streamlit")
print("   • Deploy to cloud platforms (AWS, GCP, Azure)")
print()
print("4. 🚀 Advanced Hosting:")
print("   • Heroku with Procfile")
print("   • AWS EC2 with reverse proxy")
print("   • Kubernetes cluster deployment")

# Performance optimization summary
print("\n⚡ PERFORMANCE OPTIMIZATION SUMMARY:")
print("=" * 45)
optimization_tips = [
    "✅ Use @st.cache_data for data loading and transformations",
    "✅ Use @st.cache_resource for ML models and connections",
    "✅ Implement session state for user interactions",
    "✅ Use st.columns() for efficient layouts",
    "✅ Add loading spinners for better UX",
    "✅ Monitor performance with timing metrics",
    "✅ Clear cache periodically for memory management",
    "✅ Use lazy loading for large datasets",
    "✅ Optimize Plotly charts with sampling for large data",
    "✅ Configure page layout for better performance"
]

for tip in optimization_tips:
    print(f"   {tip}")

print("\n🎯 Ready for production-grade dashboard deployment!")

⚡ STREAMLIT PERFORMANCE OPTIMIZATION TECHNIQUES
📋 Caching Best Practices:
✅ Use @st.cache_data for data loading and transformations
✅ Use @st.cache_resource for ML models and database connections
✅ Use st.session_state for user interaction state
✅ Cache expensive computations (PCA, clustering, etc.)
❌ Don't cache Streamlit widgets or UI elements

⚡ Optimized Performance Dashboard Created!
   📁 File: /Users/sanjeevadodlapati/Downloads/Repos/DataVisualization-Comprehensive-Tutorial/outputs/dashboards/optimized_dashboard.py
   🚀 Features: Caching, session state, performance monitoring

🚀 DEPLOYMENT CONFIGURATIONS
📦 Deployment Files Created:
   ✅ .streamlit/config.toml - Streamlit configuration
   ✅ requirements.txt - Python dependencies
   ✅ deploy.sh - Deployment script

🌐 DEPLOYMENT OPTIONS:
1. 🖥️  Local Development:
   streamlit run dashboard_name.py

2. ☁️  Streamlit Cloud:
   • Push to GitHub repository
   • Connect at share.streamlit.io
   • Automatic deployment from main branch

3.

## 🎉 Module 12 Complete: Interactive Dashboard Mastery

### 📋 What You've Accomplished

✅ **Streamlit Fundamentals**: Page configuration, layouts, widgets, and styling  
✅ **Interactive Components**: Filters, selectors, date pickers, and real-time updates  
✅ **Advanced Visualizations**: Plotly integration with dynamic chart updates  
✅ **Multi-Page Applications**: Navigation, session state, and user experience  
✅ **Performance Optimization**: Caching strategies, memory management, monitoring  
✅ **Deployment Ready**: Configuration files, requirements, deployment scripts  

### 🏆 Dashboard Portfolio Created

**4 Complete Interactive Dashboards:**
1. **Sales Analytics Dashboard** - KPIs, filters, time series analysis
2. **Scientific Data Explorer** - ML integration, PCA, clustering analysis  
3. **Multi-Page Dashboard** - Navigation, tabs, settings management
4. **Optimized Performance Dashboard** - Caching, large datasets, monitoring

### 💡 Key Skills Mastered

```python
# Essential Streamlit patterns you've learned:
STREAMLIT_SKILLS = {
    'Caching': '@st.cache_data, @st.cache_resource',
    'State Management': 'st.session_state for persistence',
    'Layout Control': 'st.columns(), st.container(), st.expander()',
    'Interactive Widgets': 'sliders, selectors, date pickers',
    'Performance': 'monitoring, optimization, memory management',
    'Deployment': 'configuration, requirements, cloud hosting'
}
```

### 🚀 Production Deployment Ready

Your dashboards are now equipped with:
- **Professional configuration** (config.toml)
- **Dependency management** (requirements.txt)  
- **Deployment automation** (deploy.sh)
- **Performance monitoring** and optimization
- **Scalable architecture** for large datasets

### 📈 Tutorial Progress: 10/14 Complete (71.4%)

**🎯 Next Module**: Data Storytelling and Narrative Design - Learn to craft compelling data narratives that engage and persuade audiences!

**Ready to transform your interactive dashboards into powerful storytelling tools!** 🎬

In [5]:
# Module 12 Completion Summary
print("🎊 MODULE 12: INTERACTIVE DASHBOARDS & STREAMLIT COMPLETE! 🎊")
print("=" * 70)
print()

# Dashboards created
dashboards_created = [
    "📊 Sales Analytics Dashboard - Comprehensive business metrics with filters",
    "🔬 Scientific Data Explorer - ML-powered data analysis and visualization",
    "📚 Multi-Page Dashboard - Navigation, tabs, and settings management",
    "⚡ Optimized Performance Dashboard - Large dataset handling with caching"
]

print("🎯 INTERACTIVE DASHBOARDS CREATED:")
for dashboard in dashboards_created:
    print(f"   ✅ {dashboard}")

print()
print("⚡ ADVANCED FEATURES IMPLEMENTED:")
advanced_features = [
    "✅ Real-time data filtering and interactive controls",
    "✅ Advanced caching strategies (@st.cache_data, @st.cache_resource)",
    "✅ Session state management for user persistence", 
    "✅ Performance monitoring and optimization techniques",
    "✅ Multi-page navigation with seamless UX",
    "✅ Plotly integration for dynamic visualizations",
    "✅ Machine learning integration (PCA, clustering)",
    "✅ Production deployment configuration"
]

for feature in advanced_features:
    print(f"   {feature}")

print()
print("📁 FILES GENERATED:")
import os
dashboard_files = []
dashboard_dir = '/Users/sanjeevadodlapati/Downloads/Repos/DataVisualization-Comprehensive-Tutorial/outputs/dashboards'
for file in os.listdir(dashboard_dir):
    if file.endswith('.py') or file.endswith('.txt') or file.endswith('.sh'):
        file_size = os.path.getsize(os.path.join(dashboard_dir, file)) / 1024
        dashboard_files.append(f"📄 {file:<30} ({file_size:.1f} KB)")

for file_info in dashboard_files:
    print(f"   {file_info}")

print(f"\n   📁 Configuration: .streamlit/config.toml")

print()
print("🚀 DEPLOYMENT OPTIONS:")
deployment_options = [
    "🖥️  Local: streamlit run dashboard_name.py",
    "☁️  Streamlit Cloud: share.streamlit.io (GitHub integration)",
    "🐳 Docker: Containerized deployment for cloud platforms",
    "🌐 Production: Heroku, AWS, GCP, Azure hosting"
]

for option in deployment_options:
    print(f"   {option}")

print()
print("📈 TUTORIAL PROGRESS:")
print("   📚 Module 12/14 Complete (85.7% done)")
print("   🎯 Next: Module 13 - Data Storytelling and Narrative Design")
print()
print("🎪 You've mastered interactive dashboard development!")
print("   Next: Learn to craft compelling data stories that persuade and engage! 🎬")

# Performance summary
print(f"\n💾 TOTAL PROJECT SIZE:")
total_size = 0
for root, dirs, files in os.walk('/Users/sanjeevadodlapati/Downloads/Repos/DataVisualization-Comprehensive-Tutorial/outputs'):
    for file in files:
        total_size += os.path.getsize(os.path.join(root, file))

print(f"   📊 Generated content: {total_size / (1024*1024):.1f} MB")
print(f"   📄 Interactive dashboards: 4 complete applications")
print(f"   ⚡ Performance optimized for production deployment")

🎊 MODULE 12: INTERACTIVE DASHBOARDS & STREAMLIT COMPLETE! 🎊

🎯 INTERACTIVE DASHBOARDS CREATED:
   ✅ 📊 Sales Analytics Dashboard - Comprehensive business metrics with filters
   ✅ 🔬 Scientific Data Explorer - ML-powered data analysis and visualization
   ✅ 📚 Multi-Page Dashboard - Navigation, tabs, and settings management
   ✅ ⚡ Optimized Performance Dashboard - Large dataset handling with caching

⚡ ADVANCED FEATURES IMPLEMENTED:
   ✅ Real-time data filtering and interactive controls
   ✅ Advanced caching strategies (@st.cache_data, @st.cache_resource)
   ✅ Session state management for user persistence
   ✅ Performance monitoring and optimization techniques
   ✅ Multi-page navigation with seamless UX
   ✅ Plotly integration for dynamic visualizations
   ✅ Machine learning integration (PCA, clustering)
   ✅ Production deployment configuration

📁 FILES GENERATED:
   📄 scientific_explorer.py         (8.8 KB)
   📄 sales_dashboard.py             (7.3 KB)
   📄 requirements.txt               

## 📱 In-Notebook Interactive Dashboards

While Streamlit dashboards are great for web deployment, we can also create interactive dashboards that render directly in Jupyter notebooks using **ipywidgets** and **Plotly**. This gives you the best of both worlds!

In [6]:
# Interactive Dashboard Components with ipywidgets
import ipywidgets as widgets
from IPython.display import display, clear_output
import plotly.graph_objects as go
from plotly.subplots import make_subplots

print("📱 Creating In-Notebook Interactive Dashboard...")

# Create interactive widgets
date_picker = widgets.DatePicker(
    value=sales_data['date'].max().date(),
    description='End Date:',
    style={'description_width': 'initial'}
)

region_selector = widgets.SelectMultiple(
    options=list(sales_data['region'].unique()),
    value=list(sales_data['region'].unique()),
    description='Regions:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(height='100px')
)

product_selector = widgets.SelectMultiple(
    options=list(sales_data['product'].unique()),
    value=list(sales_data['product'].unique()),
    description='Products:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(height='100px')
)

days_back_slider = widgets.IntSlider(
    value=90,
    min=30,
    max=365,
    step=30,
    description='Days Back:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='400px')
)

chart_type_dropdown = widgets.Dropdown(
    options=['Line Chart', 'Bar Chart', 'Area Chart'],
    value='Line Chart',
    description='Chart Type:',
    style={'description_width': 'initial'}
)

# Create output widget for the dashboard
output_widget = widgets.Output()

def update_dashboard(*args):
    """Function to update dashboard when widgets change"""
    with output_widget:
        clear_output(wait=True)
        
        # Filter data based on widget selections
        end_date = pd.to_datetime(date_picker.value)
        start_date = end_date - pd.Timedelta(days=days_back_slider.value)
        
        filtered_data = sales_data[
            (sales_data['date'] >= start_date) &
            (sales_data['date'] <= end_date) &
            (sales_data['region'].isin(region_selector.value)) &
            (sales_data['product'].isin(product_selector.value))
        ]
        
        if len(filtered_data) == 0:
            print("⚠️ No data matches the selected filters")
            return
        
        # Create dashboard layout
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=[
                'Sales Trend Over Time',
                'Sales by Region', 
                'Product Performance',
                'Daily Sales Distribution'
            ],
            specs=[[{"type": "xy"}, {"type": "domain"}],
                   [{"type": "xy"}, {"type": "xy"}]]
        )
        
        # 1. Time series chart (top left)
        daily_sales = filtered_data.groupby('date')['sales'].sum().reset_index()
        
        if chart_type_dropdown.value == 'Line Chart':
            fig.add_trace(
                go.Scatter(x=daily_sales['date'], y=daily_sales['sales'],
                          mode='lines+markers', name='Sales', line=dict(width=3)),
                row=1, col=1
            )
        elif chart_type_dropdown.value == 'Bar Chart':
            fig.add_trace(
                go.Bar(x=daily_sales['date'], y=daily_sales['sales'], name='Sales'),
                row=1, col=1
            )
        else:  # Area Chart
            fig.add_trace(
                go.Scatter(x=daily_sales['date'], y=daily_sales['sales'],
                          fill='tozeroy', name='Sales'),
                row=1, col=1
            )
        
        # 2. Regional pie chart (top right)
        regional_sales = filtered_data.groupby('region')['sales'].sum()
        fig.add_trace(
            go.Pie(labels=regional_sales.index, values=regional_sales.values, 
                   name="Regional Sales"),
            row=1, col=2
        )
        
        # 3. Product bar chart (bottom left)
        product_sales = filtered_data.groupby('product')['sales'].sum().sort_values(ascending=True)
        fig.add_trace(
            go.Bar(x=product_sales.values, y=product_sales.index, 
                   orientation='h', name='Product Sales'),
            row=2, col=1
        )
        
        # 4. Sales distribution histogram (bottom right)
        fig.add_trace(
            go.Histogram(x=filtered_data['sales'], nbinsx=20, 
                        name='Sales Distribution'),
            row=2, col=2
        )
        
        # Update layout
        fig.update_layout(
            height=800,
            title_text=f"Interactive Sales Dashboard ({len(filtered_data):,} records)",
            title_x=0.5,
            showlegend=False
        )
        
        # Display metrics
        total_sales = filtered_data['sales'].sum()
        avg_daily_sales = daily_sales['sales'].mean()
        max_daily_sales = daily_sales['sales'].max()
        
        print("📊 KEY METRICS")
        print("=" * 40)
        print(f"💰 Total Sales: ${total_sales:,.0f}")
        print(f"📈 Average Daily: ${avg_daily_sales:,.0f}")
        print(f"🔥 Peak Daily: ${max_daily_sales:,.0f}")
        print(f"📅 Date Range: {start_date.date()} to {end_date.date()}")
        print(f"📊 Records: {len(filtered_data):,}")
        print()
        
        # Show the interactive plot
        fig.show()

# Connect widgets to update function
date_picker.observe(update_dashboard, names='value')
region_selector.observe(update_dashboard, names='value')
product_selector.observe(update_dashboard, names='value')
days_back_slider.observe(update_dashboard, names='value')
chart_type_dropdown.observe(update_dashboard, names='value')

# Create dashboard layout
dashboard_title = widgets.HTML(
    value="<h2>📊 Interactive Sales Dashboard</h2><p>Use the controls below to filter and explore the data:</p>",
    layout=widgets.Layout(margin='0 0 20px 0')
)

controls_box = widgets.VBox([
    dashboard_title,
    widgets.HBox([
        widgets.VBox([date_picker, days_back_slider], layout=widgets.Layout(margin='0 20px 0 0')),
        widgets.VBox([region_selector], layout=widgets.Layout(margin='0 20px 0 0')),
        widgets.VBox([product_selector], layout=widgets.Layout(margin='0 20px 0 0')),
        widgets.VBox([chart_type_dropdown])
    ]),
    output_widget
], layout=widgets.Layout(border='2px solid #ddd', padding='20px', margin='10px'))

# Display the dashboard
display(controls_box)

# Initial load
update_dashboard()

print("✅ Interactive Dashboard Created!")
print("🎮 Use the controls above to filter and explore the data in real-time!")

📱 Creating In-Notebook Interactive Dashboard...


VBox(children=(HTML(value='<h2>📊 Interactive Sales Dashboard</h2><p>Use the controls below to filter and explo…

✅ Interactive Dashboard Created!
🎮 Use the controls above to filter and explore the data in real-time!


In [None]:
# Advanced Scientific Dashboard with ML Integration
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

print("\n🔬 Creating Advanced Scientific Dashboard...")

# Scientific dashboard widgets
dataset_dropdown = widgets.Dropdown(
    options=['Iris Dataset', 'Customer Analytics'],
    value='Iris Dataset',
    description='Dataset:',
    style={'description_width': 'initial'}
)

analysis_tabs = widgets.Tab()
analysis_tabs.children = [
    widgets.VBox([]),  # PCA tab
    widgets.VBox([]),  # Clustering tab  
    widgets.VBox([])   # Statistics tab
]
analysis_tabs.titles = ['PCA Analysis', 'Clustering', 'Statistics']

# PCA controls
pca_components_slider = widgets.IntSlider(
    value=2,
    min=2,
    max=4,
    description='Components:',
    style={'description_width': 'initial'}
)

# Clustering controls
n_clusters_slider = widgets.IntSlider(
    value=3,
    min=2,
    max=8,
    description='Clusters:',
    style={'description_width': 'initial'}
)

feature_selector = widgets.SelectMultiple(
    options=[],
    value=[],
    description='Features:',
    style={'description_width': 'initial'},
    layout=widgets.Layout(height='120px')
)

# Scientific output widget
scientific_output = widgets.Output()

def update_scientific_dashboard(*args):
    """Update scientific dashboard based on widget selections"""
    with scientific_output:
        clear_output(wait=True)
        
        # Select dataset
        if dataset_dropdown.value == 'Iris Dataset':
            data = iris_df.copy()
            numeric_cols = ['sepal length (cm)', 'sepal width (cm)', 
                          'petal length (cm)', 'petal width (cm)']
            target_col = 'species'
        else:  # Customer Analytics
            data = customer_data.copy()
            numeric_cols = ['age', 'income', 'spending', 'satisfaction']
            target_col = 'segment'
        
        # Update feature selector options
        if list(feature_selector.options) != numeric_cols:
            feature_selector.options = numeric_cols
            feature_selector.value = numeric_cols[:4] if len(numeric_cols) >= 4 else numeric_cols
        
        selected_features = list(feature_selector.value)
        if len(selected_features) < 2:
            print("⚠️ Please select at least 2 features for analysis")
            return
        
        # Prepare data
        X = data[selected_features]
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # Create subplots
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=[
                'PCA Visualization',
                'K-Means Clustering',
                'Feature Correlation',
                'Distribution Comparison'
            ],
            specs=[[{"type": "xy"}, {"type": "xy"}],
                   [{"type": "xy"}, {"type": "xy"}]]
        )
        
        # 1. PCA Analysis
        n_components = min(pca_components_slider.value, len(selected_features))
        pca = PCA(n_components=n_components)
        X_pca = pca.fit_transform(X_scaled)
        
        pca_df = pd.DataFrame(X_pca[:, :2], columns=['PC1', 'PC2'])
        pca_df[target_col] = data[target_col].values
        
        for category in pca_df[target_col].unique():
            mask = pca_df[target_col] == category
            fig.add_trace(
                go.Scatter(
                    x=pca_df.loc[mask, 'PC1'],
                    y=pca_df.loc[mask, 'PC2'],
                    mode='markers',
                    name=f'PCA - {category}',
                    legendgroup='pca'
                ),
                row=1, col=1
            )
        
        # 2. K-Means Clustering
        kmeans = KMeans(n_clusters=n_clusters_slider.value, random_state=42)
        clusters = kmeans.fit_predict(X_scaled)
        
        cluster_df = pd.DataFrame(X_pca[:, :2], columns=['PC1', 'PC2'])
        cluster_df['cluster'] = [f'Cluster {i}' for i in clusters]
        
        for cluster in cluster_df['cluster'].unique():
            mask = cluster_df['cluster'] == cluster
            fig.add_trace(
                go.Scatter(
                    x=cluster_df.loc[mask, 'PC1'],
                    y=cluster_df.loc[mask, 'PC2'],
                    mode='markers',
                    name=cluster,
                    legendgroup='clustering'
                ),
                row=1, col=2
            )
        
        # Add cluster centers
        centers_pca = pca.transform(scaler.inverse_transform(kmeans.cluster_centers_))
        fig.add_trace(
            go.Scatter(
                x=centers_pca[:, 0],
                y=centers_pca[:, 1],
                mode='markers',
                marker=dict(symbol='x', size=15, color='black'),
                name='Centers',
                legendgroup='clustering'
            ),
            row=1, col=2
        )
        
        # 3. Feature Correlation Heatmap
        if len(selected_features) >= 2:
            corr_matrix = X.corr()
            fig.add_trace(
                go.Heatmap(
                    z=corr_matrix.values,
                    x=corr_matrix.columns,
                    y=corr_matrix.columns,
                    colorscale='RdBu',
                    zmid=0,
                    showscale=True
                ),
                row=2, col=1
            )
        
        # 4. Distribution comparison (box plots)
        if len(selected_features) >= 1:
            feature_to_plot = selected_features[0]
            for category in data[target_col].unique():
                category_data = data[data[target_col] == category][feature_to_plot]
                fig.add_trace(
                    go.Box(
                        y=category_data,
                        name=category,
                        legendgroup='distributions'
                    ),
                    row=2, col=2
                )
        
        # Update layout
        fig.update_layout(
            height=800,
            title_text=f"Scientific Data Analysis: {dataset_dropdown.value}",
            title_x=0.5
        )
        
        # Display analysis metrics
        print("🔬 ANALYSIS RESULTS")
        print("=" * 50)
        print(f"📊 Dataset: {dataset_dropdown.value}")
        print(f"🔍 Features Analyzed: {len(selected_features)}")
        print(f"📈 PCA Components: {n_components}")
        print(f"🎯 Clusters: {n_clusters_slider.value}")
        print()
        
        print("📊 PCA Explained Variance:")
        for i, var_ratio in enumerate(pca.explained_variance_ratio_):
            print(f"   PC{i+1}: {var_ratio:.3f} ({var_ratio*100:.1f}%)")
        
        print(f"   Cumulative: {pca.explained_variance_ratio_.sum():.3f} ({pca.explained_variance_ratio_.sum()*100:.1f}%)")
        print()
        
        print("🎯 Clustering Quality:")
        silhouette_avg = "N/A"  # Would need silhouette_score for full implementation
        print(f"   Inertia: {kmeans.inertia_:.2f}")
        print(f"   Iterations: {kmeans.n_iter_}")
        print()
        
        # Show the plot
        fig.show()

# Connect scientific widgets
dataset_dropdown.observe(update_scientific_dashboard, names='value')
pca_components_slider.observe(update_scientific_dashboard, names='value')
n_clusters_slider.observe(update_scientific_dashboard, names='value')
feature_selector.observe(update_scientific_dashboard, names='value')

# Scientific dashboard layout
scientific_title = widgets.HTML(
    value="<h2>🔬 Scientific Data Analysis Dashboard</h2><p>Explore datasets with PCA and clustering analysis:</p>",
    layout=widgets.Layout(margin='0 0 20px 0')
)

scientific_controls = widgets.VBox([
    scientific_title,
    widgets.HBox([
        widgets.VBox([dataset_dropdown, pca_components_slider], 
                    layout=widgets.Layout(margin='0 20px 0 0')),
        widgets.VBox([n_clusters_slider], 
                    layout=widgets.Layout(margin='0 20px 0 0')),
        widgets.VBox([feature_selector])
    ]),
    scientific_output
], layout=widgets.Layout(border='2px solid #ddd', padding='20px', margin='20px 0'))

# Display scientific dashboard
display(scientific_controls)

# Initial load for scientific dashboard
update_scientific_dashboard()

print("✅ Scientific Dashboard Created!")
print("🧪 Explore PCA, clustering, and correlation analysis interactively!")


🔬 Creating Advanced Scientific Dashboard...


VBox(children=(HTML(value='<h2>🔬 Scientific Data Analysis Dashboard</h2><p>Explore datasets with PCA and clust…

✅ Scientific Dashboard Created!
🧪 Explore PCA, clustering, and correlation analysis interactively!


## 🎯 Dashboard Approaches Comparison

We've covered both **file-based dashboards** (Streamlit) and **notebook-based dashboards** (ipywidgets):

### 🌐 Streamlit Dashboards (File-Based)
**Best for:** Production deployment, sharing with stakeholders, web applications
- ✅ Professional web interface
- ✅ Easy deployment to cloud platforms
- ✅ Multi-page applications
- ✅ Built-in caching and performance optimization
- ✅ User authentication and session management

### 📓 IPywidgets Dashboards (Notebook-Based)
**Best for:** Exploratory analysis, research, interactive presentations
- ✅ Immediate rendering in notebooks
- ✅ No separate files needed
- ✅ Perfect for Jupyter environments
- ✅ Great for prototyping and exploration
- ✅ Integrates seamlessly with analysis workflow

### 🎮 Interactive Features Available:
- **Real-time filtering** (dates, categories, numerical ranges)
- **Dynamic chart types** (scatter, line, bar, heatmap)
- **Machine learning integration** (PCA, clustering)
- **Statistical analysis** (correlations, distributions)
- **Multi-dataset support** (easy switching between datasets)

## 🚀 Next Steps & Advanced Topics

### 📈 Performance Optimization
- Implement data sampling for large datasets
- Use progressive loading for complex visualizations
- Optimize memory usage with chunked processing

### 🎨 Advanced Styling
- Custom CSS for Streamlit applications
- Professional dashboard themes
- Responsive design for mobile devices

### 🔗 Integration Opportunities
- Database connections (SQL, MongoDB)
- API integrations for live data
- Cloud storage integration (AWS S3, Google Cloud)
- Version control for dashboard configurations

### 📊 Enterprise Features
- User authentication and role-based access
- Audit logging and usage analytics
- Automated report generation
- Email notifications and alerts

## 🎓 Module 12 Complete!

You now have the skills to create:
- ✅ Professional Streamlit web dashboards
- ✅ Interactive notebook-based dashboards
- ✅ Scientific analysis tools with ML integration
- ✅ Performance-optimized applications
- ✅ Deployment-ready configurations

**Ready for Module 13: Data Storytelling & Narrative Design! 📖**