# Week 9 - Part 2: Advanced Database Integration

**Course:** Python Data Analysis for Business Intelligence  
**Week:** 9 | **Session:** Thursday | **Part:** 2 of 3  
**Duration:** 20 minutes | **Date:** June 5, 2025

## Learning Objectives
By the end of this session, you will be able to:
- Implement advanced Supabase integration patterns for production applications
- Handle real-time data streaming and live updates
- Optimize database connections with pooling and caching strategies
- Implement robust error handling and fallback mechanisms
- Design secure authentication and authorization workflows

---

## 🎯 Business Context: Production-Grade Data Integration

**Enterprise Challenge**: Olist's business intelligence platform serves 500+ concurrent users across multiple time zones, processing real-time data from:

### Data Sources:
- **📦 Order Management**: 10,000+ daily transactions
- **👥 Customer Database**: 2M+ customer profiles with real-time updates
- **💰 Payment Systems**: Real-time payment processing and fraud detection
- **📊 Analytics Events**: User behavior tracking and conversion metrics

### Performance Requirements:
- **⚡ Response Time**: <2 seconds for dashboard loads
- **🔄 Real-time Updates**: Live data refreshes without page reload
- **📈 Scalability**: Handle 1000+ concurrent dashboard users
- **🛡️ Security**: Row-level security and role-based access

**Architecture Challenge**: Build a robust, scalable database integration that maintains performance under load while ensuring data security and consistency.

**Today's Solution**: Advanced Supabase integration with connection pooling, real-time subscriptions, and enterprise security patterns.

---

## 🛠️ Setup: Production Database Environment

Let's configure a production-ready database integration environment:

In [None]:
# Production-grade imports
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta
import asyncio
import json
import hashlib
import time
from typing import Dict, List, Optional, Union
import logging

# Configure logging for production monitoring
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Course utilities
from Utilities.visualization_helper import set_plotting_style
from Utilities.colab_helper import setup_colab

# Setup
plt, sns = set_plotting_style()
setup_colab()

print("✅ Production database environment ready!")
print("🔐 Security, performance, and reliability patterns loaded")
print("📊 Ready for enterprise-grade data integration")

## 🔗 Section 1: Advanced Supabase Integration (8 minutes)

Let's build a comprehensive Supabase integration with enterprise features:

In [None]:
%%writefile advanced_supabase_integration.py

import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta
import time
import json
import hashlib
from typing import Dict, List, Optional, Union
import logging

# Configure page
st.set_page_config(
    page_title="Advanced Supabase Integration",
    page_icon="🔗",
    layout="wide"
)

# Production-grade Supabase connection manager
class SupabaseConnectionManager:
    """
    Production-grade Supabase connection manager with pooling,
    retry logic, and performance monitoring.
    """
    
    def __init__(self):
        self.connection_pool = {}
        self.connection_stats = {
            'total_queries': 0,
            'failed_queries': 0,
            'avg_response_time': 0,
            'last_connection': None
        }
        self.logger = logging.getLogger(__name__)
    
    @st.cache_resource
    def initialize_connection(_self):
        """
        Initialize Supabase connection with error handling.
        In production, this would use actual Supabase client.
        """
        try:
            # Simulate connection initialization
            _self.connection_stats['last_connection'] = datetime.now()
            _self.logger.info("Supabase connection initialized successfully")
            
            # In production:
            # from supabase import create_client, Client
            # url = st.secrets["supabase"]["url"]
            # key = st.secrets["supabase"]["anon_key"]
            # supabase: Client = create_client(url, key)
            # return supabase
            
            return {'status': 'connected', 'client': 'mock_client'}
            
        except Exception as e:
            _self.logger.error(f"Failed to initialize Supabase connection: {e}")
            raise
    
    def execute_query(self, query: str, params: Dict = None, retry_count: int = 3) -> pd.DataFrame:
        """
        Execute database query with retry logic and performance monitoring.
        """
        start_time = time.time()
        
        for attempt in range(retry_count):
            try:
                self.connection_stats['total_queries'] += 1
                
                # Simulate query execution
                # In production, this would execute actual Supabase queries
                result_data = self._simulate_query_result(query, params)
                
                # Update performance stats
                response_time = time.time() - start_time
                self._update_performance_stats(response_time)
                
                self.logger.info(f"Query executed successfully in {response_time:.3f}s")
                return result_data
                
            except Exception as e:
                self.connection_stats['failed_queries'] += 1
                self.logger.warning(f"Query attempt {attempt + 1} failed: {e}")
                
                if attempt == retry_count - 1:
                    self.logger.error(f"Query failed after {retry_count} attempts")
                    raise
                
                time.sleep(2 ** attempt)  # Exponential backoff
    
    def _simulate_query_result(self, query: str, params: Dict = None) -> pd.DataFrame:
        """
        Simulate database query results for demonstration.
        In production, this would be replaced with actual Supabase queries.
        """
        np.random.seed(42)
        
        if 'orders' in query.lower():
            # Simulate orders data
            n_records = params.get('limit', 1000) if params else 1000
            
            dates = pd.date_range('2024-01-01', periods=n_records, freq='H')
            
            data = {
                'order_id': [f"ORD_{i:08d}" for i in range(1, n_records + 1)],
                'customer_id': [f"CUST_{np.random.randint(1, 10000):06d}" for _ in range(n_records)],
                'order_date': dates,
                'total_amount': np.random.exponential(120, n_records),
                'status': np.random.choice(['completed', 'processing', 'shipped'], n_records, p=[0.7, 0.2, 0.1]),
                'customer_satisfaction': np.random.choice([1, 2, 3, 4, 5], n_records, p=[0.05, 0.1, 0.2, 0.35, 0.3]),
                'product_category': np.random.choice([
                    'Electronics', 'Fashion', 'Home & Garden', 'Books', 'Sports'
                ], n_records),
                'seller_state': np.random.choice(['SP', 'RJ', 'MG', 'RS', 'PR'], n_records)
            }
            
            return pd.DataFrame(data)
            
        elif 'customers' in query.lower():
            # Simulate customer data
            n_records = params.get('limit', 500) if params else 500
            
            data = {
                'customer_id': [f"CUST_{i:06d}" for i in range(1, n_records + 1)],
                'registration_date': pd.date_range('2023-01-01', periods=n_records, freq='D'),
                'total_orders': np.random.poisson(5, n_records),
                'lifetime_value': np.random.exponential(500, n_records),
                'customer_segment': np.random.choice(['Premium', 'Standard', 'Budget'], n_records, p=[0.2, 0.5, 0.3]),
                'state': np.random.choice(['SP', 'RJ', 'MG', 'RS', 'PR'], n_records),
                'active': np.random.choice([True, False], n_records, p=[0.8, 0.2])
            }
            
            return pd.DataFrame(data)
            
        else:
            # Default simulation
            return pd.DataFrame({'message': ['Query executed successfully']})
    
    def _update_performance_stats(self, response_time: float):
        """
        Update performance statistics.
        """
        current_avg = self.connection_stats['avg_response_time']
        total_queries = self.connection_stats['total_queries']
        
        # Calculate rolling average
        if total_queries == 1:
            self.connection_stats['avg_response_time'] = response_time
        else:
            self.connection_stats['avg_response_time'] = (
                (current_avg * (total_queries - 1) + response_time) / total_queries
            )
    
    def get_connection_health(self) -> Dict:
        """
        Get connection health metrics.
        """
        total_queries = self.connection_stats['total_queries']
        failed_queries = self.connection_stats['failed_queries']
        
        success_rate = ((total_queries - failed_queries) / total_queries * 100) if total_queries > 0 else 100
        
        return {
            'status': 'healthy' if success_rate > 95 else 'degraded' if success_rate > 90 else 'unhealthy',
            'success_rate': success_rate,
            'total_queries': total_queries,
            'failed_queries': failed_queries,
            'avg_response_time': self.connection_stats['avg_response_time'],
            'last_connection': self.connection_stats['last_connection']
        }

# Initialize connection manager
if 'db_manager' not in st.session_state:
    st.session_state.db_manager = SupabaseConnectionManager()

db_manager = st.session_state.db_manager

# Initialize connection
connection = db_manager.initialize_connection()

st.title("🔗 Advanced Supabase Integration")
st.markdown("**Production-grade database integration with monitoring and optimization**")

# Connection Health Dashboard
st.subheader("🏥 Database Health Monitoring")

health_col1, health_col2, health_col3, health_col4 = st.columns(4)

health_data = db_manager.get_connection_health()

with health_col1:
    status_color = {
        'healthy': '🟢',
        'degraded': '🟡', 
        'unhealthy': '🔴'
    }.get(health_data['status'], '⚫')
    
    st.metric(
        f"{status_color} Connection Status",
        health_data['status'].title(),
        f"{health_data['success_rate']:.1f}% success rate"
    )

with health_col2:
    st.metric(
        "⚡ Avg Response Time",
        f"{health_data['avg_response_time']:.3f}s",
        "Target: <0.5s"
    )

with health_col3:
    st.metric(
        "📊 Total Queries",
        f"{health_data['total_queries']:,}",
        f"{health_data['failed_queries']} failed"
    )

with health_col4:
    last_conn = health_data['last_connection']
    time_diff = datetime.now() - last_conn if last_conn else timedelta(0)
    st.metric(
        "🕒 Last Connection",
        f"{time_diff.seconds}s ago" if time_diff.seconds < 60 else f"{time_diff.seconds//60}m ago",
        "Active"
    )

# Real-time Data Loading
st.markdown("---")
st.subheader("📊 Real-time Data Integration")

# Query interface
query_col1, query_col2 = st.columns([2, 1])

with query_col1:
    st.markdown("**Query Builder**")
    
    query_type = st.selectbox(
        "Select Data Source:",
        ['orders', 'customers', 'products', 'reviews']
    )
    
    # Dynamic query parameters based on selection
    if query_type == 'orders':
        date_filter = st.date_input(
            "Orders since:",
            value=datetime.now().date() - timedelta(days=30)
        )
        
        status_filter = st.multiselect(
            "Order status:",
            ['completed', 'processing', 'shipped', 'cancelled'],
            default=['completed', 'processing']
        )
        
        limit = st.slider("Limit results:", 100, 5000, 1000, 100)
        
    elif query_type == 'customers':
        segment_filter = st.multiselect(
            "Customer segment:",
            ['Premium', 'Standard', 'Budget'],
            default=['Premium', 'Standard']
        )
        
        active_only = st.checkbox("Active customers only", value=True)
        limit = st.slider("Limit results:", 100, 2000, 500, 100)

with query_col2:
    st.markdown("**Query Actions**")
    
    if st.button("🔄 Execute Query", type="primary", use_container_width=True):
        with st.spinner("Executing database query..."):
            try:
                # Build query parameters
                query_params = {'limit': limit}
                
                if query_type == 'orders':
                    query_params['date_filter'] = date_filter
                    query_params['status_filter'] = status_filter
                    
                elif query_type == 'customers':
                    query_params['segment_filter'] = segment_filter
                    query_params['active_only'] = active_only
                
                # Execute query
                result_df = db_manager.execute_query(
                    f"SELECT * FROM {query_type}",
                    query_params
                )
                
                # Store results in session state
                st.session_state.query_result = result_df
                st.session_state.query_timestamp = datetime.now()
                
                st.success(f"✅ Query executed successfully! Retrieved {len(result_df):,} records")
                
            except Exception as e:
                st.error(f"❌ Query failed: {str(e)}")
    
    if st.button("💾 Cache Clear", use_container_width=True):
        st.cache_data.clear()
        st.success("Cache cleared!")
    
    if st.button("📊 Performance Stats", use_container_width=True):
        st.info(f"Total queries: {health_data['total_queries']}")

# Display query results
if 'query_result' in st.session_state:
    st.markdown("---")
    st.subheader("📋 Query Results")
    
    result_df = st.session_state.query_result
    query_time = st.session_state.query_timestamp
    
    # Results summary
    result_col1, result_col2, result_col3 = st.columns(3)
    
    with result_col1:
        st.metric("📊 Records Retrieved", f"{len(result_df):,}")
    
    with result_col2:
        st.metric("🕒 Query Time", query_time.strftime('%H:%M:%S'))
    
    with result_col3:
        memory_usage = result_df.memory_usage(deep=True).sum() / 1024 / 1024
        st.metric("💾 Memory Usage", f"{memory_usage:.2f} MB")
    
    # Data preview tabs
    preview_tab1, preview_tab2, preview_tab3 = st.tabs(["📊 Data Preview", "📈 Analytics", "💾 Export"])
    
    with preview_tab1:
        # Interactive data table
        st.dataframe(
            result_df.head(20),
            use_container_width=True,
            height=400
        )
        
        # Search functionality
        search_term = st.text_input("🔍 Search data:")
        if search_term:
            # Simple text search across all columns
            mask = result_df.astype(str).apply(lambda x: x.str.contains(search_term, case=False, na=False)).any(axis=1)
            search_results = result_df[mask]
            st.write(f"Found {len(search_results)} matching records:")
            st.dataframe(search_results.head(10), use_container_width=True)
    
    with preview_tab2:
        # Quick analytics
        if query_type == 'orders' and len(result_df) > 0:
            analytics_col1, analytics_col2 = st.columns(2)
            
            with analytics_col1:
                # Revenue by category
                if 'product_category' in result_df.columns:
                    category_revenue = result_df.groupby('product_category')['total_amount'].sum().reset_index()
                    fig = px.pie(
                        category_revenue,
                        values='total_amount',
                        names='product_category',
                        title="Revenue by Category"
                    )
                    st.plotly_chart(fig, use_container_width=True)
            
            with analytics_col2:
                # Orders by state
                if 'seller_state' in result_df.columns:
                    state_orders = result_df['seller_state'].value_counts().reset_index()
                    state_orders.columns = ['state', 'orders']
                    fig = px.bar(
                        state_orders.head(10),
                        x='state',
                        y='orders',
                        title="Orders by State (Top 10)"
                    )
                    st.plotly_chart(fig, use_container_width=True)
        
        elif query_type == 'customers' and len(result_df) > 0:
            # Customer analytics
            if 'customer_segment' in result_df.columns:
                segment_dist = result_df['customer_segment'].value_counts().reset_index()
                segment_dist.columns = ['segment', 'count']
                
                fig = px.bar(
                    segment_dist,
                    x='segment',
                    y='count',
                    title="Customer Distribution by Segment"
                )
                st.plotly_chart(fig, use_container_width=True)
    
    with preview_tab3:
        # Export options
        st.markdown("**📊 Export Data**")
        
        export_col1, export_col2, export_col3 = st.columns(3)
        
        with export_col1:
            if st.button("📄 Export CSV", use_container_width=True):
                csv_data = result_df.to_csv(index=False)
                st.download_button(
                    label="⬇️ Download CSV",
                    data=csv_data,
                    file_name=f"{query_type}_export_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv",
                    mime="text/csv",
                    use_container_width=True
                )
        
        with export_col2:
            if st.button("📊 Export JSON", use_container_width=True):
                json_data = result_df.to_json(orient='records', date_format='iso')
                st.download_button(
                    label="⬇️ Download JSON",
                    data=json_data,
                    file_name=f"{query_type}_export_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json",
                    mime="application/json",
                    use_container_width=True
                )
        
        with export_col3:
            if st.button("📧 Email Report", use_container_width=True):
                st.success("📧 Report sent to stakeholders!")
                st.info("Email functionality would be implemented here")

# Real-time monitoring sidebar
with st.sidebar:
    st.header("📊 Real-time Monitoring")
    
    # Connection status
    st.markdown("### 🔗 Connection Status")
    st.metric("Database", "Connected", "🟢 Healthy")
    st.metric("Response Time", f"{health_data['avg_response_time']:.3f}s")
    st.metric("Success Rate", f"{health_data['success_rate']:.1f}%")
    
    # Auto-refresh controls
    st.markdown("---")
    st.markdown("### 🔄 Auto-refresh")
    
    auto_refresh = st.checkbox("Enable auto-refresh", value=False)
    refresh_interval = st.selectbox(
        "Refresh interval:",
        ['30 seconds', '1 minute', '5 minutes', '15 minutes'],
        index=1
    )
    
    if auto_refresh:
        st.info(f"⏰ Next refresh in {refresh_interval}")
        # In production, implement actual auto-refresh logic
    
    # Database tools
    st.markdown("---")
    st.markdown("### 🛠️ Database Tools")
    
    if st.button("🔍 Connection Test", use_container_width=True):
        with st.spinner("Testing connection..."):
            time.sleep(1)
            st.success("✅ Connection test passed!")
    
    if st.button("📊 Query Statistics", use_container_width=True):
        st.json(health_data)
    
    if st.button("🔄 Reset Stats", use_container_width=True):
        db_manager.connection_stats = {
            'total_queries': 0,
            'failed_queries': 0,
            'avg_response_time': 0,
            'last_connection': datetime.now()
        }
        st.success("Statistics reset!")
        st.rerun()

# Footer
st.markdown("---")
st.markdown(
    "**🔗 Advanced Supabase Integration** | "
    "Production-grade database connectivity with monitoring | "
    "Built for enterprise scalability and reliability"
)

## ⚡ Section 2: Real-time Data Streaming (7 minutes)

Let's implement real-time data streaming capabilities for live dashboards:

In [None]:
%%writefile realtime_streaming_app.py

import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta
import time
import json
import asyncio
from typing import Dict, List, Callable
import threading
import queue

# Configure page
st.set_page_config(
    page_title="Real-time Data Streaming",
    page_icon="⚡",
    layout="wide"
)

# Real-time data streaming manager
class RealTimeDataStreamer:
    """
    Manages real-time data streaming from Supabase with
    subscription management and data buffering.
    """
    
    def __init__(self):
        self.active_subscriptions = {}
        self.data_buffer = queue.Queue(maxsize=1000)
        self.streaming_stats = {
            'messages_received': 0,
            'last_message': None,
            'connection_status': 'disconnected',
            'subscription_count': 0
        }
    
    def create_subscription(self, table_name: str, callback: Callable = None):
        """
        Create a real-time subscription to a Supabase table.
        In production, this would use actual Supabase real-time subscriptions.
        """
        subscription_id = f"{table_name}_{datetime.now().timestamp()}"
        
        # Simulate subscription creation
        self.active_subscriptions[subscription_id] = {
            'table': table_name,
            'callback': callback,
            'created_at': datetime.now(),
            'status': 'active'
        }
        
        self.streaming_stats['subscription_count'] += 1
        self.streaming_stats['connection_status'] = 'connected'
        
        # In production:
        # supabase.table(table_name).on('*', callback).subscribe()
        
        return subscription_id
    
    def simulate_real_time_data(self, table_name: str) -> Dict:
        """
        Simulate real-time data updates.
        In production, this would be handled by Supabase real-time callbacks.
        """
        current_time = datetime.now()
        
        if table_name == 'orders':
            # Simulate new order
            data = {
                'event_type': 'INSERT',
                'table': 'orders',
                'timestamp': current_time,
                'data': {
                    'order_id': f"ORD_{np.random.randint(100000, 999999)}",
                    'customer_id': f"CUST_{np.random.randint(1000, 9999):04d}",
                    'total_amount': round(np.random.exponential(120), 2),
                    'status': 'processing',
                    'created_at': current_time.isoformat()
                }
            }
        
        elif table_name == 'metrics':
            # Simulate real-time metrics
            data = {
                'event_type': 'UPDATE',
                'table': 'metrics',
                'timestamp': current_time,
                'data': {
                    'active_users': np.random.randint(800, 1200),
                    'revenue_today': round(np.random.normal(50000, 10000), 2),
                    'orders_pending': np.random.randint(50, 150),
                    'system_load': round(np.random.uniform(0.3, 0.9), 2)
                }
            }
        
        else:
            # Generic data
            data = {
                'event_type': 'UPDATE',
                'table': table_name,
                'timestamp': current_time,
                'data': {'value': np.random.randint(1, 100)}
            }
        
        self.streaming_stats['messages_received'] += 1
        self.streaming_stats['last_message'] = current_time
        
        return data
    
    def get_streaming_stats(self) -> Dict:
        """
        Get current streaming statistics.
        """
        return self.streaming_stats.copy()
    
    def close_subscription(self, subscription_id: str):
        """
        Close a real-time subscription.
        """
        if subscription_id in self.active_subscriptions:
            del self.active_subscriptions[subscription_id]
            self.streaming_stats['subscription_count'] -= 1
            
            if self.streaming_stats['subscription_count'] == 0:
                self.streaming_stats['connection_status'] = 'disconnected'

# Initialize streamer
if 'data_streamer' not in st.session_state:
    st.session_state.data_streamer = RealTimeDataStreamer()
    st.session_state.streaming_data = []
    st.session_state.live_metrics = {
        'active_users': 0,
        'revenue_today': 0,
        'orders_pending': 0,
        'system_load': 0
    }

streamer = st.session_state.data_streamer

st.title("⚡ Real-time Data Streaming Dashboard")
st.markdown("**Live data updates with Supabase real-time subscriptions**")

# Streaming controls
st.subheader("🎛️ Real-time Controls")

control_col1, control_col2, control_col3, control_col4 = st.columns(4)

with control_col1:
    if st.button("▶️ Start Orders Stream", use_container_width=True):
        subscription_id = streamer.create_subscription('orders')
        st.success(f"✅ Orders stream started: {subscription_id[:8]}...")

with control_col2:
    if st.button("📊 Start Metrics Stream", use_container_width=True):
        subscription_id = streamer.create_subscription('metrics')
        st.success(f"✅ Metrics stream started: {subscription_id[:8]}...")

with control_col3:
    if st.button("🔄 Simulate Update", use_container_width=True):
        # Simulate receiving new data
        new_order = streamer.simulate_real_time_data('orders')
        st.session_state.streaming_data.append(new_order)
        
        new_metrics = streamer.simulate_real_time_data('metrics')
        st.session_state.live_metrics.update(new_metrics['data'])
        
        st.success("📡 Real-time data updated!")

with control_col4:
    if st.button("⏹️ Stop All Streams", use_container_width=True):
        for sub_id in list(streamer.active_subscriptions.keys()):
            streamer.close_subscription(sub_id)
        st.info("🛑 All streams stopped")

# Streaming status
streaming_stats = streamer.get_streaming_stats()

status_col1, status_col2, status_col3, status_col4 = st.columns(4)

with status_col1:
    status_emoji = "🟢" if streaming_stats['connection_status'] == 'connected' else "🔴"
    st.metric(
        f"{status_emoji} Connection",
        streaming_stats['connection_status'].title(),
        f"{streaming_stats['subscription_count']} active"
    )

with status_col2:
    st.metric(
        "📡 Messages Received",
        f"{streaming_stats['messages_received']:,}",
        "Total"
    )

with status_col3:
    last_msg = streaming_stats['last_message']
    if last_msg:
        time_diff = (datetime.now() - last_msg).seconds
        st.metric(
            "🕒 Last Update",
            f"{time_diff}s ago" if time_diff < 60 else f"{time_diff//60}m ago",
            "Live"
        )
    else:
        st.metric("🕒 Last Update", "Never", "Waiting")

with status_col4:
    buffer_size = len(st.session_state.streaming_data)
    st.metric(
        "💾 Buffer Size",
        f"{buffer_size}",
        "Messages"
    )

# Live metrics dashboard
st.markdown("---")
st.subheader("📊 Live Business Metrics")

live_metrics = st.session_state.live_metrics

metrics_col1, metrics_col2, metrics_col3, metrics_col4 = st.columns(4)

with metrics_col1:
    st.metric(
        "👥 Active Users",
        f"{live_metrics['active_users']:,}",
        "Real-time"
    )

with metrics_col2:
    st.metric(
        "💰 Revenue Today",
        f"R$ {live_metrics['revenue_today']:,.0f}",
        "Live updates"
    )

with metrics_col3:
    st.metric(
        "📦 Orders Pending",
        f"{live_metrics['orders_pending']}",
        "Processing"
    )

with metrics_col4:
    load_color = "🟢" if live_metrics['system_load'] < 0.7 else "🟡" if live_metrics['system_load'] < 0.9 else "🔴"
    st.metric(
        f"{load_color} System Load",
        f"{live_metrics['system_load']:.1%}",
        "Current"
    )

# Real-time data visualization
if len(st.session_state.streaming_data) > 0:
    st.markdown("---")
    st.subheader("📈 Real-time Data Visualization")
    
    # Convert streaming data to DataFrame
    streaming_df = pd.DataFrame([
        {
            'timestamp': item['timestamp'],
            'event_type': item['event_type'],
            'table': item['table'],
            **item['data']
        }
        for item in st.session_state.streaming_data[-50:]  # Last 50 events
    ])
    
    viz_col1, viz_col2 = st.columns(2)
    
    with viz_col1:
        # Event timeline
        if len(streaming_df) > 1:
            # Group by minute for timeline
            streaming_df['minute'] = streaming_df['timestamp'].dt.floor('min')
            timeline_data = streaming_df.groupby(['minute', 'event_type']).size().reset_index(name='count')
            
            fig = px.line(
                timeline_data,
                x='minute',
                y='count',
                color='event_type',
                title="Real-time Event Timeline",
                labels={'count': 'Events per Minute'}
            )
            fig.update_layout(height=350)
            st.plotly_chart(fig, use_container_width=True)
    
    with viz_col2:
        # Event type distribution
        event_dist = streaming_df['event_type'].value_counts().reset_index()
        event_dist.columns = ['event_type', 'count']
        
        fig = px.pie(
            event_dist,
            values='count',
            names='event_type',
            title="Event Type Distribution"
        )
        fig.update_layout(height=350)
        st.plotly_chart(fig, use_container_width=True)
    
    # Recent events table
    st.subheader("📋 Recent Real-time Events")
    
    # Display last 10 events
    recent_events = streaming_df.tail(10).sort_values('timestamp', ascending=False)
    st.dataframe(
        recent_events[['timestamp', 'event_type', 'table']],
        use_container_width=True,
        height=300
    )

# Configuration and monitoring
with st.sidebar:
    st.header("⚙️ Streaming Configuration")
    
    # Buffer settings
    st.markdown("### 💾 Buffer Settings")
    buffer_limit = st.slider("Buffer limit:", 10, 1000, 100)
    auto_clear = st.checkbox("Auto-clear old data", value=True)
    
    if auto_clear and len(st.session_state.streaming_data) > buffer_limit:
        st.session_state.streaming_data = st.session_state.streaming_data[-buffer_limit:]
    
    # Connection settings
    st.markdown("### 🔗 Connection Settings")
    reconnect_attempts = st.slider("Reconnect attempts:", 1, 10, 3)
    heartbeat_interval = st.selectbox(
        "Heartbeat interval:",
        ['5 seconds', '10 seconds', '30 seconds', '1 minute']
    )
    
    # Monitoring
    st.markdown("---")
    st.markdown("### 📊 Monitoring")
    
    if st.button("🔍 Connection Health", use_container_width=True):
        st.json(streaming_stats)
    
    if st.button("🗑️ Clear Buffer", use_container_width=True):
        st.session_state.streaming_data = []
        st.success("Buffer cleared!")
        st.rerun()
    
    if st.button("📊 Reset Stats", use_container_width=True):
        streamer.streaming_stats = {
            'messages_received': 0,
            'last_message': None,
            'connection_status': 'disconnected',
            'subscription_count': 0
        }
        st.success("Stats reset!")
        st.rerun()
    
    # Export streaming data
    st.markdown("---")
    st.markdown("### 💾 Export")
    
    if len(st.session_state.streaming_data) > 0:
        export_data = pd.DataFrame(st.session_state.streaming_data)
        csv_data = export_data.to_csv(index=False)
        
        st.download_button(
            label="📄 Export Stream Data",
            data=csv_data,
            file_name=f"stream_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv",
            mime="text/csv",
            use_container_width=True
        )

# Auto-refresh for demo
if streaming_stats['connection_status'] == 'connected':
    # Auto-simulate updates every 10 seconds for demo
    time.sleep(0.1)  # Small delay for demo
    
    # Add a refresh button for manual updates
    if st.button("🔄 Manual Refresh"):
        st.rerun()

# Footer
st.markdown("---")
st.markdown(
    "**⚡ Real-time Data Streaming** | "
    "Live updates with Supabase subscriptions | "
    "Enterprise-grade real-time analytics"
)

## 🛡️ Section 3: Security and Error Handling (5 minutes)

Production applications require robust security and error handling:

In [None]:
# Security and error handling patterns
import streamlit as st
import pandas as pd
import hashlib
import time
from datetime import datetime, timedelta

st.subheader("🛡️ Security and Error Handling Best Practices")

st.markdown("""
### 1. Authentication and Authorization

**Row-Level Security (RLS) with Supabase:**

```sql
-- Enable RLS on orders table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

-- Policy: Users can only see their own orders
CREATE POLICY "Users can view own orders" ON orders
    FOR SELECT USING (auth.uid() = customer_id);

-- Policy: Managers can see all orders
CREATE POLICY "Managers can view all orders" ON orders
    FOR SELECT USING (
        EXISTS (
            SELECT 1 FROM user_roles 
            WHERE user_id = auth.uid() 
            AND role = 'manager'
        )
    );
```

**Streamlit Authentication Integration:**

```python
# Secure session management
def authenticate_user(username: str, password: str) -> bool:
    # Hash password securely
    password_hash = hashlib.sha256(password.encode()).hexdigest()
    
    # Verify against Supabase auth
    try:
        response = supabase.auth.sign_in_with_password({
            "email": username,
            "password": password
        })
        return response.user is not None
    except Exception:
        return False

# Check authentication before data access
if 'authenticated' not in st.session_state:
    st.session_state.authenticated = False

if not st.session_state.authenticated:
    st.error("Please log in to access this dashboard")
    st.stop()
```

### 2. Input Validation and Sanitization

```python
import re
from typing import Union

def validate_sql_input(user_input: str) -> bool:
    """Validate user input to prevent SQL injection."""
    # Block dangerous SQL keywords
    dangerous_patterns = [
        r'\b(DROP|DELETE|INSERT|UPDATE|ALTER|CREATE)\b',
        r'[;\-\-]',  # SQL injection patterns
        r'\bUNION\b.*\bSELECT\b',
        r'\bEXEC\b|\bEXECUTE\b'
    ]
    
    for pattern in dangerous_patterns:
        if re.search(pattern, user_input, re.IGNORECASE):
            return False
    return True

def sanitize_user_input(user_input: str) -> str:
    """Sanitize user input for safe database queries."""
    # Remove potentially dangerous characters
    sanitized = re.sub(r'[^\w\s\-@.]', '', user_input)
    return sanitized.strip()[:100]  # Limit length

# Usage in Streamlit
user_filter = st.text_input("Search orders:")
if user_filter:
    if not validate_sql_input(user_filter):
        st.error("Invalid input detected")
        st.stop()
    
    safe_filter = sanitize_user_input(user_filter)
    # Use parameterized queries
    query = "SELECT * FROM orders WHERE description ILIKE %s"
    results = execute_query(query, [f"%{safe_filter}%"])
```

### 3. Error Handling and Recovery

```python
import logging
from contextlib import contextmanager

@contextmanager
def handle_database_errors():
    """Context manager for database error handling."""
    try:
        yield
    except ConnectionError:
        st.error("🔌 Database connection lost. Retrying...")
        time.sleep(2)
        # Implement reconnection logic
    except TimeoutError:
        st.warning("⏱️ Query timeout. Try reducing data range.")
    except PermissionError:
        st.error("🚫 Access denied. Check your permissions.")
    except Exception as e:
        logging.error(f"Database error: {e}")
        st.error("❌ An unexpected error occurred. Please try again.")

# Usage
with handle_database_errors():
    data = load_sensitive_data(user_id)
    display_dashboard(data)
```

### 4. Rate Limiting and Performance Protection

```python
from functools import wraps
import time

def rate_limit(calls_per_minute: int = 60):
    """Rate limiting decorator for API calls."""
    def decorator(func):
        if not hasattr(func, 'call_times'):
            func.call_times = []
        
        @wraps(func)
        def wrapper(*args, **kwargs):
            now = time.time()
            # Remove calls older than 1 minute
            func.call_times = [t for t in func.call_times if now - t < 60]
            
            if len(func.call_times) >= calls_per_minute:
                st.error(f"Rate limit exceeded. Max {calls_per_minute} calls per minute.")
                return None
            
            func.call_times.append(now)
            return func(*args, **kwargs)
        
        return wrapper
    return decorator

@rate_limit(calls_per_minute=30)
def expensive_database_operation():
    # Expensive operation here
    pass
```

### 5. Data Privacy and Compliance

```python
def mask_sensitive_data(df: pd.DataFrame, columns: List[str]) -> pd.DataFrame:
    """Mask sensitive data for display."""
    masked_df = df.copy()
    
    for col in columns:
        if col in masked_df.columns:
            if masked_df[col].dtype == 'object':  # String data
                masked_df[col] = masked_df[col].apply(
                    lambda x: x[:3] + "***" + x[-2:] if len(str(x)) > 5 else "***"
                )
            else:  # Numeric data
                masked_df[col] = "[REDACTED]"
    
    return masked_df

# Usage
if user_role != 'admin':
    sensitive_columns = ['customer_email', 'phone_number', 'cpf']
    display_data = mask_sensitive_data(raw_data, sensitive_columns)
else:
    display_data = raw_data

st.dataframe(display_data)
```
""")

# Interactive security demo
st.markdown("---")
st.subheader("🔒 Interactive Security Demo")

# Simulate authentication
demo_col1, demo_col2 = st.columns(2)

with demo_col1:
    st.markdown("**Authentication Simulation**")
    
    username = st.text_input("Username:", value="demo@olist.com")
    password = st.text_input("Password:", type="password", value="demo123")
    
    if st.button("🔐 Login"):
        # Simulate authentication
        if username == "demo@olist.com" and password == "demo123":
            st.success("✅ Authentication successful!")
            st.session_state.demo_authenticated = True
        else:
            st.error("❌ Invalid credentials")
            st.session_state.demo_authenticated = False

with demo_col2:
    st.markdown("**Input Validation Demo**")
    
    user_input = st.text_input("Enter search term:", placeholder="Try: normal text OR DROP TABLE")
    
    if user_input:
        # Validate input
        dangerous_patterns = [r'\b(DROP|DELETE|INSERT|UPDATE)\b', r'[;\-\-]']
        is_safe = True
        
        for pattern in dangerous_patterns:
            if re.search(pattern, user_input, re.IGNORECASE):
                is_safe = False
                break
        
        if is_safe:
            st.success(f"✅ Safe input: '{user_input}'")
        else:
            st.error(f"🚫 Dangerous input detected: '{user_input}'")

# Rate limiting demo
st.markdown("**Rate Limiting Demo**")

if 'api_calls' not in st.session_state:
    st.session_state.api_calls = []

if st.button("📡 Make API Call"):
    now = time.time()
    # Remove calls older than 1 minute
    st.session_state.api_calls = [
        t for t in st.session_state.api_calls if now - t < 60
    ]
    
    if len(st.session_state.api_calls) >= 5:  # Limit for demo
        st.error("🚫 Rate limit exceeded! Max 5 calls per minute.")
    else:
        st.session_state.api_calls.append(now)
        st.success(f"✅ API call successful! ({len(st.session_state.api_calls)}/5 this minute)")

print("✅ Security and error handling patterns ready!")

## 🎯 Key Takeaways

✅ **Production Integration**: Connection pooling, retry logic, and performance monitoring  
✅ **Real-time Streaming**: Live data updates with Supabase subscriptions  
✅ **Security Best Practices**: Authentication, input validation, and data protection  
✅ **Error Handling**: Robust error recovery and graceful degradation  
✅ **Enterprise Features**: Rate limiting, logging, and compliance considerations  

## 🔜 Coming Up in Part 3

In the final session today, we'll cover deployment and production considerations:

**Preview Topics:**
- Streamlit Cloud deployment strategies
- Environment management and secrets
- Performance optimization for production
- Monitoring and alerting setup

---

## 💼 Practice Exercise

**Scenario**: Implement a secure real-time order monitoring system for Olist's operations team.

**Requirements**:
1. **Real-time Updates**: Live order status changes
2. **Role-based Access**: Different views for managers vs analysts
3. **Error Handling**: Graceful handling of connection issues
4. **Security**: Input validation and data masking
5. **Performance**: Efficient data loading and caching

**Security Constraints**:
- Mask customer PII for non-admin users
- Rate limit database queries
- Validate all user inputs
- Implement session timeout

**Time**: 15 minutes

---

*Next: [Part 3 - Deployment & Production →](02_streamlit_advanced_part3_deployment_production.ipynb)*