# Week 9 - Part 3: Deployment and Production

**Course:** Python Data Analysis for Business Intelligence  
**Week:** 9 | **Session:** Thursday | **Part:** 3 of 3  
**Duration:** 20 minutes | **Date:** June 5, 2025

## Learning Objectives
By the end of this session, you will be able to:
- Deploy Streamlit applications to production using Streamlit Cloud
- Implement environment management and secure secrets handling
- Optimize application performance for production workloads
- Set up monitoring, logging, and alerting for deployed applications
- Implement CI/CD pipelines for automated deployment

---

## 🎯 Business Context: Production Deployment Strategy

**Enterprise Challenge**: Olist's business intelligence platform must be deployed to serve multiple stakeholder groups with enterprise-grade reliability:

### Production Requirements:
- **👥 User Base**: 500+ concurrent users across Brazil
- **⚡ Performance**: <2 second load times, 99.9% uptime
- **🔒 Security**: Enterprise authentication, data encryption, audit trails
- **📈 Scalability**: Auto-scaling for peak usage periods
- **🌍 Global Access**: Multi-region deployment for low latency

### Deployment Architecture:
- **Frontend**: Streamlit Cloud with custom domain
- **Database**: Supabase with read replicas
- **Monitoring**: Real-time performance and error tracking
- **CI/CD**: Automated testing and deployment pipeline

**Production Challenge**: Move from development prototype to enterprise-grade production system that can handle real business operations.

**Today's Solution**: Complete deployment workflow from local development to production monitoring with enterprise DevOps practices.

---

## 🛠️ Setup: Production Environment Preparation

Let's prepare our development environment for production deployment:

In [None]:
# Production deployment imports
import streamlit as st
import pandas as pd
import numpy as np
import os
import json
import logging
import time
from datetime import datetime, timedelta
from typing import Dict, List, Optional
import hashlib
import base64
from pathlib import Path

# Configure production logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(),
        logging.FileHandler('app.log')
    ]
)
logger = logging.getLogger(__name__)

# Course utilities
from Utilities.visualization_helper import set_plotting_style
from Utilities.colab_helper import setup_colab

# Setup
plt, sns = set_plotting_style()
setup_colab()

print("✅ Production deployment environment ready!")
print("🚀 Ready to deploy enterprise-grade Streamlit applications")
print("📊 DevOps, monitoring, and optimization patterns loaded")

## 🚀 Section 1: Streamlit Cloud Deployment (8 minutes)

Let's create a complete deployment workflow for Streamlit Cloud:

In [None]:
%%writefile deployment_ready_app.py

import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta
import os
import logging
import time
import json
from typing import Dict, Optional

# Configure production settings
st.set_page_config(
    page_title="Olist Business Intelligence Platform",
    page_icon="🏢",
    layout="wide",
    initial_sidebar_state="expanded",
    menu_items={
        'Get Help': 'https://docs.olist.com/support',
        'Report a bug': 'https://github.com/olist/bi-platform/issues',
        'About': "Olist BI Platform v2.1 - Built with Streamlit"
    }
)

# Production environment configuration
class ProductionConfig:
    """
    Production configuration management with environment-specific settings.
    """
    
    def __init__(self):
        self.environment = os.getenv('STREAMLIT_ENV', 'development')
        self.debug_mode = os.getenv('DEBUG', 'False').lower() == 'true'
        self.version = os.getenv('APP_VERSION', 'v2.1.0')
        
        # Configure logging based on environment
        log_level = logging.DEBUG if self.debug_mode else logging.INFO
        logging.basicConfig(
            level=log_level,
            format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
        )
        self.logger = logging.getLogger(__name__)
    
    def get_database_config(self) -> Dict:
        """
        Get database configuration from environment or Streamlit secrets.
        """
        try:
            # Try Streamlit secrets first (production)
            if hasattr(st, 'secrets') and 'database' in st.secrets:
                return {
                    'url': st.secrets['database']['url'],
                    'api_key': st.secrets['database']['api_key'],
                    'connection_pool_size': st.secrets['database'].get('pool_size', 10)
                }
            
            # Fallback to environment variables (development)
            return {
                'url': os.getenv('DATABASE_URL'),
                'api_key': os.getenv('DATABASE_API_KEY'),
                'connection_pool_size': int(os.getenv('DB_POOL_SIZE', '5'))
            }
            
        except Exception as e:
            self.logger.error(f"Failed to load database config: {e}")
            return {}
    
    def get_api_config(self) -> Dict:
        """
        Get external API configuration.
        """
        try:
            if hasattr(st, 'secrets') and 'apis' in st.secrets:
                return {
                    'analytics_key': st.secrets['apis']['google_analytics'],
                    'monitoring_key': st.secrets['apis']['datadog_key'],
                    'email_service': st.secrets['apis']['sendgrid_key']
                }
            
            return {
                'analytics_key': os.getenv('GOOGLE_ANALYTICS_KEY'),
                'monitoring_key': os.getenv('DATADOG_API_KEY'),
                'email_service': os.getenv('SENDGRID_API_KEY')
            }
            
        except Exception as e:
            self.logger.warning(f"API config not available: {e}")
            return {}

# Initialize production configuration
config = ProductionConfig()
logger = config.logger

# Performance monitoring
class PerformanceMonitor:
    """
    Monitor application performance and user interactions.
    """
    
    def __init__(self):
        if 'performance_metrics' not in st.session_state:
            st.session_state.performance_metrics = {
                'page_loads': 0,
                'query_times': [],
                'error_count': 0,
                'user_sessions': 0,
                'last_active': datetime.now()
            }
    
    def track_page_load(self, page_name: str):
        """
        Track page load event.
        """
        st.session_state.performance_metrics['page_loads'] += 1
        st.session_state.performance_metrics['last_active'] = datetime.now()
        
        logger.info(f"Page loaded: {page_name}")
        
        # In production, send to analytics service
        # analytics.track('page_view', {'page': page_name, 'timestamp': datetime.now()})
    
    def track_query_performance(self, query_type: str, execution_time: float):
        """
        Track database query performance.
        """
        st.session_state.performance_metrics['query_times'].append({
            'type': query_type,
            'time': execution_time,
            'timestamp': datetime.now()
        })
        
        logger.info(f"Query executed: {query_type} in {execution_time:.3f}s")
        
        # Alert on slow queries
        if execution_time > 5.0:
            logger.warning(f"Slow query detected: {query_type} took {execution_time:.3f}s")
    
    def track_error(self, error_type: str, error_message: str):
        """
        Track application errors.
        """
        st.session_state.performance_metrics['error_count'] += 1
        
        logger.error(f"Application error: {error_type} - {error_message}")
        
        # In production, send to error tracking service
        # sentry.capture_exception(error_message)
    
    def get_metrics_summary(self) -> Dict:
        """
        Get performance metrics summary.
        """
        metrics = st.session_state.performance_metrics
        
        avg_query_time = 0
        if metrics['query_times']:
            avg_query_time = sum(q['time'] for q in metrics['query_times']) / len(metrics['query_times'])
        
        return {
            'page_loads': metrics['page_loads'],
            'total_queries': len(metrics['query_times']),
            'avg_query_time': avg_query_time,
            'error_count': metrics['error_count'],
            'uptime_minutes': (datetime.now() - metrics['last_active']).total_seconds() / 60
        }

# Initialize performance monitoring
monitor = PerformanceMonitor()
monitor.track_page_load('main_dashboard')

# Custom CSS for production branding
st.markdown("""
<style>
    /* Production branding */
    .main-header {
        background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
        padding: 2rem;
        border-radius: 10px;
        color: white;
        margin-bottom: 2rem;
    }
    
    /* Environment indicator */
    .env-indicator {
        position: fixed;
        top: 10px;
        right: 10px;
        padding: 5px 10px;
        border-radius: 15px;
        font-size: 12px;
        font-weight: bold;
        z-index: 1000;
    }
    
    .env-production {
        background: #28a745;
        color: white;
    }
    
    .env-development {
        background: #ffc107;
        color: black;
    }
    
    /* Performance indicators */
    .performance-good {
        background: #d4edda;
        border: 1px solid #c3e6cb;
        color: #155724;
        padding: 0.75rem;
        border-radius: 5px;
        border-left: 4px solid #28a745;
    }
    
    .performance-warning {
        background: #fff3cd;
        border: 1px solid #ffeaa7;
        color: #856404;
        padding: 0.75rem;
        border-radius: 5px;
        border-left: 4px solid #ffc107;
    }
    
    .performance-critical {
        background: #f8d7da;
        border: 1px solid #f5c6cb;
        color: #721c24;
        padding: 0.75rem;
        border-radius: 5px;
        border-left: 4px solid #dc3545;
    }
</style>
""", unsafe_allow_html=True)

# Environment indicator
env_class = "env-production" if config.environment == "production" else "env-development"
st.markdown(f"""
<div class="env-indicator {env_class}">
    {config.environment.upper()} • {config.version}
</div>
""", unsafe_allow_html=True)

# Production header
st.markdown("""
<div class="main-header">
    <h1>🏢 Olist Business Intelligence Platform</h1>
    <p>Production Dashboard • Real-time Analytics • Enterprise Grade</p>
</div>
""", unsafe_allow_html=True)

# Health check endpoint simulation
def health_check() -> Dict:
    """
    Application health check for monitoring.
    """
    try:
        # Check database connectivity
        db_config = config.get_database_config()
        db_healthy = bool(db_config.get('url'))
        
        # Check performance metrics
        metrics = monitor.get_metrics_summary()
        performance_healthy = metrics['avg_query_time'] < 2.0 and metrics['error_count'] < 10
        
        # Overall health status
        overall_healthy = db_healthy and performance_healthy
        
        return {
            'status': 'healthy' if overall_healthy else 'unhealthy',
            'timestamp': datetime.now().isoformat(),
            'version': config.version,
            'environment': config.environment,
            'database': 'connected' if db_healthy else 'disconnected',
            'performance': 'good' if performance_healthy else 'degraded',
            'uptime_minutes': metrics['uptime_minutes'],
            'total_requests': metrics['page_loads']
        }
        
    except Exception as e:
        logger.error(f"Health check failed: {e}")
        return {
            'status': 'unhealthy',
            'error': str(e),
            'timestamp': datetime.now().isoformat()
        }

# Application monitoring dashboard
st.subheader("📊 Application Health & Performance")

health_data = health_check()
performance_metrics = monitor.get_metrics_summary()

# Health status indicators
health_col1, health_col2, health_col3, health_col4 = st.columns(4)

with health_col1:
    status_emoji = "🟢" if health_data['status'] == 'healthy' else "🔴"
    st.metric(
        f"{status_emoji} System Health",
        health_data['status'].title(),
        f"Environment: {config.environment}"
    )

with health_col2:
    st.metric(
        "📡 Database Status",
        health_data.get('database', 'unknown').title(),
        "Real-time"
    )

with health_col3:
    st.metric(
        "⚡ Avg Response Time",
        f"{performance_metrics['avg_query_time']:.3f}s",
        "Target: <2.0s"
    )

with health_col4:
    st.metric(
        "🔢 Total Requests",
        f"{performance_metrics['page_loads']:,}",
        f"Errors: {performance_metrics['error_count']}"
    )

# Performance status
if performance_metrics['avg_query_time'] < 1.0:
    performance_class = "performance-good"
    performance_message = "🚀 Application performance is excellent"
elif performance_metrics['avg_query_time'] < 2.0:
    performance_class = "performance-warning"
    performance_message = "⚠️ Application performance is acceptable but could be improved"
else:
    performance_class = "performance-critical"
    performance_message = "🚨 Application performance needs immediate attention"

st.markdown(f"""
<div class="{performance_class}">
    <strong>Performance Status:</strong> {performance_message}
</div>
""", unsafe_allow_html=True)

# Sample business data with performance tracking
@st.cache_data(ttl=300)  # 5-minute cache for production
def load_production_data():
    """
    Load production data with performance monitoring.
    """
    start_time = time.time()
    
    try:
        # Simulate production data loading
        np.random.seed(42)
        
        # Generate business metrics
        dates = pd.date_range('2024-01-01', periods=365, freq='D')
        data = {
            'date': dates,
            'revenue': np.random.normal(50000, 10000, 365).cumsum(),
            'orders': np.random.poisson(200, 365),
            'customers': np.random.poisson(150, 365),
            'satisfaction': np.random.normal(4.2, 0.3, 365)
        }
        
        df = pd.DataFrame(data)
        
        # Track query performance
        execution_time = time.time() - start_time
        monitor.track_query_performance('load_production_data', execution_time)
        
        logger.info(f"Production data loaded successfully in {execution_time:.3f}s")
        return df
        
    except Exception as e:
        monitor.track_error('data_loading_error', str(e))
        raise

# Load and display production data
try:
    with st.spinner("Loading production data..."):
        production_df = load_production_data()
    
    # Business metrics dashboard
    st.markdown("---")
    st.subheader("📈 Business Performance Dashboard")
    
    # Current metrics
    current_revenue = production_df['revenue'].iloc[-1]
    current_orders = production_df['orders'].iloc[-30:].sum()
    current_satisfaction = production_df['satisfaction'].iloc[-30:].mean()
    
    metric_col1, metric_col2, metric_col3 = st.columns(3)
    
    with metric_col1:
        st.metric(
            "💰 Total Revenue",
            f"R$ {current_revenue:,.0f}",
            "YTD Performance"
        )
    
    with metric_col2:
        st.metric(
            "📦 Orders (30d)",
            f"{current_orders:,}",
            "Recent Performance"
        )
    
    with metric_col3:
        st.metric(
            "⭐ Satisfaction",
            f"{current_satisfaction:.2f}/5.0",
            "30-day Average"
        )
    
    # Revenue trend chart
    st.subheader("📊 Revenue Trend Analysis")
    
    import plotly.express as px
    
    fig = px.line(
        production_df.tail(90),  # Last 90 days
        x='date',
        y='revenue',
        title="Revenue Trend (Last 90 Days)",
        labels={'revenue': 'Revenue (R$)', 'date': 'Date'}
    )
    
    fig.update_traces(line=dict(color='#667eea', width=3))
    fig.update_layout(height=400)
    
    st.plotly_chart(fig, use_container_width=True)
    
except Exception as e:
    st.error(f"❌ Failed to load production data: {str(e)}")
    monitor.track_error('dashboard_error', str(e))

# Production deployment information
with st.sidebar:
    st.header("🚀 Deployment Info")
    
    st.markdown(f"""
    **Environment:** {config.environment}  
    **Version:** {config.version}  
    **Build:** {datetime.now().strftime('%Y%m%d-%H%M')}  
    **Status:** {health_data['status'].title()}  
    """)
    
    st.markdown("---")
    st.subheader("📊 Performance Metrics")
    
    st.metric("Page Loads", performance_metrics['page_loads'])
    st.metric("Total Queries", performance_metrics['total_queries'])
    st.metric("Error Count", performance_metrics['error_count'])
    
    st.markdown("---")
    st.subheader("🔧 Admin Tools")
    
    if st.button("🔄 Clear Cache", use_container_width=True):
        st.cache_data.clear()
        st.success("Cache cleared!")
    
    if st.button("📊 Full Health Check", use_container_width=True):
        health_result = health_check()
        st.json(health_result)
    
    if st.button("📥 Export Logs", use_container_width=True):
        # In production, this would export actual logs
        st.info("Logs exported to admin panel")

# Footer with deployment information
st.markdown("---")
st.markdown(f"""
<div style="text-align: center; color: #666; padding: 1rem;">
    <p><strong>🏢 Olist Business Intelligence Platform</strong> | 
    Version {config.version} | 
    Environment: {config.environment.title()} | 
    <a href="#" style="color: #667eea;">Support</a> | 
    <a href="#" style="color: #667eea;">Documentation</a></p>
    <p>Deployed on Streamlit Cloud | 
    Last updated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} UTC</p>
</div>
""", unsafe_allow_html=True)

# Log successful page render
logger.info("Dashboard rendered successfully for user session")

## 📋 Section 2: Production Configuration Files (7 minutes)

Let's create all the necessary configuration files for production deployment:

In [None]:
# Create production configuration files
import os
from pathlib import Path

# Create project structure
project_files = {
    'requirements.txt': '''
streamlit>=1.28.0
pandas>=1.5.0
numpy>=1.24.0
plotly>=5.15.0
supabase>=1.0.3
python-dotenv>=1.0.0
psycopg2-binary>=2.9.5
requests>=2.28.0
'''.strip(),
    
    '.streamlit/config.toml': '''
[global]
developmentMode = false
showWarningOnDirectExecution = false

[server]
headless = true
enableCORS = false
enableXsrfProtection = true
maxUploadSize = 200

[browser]
gatherUsageStats = false
serverAddress = "0.0.0.0"
serverPort = 8501

[theme]
primaryColor = "#667eea"
backgroundColor = "#ffffff"
secondaryBackgroundColor = "#f0f2f6"
textColor = "#262730"
font = "sans serif"
''',
    
    '.streamlit/secrets.toml': '''
# Production secrets (example - use actual values in production)
[database]
url = "your-supabase-url"
api_key = "your-supabase-anon-key"
service_role_key = "your-service-role-key"
pool_size = 10

[apis]
google_analytics = "your-ga-key"
datadog_key = "your-datadog-key"
sendgrid_key = "your-sendgrid-key"

[auth]
jwt_secret = "your-jwt-secret"
session_timeout = 3600

[monitoring]
sentry_dsn = "your-sentry-dsn"
log_level = "INFO"
''',
    
    '.env.example': '''
# Environment variables for local development
STREAMLIT_ENV=development
DEBUG=true
APP_VERSION=v2.1.0

# Database
DATABASE_URL=your-supabase-url
DATABASE_API_KEY=your-api-key
DB_POOL_SIZE=5

# External APIs
GOOGLE_ANALYTICS_KEY=your-ga-key
DATADOG_API_KEY=your-datadog-key
SENDGRID_API_KEY=your-sendgrid-key

# Monitoring
SENTRY_DSN=your-sentry-dsn
LOG_LEVEL=DEBUG
''',
    
    'Dockerfile': '''
FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \\
    build-essential \\
    curl \\
    software-properties-common \\
    git \\
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first for better caching
COPY requirements.txt .
RUN pip3 install -r requirements.txt

# Copy application code
COPY . .

# Create non-root user
RUN useradd -m -u 1000 streamlit && chown -R streamlit:streamlit /app
USER streamlit

# Health check
HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health

# Expose port
EXPOSE 8501

# Run application
ENTRYPOINT ["streamlit", "run", "deployment_ready_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
''',
    
    'docker-compose.yml': '''
version: '3.8'

services:
  streamlit-app:
    build: .
    ports:
      - "8501:8501"
    environment:
      - STREAMLIT_ENV=production
      - DEBUG=false
    volumes:
      - ./logs:/app/logs
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8501/_stcore/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - streamlit-app
    restart: unless-stopped
''',
    
    '.github/workflows/deploy.yml': '''
name: Deploy to Production

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
        pip install pytest flake8 black
    
    - name: Lint with flake8
      run: |
        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
    
    - name: Format check with black
      run: |
        black --check .
    
    - name: Test with pytest
      run: |
        pytest tests/ -v
  
  deploy:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Deploy to Streamlit Cloud
      run: |
        echo "Deploying to Streamlit Cloud..."
        # Streamlit Cloud automatically deploys on push to main
        echo "Deployment triggered successfully"
    
    - name: Notify deployment status
      run: |
        echo "Sending deployment notification..."
        # Add notification logic here
'''
}

print("📁 Production Configuration Files:")
print()
for filename, content in project_files.items():
    print(f"📄 {filename}")
    print(f"   {len(content.splitlines())} lines")
    print()

# Display key configuration explanations
st.subheader("📋 Production Configuration Guide")

config_tabs = st.tabs(["Requirements", "Streamlit Config", "Secrets", "Docker", "CI/CD"])

with config_tabs[0]:
    st.markdown("""
    ### 📦 requirements.txt
    
    **Purpose**: Define all Python dependencies for consistent deployments
    
    **Key Dependencies**:
    - `streamlit>=1.28.0` - Core framework with latest features
    - `supabase>=1.0.3` - Database integration
    - `plotly>=5.15.0` - Interactive visualizations
    - `psycopg2-binary>=2.9.5` - PostgreSQL adapter
    
    **Best Practices**:
    - Pin major versions to avoid breaking changes
    - Use `pip freeze > requirements.txt` after testing
    - Separate dev dependencies if needed
    """)

with config_tabs[1]:
    st.markdown("""
    ### ⚙️ .streamlit/config.toml
    
    **Purpose**: Configure Streamlit behavior for production
    
    **Key Settings**:
    - `developmentMode = false` - Disable development features
    - `enableXsrfProtection = true` - Enable security protection
    - `maxUploadSize = 200` - Limit file uploads (MB)
    - Custom theme colors for branding
    
    **Security Note**: Never disable XSRF protection in production
    """)

with config_tabs[2]:
    st.markdown("""
    ### 🔐 .streamlit/secrets.toml
    
    **Purpose**: Store sensitive configuration securely
    
    **Never Include**:
    - Database passwords in plain text
    - API keys in source code
    - JWT secrets or encryption keys
    
    **Production Setup**:
    1. Add secrets in Streamlit Cloud dashboard
    2. Use environment variables for local development
    3. Never commit secrets.toml to version control
    
    **Access in Code**:
    ```python
    db_url = st.secrets["database"]["url"]
    api_key = st.secrets["apis"]["sendgrid_key"]
    ```
    """)

with config_tabs[3]:
    st.markdown("""
    ### 🐳 Docker Configuration
    
    **Dockerfile Benefits**:
    - Consistent deployment environment
    - Easy local testing of production setup
    - Container orchestration support
    
    **Security Features**:
    - Non-root user execution
    - Health check endpoint
    - Minimal base image
    
    **Usage**:
    ```bash
    # Build image
    docker build -t olist-bi .
    
    # Run container
    docker run -p 8501:8501 olist-bi
    
    # Use docker-compose for full stack
    docker-compose up -d
    ```
    """)

with config_tabs[4]:
    st.markdown("""
    ### 🔄 CI/CD Pipeline
    
    **GitHub Actions Workflow**:
    1. **Test Stage**: Run linting, formatting, and unit tests
    2. **Deploy Stage**: Automatic deployment on main branch
    3. **Notification**: Alert team of deployment status
    
    **Quality Gates**:
    - Code formatting with Black
    - Linting with Flake8
    - Unit tests with pytest
    - Security scanning (optional)
    
    **Deployment Triggers**:
    - Push to main branch
    - Manual workflow dispatch
    - Scheduled deployments
    """)

print("✅ Production configuration files created!")

## 🔍 Section 3: Monitoring and Optimization (5 minutes)

Let's implement comprehensive monitoring and performance optimization:

In [None]:
%%writefile monitoring_optimization_guide.py

import streamlit as st
import pandas as pd
import numpy as np
import time
import logging
from datetime import datetime, timedelta
from typing import Dict, List
import psutil
import gc

st.title("🔍 Production Monitoring & Optimization Guide")
st.markdown("**Enterprise-grade monitoring and performance optimization strategies**")

# Performance monitoring class
class ProductionMonitor:
    """
    Comprehensive production monitoring system.
    """
    
    def __init__(self):
        if 'monitoring_data' not in st.session_state:
            st.session_state.monitoring_data = {
                'response_times': [],
                'memory_usage': [],
                'cpu_usage': [],
                'user_sessions': [],
                'error_logs': [],
                'cache_hits': 0,
                'cache_misses': 0
            }
    
    def log_performance_metric(self, metric_type: str, value: float, metadata: Dict = None):
        """
        Log performance metrics with timestamp.
        """
        timestamp = datetime.now()
        
        metric_entry = {
            'timestamp': timestamp,
            'value': value,
            'metadata': metadata or {}
        }
        
        if metric_type in st.session_state.monitoring_data:
            st.session_state.monitoring_data[metric_type].append(metric_entry)
            
            # Keep only last 1000 entries to prevent memory bloat
            if len(st.session_state.monitoring_data[metric_type]) > 1000:
                st.session_state.monitoring_data[metric_type] = \
                    st.session_state.monitoring_data[metric_type][-1000:]
    
    def get_system_metrics(self) -> Dict:
        """
        Get current system performance metrics.
        """
        try:
            # Get system metrics
            cpu_percent = psutil.cpu_percent(interval=0.1)
            memory = psutil.virtual_memory()
            disk = psutil.disk_usage('/')
            
            return {
                'cpu_percent': cpu_percent,
                'memory_percent': memory.percent,
                'memory_available_gb': memory.available / (1024**3),
                'disk_percent': disk.percent,
                'disk_free_gb': disk.free / (1024**3),
                'timestamp': datetime.now()
            }
        except Exception as e:
            # Fallback for environments where psutil might not work
            return {
                'cpu_percent': 45.2,  # Simulated values
                'memory_percent': 62.8,
                'memory_available_gb': 2.1,
                'disk_percent': 78.5,
                'disk_free_gb': 15.7,
                'timestamp': datetime.now()
            }
    
    def track_cache_performance(self, hit: bool = True):
        """
        Track cache hit/miss rates.
        """
        if hit:
            st.session_state.monitoring_data['cache_hits'] += 1
        else:
            st.session_state.monitoring_data['cache_misses'] += 1
    
    def get_cache_metrics(self) -> Dict:
        """
        Get cache performance metrics.
        """
        hits = st.session_state.monitoring_data['cache_hits']
        misses = st.session_state.monitoring_data['cache_misses']
        total = hits + misses
        
        hit_rate = (hits / total * 100) if total > 0 else 0
        
        return {
            'hit_rate': hit_rate,
            'total_requests': total,
            'cache_hits': hits,
            'cache_misses': misses
        }

# Initialize monitor
monitor = ProductionMonitor()

# Real-time monitoring dashboard
st.subheader("📊 Real-time System Monitoring")

# Get current system metrics
system_metrics = monitor.get_system_metrics()
cache_metrics = monitor.get_cache_metrics()

# System metrics display
sys_col1, sys_col2, sys_col3, sys_col4 = st.columns(4)

with sys_col1:
    cpu_color = "🟢" if system_metrics['cpu_percent'] < 70 else "🟡" if system_metrics['cpu_percent'] < 90 else "🔴"
    st.metric(
        f"{cpu_color} CPU Usage",
        f"{system_metrics['cpu_percent']:.1f}%",
        "Current"
    )

with sys_col2:
    mem_color = "🟢" if system_metrics['memory_percent'] < 70 else "🟡" if system_metrics['memory_percent'] < 85 else "🔴"
    st.metric(
        f"{mem_color} Memory Usage",
        f"{system_metrics['memory_percent']:.1f}%",
        f"{system_metrics['memory_available_gb']:.1f}GB free"
    )

with sys_col3:
    disk_color = "🟢" if system_metrics['disk_percent'] < 80 else "🟡" if system_metrics['disk_percent'] < 90 else "🔴"
    st.metric(
        f"{disk_color} Disk Usage",
        f"{system_metrics['disk_percent']:.1f}%",
        f"{system_metrics['disk_free_gb']:.1f}GB free"
    )

with sys_col4:
    cache_color = "🟢" if cache_metrics['hit_rate'] > 80 else "🟡" if cache_metrics['hit_rate'] > 60 else "🔴"
    st.metric(
        f"{cache_color} Cache Hit Rate",
        f"{cache_metrics['hit_rate']:.1f}%",
        f"{cache_metrics['total_requests']} requests"
    )

# Performance optimization guides
st.markdown("---")
st.subheader("⚡ Performance Optimization Strategies")

opt_tabs = st.tabs(["🚀 Caching", "💾 Memory", "📊 Database", "🔍 Monitoring", "🛡️ Security"])

with opt_tabs[0]:
    st.markdown("""
    ### 🚀 Advanced Caching Strategies
    
    **1. Multi-level Caching**
    ```python
    # Application-level cache
    @st.cache_data(ttl=300)  # 5 minutes
    def load_dashboard_data():
        return expensive_query()
    
    # Resource-level cache
    @st.cache_resource
    def get_database_connection():
        return create_connection_pool()
    
    # Custom cache with invalidation
    @st.cache_data(ttl=600)
    def load_user_specific_data(user_id: str):
        return query_user_data(user_id)
    ```
    
    **2. Cache Warming**
    ```python
    # Pre-load critical data
    def warm_cache():
        load_dashboard_data()
        load_user_specific_data('default')
        logger.info("Cache warmed successfully")
    
    # Run on app startup
    if 'cache_warmed' not in st.session_state:
        warm_cache()
        st.session_state.cache_warmed = True
    ```
    
    **3. Smart Cache Invalidation**
    ```python
    def invalidate_user_cache(user_id: str):
        # Clear specific user cache
        st.cache_data.clear()
        logger.info(f"Cache invalidated for user: {user_id}")
    ```
    """)

with opt_tabs[1]:
    st.markdown("""
    ### 💾 Memory Management
    
    **1. DataFrame Optimization**
    ```python
    # Use appropriate data types
    def optimize_dataframe(df: pd.DataFrame) -> pd.DataFrame:
        # Convert object columns to category for repeated strings
        for col in df.select_dtypes(include=['object']).columns:
            if df[col].nunique() / len(df) < 0.5:
                df[col] = df[col].astype('category')
        
        # Downcast numeric types
        for col in df.select_dtypes(include=['int64']).columns:
            df[col] = pd.to_numeric(df[col], downcast='integer')
        
        for col in df.select_dtypes(include=['float64']).columns:
            df[col] = pd.to_numeric(df[col], downcast='float')
        
        return df
    ```
    
    **2. Memory Monitoring**
    ```python
    import tracemalloc
    
    def monitor_memory_usage():
        tracemalloc.start()
        
        # Your code here
        
        current, peak = tracemalloc.get_traced_memory()
        logger.info(f"Memory usage: {current / 1024 / 1024:.1f} MB")
        logger.info(f"Peak memory: {peak / 1024 / 1024:.1f} MB")
        
        tracemalloc.stop()
    ```
    
    **3. Garbage Collection**
    ```python
    import gc
    
    def cleanup_memory():
        # Force garbage collection
        collected = gc.collect()
        logger.info(f"Garbage collected: {collected} objects")
    
    # Run cleanup periodically
    if st.session_state.get('request_count', 0) % 100 == 0:
        cleanup_memory()
    ```
    """)

with opt_tabs[2]:
    st.markdown("""
    ### 📊 Database Optimization
    
    **1. Connection Pooling**
    ```python
    from sqlalchemy import create_engine
    from sqlalchemy.pool import QueuePool
    
    @st.cache_resource
    def get_database_engine():
        return create_engine(
            DATABASE_URL,
            poolclass=QueuePool,
            pool_size=10,
            max_overflow=20,
            pool_pre_ping=True,
            pool_recycle=3600
        )
    ```
    
    **2. Query Optimization**
    ```python
    # Use pagination for large datasets
    def load_paginated_data(page: int = 1, page_size: int = 1000):
        offset = (page - 1) * page_size
        query = f"""
        SELECT * FROM orders 
        ORDER BY order_date DESC 
        LIMIT {page_size} OFFSET {offset}
        """
        return pd.read_sql(query, get_database_engine())
    
    # Use database-level aggregation
    def get_daily_metrics(start_date: str, end_date: str):
        query = f"""
        SELECT 
            DATE(order_date) as date,
            COUNT(*) as orders,
            SUM(total_amount) as revenue,
            AVG(customer_satisfaction) as satisfaction
        FROM orders 
        WHERE order_date BETWEEN '{start_date}' AND '{end_date}'
        GROUP BY DATE(order_date)
        ORDER BY date
        """
        return pd.read_sql(query, get_database_engine())
    ```
    
    **3. Index Optimization**
    ```sql
    -- Create indexes for common queries
    CREATE INDEX idx_orders_date ON orders(order_date);
    CREATE INDEX idx_orders_customer ON orders(customer_id);
    CREATE INDEX idx_orders_status ON orders(status);
    
    -- Composite index for common filter combinations
    CREATE INDEX idx_orders_date_status ON orders(order_date, status);
    ```
    """)

with opt_tabs[3]:
    st.markdown("""
    ### 🔍 Production Monitoring Setup
    
    **1. Application Performance Monitoring (APM)**
    ```python
    import sentry_sdk
    from sentry_sdk.integrations.logging import LoggingIntegration
    
    # Initialize Sentry for error tracking
    sentry_sdk.init(
        dsn="your-sentry-dsn",
        integrations=[LoggingIntegration(level=logging.INFO)],
        traces_sample_rate=0.1,
        environment="production"
    )
    ```
    
    **2. Custom Metrics**
    ```python
    # Track business metrics
    def track_business_metric(metric_name: str, value: float, tags: Dict = None):
        # Send to monitoring service (Datadog, New Relic, etc.)
        logger.info(f"Metric: {metric_name} = {value}", extra={
            'metric_name': metric_name,
            'metric_value': value,
            'tags': tags or {}
        })
    
    # Usage
    track_business_metric('dashboard.load_time', 1.23, {'page': 'revenue'})
    track_business_metric('user.session_duration', 456.7, {'user_type': 'admin'})
    ```
    
    **3. Health Checks**
    ```python
    def health_check_endpoint():
        checks = {
            'database': test_database_connection(),
            'cache': test_cache_connection(),
            'memory': get_memory_usage() < 0.9,
            'disk': get_disk_usage() < 0.9
        }
        
        all_healthy = all(checks.values())
        
        return {
            'status': 'healthy' if all_healthy else 'unhealthy',
            'checks': checks,
            'timestamp': datetime.now().isoformat()
        }
    ```
    """)

with opt_tabs[4]:
    st.markdown("""
    ### 🛡️ Security and Compliance
    
    **1. Security Headers**
    ```python
    # Add security headers (if using custom server)
    def add_security_headers():
        headers = {
            'X-Content-Type-Options': 'nosniff',
            'X-Frame-Options': 'DENY',
            'X-XSS-Protection': '1; mode=block',
            'Strict-Transport-Security': 'max-age=31536000; includeSubDomains',
            'Content-Security-Policy': "default-src 'self'"
        }
        return headers
    ```
    
    **2. Audit Logging**
    ```python
    def audit_log(action: str, user_id: str, resource: str, details: Dict = None):
        audit_entry = {
            'timestamp': datetime.now().isoformat(),
            'action': action,
            'user_id': user_id,
            'resource': resource,
            'details': details or {},
            'ip_address': get_client_ip(),
            'user_agent': get_user_agent()
        }
        
        # Send to secure logging service
        logger.info("AUDIT", extra=audit_entry)
    
    # Usage
    audit_log('data_export', user_id, 'customer_data', {
        'export_format': 'csv',
        'record_count': 1000
    })
    ```
    
    **3. Data Privacy**
    ```python
    def anonymize_sensitive_data(df: pd.DataFrame, columns: List[str]) -> pd.DataFrame:
        """Remove or anonymize sensitive data for display."""
        anonymized_df = df.copy()
        
        for col in columns:
            if col in anonymized_df.columns:
                # Hash sensitive data
                anonymized_df[col] = anonymized_df[col].apply(
                    lambda x: hashlib.sha256(str(x).encode()).hexdigest()[:8]
                )
        
        return anonymized_df
    ```
    """)

# Performance testing demo
st.markdown("---")
st.subheader("🧪 Performance Testing Demo")

test_col1, test_col2 = st.columns(2)

with test_col1:
    st.markdown("**Load Test Simulation**")
    
    if st.button("🚀 Run Load Test"):
        with st.spinner("Running load test..."):
            # Simulate load test
            for i in range(5):
                start_time = time.time()
                
                # Simulate query
                time.sleep(np.random.uniform(0.1, 0.5))
                
                response_time = time.time() - start_time
                monitor.log_performance_metric('response_times', response_time)
                
                # Simulate cache hit/miss
                cache_hit = np.random.choice([True, False], p=[0.8, 0.2])
                monitor.track_cache_performance(cache_hit)
        
        st.success("✅ Load test completed!")

with test_col2:
    st.markdown("**Memory Usage Test**")
    
    if st.button("💾 Test Memory Usage"):
        with st.spinner("Testing memory usage..."):
            # Create large DataFrame to test memory
            large_df = pd.DataFrame({
                'col1': np.random.randn(100000),
                'col2': np.random.randn(100000),
                'col3': np.random.choice(['A', 'B', 'C'], 100000)
            })
            
            # Optimize DataFrame
            large_df['col3'] = large_df['col3'].astype('category')
            
            memory_usage = large_df.memory_usage(deep=True).sum() / 1024 / 1024
            monitor.log_performance_metric('memory_usage', memory_usage)
            
            # Clean up
            del large_df
            gc.collect()
        
        st.success(f"✅ Memory test completed! Peak usage: {memory_usage:.2f} MB")

# Display monitoring results
if st.session_state.monitoring_data['response_times']:
    st.markdown("---")
    st.subheader("📈 Performance Metrics")
    
    # Response time chart
    response_times = [m['value'] for m in st.session_state.monitoring_data['response_times']]
    timestamps = [m['timestamp'] for m in st.session_state.monitoring_data['response_times']]
    
    metrics_df = pd.DataFrame({
        'timestamp': timestamps,
        'response_time': response_times
    })
    
    import plotly.express as px
    
    fig = px.line(
        metrics_df,
        x='timestamp',
        y='response_time',
        title="Response Time Trend",
        labels={'response_time': 'Response Time (seconds)'}
    )
    
    st.plotly_chart(fig, use_container_width=True)

print("✅ Monitoring and optimization guide complete!")

## 🎯 Key Takeaways

✅ **Production Deployment**: Complete Streamlit Cloud deployment workflow with configuration  
✅ **Environment Management**: Secure secrets handling and environment-specific configurations  
✅ **Performance Optimization**: Caching strategies, memory management, and database optimization  
✅ **Monitoring & Alerting**: Comprehensive application and system monitoring setup  
✅ **CI/CD Pipeline**: Automated testing and deployment with GitHub Actions  

## 🚀 Deployment Checklist

Before deploying to production, ensure you have:

### Pre-Deployment
- [ ] Created `requirements.txt` with pinned versions
- [ ] Configured `.streamlit/config.toml` for production
- [ ] Set up secrets management in Streamlit Cloud
- [ ] Implemented error handling and logging
- [ ] Added performance monitoring
- [ ] Created health check endpoints

### Security
- [ ] Enabled XSRF protection
- [ ] Implemented input validation
- [ ] Set up audit logging
- [ ] Configured secure database connections
- [ ] Added rate limiting (if needed)

### Performance
- [ ] Implemented caching strategies
- [ ] Optimized database queries
- [ ] Added memory management
- [ ] Set up connection pooling
- [ ] Configured auto-scaling (if applicable)

### Monitoring
- [ ] Set up error tracking (Sentry)
- [ ] Configured performance monitoring
- [ ] Added business metrics tracking
- [ ] Set up alerting for critical issues
- [ ] Created monitoring dashboards

---

## 🎓 Week 9 Complete!

Congratulations! You've now mastered:

**Wednesday - Fundamentals:**
- Streamlit setup and architecture
- Interactive widgets and user interfaces
- Data visualization with Plotly integration

**Thursday - Advanced:**
- Multi-page business dashboard architecture
- Advanced database integration with Supabase
- Production deployment and monitoring

## 🔜 Next Steps: Month 4 Capstone Project

You're now ready to build professional business intelligence applications for your capstone project:

- **Project Scope**: 6-week comprehensive BI application
- **Technology Stack**: Streamlit + Supabase + Plotly
- **Deployment**: Production-ready application on Streamlit Cloud
- **Business Focus**: Real-world e-commerce analytics with Olist data

---

## 💼 Final Assignment: Major Group Project

**Create a production-ready business intelligence application:**

### Requirements:
1. **Multi-page Application**: 4-5 pages serving different stakeholder needs
2. **Live Data Integration**: Real-time Supabase connectivity
3. **Professional Design**: Enterprise-grade UI/UX
4. **Full Deployment**: Live application on Streamlit Cloud
5. **Team Presentation**: 15-minute demo to class

### Deliverables:
- Complete Streamlit application source code
- Production deployment with live URL
- Technical documentation
- Business presentation highlighting key insights

### Due: End of Week 10

**This project will serve as your portfolio piece for Month 4 capstone preparation!**

---

*Congratulations on completing Week 9! You're now ready to build enterprise-grade business intelligence applications.* 🎉