# Week 10: Final Project - Comprehensive Seoul Heatwave Analysis System

**Instructor**: Sohn Chul

## Project Overview

In this final project, you will integrate all the skills and knowledge gained throughout the course to build a comprehensive heatwave analysis and prediction system for Seoul. This system will:

1. Process real S-DoT sensor data
2. Calculate KMA heat index values
3. Perform spatial-temporal analysis
4. Identify urban heat islands
5. Build predictive models
6. Create interactive visualizations
7. Generate automated reports
8. Provide real-time monitoring capabilities

## Learning Objectives

By completing this project, you will demonstrate:
- Ability to work with real-world climate data
- Proficiency in applying the KMA heat index formula
- Skills in building end-to-end data science pipelines
- Capability to create production-ready analysis systems
- Understanding of climate change impact assessment

## Part 1: System Architecture and Setup

In [None]:
# Import all required libraries
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import folium
from folium import plugins
import streamlit as st

# Machine Learning
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor, IsolationForest
from sklearn.cluster import KMeans, DBSCAN
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import xgboost as xgb
import tensorflow as tf
from tensorflow import keras

# Time Series
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Utilities
import warnings
import json
import pickle
from datetime import datetime, timedelta
from pathlib import Path
import logging
from typing import Dict, List, Tuple, Optional

warnings.filterwarnings('ignore')

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)

In [None]:
class SeoulHeatwaveAnalysisSystem:
    """
    Comprehensive Seoul Heatwave Analysis System
    Integrates all course components into a production-ready system.
    """
    
    def __init__(self, config_path: Optional[str] = None):
        """
        Initialize the analysis system.
        
        Parameters:
        config_path: Path to configuration file (JSON)
        """
        self.logger = logging.getLogger(self.__class__.__name__)
        self.config = self._load_config(config_path)
        self.data = None
        self.models = {}
        self.results = {}
        
    def _load_config(self, config_path: Optional[str]) -> Dict:
        """Load system configuration."""
        default_config = {
            'data_path': '../data/s-dot/',
            'output_path': '../outputs/',
            'model_path': '../models/',
            'heat_index_threshold': 33,
            'analysis_period': {'start': '2025-04-01', 'end': '2025-08-31'},
            'districts': ['Gangnam', 'Gangdong', 'Gangbuk', 'Gangseo', 'Gwanak',
                         'Gwangjin', 'Guro', 'Geumcheon', 'Nowon', 'Dobong',
                         'Dongdaemun', 'Dongjak', 'Mapo', 'Seodaemun', 'Seocho',
                         'Seongdong', 'Seongbuk', 'Songpa', 'Yangcheon', 'Yeongdeungpo',
                         'Yongsan', 'Eunpyeong', 'Jongno', 'Jung', 'Jungnang']
        }
        
        if config_path and Path(config_path).exists():
            with open(config_path, 'r') as f:
                user_config = json.load(f)
                default_config.update(user_config)
        
        return default_config
    
    @staticmethod
    def calculate_wet_bulb_temperature(Ta: np.ndarray, RH: np.ndarray) -> np.ndarray:
        """
        Calculate wet-bulb temperature using Stull's formula.
        
        Parameters:
        Ta: Air temperature (°C)
        RH: Relative humidity (%)
        
        Returns:
        Tw: Wet-bulb temperature (°C)
        """
        Tw = (Ta * np.arctan(0.151977 * (RH + 8.313659)**0.5) + 
              np.arctan(Ta + RH) - 
              np.arctan(RH - 1.67633) + 
              0.00391838 * RH**1.5 * np.arctan(0.023101 * RH) - 
              4.686035)
        return Tw
    
    @staticmethod
    def calculate_heat_index_kma(Ta: np.ndarray, RH: np.ndarray) -> np.ndarray:
        """
        Calculate heat index using Korea Meteorological Administration (KMA) formula.
        
        Parameters:
        Ta: Air temperature (°C)
        RH: Relative humidity (%)
        
        Returns:
        HI: Heat index (°C)
        """
        Tw = SeoulHeatwaveAnalysisSystem.calculate_wet_bulb_temperature(Ta, RH)
        HI = (-0.2442 + 0.55399 * Tw + 0.45535 * Ta - 
              0.0022 * Tw**2 + 0.00278 * Tw * Ta + 3.0)
        return HI
    
    def load_data(self, data_source: str = 'synthetic') -> pd.DataFrame:
        """
        Load data from specified source.
        
        Parameters:
        data_source: 'synthetic' or 'real' (S-DoT data)
        
        Returns:
        DataFrame with loaded data
        """
        self.logger.info(f"Loading data from {data_source} source")
        
        if data_source == 'synthetic':
            self.data = self._generate_synthetic_data()
        else:
            self.data = self._load_sdot_data()
        
        # Calculate KMA heat index
        self.data['heat_index'] = self.calculate_heat_index_kma(
            self.data['temperature'].values,
            self.data['humidity'].values
        )
        
        self.logger.info(f"Data loaded: {len(self.data)} records")
        return self.data
    
    def _generate_synthetic_data(self) -> pd.DataFrame:
        """Generate synthetic data for demonstration."""
        np.random.seed(42)
        
        date_range = pd.date_range(
            start=self.config['analysis_period']['start'],
            end=self.config['analysis_period']['end'],
            freq='H'
        )
        
        data_list = []
        for district in self.config['districts'][:10]:  # Use 10 districts for demo
            for date in date_range:
                hour = date.hour
                month = date.month
                
                # Realistic temperature patterns
                base_temp = 15 + (month - 4) * 3
                daily_variation = 8 * np.sin((hour - 6) * np.pi / 12) if 6 <= hour <= 18 else -2
                temp = base_temp + daily_variation + np.random.normal(0, 2)
                
                # Humidity patterns
                base_humidity = 70 - (month - 4) * 5
                humidity = base_humidity - daily_variation * 2 + np.random.normal(0, 5)
                humidity = np.clip(humidity, 20, 95)
                
                # Air quality
                pm25 = 20 + np.random.exponential(10)
                pm10 = pm25 * 1.5 + np.random.normal(0, 5)
                
                data_list.append({
                    'timestamp': date,
                    'district': district,
                    'temperature': temp,
                    'humidity': humidity,
                    'pm25': pm25,
                    'pm10': pm10,
                    'lat': 37.5665 + np.random.uniform(-0.1, 0.1),
                    'lon': 126.9780 + np.random.uniform(-0.15, 0.15)
                })
        
        return pd.DataFrame(data_list)
    
    def _load_sdot_data(self) -> pd.DataFrame:
        """Load real S-DoT sensor data."""
        # Implementation for loading real S-DoT data
        # This would read from the actual CSV files in s-dot folder
        pass
    
    def perform_eda(self) -> Dict:
        """Perform exploratory data analysis."""
        self.logger.info("Performing exploratory data analysis")
        
        eda_results = {
            'basic_stats': self.data.describe(),
            'missing_values': self.data.isnull().sum(),
            'heat_index_stats': {
                'mean': self.data['heat_index'].mean(),
                'max': self.data['heat_index'].max(),
                'min': self.data['heat_index'].min(),
                'std': self.data['heat_index'].std(),
                'danger_count': (self.data['heat_index'] > self.config['heat_index_threshold']).sum()
            },
            'district_analysis': self.data.groupby('district')['heat_index'].agg(['mean', 'max', 'min', 'std']),
            'temporal_patterns': self._analyze_temporal_patterns(),
            'correlations': self.data[['temperature', 'humidity', 'heat_index', 'pm25', 'pm10']].corr()
        }
        
        self.results['eda'] = eda_results
        return eda_results
    
    def _analyze_temporal_patterns(self) -> Dict:
        """Analyze temporal patterns in the data."""
        patterns = {
            'hourly': self.data.groupby(self.data['timestamp'].dt.hour)['heat_index'].mean(),
            'daily': self.data.groupby(self.data['timestamp'].dt.date)['heat_index'].mean(),
            'monthly': self.data.groupby(self.data['timestamp'].dt.month)['heat_index'].mean(),
            'weekday': self.data.groupby(self.data['timestamp'].dt.dayofweek)['heat_index'].mean()
        }
        return patterns
    
    def identify_urban_heat_islands(self) -> pd.DataFrame:
        """Identify urban heat island areas."""
        self.logger.info("Identifying urban heat islands")
        
        # Calculate UHI intensity for each district
        district_avg = self.data.groupby('district')['heat_index'].mean()
        overall_avg = self.data['heat_index'].mean()
        
        uhi_intensity = district_avg - overall_avg
        uhi_df = pd.DataFrame({
            'district': uhi_intensity.index,
            'uhi_intensity': uhi_intensity.values,
            'classification': pd.cut(uhi_intensity.values, 
                                    bins=[-np.inf, -1, 0, 1, 2, np.inf],
                                    labels=['Cool Island', 'Neutral', 'Mild UHI', 
                                          'Moderate UHI', 'Severe UHI'])
        })
        
        self.results['uhi'] = uhi_df
        return uhi_df
    
    def build_prediction_models(self) -> Dict:
        """Build multiple prediction models."""
        self.logger.info("Building prediction models")
        
        # Prepare features
        feature_cols = ['temperature', 'humidity', 'pm25', 'pm10']
        self.data['hour'] = self.data['timestamp'].dt.hour
        self.data['month'] = self.data['timestamp'].dt.month
        self.data['day_of_week'] = self.data['timestamp'].dt.dayofweek
        
        # Add lag features
        for lag in [1, 3, 6, 12, 24]:
            self.data[f'heat_index_lag_{lag}'] = self.data.groupby('district')['heat_index'].shift(lag)
        
        # Remove NaN values from lag features
        data_clean = self.data.dropna()
        
        # Feature matrix
        X = data_clean[feature_cols + ['hour', 'month', 'day_of_week'] + 
                      [f'heat_index_lag_{lag}' for lag in [1, 3, 6, 12, 24]]]
        y = data_clean['heat_index']
        
        # Split data
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42
        )
        
        # Scale features
        scaler = StandardScaler()
        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)
        
        # 1. Random Forest
        rf_model = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)
        rf_model.fit(X_train, y_train)
        rf_pred = rf_model.predict(X_test)
        
        # 2. XGBoost
        xgb_model = xgb.XGBRegressor(
            n_estimators=100,
            learning_rate=0.1,
            max_depth=6,
            random_state=42
        )
        xgb_model.fit(X_train, y_train)
        xgb_pred = xgb_model.predict(X_test)
        
        # 3. Neural Network
        nn_model = keras.Sequential([
            keras.layers.Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
            keras.layers.Dropout(0.2),
            keras.layers.Dense(32, activation='relu'),
            keras.layers.Dropout(0.2),
            keras.layers.Dense(16, activation='relu'),
            keras.layers.Dense(1)
        ])
        
        nn_model.compile(optimizer='adam', loss='mse', metrics=['mae'])
        nn_model.fit(
            X_train_scaled, y_train,
            epochs=50,
            batch_size=32,
            validation_split=0.2,
            verbose=0
        )
        nn_pred = nn_model.predict(X_test_scaled).flatten()
        
        # Store models
        self.models = {
            'random_forest': rf_model,
            'xgboost': xgb_model,
            'neural_network': nn_model,
            'scaler': scaler
        }
        
        # Evaluate models
        model_results = {
            'random_forest': {
                'rmse': np.sqrt(mean_squared_error(y_test, rf_pred)),
                'mae': mean_absolute_error(y_test, rf_pred),
                'r2': r2_score(y_test, rf_pred)
            },
            'xgboost': {
                'rmse': np.sqrt(mean_squared_error(y_test, xgb_pred)),
                'mae': mean_absolute_error(y_test, xgb_pred),
                'r2': r2_score(y_test, xgb_pred)
            },
            'neural_network': {
                'rmse': np.sqrt(mean_squared_error(y_test, nn_pred)),
                'mae': mean_absolute_error(y_test, nn_pred),
                'r2': r2_score(y_test, nn_pred)
            }
        }
        
        self.results['models'] = model_results
        return model_results
    
    def detect_anomalies(self) -> pd.DataFrame:
        """Detect anomalous heat events."""
        self.logger.info("Detecting anomalies")
        
        # Prepare data for anomaly detection
        anomaly_features = self.data[['temperature', 'humidity', 'heat_index']].dropna()
        
        # Isolation Forest
        iso_forest = IsolationForest(contamination=0.05, random_state=42)
        anomalies = iso_forest.fit_predict(anomaly_features)
        
        # Add anomaly labels to data
        anomaly_df = self.data.copy()
        anomaly_df.loc[anomaly_features.index, 'anomaly'] = anomalies
        anomaly_df['is_anomaly'] = anomaly_df['anomaly'] == -1
        
        # Analyze anomalies
        anomaly_stats = {
            'total_anomalies': anomaly_df['is_anomaly'].sum(),
            'anomaly_rate': anomaly_df['is_anomaly'].mean(),
            'anomaly_heat_index_mean': anomaly_df[anomaly_df['is_anomaly']]['heat_index'].mean(),
            'normal_heat_index_mean': anomaly_df[~anomaly_df['is_anomaly']]['heat_index'].mean()
        }
        
        self.results['anomalies'] = anomaly_stats
        return anomaly_df[anomaly_df['is_anomaly']]
    
    def generate_report(self, output_format: str = 'html') -> str:
        """Generate comprehensive analysis report."""
        self.logger.info(f"Generating {output_format} report")
        
        if output_format == 'html':
            return self._generate_html_report()
        elif output_format == 'pdf':
            return self._generate_pdf_report()
        else:
            return self._generate_json_report()
    
    def _generate_html_report(self) -> str:
        """Generate HTML report."""
        html_template = """
        <!DOCTYPE html>
        <html>
        <head>
            <title>Seoul Heatwave Analysis Report</title>
            <style>
                body {{ font-family: Arial, sans-serif; margin: 40px; }}
                h1 {{ color: #2c3e50; }}
                h2 {{ color: #34495e; margin-top: 30px; }}
                .metric {{ 
                    display: inline-block; 
                    padding: 15px; 
                    margin: 10px;
                    background: #ecf0f1; 
                    border-radius: 5px;
                }}
                .warning {{ background-color: #e74c3c; color: white; }}
                table {{ border-collapse: collapse; width: 100%; margin: 20px 0; }}
                th, td {{ border: 1px solid #ddd; padding: 12px; text-align: left; }}
                th {{ background-color: #3498db; color: white; }}
            </style>
        </head>
        <body>
            <h1>Seoul Heatwave Analysis Report</h1>
            <p>Generated: {timestamp}</p>
            <p>Analysis Period: {start_date} to {end_date}</p>
            <p>Instructor: Sohn Chul</p>
            
            <h2>Executive Summary</h2>
            <div class="metric">
                <strong>Average Heat Index:</strong> {avg_heat_index:.1f}°C
            </div>
            <div class="metric">
                <strong>Maximum Heat Index:</strong> {max_heat_index:.1f}°C
            </div>
            <div class="metric {danger_class}">
                <strong>Danger Events:</strong> {danger_count}
            </div>
            
            <h2>District Analysis</h2>
            {district_table}
            
            <h2>Model Performance</h2>
            {model_table}
            
            <h2>Urban Heat Islands</h2>
            {uhi_table}
            
            <h2>Recommendations</h2>
            <ul>
                <li>Implement cooling centers in high-risk districts</li>
                <li>Increase green space in severe UHI areas</li>
                <li>Deploy early warning systems based on predictive models</li>
                <li>Focus resources on vulnerable populations during peak hours</li>
            </ul>
            
            <footer>
                <p style="margin-top: 50px; font-size: 0.9em; color: #7f8c8d;">
                    Report generated using KMA Heat Index Formula | Seoul Heatwave Analysis System
                </p>
            </footer>
        </body>
        </html>
        """
        
        # Prepare data for the report
        heat_stats = self.results.get('eda', {}).get('heat_index_stats', {})
        
        # Format tables
        district_table = self.results.get('eda', {}).get('district_analysis', pd.DataFrame()).to_html()
        
        model_df = pd.DataFrame(self.results.get('models', {}))
        model_table = model_df.T.to_html()
        
        uhi_table = self.results.get('uhi', pd.DataFrame()).to_html(index=False)
        
        # Fill template
        report = html_template.format(
            timestamp=datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            start_date=self.config['analysis_period']['start'],
            end_date=self.config['analysis_period']['end'],
            avg_heat_index=heat_stats.get('mean', 0),
            max_heat_index=heat_stats.get('max', 0),
            danger_count=heat_stats.get('danger_count', 0),
            danger_class='warning' if heat_stats.get('danger_count', 0) > 100 else '',
            district_table=district_table,
            model_table=model_table,
            uhi_table=uhi_table
        )
        
        # Save report
        output_path = Path(self.config['output_path']) / f"report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.html"
        with open(output_path, 'w') as f:
            f.write(report)
        
        self.logger.info(f"Report saved to {output_path}")
        return str(output_path)
    
    def _generate_pdf_report(self) -> str:
        """Generate PDF report (placeholder)."""
        # Implementation would use libraries like reportlab or weasyprint
        pass
    
    def _generate_json_report(self) -> str:
        """Generate JSON report."""
        report_data = {
            'metadata': {
                'generated': datetime.now().isoformat(),
                'analysis_period': self.config['analysis_period'],
                'instructor': 'Sohn Chul'
            },
            'results': self.results
        }
        
        output_path = Path(self.config['output_path']) / f"report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
        with open(output_path, 'w') as f:
            json.dump(report_data, f, indent=2, default=str)
        
        return str(output_path)
    
    def save_models(self) -> None:
        """Save trained models to disk."""
        self.logger.info("Saving models")
        
        model_path = Path(self.config['model_path'])
        model_path.mkdir(exist_ok=True)
        
        # Save scikit-learn models
        for name in ['random_forest', 'xgboost', 'scaler']:
            if name in self.models:
                with open(model_path / f"{name}.pkl", 'wb') as f:
                    pickle.dump(self.models[name], f)
        
        # Save neural network
        if 'neural_network' in self.models:
            self.models['neural_network'].save(model_path / 'neural_network.h5')
    
    def load_models(self) -> None:
        """Load saved models from disk."""
        self.logger.info("Loading models")
        
        model_path = Path(self.config['model_path'])
        
        # Load scikit-learn models
        for name in ['random_forest', 'xgboost', 'scaler']:
            pkl_path = model_path / f"{name}.pkl"
            if pkl_path.exists():
                with open(pkl_path, 'rb') as f:
                    self.models[name] = pickle.load(f)
        
        # Load neural network
        nn_path = model_path / 'neural_network.h5'
        if nn_path.exists():
            self.models['neural_network'] = keras.models.load_model(nn_path)
    
    def run_full_analysis(self) -> Dict:
        """Run complete analysis pipeline."""
        self.logger.info("Starting full analysis pipeline")
        
        # 1. Load data
        self.load_data('synthetic')
        
        # 2. Exploratory Data Analysis
        self.perform_eda()
        
        # 3. Identify Urban Heat Islands
        self.identify_urban_heat_islands()
        
        # 4. Build prediction models
        self.build_prediction_models()
        
        # 5. Detect anomalies
        self.detect_anomalies()
        
        # 6. Save models
        self.save_models()
        
        # 7. Generate report
        report_path = self.generate_report('html')
        
        self.logger.info("Analysis pipeline completed successfully")
        
        return {
            'status': 'success',
            'report_path': report_path,
            'results_summary': {
                'records_analyzed': len(self.data),
                'districts_covered': self.data['district'].nunique(),
                'danger_events': self.results['eda']['heat_index_stats']['danger_count'],
                'best_model': min(self.results['models'].items(), 
                                 key=lambda x: x[1]['rmse'])[0]
            }
        }

## Part 2: System Deployment and Testing

In [None]:
# Initialize and run the analysis system
system = SeoulHeatwaveAnalysisSystem()

# Run full analysis
results = system.run_full_analysis()

print("\n" + "="*50)
print("ANALYSIS COMPLETE")
print("="*50)
print(f"Report saved to: {results['report_path']}")
print(f"\nResults Summary:")
for key, value in results['results_summary'].items():
    print(f"  {key}: {value}")

## Part 3: Interactive Dashboard Creation

In [None]:
# Create visualization dashboard
def create_comprehensive_dashboard(system):
    """Create comprehensive visualization dashboard."""
    
    # Create subplots
    fig = make_subplots(
        rows=3, cols=2,
        subplot_titles=(
            'Heat Index Time Series',
            'District Comparison',
            'Hourly Patterns',
            'Model Performance',
            'Urban Heat Islands',
            'Correlation Matrix'
        ),
        specs=[
            [{'type': 'scatter'}, {'type': 'bar'}],
            [{'type': 'scatter'}, {'type': 'bar'}],
            [{'type': 'bar'}, {'type': 'heatmap'}]
        ],
        vertical_spacing=0.1,
        horizontal_spacing=0.15
    )
    
    # 1. Time series
    daily_avg = system.data.groupby(system.data['timestamp'].dt.date)['heat_index'].mean()
    fig.add_trace(
        go.Scatter(x=daily_avg.index, y=daily_avg.values, 
                  mode='lines', name='Daily Avg Heat Index'),
        row=1, col=1
    )
    
    # 2. District comparison
    district_stats = system.results['eda']['district_analysis']
    fig.add_trace(
        go.Bar(x=district_stats.index, y=district_stats['mean'],
              name='Mean Heat Index'),
        row=1, col=2
    )
    
    # 3. Hourly patterns
    hourly_patterns = system.results['eda']['temporal_patterns']['hourly']
    fig.add_trace(
        go.Scatter(x=hourly_patterns.index, y=hourly_patterns.values,
                  mode='lines+markers', name='Hourly Pattern'),
        row=2, col=1
    )
    
    # 4. Model performance
    model_perf = pd.DataFrame(system.results['models'])
    fig.add_trace(
        go.Bar(x=model_perf.columns, y=model_perf.loc['rmse'],
              name='RMSE'),
        row=2, col=2
    )
    
    # 5. Urban Heat Islands
    uhi_data = system.results['uhi']
    fig.add_trace(
        go.Bar(x=uhi_data['district'], y=uhi_data['uhi_intensity'],
              name='UHI Intensity'),
        row=3, col=1
    )
    
    # 6. Correlation matrix
    corr_matrix = system.results['eda']['correlations']
    fig.add_trace(
        go.Heatmap(z=corr_matrix.values,
                  x=corr_matrix.columns,
                  y=corr_matrix.index,
                  colorscale='RdBu'),
        row=3, col=2
    )
    
    # Update layout
    fig.update_layout(
        title='Seoul Heatwave Analysis Dashboard - KMA Heat Index',
        height=1200,
        showlegend=False
    )
    
    return fig

# Create and display dashboard
dashboard = create_comprehensive_dashboard(system)
dashboard.show()

## Part 4: Real-time Monitoring System

In [None]:
class RealTimeMonitor:
    """
    Real-time monitoring system for Seoul heatwave conditions.
    """
    
    def __init__(self, system: SeoulHeatwaveAnalysisSystem):
        self.system = system
        self.alerts = []
        
    def check_current_conditions(self, current_data: Dict) -> Dict:
        """
        Check current conditions and generate alerts.
        
        Parameters:
        current_data: Dictionary with current temperature and humidity
        
        Returns:
        Dictionary with analysis results and alerts
        """
        # Calculate KMA heat index
        heat_index = self.system.calculate_heat_index_kma(
            current_data['temperature'],
            current_data['humidity']
        )
        
        # Determine alert level
        if heat_index < 25:
            alert_level = 'Safe'
            color = 'green'
        elif heat_index < 30:
            alert_level = 'Caution'
            color = 'yellow'
        elif heat_index < 33:
            alert_level = 'Warning'
            color = 'orange'
        else:
            alert_level = 'Danger'
            color = 'red'
        
        # Generate recommendations
        recommendations = self._generate_recommendations(heat_index)
        
        # Create alert if necessary
        if heat_index > 33:
            alert = {
                'timestamp': datetime.now(),
                'heat_index': heat_index,
                'alert_level': alert_level,
                'district': current_data.get('district', 'Unknown'),
                'message': f"DANGER: Heat index {heat_index:.1f}°C exceeds safety threshold"
            }
            self.alerts.append(alert)
        
        return {
            'heat_index': heat_index,
            'alert_level': alert_level,
            'color': color,
            'recommendations': recommendations,
            'recent_alerts': self.alerts[-5:]  # Last 5 alerts
        }
    
    def _generate_recommendations(self, heat_index: float) -> List[str]:
        """Generate recommendations based on heat index."""
        recommendations = []
        
        if heat_index < 25:
            recommendations.append("Comfortable conditions for outdoor activities")
        elif heat_index < 30:
            recommendations.append("Stay hydrated during outdoor activities")
            recommendations.append("Take frequent breaks in shade")
        elif heat_index < 33:
            recommendations.append("Limit outdoor activities during peak hours")
            recommendations.append("Drink water frequently")
            recommendations.append("Wear light-colored, loose clothing")
            recommendations.append("Check on elderly neighbors")
        else:
            recommendations.append("AVOID outdoor activities")
            recommendations.append("Stay in air-conditioned spaces")
            recommendations.append("Drink water every 15-20 minutes")
            recommendations.append("Seek immediate medical attention for heat-related symptoms")
            recommendations.append("Check on vulnerable populations immediately")
        
        return recommendations
    
    def predict_next_hours(self, hours: int = 6) -> pd.DataFrame:
        """
        Predict heat index for the next specified hours.
        
        Parameters:
        hours: Number of hours to predict
        
        Returns:
        DataFrame with predictions
        """
        # This would use the trained models from the system
        # For demonstration, we'll create synthetic predictions
        
        current_time = datetime.now()
        predictions = []
        
        for h in range(1, hours + 1):
            future_time = current_time + timedelta(hours=h)
            
            # Simulate prediction (would use actual model)
            base_heat_index = 28 + np.random.normal(0, 3)
            hour_effect = 5 * np.sin((future_time.hour - 6) * np.pi / 12)
            predicted_heat_index = base_heat_index + hour_effect
            
            predictions.append({
                'time': future_time,
                'predicted_heat_index': predicted_heat_index,
                'confidence_lower': predicted_heat_index - 2,
                'confidence_upper': predicted_heat_index + 2
            })
        
        return pd.DataFrame(predictions)

# Test the monitoring system
monitor = RealTimeMonitor(system)

# Simulate current conditions
current_conditions = {
    'temperature': 35,
    'humidity': 70,
    'district': 'Gangnam'
}

# Check conditions
analysis = monitor.check_current_conditions(current_conditions)

print("\n" + "="*50)
print("REAL-TIME MONITORING RESULTS")
print("="*50)
print(f"Current Heat Index: {analysis['heat_index']:.1f}°C")
print(f"Alert Level: {analysis['alert_level']}")
print("\nRecommendations:")
for rec in analysis['recommendations']:
    print(f"  • {rec}")

# Get predictions
predictions = monitor.predict_next_hours(6)
print("\nNext 6 Hours Predictions:")
print(predictions[['time', 'predicted_heat_index']].to_string(index=False))

## Part 5: Project Submission Guidelines

### Deliverables

Your final project submission should include:

1. **Code Repository**
   - Complete Python code for the analysis system
   - Well-documented functions and classes
   - Requirements.txt file with dependencies
   - README.md with setup instructions

2. **Data Analysis**
   - Processed S-DoT sensor data
   - KMA heat index calculations
   - Statistical analysis results
   - Anomaly detection findings

3. **Predictive Models**
   - Trained models (saved as .pkl or .h5 files)
   - Model evaluation metrics
   - Feature importance analysis
   - Cross-validation results

4. **Visualizations**
   - Interactive dashboard (Streamlit or similar)
   - Static analysis plots
   - Geospatial heat maps
   - Time series visualizations

5. **Final Report**
   - Executive summary
   - Methodology description
   - Key findings and insights
   - Policy recommendations
   - Future work suggestions

### Evaluation Criteria

Your project will be evaluated based on:

1. **Technical Implementation (40%)**
   - Correct implementation of KMA heat index formula
   - Code quality and organization
   - Model performance and validation
   - System robustness and error handling

2. **Data Analysis (30%)**
   - Thoroughness of exploratory analysis
   - Insights from spatial-temporal patterns
   - Urban heat island identification
   - Anomaly detection effectiveness

3. **Visualization and Communication (20%)**
   - Quality of visualizations
   - Dashboard usability
   - Report clarity and completeness
   - Presentation of findings

4. **Innovation and Creativity (10%)**
   - Novel approaches or insights
   - Additional features beyond requirements
   - Real-world applicability
   - Scalability considerations

## Conclusion

Congratulations on completing the Seoul Heatwave Analysis course! Through this final project, you have:

✅ Integrated all course components into a comprehensive system
✅ Applied the KMA heat index formula throughout the analysis
✅ Built production-ready data science pipelines
✅ Created predictive models for heat wave forecasting
✅ Developed interactive visualizations and dashboards
✅ Designed a real-time monitoring system

### Key Takeaways

1. **Climate Data Analysis**: You now have the skills to work with complex climate datasets
2. **Heat Index Calculation**: Mastery of the KMA formula and its applications
3. **Machine Learning**: Experience building and evaluating predictive models
4. **Visualization**: Ability to create compelling data stories
5. **System Design**: Understanding of end-to-end data science systems

### Future Directions

Consider extending this project by:
- Integrating real-time S-DoT sensor feeds
- Implementing mobile alert systems
- Adding social vulnerability indices
- Incorporating satellite imagery analysis
- Developing climate change projections

### Resources for Continued Learning

- Korea Meteorological Administration (KMA): http://www.kma.go.kr
- Seoul Open Data Plaza: https://data.seoul.go.kr
- Climate Change Knowledge Portal: https://climateknowledgeportal.worldbank.org
- IPCC Reports: https://www.ipcc.ch

Thank you for your participation in this course. Your work contributes to better understanding and mitigation of urban heat waves in Seoul and beyond.

**Instructor: Sohn Chul**

---

*"The best way to predict the future is to create it."* - Peter Drucker

Good luck with your climate analysis journey!