# Week 4: Spatial Analysis of Heatwave Patterns
## Geographic Distribution and Hotspot Identification

**Instructor**: Sohn Chul

---

## 🎯 Learning Objectives

By the end of this session, you will be able to:
1. Create spatial visualizations of heat index data
2. Identify urban heat islands and hotspots
3. Perform spatial interpolation and analysis
4. Generate interactive maps with Folium
5. Analyze spatial patterns and clustering

## 1. Setup and Data Loading

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import geopandas as gpd
import folium
from folium import plugins
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import spatial
from scipy.interpolate import griddata
from shapely.geometry import Point
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('RdYlBu_r')

print("✅ Libraries imported successfully!")

## 2. Load Sensor Location Data

In [None]:
# Load sensor locations
try:
    # Load actual sensor location file
    sensor_locations = pd.read_excel(
        '../../서울시 도시데이터 센서(S-DoT) 환경정보 설치 위치정보.xlsx',
        engine='openpyxl'
    )
    print("✅ Sensor location data loaded successfully")
    print(f"Total sensors: {len(sensor_locations)}")
except FileNotFoundError:
    print("⚠️ Sensor location file not found. Creating sample data...")
    
    # Create sample sensor locations across Seoul
    np.random.seed(42)
    n_sensors = 100
    
    sensor_locations = pd.DataFrame({
        'serial_number': [f'S{i:04d}' for i in range(n_sensors)],
        'latitude': np.random.uniform(37.45, 37.65, n_sensors),
        'longitude': np.random.uniform(126.85, 127.15, n_sensors),
        'district': np.random.choice(
            ['Gangnam', 'Jongno', 'Mapo', 'Seodaemun', 'Yeongdeungpo', 
             'Dongdaemun', 'Gwanak', 'Nowon', 'Songpa', 'Gangdong'],
            n_sensors
        ),
        'installation_date': pd.date_range('2020-01-01', periods=n_sensors, freq='W')
    })
    print(f"✅ Created {n_sensors} sample sensor locations")

sensor_locations.head()

## 3. Create GeoDataFrame

In [None]:
# Convert to GeoDataFrame
geometry = [Point(xy) for xy in zip(sensor_locations.longitude, sensor_locations.latitude)]
gdf_sensors = gpd.GeoDataFrame(sensor_locations, geometry=geometry, crs='EPSG:4326')

print(f"✅ GeoDataFrame created with {len(gdf_sensors)} sensors")
print(f"CRS: {gdf_sensors.crs}")
print(f"\nBounds:")
print(f"  Min Lat: {gdf_sensors.latitude.min():.4f}")
print(f"  Max Lat: {gdf_sensors.latitude.max():.4f}")
print(f"  Min Lon: {gdf_sensors.longitude.min():.4f}")
print(f"  Max Lon: {gdf_sensors.longitude.max():.4f}")

## 4. Load and Merge Heat Index Data

In [None]:
# Generate sample heat index data for spatial analysis
np.random.seed(42)

# Create heat index values with spatial correlation
# Simulate urban heat island effect (higher temperatures in central areas)
center_lat = 37.55
center_lon = 127.0

distances = np.sqrt(
    (sensor_locations['latitude'] - center_lat)**2 + 
    (sensor_locations['longitude'] - center_lon)**2
)

# Heat index decreases with distance from center (urban heat island)
base_heat_index = 35 - (distances * 50)  # Base temperature gradient
noise = np.random.normal(0, 2, len(sensor_locations))  # Add some randomness
heat_index_values = base_heat_index + noise

# Clip values to realistic range
heat_index_values = np.clip(heat_index_values, 25, 42)

# Add to GeoDataFrame
gdf_sensors['heat_index_avg'] = heat_index_values
gdf_sensors['heat_index_max'] = heat_index_values + np.random.uniform(2, 5, len(gdf_sensors))
gdf_sensors['heat_index_min'] = heat_index_values - np.random.uniform(2, 5, len(gdf_sensors))

print("✅ Heat index data added to sensors")
print(f"\nHeat Index Statistics:")
print(gdf_sensors[['heat_index_avg', 'heat_index_max', 'heat_index_min']].describe())

## 5. Basic Spatial Visualization

In [None]:
# Create static map with heat index values
fig, ax = plt.subplots(1, 1, figsize=(12, 10))

# Plot sensors colored by heat index
scatter = ax.scatter(
    gdf_sensors['longitude'], 
    gdf_sensors['latitude'],
    c=gdf_sensors['heat_index_avg'],
    s=100,
    cmap='RdYlBu_r',
    alpha=0.7,
    edgecolors='black',
    linewidth=0.5
)

# Add colorbar
cbar = plt.colorbar(scatter, ax=ax, label='Average Heat Index (°C)')

# Labels and title
ax.set_xlabel('Longitude', fontsize=12)
ax.set_ylabel('Latitude', fontsize=12)
ax.set_title('Spatial Distribution of Heat Index Across Seoul', fontsize=14, fontweight='bold')

# Add grid
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 6. Interactive Map with Folium

In [None]:
# Create base map centered on Seoul
seoul_center = [37.5665, 126.9780]
m = folium.Map(location=seoul_center, zoom_start=11, tiles='OpenStreetMap')

# Add heat map layer
heat_data = [[row['latitude'], row['longitude'], row['heat_index_avg']] 
             for idx, row in gdf_sensors.iterrows()]

plugins.HeatMap(heat_data, radius=15, blur=10, max_zoom=1).add_to(m)

# Add sensor markers with popup information
for idx, row in gdf_sensors.iterrows():
    # Determine color based on heat index
    if row['heat_index_avg'] >= 40:
        color = 'red'
        icon = 'fire'
    elif row['heat_index_avg'] >= 35:
        color = 'orange'
        icon = 'exclamation-triangle'
    elif row['heat_index_avg'] >= 30:
        color = 'yellow'
        icon = 'sun'
    else:
        color = 'green'
        icon = 'cloud'
    
    popup_text = f"""
    <b>Sensor: {row['serial_number']}</b><br>
    District: {row.get('district', 'Unknown')}<br>
    Avg Heat Index: {row['heat_index_avg']:.1f}°C<br>
    Max Heat Index: {row['heat_index_max']:.1f}°C<br>
    Min Heat Index: {row['heat_index_min']:.1f}°C
    """
    
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=popup_text,
        icon=folium.Icon(color=color, icon=icon, prefix='fa'),
        tooltip=f"Sensor {row['serial_number']}: {row['heat_index_avg']:.1f}°C"
    ).add_to(m)

# Add title
title_html = '''
             <h3 align="center" style="font-size:20px"><b>Seoul Heat Index Distribution Map</b></h3>
             '''
m.get_root().html.add_child(folium.Element(title_html))

# Display map
print("✅ Interactive map created")
m

## 7. Spatial Interpolation

In [None]:
# Create grid for interpolation
grid_lat = np.linspace(gdf_sensors.latitude.min(), gdf_sensors.latitude.max(), 50)
grid_lon = np.linspace(gdf_sensors.longitude.min(), gdf_sensors.longitude.max(), 50)
grid_lon_mesh, grid_lat_mesh = np.meshgrid(grid_lon, grid_lat)

# Prepare points and values
points = gdf_sensors[['longitude', 'latitude']].values
values = gdf_sensors['heat_index_avg'].values

# Perform interpolation using different methods
fig, axes = plt.subplots(2, 2, figsize=(14, 12))

methods = ['nearest', 'linear', 'cubic']
titles = ['Nearest Neighbor', 'Linear Interpolation', 'Cubic Interpolation', 'Original Points']

for idx, (method, title, ax) in enumerate(zip(methods + ['points'], titles, axes.flat)):
    if method != 'points':
        # Interpolate
        grid_values = griddata(points, values, (grid_lon_mesh, grid_lat_mesh), method=method)
        
        # Plot interpolated surface
        im = ax.contourf(grid_lon_mesh, grid_lat_mesh, grid_values, levels=15, cmap='RdYlBu_r')
        ax.contour(grid_lon_mesh, grid_lat_mesh, grid_values, levels=10, colors='black', alpha=0.3, linewidths=0.5)
    else:
        # Plot original points
        im = ax.scatter(gdf_sensors['longitude'], gdf_sensors['latitude'], 
                       c=gdf_sensors['heat_index_avg'], s=50, cmap='RdYlBu_r', edgecolors='black', linewidth=0.5)
    
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')
    ax.set_title(title, fontweight='bold')
    ax.grid(True, alpha=0.3)
    
    # Add colorbar
    plt.colorbar(im, ax=ax, label='Heat Index (°C)')

plt.suptitle('Spatial Interpolation of Heat Index', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

## 8. Hotspot Analysis

In [None]:
# Identify hotspots (areas with high heat index)
threshold_extreme = 40  # Extreme heat
threshold_high = 35     # High heat

# Classify sensors
gdf_sensors['heat_category'] = pd.cut(
    gdf_sensors['heat_index_avg'],
    bins=[0, 30, 35, 40, 50],
    labels=['Normal', 'Moderate', 'High', 'Extreme']
)

# Count by category
category_counts = gdf_sensors['heat_category'].value_counts()

# Visualize hotspot distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Bar chart of categories
colors = ['green', 'yellow', 'orange', 'red']
category_counts.plot(kind='bar', ax=axes[0], color=colors)
axes[0].set_title('Distribution of Heat Categories', fontweight='bold')
axes[0].set_xlabel('Heat Category')
axes[0].set_ylabel('Number of Sensors')
axes[0].grid(True, alpha=0.3)

# Spatial plot of hotspots
for category, color in zip(['Normal', 'Moderate', 'High', 'Extreme'], colors):
    mask = gdf_sensors['heat_category'] == category
    axes[1].scatter(
        gdf_sensors[mask]['longitude'],
        gdf_sensors[mask]['latitude'],
        c=color,
        label=category,
        s=100,
        alpha=0.7,
        edgecolors='black',
        linewidth=0.5
    )

axes[1].set_xlabel('Longitude')
axes[1].set_ylabel('Latitude')
axes[1].set_title('Spatial Distribution of Heat Categories', fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print summary
print("\n📊 Hotspot Analysis Summary:")
print("="*40)
for category in category_counts.index:
    count = category_counts[category]
    percentage = (count / len(gdf_sensors)) * 100
    print(f"{category:10s}: {count:3d} sensors ({percentage:5.1f}%)")

## 9. District-Level Analysis

In [None]:
# Aggregate by district
if 'district' in gdf_sensors.columns:
    district_stats = gdf_sensors.groupby('district').agg({
        'heat_index_avg': ['mean', 'std', 'min', 'max'],
        'serial_number': 'count'
    }).round(2)
    
    district_stats.columns = ['Avg Heat Index', 'Std Dev', 'Min Heat Index', 
                              'Max Heat Index', 'Sensor Count']
    district_stats = district_stats.sort_values('Avg Heat Index', ascending=False)
    
    print("📊 District-Level Heat Index Statistics:")
    print("="*60)
    print(district_stats)
    
    # Visualize district comparison
    fig, ax = plt.subplots(figsize=(12, 6))
    
    x = np.arange(len(district_stats))
    width = 0.35
    
    bars = ax.bar(x, district_stats['Avg Heat Index'], width, 
                  label='Average', color='orange', alpha=0.7)
    
    # Add error bars for standard deviation
    ax.errorbar(x, district_stats['Avg Heat Index'], 
                yerr=district_stats['Std Dev'],
                fmt='none', color='black', capsize=5)
    
    ax.set_xlabel('District', fontsize=12)
    ax.set_ylabel('Heat Index (°C)', fontsize=12)
    ax.set_title('Average Heat Index by District', fontsize=14, fontweight='bold')
    ax.set_xticks(x)
    ax.set_xticklabels(district_stats.index, rotation=45, ha='right')
    ax.grid(True, alpha=0.3)
    
    # Add value labels on bars
    for i, (bar, value) in enumerate(zip(bars, district_stats['Avg Heat Index'])):
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5,
                f'{value:.1f}', ha='center', va='bottom', fontsize=9)
    
    plt.tight_layout()
    plt.show()

## 10. Spatial Clustering Analysis

In [None]:
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler

# Prepare data for clustering
X = gdf_sensors[['longitude', 'latitude', 'heat_index_avg']].values

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply DBSCAN clustering
dbscan = DBSCAN(eps=0.5, min_samples=5)
clusters = dbscan.fit_predict(X_scaled)

# Add cluster labels to GeoDataFrame
gdf_sensors['cluster'] = clusters

# Visualize clusters
fig, ax = plt.subplots(figsize=(12, 10))

# Plot each cluster with different color
unique_clusters = np.unique(clusters)
colors = plt.cm.Set3(np.linspace(0, 1, len(unique_clusters)))

for cluster_id, color in zip(unique_clusters, colors):
    mask = clusters == cluster_id
    if cluster_id == -1:
        # Noise points
        ax.scatter(X[mask, 0], X[mask, 1], c='gray', 
                  label='Noise', s=30, alpha=0.5)
    else:
        ax.scatter(X[mask, 0], X[mask, 1], c=[color], 
                  label=f'Cluster {cluster_id}', s=100, alpha=0.7,
                  edgecolors='black', linewidth=0.5)

ax.set_xlabel('Longitude', fontsize=12)
ax.set_ylabel('Latitude', fontsize=12)
ax.set_title('Spatial Clustering of Heat Index Patterns', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print cluster statistics
print("\n📊 Cluster Statistics:")
print("="*50)
for cluster_id in unique_clusters:
    if cluster_id != -1:
        cluster_data = gdf_sensors[gdf_sensors['cluster'] == cluster_id]
        print(f"\nCluster {cluster_id}:")
        print(f"  Size: {len(cluster_data)} sensors")
        print(f"  Avg Heat Index: {cluster_data['heat_index_avg'].mean():.2f}°C")
        print(f"  Std Heat Index: {cluster_data['heat_index_avg'].std():.2f}°C")

## 11. Save Spatial Analysis Results

In [None]:
# Save processed spatial data
output_file = '../data/processed/spatial_analysis_results.csv'
gdf_sensors.to_csv(output_file, index=False)
print(f"✅ Spatial analysis results saved to {output_file}")

# Save interactive map
map_file = '../outputs/seoul_heat_map.html'
m.save(map_file)
print(f"✅ Interactive map saved to {map_file}")

# Generate summary report
summary = f"""
SPATIAL ANALYSIS SUMMARY
========================
Analysis Date: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M')}

Sensor Coverage:
- Total Sensors: {len(gdf_sensors)}
- Geographic Extent:
  * Latitude: {gdf_sensors.latitude.min():.4f} to {gdf_sensors.latitude.max():.4f}
  * Longitude: {gdf_sensors.longitude.min():.4f} to {gdf_sensors.longitude.max():.4f}

Heat Index Statistics:
- Average: {gdf_sensors['heat_index_avg'].mean():.2f}°C
- Maximum: {gdf_sensors['heat_index_max'].max():.2f}°C
- Minimum: {gdf_sensors['heat_index_min'].min():.2f}°C

Hotspot Analysis:
{category_counts.to_string()}

Clustering Results:
- Number of Clusters: {len(unique_clusters[unique_clusters != -1])}
- Noise Points: {sum(clusters == -1)}
"""

print("\n" + "="*50)
print(summary)

## 12. Assignment

### Week 4 Tasks:

1. **Spatial Visualization** (25 points)
   - Create at least 3 different types of spatial visualizations
   - Include both static and interactive maps
   - Add proper legends and color scales

2. **Interpolation Analysis** (25 points)
   - Compare different interpolation methods
   - Evaluate interpolation accuracy using cross-validation
   - Create continuous heat surface maps

3. **Hotspot Identification** (25 points)
   - Identify and map urban heat islands
   - Perform statistical hotspot analysis
   - Calculate spatial autocorrelation metrics

4. **District Comparison** (25 points)
   - Aggregate data by administrative boundaries
   - Compare heat patterns across districts
   - Identify vulnerable areas

### Bonus Challenge:
- Implement Getis-Ord Gi* statistic for hotspot detection
- Create an animated map showing temporal changes
- Integrate demographic data for vulnerability assessment

## Summary

In this week, we covered:
- ✅ Spatial data handling with GeoPandas
- ✅ Interactive mapping with Folium
- ✅ Spatial interpolation techniques
- ✅ Hotspot and cluster analysis
- ✅ District-level aggregation

### Next Week Preview:
**Week 5: Temporal Pattern Analysis**
- Time series decomposition
- Trend and seasonality analysis
- Diurnal and weekly patterns
- Temporal forecasting

### Resources:
- [GeoPandas Documentation](https://geopandas.org/)
- [Folium Documentation](https://python-visualization.github.io/folium/)
- [Spatial Analysis in Python](https://geographicdata.science/book/intro.html)

---
**End of Week 4**

*Instructor: Sohn Chul*