# Week 5: Temporal Pattern Analysis
## Time Series Analysis and Trend Detection

**Instructor**: Sohn Chul

---

## 🎯 Learning Objectives

By the end of this session, you will be able to:
1. Analyze temporal patterns in heat index data
2. Identify daily, weekly, and seasonal trends
3. Perform time series decomposition
4. Detect anomalies and extreme events
5. Create temporal visualizations and forecasts

## 1. Setup and Data Loading

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller, acf, pacf
from scipy import stats
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

print("✅ Libraries imported successfully!")

## 2. Generate Time Series Data

In [None]:
# Generate comprehensive time series data for April-August 2025
np.random.seed(42)

# Create date range (10-minute intervals)
start_date = '2025-04-01'
end_date = '2025-08-31'
date_range = pd.date_range(start=start_date, end=end_date, freq='10min')

print(f"📅 Date Range: {start_date} to {end_date}")
print(f"📊 Total observations: {len(date_range):,} (10-minute intervals)")

# Generate realistic temperature patterns
n = len(date_range)
hours = date_range.hour + date_range.minute/60
days = (date_range - date_range[0]).days

# Base temperature with seasonal trend (increasing from April to August)
seasonal_trend = 20 + (days / 30) * 2  # Gradual increase over months

# Daily pattern (cooler at night, warmer during day)
daily_pattern = 5 * np.sin((hours - 6) * np.pi / 12)  # Peak at 2 PM

# Weekly pattern (slightly warmer on weekends)
weekly_pattern = 0.5 * np.sin(date_range.dayofweek * 2 * np.pi / 7)

# Random noise
noise = np.random.normal(0, 2, n)

# Combine all components
temperature = seasonal_trend + daily_pattern + weekly_pattern + noise

# Generate humidity (inversely correlated with temperature)
humidity = 80 - temperature * 0.8 + np.random.normal(0, 5, n)
humidity = np.clip(humidity, 30, 95)  # Realistic bounds

# KMA Heat Index Calculation Functions
def calculate_wet_bulb_temperature(Ta, RH):
    """Calculate wet-bulb temperature using Stull's formula."""
    Tw = (Ta * np.arctan(0.151977 * (RH + 8.313659)**0.5) + 
          np.arctan(Ta + RH) - 
          np.arctan(RH - 1.67633) + 
          0.00391838 * RH**1.5 * np.arctan(0.023101 * RH) - 
          4.686035)
    return Tw

def calculate_heat_index_kma(Ta, RH):
    """Calculate heat index using KMA (Korea Meteorological Administration) formula."""
    # Step 1: Calculate wet-bulb temperature
    Tw = calculate_wet_bulb_temperature(Ta, RH)
    
    # Step 2: Calculate heat index using KMA formula
    HI = (-0.2442 + 
          0.55399 * Tw + 
          0.45535 * Ta - 
          0.0022 * Tw**2 + 
          0.00278 * Tw * Ta + 
          3.0)
    
    return HI

# Calculate KMA heat index
heat_index = calculate_heat_index_kma(temperature, humidity)

# Create DataFrame
df_temporal = pd.DataFrame({
    'datetime': date_range,
    'temperature': temperature,
    'humidity': humidity,
    'heat_index': heat_index
})

# Add time-based features
df_temporal['hour'] = df_temporal['datetime'].dt.hour
df_temporal['day'] = df_temporal['datetime'].dt.day
df_temporal['month'] = df_temporal['datetime'].dt.month
df_temporal['weekday'] = df_temporal['datetime'].dt.dayofweek
df_temporal['is_weekend'] = df_temporal['weekday'].isin([5, 6])

print("\n📊 Data Summary (KMA Heat Index):")
print(df_temporal[['temperature', 'humidity', 'heat_index']].describe())

## 3. Time Series Overview

In [None]:
# Create interactive time series plot
fig = make_subplots(
    rows=3, cols=1,
    subplot_titles=('Temperature', 'Humidity', 'Heat Index'),
    shared_xaxes=True,
    vertical_spacing=0.1
)

# Resample to hourly for better visualization
df_hourly = df_temporal.set_index('datetime').resample('H').mean()

# Temperature
fig.add_trace(
    go.Scatter(x=df_hourly.index, y=df_hourly['temperature'],
               mode='lines', name='Temperature',
               line=dict(color='red', width=1)),
    row=1, col=1
)

# Humidity
fig.add_trace(
    go.Scatter(x=df_hourly.index, y=df_hourly['humidity'],
               mode='lines', name='Humidity',
               line=dict(color='blue', width=1)),
    row=2, col=1
)

# Heat Index
fig.add_trace(
    go.Scatter(x=df_hourly.index, y=df_hourly['heat_index'],
               mode='lines', name='Heat Index',
               line=dict(color='orange', width=1)),
    row=3, col=1
)

fig.update_xaxes(title_text="Date", row=3, col=1)
fig.update_yaxes(title_text="°C", row=1, col=1)
fig.update_yaxes(title_text="%", row=2, col=1)
fig.update_yaxes(title_text="°C", row=3, col=1)

fig.update_layout(
    height=800,
    title_text="Temporal Patterns: April-August 2025",
    showlegend=False
)

fig.show()

## 4. Daily Pattern Analysis

In [None]:
# Analyze daily patterns
hourly_avg = df_temporal.groupby('hour')[['temperature', 'humidity', 'heat_index']].mean()

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Temperature pattern
axes[0].plot(hourly_avg.index, hourly_avg['temperature'], 'r-', linewidth=2)
axes[0].fill_between(hourly_avg.index, hourly_avg['temperature'], alpha=0.3, color='red')
axes[0].set_xlabel('Hour of Day')
axes[0].set_ylabel('Temperature (°C)')
axes[0].set_title('Average Daily Temperature Pattern')
axes[0].grid(True, alpha=0.3)

# Humidity pattern
axes[1].plot(hourly_avg.index, hourly_avg['humidity'], 'b-', linewidth=2)
axes[1].fill_between(hourly_avg.index, hourly_avg['humidity'], alpha=0.3, color='blue')
axes[1].set_xlabel('Hour of Day')
axes[1].set_ylabel('Humidity (%)')
axes[1].set_title('Average Daily Humidity Pattern')
axes[1].grid(True, alpha=0.3)

# Heat Index pattern
axes[2].plot(hourly_avg.index, hourly_avg['heat_index'], 'orange', linewidth=2)
axes[2].fill_between(hourly_avg.index, hourly_avg['heat_index'], alpha=0.3, color='orange')
axes[2].set_xlabel('Hour of Day')
axes[2].set_ylabel('Heat Index (°C)')
axes[2].set_title('Average Daily Heat Index Pattern')
axes[2].grid(True, alpha=0.3)

plt.suptitle('Diurnal Patterns', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

# Identify peak hours
peak_temp_hour = hourly_avg['temperature'].idxmax()
peak_hi_hour = hourly_avg['heat_index'].idxmax()
min_temp_hour = hourly_avg['temperature'].idxmin()

print(f"\n🌡️ Daily Pattern Summary:")
print(f"  Peak temperature hour: {peak_temp_hour:02d}:00 ({hourly_avg.loc[peak_temp_hour, 'temperature']:.1f}°C)")
print(f"  Peak heat index hour: {peak_hi_hour:02d}:00 ({hourly_avg.loc[peak_hi_hour, 'heat_index']:.1f}°C)")
print(f"  Minimum temperature hour: {min_temp_hour:02d}:00 ({hourly_avg.loc[min_temp_hour, 'temperature']:.1f}°C)")
print(f"  Daily temperature range: {hourly_avg['temperature'].max() - hourly_avg['temperature'].min():.1f}°C")

## 5. Weekly Pattern Analysis

In [None]:
# Weekly patterns
weekday_names = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
weekly_avg = df_temporal.groupby('weekday')[['temperature', 'humidity', 'heat_index']].mean()

fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(weekday_names))
width = 0.25

bars1 = ax.bar(x - width, weekly_avg['temperature'], width, label='Temperature', color='red', alpha=0.7)
bars2 = ax.bar(x, weekly_avg['heat_index'], width, label='Heat Index', color='orange', alpha=0.7)
bars3 = ax.bar(x + width, weekly_avg['humidity']/2, width, label='Humidity/2', color='blue', alpha=0.7)

ax.set_xlabel('Day of Week', fontsize=12)
ax.set_ylabel('Value', fontsize=12)
ax.set_title('Weekly Pattern Analysis', fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(weekday_names)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Weekend vs Weekday comparison
weekend_mask = df_temporal['is_weekend']
weekday_hi = df_temporal[~weekend_mask]['heat_index'].mean()
weekend_hi = df_temporal[weekend_mask]['heat_index'].mean()

print(f"\n📅 Weekly Pattern Summary:")
print(f"  Weekday avg heat index: {weekday_hi:.2f}°C")
print(f"  Weekend avg heat index: {weekend_hi:.2f}°C")
print(f"  Weekend-Weekday difference: {weekend_hi - weekday_hi:.2f}°C")

## 6. Monthly Trend Analysis

In [None]:
# Monthly analysis
monthly_stats = df_temporal.groupby('month').agg({
    'temperature': ['mean', 'std', 'min', 'max'],
    'heat_index': ['mean', 'std', 'min', 'max']
}).round(2)

month_names = ['April', 'May', 'June', 'July', 'August']

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Temperature trend
months = list(range(4, 9))
temp_means = [monthly_stats.loc[m, ('temperature', 'mean')] for m in months]
temp_stds = [monthly_stats.loc[m, ('temperature', 'std')] for m in months]

axes[0].errorbar(month_names, temp_means, yerr=temp_stds, 
                 fmt='o-', capsize=5, capthick=2, linewidth=2, 
                 markersize=8, color='red')
axes[0].fill_between(range(len(month_names)), 
                     np.array(temp_means) - np.array(temp_stds),
                     np.array(temp_means) + np.array(temp_stds),
                     alpha=0.2, color='red')
axes[0].set_ylabel('Temperature (°C)', fontsize=12)
axes[0].set_title('Monthly Temperature Trend', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Heat Index trend
hi_means = [monthly_stats.loc[m, ('heat_index', 'mean')] for m in months]
hi_stds = [monthly_stats.loc[m, ('heat_index', 'std')] for m in months]

axes[1].errorbar(month_names, hi_means, yerr=hi_stds, 
                 fmt='o-', capsize=5, capthick=2, linewidth=2, 
                 markersize=8, color='orange')
axes[1].fill_between(range(len(month_names)), 
                     np.array(hi_means) - np.array(hi_stds),
                     np.array(hi_means) + np.array(hi_stds),
                     alpha=0.2, color='orange')
axes[1].set_ylabel('Heat Index (°C)', fontsize=12)
axes[1].set_title('Monthly Heat Index Trend', fontsize=14, fontweight='bold')
axes[1].grid(True, alpha=0.3)

plt.suptitle('Seasonal Progression: April to August 2025', fontsize=16)
plt.tight_layout()
plt.show()

print("\n📊 Monthly Statistics:")
print(monthly_stats)

## 7. Time Series Decomposition

In [None]:
# Perform seasonal decomposition on daily averages
df_daily = df_temporal.set_index('datetime').resample('D')['heat_index'].mean()

# Seasonal decomposition
decomposition = seasonal_decompose(df_daily, model='additive', period=7)  # Weekly period

fig, axes = plt.subplots(4, 1, figsize=(14, 12))

# Original
df_daily.plot(ax=axes[0], color='blue')
axes[0].set_ylabel('Heat Index')
axes[0].set_title('Original Time Series')
axes[0].grid(True, alpha=0.3)

# Trend
decomposition.trend.plot(ax=axes[1], color='red')
axes[1].set_ylabel('Trend')
axes[1].set_title('Trend Component')
axes[1].grid(True, alpha=0.3)

# Seasonal
decomposition.seasonal.plot(ax=axes[2], color='green')
axes[2].set_ylabel('Seasonal')
axes[2].set_title('Seasonal Component (Weekly)')
axes[2].grid(True, alpha=0.3)

# Residual
decomposition.resid.plot(ax=axes[3], color='purple')
axes[3].set_ylabel('Residual')
axes[3].set_title('Residual Component')
axes[3].set_xlabel('Date')
axes[3].grid(True, alpha=0.3)

plt.suptitle('Time Series Decomposition of Daily Heat Index', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

## 8. Extreme Event Detection

In [None]:
# Identify extreme heat events
threshold_extreme = df_temporal['heat_index'].quantile(0.95)  # 95th percentile
threshold_high = df_temporal['heat_index'].quantile(0.90)     # 90th percentile

# Mark extreme events
df_temporal['is_extreme'] = df_temporal['heat_index'] >= threshold_extreme
df_temporal['is_high'] = (df_temporal['heat_index'] >= threshold_high) & (~df_temporal['is_extreme'])

# Count consecutive extreme hours
def count_consecutive_events(series):
    """Count consecutive True values in a boolean series"""
    groups = (series != series.shift()).cumsum()
    return series.groupby(groups).sum()

extreme_events = count_consecutive_events(df_temporal['is_extreme'])
extreme_events = extreme_events[extreme_events > 0]

# Visualize extreme events
fig = go.Figure()

# Base heat index
fig.add_trace(go.Scatter(
    x=df_hourly.index,
    y=df_hourly['heat_index'],
    mode='lines',
    name='Heat Index',
    line=dict(color='lightgray', width=1)
))

# Extreme events
extreme_data = df_temporal[df_temporal['is_extreme']].set_index('datetime').resample('H').mean()
fig.add_trace(go.Scatter(
    x=extreme_data.index,
    y=extreme_data['heat_index'],
    mode='markers',
    name='Extreme Heat',
    marker=dict(color='red', size=8, symbol='diamond')
))

# High events
high_data = df_temporal[df_temporal['is_high']].set_index('datetime').resample('H').mean()
fig.add_trace(go.Scatter(
    x=high_data.index,
    y=high_data['heat_index'],
    mode='markers',
    name='High Heat',
    marker=dict(color='orange', size=6, symbol='circle')
))

# Add threshold lines
fig.add_hline(y=threshold_extreme, line_dash="dash", line_color="red", 
              annotation_text="Extreme Threshold (95th percentile)")
fig.add_hline(y=threshold_high, line_dash="dash", line_color="orange", 
              annotation_text="High Threshold (90th percentile)")

fig.update_layout(
    title="Extreme Heat Event Detection",
    xaxis_title="Date",
    yaxis_title="Heat Index (°C)",
    height=500
)

fig.show()

# Summary statistics
print(f"\n🌡️ Extreme Event Statistics:")
print(f"  Extreme threshold (95th percentile): {threshold_extreme:.2f}°C")
print(f"  High threshold (90th percentile): {threshold_high:.2f}°C")
print(f"  Total extreme hours: {df_temporal['is_extreme'].sum()} ({df_temporal['is_extreme'].sum()/len(df_temporal)*100:.1f}%)")
print(f"  Total high heat hours: {df_temporal['is_high'].sum()} ({df_temporal['is_high'].sum()/len(df_temporal)*100:.1f}%)")
print(f"  Number of extreme events: {len(extreme_events)}")
print(f"  Longest extreme event: {extreme_events.max():.0f} consecutive 10-min periods")

## 9. Autocorrelation Analysis

In [None]:
# Autocorrelation and partial autocorrelation
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# ACF for hourly data
plot_acf(df_hourly['heat_index'].dropna(), lags=48, ax=axes[0, 0])
axes[0, 0].set_title('Autocorrelation Function (Hourly)', fontweight='bold')

# PACF for hourly data
plot_pacf(df_hourly['heat_index'].dropna(), lags=48, ax=axes[0, 1])
axes[0, 1].set_title('Partial Autocorrelation Function (Hourly)', fontweight='bold')

# ACF for daily data
plot_acf(df_daily.dropna(), lags=30, ax=axes[1, 0])
axes[1, 0].set_title('Autocorrelation Function (Daily)', fontweight='bold')

# PACF for daily data
plot_pacf(df_daily.dropna(), lags=30, ax=axes[1, 1])
axes[1, 1].set_title('Partial Autocorrelation Function (Daily)', fontweight='bold')

plt.suptitle('Time Series Correlation Analysis', fontsize=16)
plt.tight_layout()
plt.show()

# Stationarity test
result = adfuller(df_daily.dropna())
print(f"\n📊 Augmented Dickey-Fuller Test:")
print(f"  ADF Statistic: {result[0]:.4f}")
print(f"  p-value: {result[1]:.4f}")
print(f"  Critical Values:")
for key, value in result[4].items():
    print(f"    {key}: {value:.4f}")
    
if result[1] <= 0.05:
    print("  Result: Time series is stationary (reject H0)")
else:
    print("  Result: Time series is non-stationary (fail to reject H0)")

## 10. Temporal Heatmap

In [None]:
# Create temporal heatmap
# Pivot data for heatmap
df_temporal['date'] = df_temporal['datetime'].dt.date
heatmap_data = df_temporal.pivot_table(
    values='heat_index',
    index='hour',
    columns='date',
    aggfunc='mean'
)

# Select subset for better visualization (every 5th day)
heatmap_subset = heatmap_data.iloc[:, ::5]

plt.figure(figsize=(16, 8))
sns.heatmap(heatmap_subset, cmap='RdYlBu_r', cbar_kws={'label': 'Heat Index (°C)'},
            xticklabels=5, yticklabels=2)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Hour of Day', fontsize=12)
plt.title('Temporal Heatmap of Heat Index', fontsize=14, fontweight='bold')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

## 11. Summary and Export

In [None]:
# Generate temporal analysis summary
summary_report = f"""
TEMPORAL ANALYSIS SUMMARY
=========================
Analysis Period: {start_date} to {end_date}
Total Observations: {len(df_temporal):,}

TEMPERATURE STATISTICS:
- Mean: {df_temporal['temperature'].mean():.2f}°C
- Max: {df_temporal['temperature'].max():.2f}°C
- Min: {df_temporal['temperature'].min():.2f}°C
- Std Dev: {df_temporal['temperature'].std():.2f}°C

HEAT INDEX STATISTICS:
- Mean: {df_temporal['heat_index'].mean():.2f}°C
- Max: {df_temporal['heat_index'].max():.2f}°C
- Min: {df_temporal['heat_index'].min():.2f}°C
- 95th Percentile: {threshold_extreme:.2f}°C

TEMPORAL PATTERNS:
- Peak Heat Hour: {peak_hi_hour:02d}:00
- Minimum Heat Hour: {min_temp_hour:02d}:00
- Daily Range: {hourly_avg['heat_index'].max() - hourly_avg['heat_index'].min():.1f}°C
- Weekend Effect: {weekend_hi - weekday_hi:+.2f}°C

EXTREME EVENTS:
- Extreme Heat Hours: {df_temporal['is_extreme'].sum()} ({df_temporal['is_extreme'].sum()/len(df_temporal)*100:.1f}%)
- High Heat Hours: {df_temporal['is_high'].sum()} ({df_temporal['is_high'].sum()/len(df_temporal)*100:.1f}%)
- Number of Events: {len(extreme_events)}
"""

print(summary_report)

# Save processed data
output_file = '../data/processed/temporal_analysis_results.csv'
df_temporal.to_csv(output_file, index=False)
print(f"\n✅ Temporal analysis results saved to {output_file}")

## 12. Assignment

### Week 5 Tasks:

1. **Temporal Pattern Identification** (25 points)
   - Analyze diurnal, weekly, and monthly patterns
   - Identify peak hours and days for heat stress
   - Compare weekday vs weekend patterns

2. **Time Series Decomposition** (25 points)
   - Perform seasonal decomposition
   - Extract trend, seasonal, and residual components
   - Test for stationarity

3. **Extreme Event Analysis** (25 points)
   - Define and identify extreme heat events
   - Calculate duration and frequency of events
   - Analyze temporal clustering of extremes

4. **Visualization** (25 points)
   - Create interactive time series plots
   - Generate temporal heatmaps
   - Design clear summary visualizations

### Bonus Challenge:
- Implement change point detection algorithm
- Perform wavelet analysis for multi-scale patterns
- Create animated visualization of temporal evolution

## Summary

In this week, we covered:
- ✅ Time series data handling and resampling
- ✅ Daily, weekly, and monthly pattern analysis
- ✅ Time series decomposition
- ✅ Extreme event detection
- ✅ Autocorrelation and stationarity testing

### Next Week Preview:
**Week 6: Urban Heat Island Analysis**
- Spatial-temporal integration
- UHI intensity calculation
- Land use impact analysis
- Mitigation strategy assessment

### Resources:
- [Statsmodels Time Series](https://www.statsmodels.org/stable/tsa.html)
- [Plotly Time Series](https://plotly.com/python/time-series/)
- [Time Series Analysis in Python](https://www.machinelearningplus.com/time-series/time-series-analysis-python/)

---
**End of Week 5**

*Instructor: Sohn Chul*