# SA Tourism Weather: Hourly Feature Engineering

---

**Purpose:** Create tourism-relevant features from hourly weather data for detailed analysis and dashboards.


**Why Hourly?**

- Enables time-of-day insights (morning, afternoon, evening, night)

- Supports activity planning (best beach hours, hiking, etc.)

- Complements daily analysis for advanced dashboards



---

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from datetime import datetime, timedelta

plt.style.use('seaborn-v0_8-darkgrid')
pd.set_option('display.max_columns', None)
print('✅ Libraries imported successfully!')

In [None]:
# Load hourly data
data_dir = Path('../data/processed')
hourly = pd.read_parquet(data_dir / 'hourly' / 'all_locations_hourly.parquet')
print(f'✅ Loaded {len(hourly):,} hourly records')
print(f'Date range: {hourly[date].min()} to {hourly[date].max()}')
print(f'Locations: {hourly[location_code].nunique()}')

In [None]:
# Create a working copy
df = hourly.copy()
df['date'] = pd.to_datetime(df['date'])
df['hour'] = df['date'].dt.hour
print(f'Working with {len(df):,} records')
df.head()

## Time-of-Day Features

Extract hour, part of day, and activity windows.

In [None]:
# Hour of day
df['hour'] = df['date'].dt.hour
# Part of day
def get_part_of_day(hour):
    if 5 <= hour < 12: return 'Morning'
    elif 12 <= hour < 17: return 'Afternoon'
    elif 17 <= hour < 21: return 'Evening'
    else: return 'Night'
df['part_of_day'] = df['hour'].apply(get_part_of_day)
print(df[['date', 'hour', 'part_of_day']].head(10))

## Weather Comfort Features (Hourly)

Create comfort indicators for each hour.

In [None]:
# Temperature comfort
def categorize_temperature(temp):
    if temp < 10: return 'Cold'
    elif temp < 18: return 'Cool'
    elif temp < 28: return 'Comfortable'
    elif temp < 35: return 'Hot'
    else: return 'Very Hot'
df['temp_category'] = df['temperature_2m'].apply(categorize_temperature)
df['is_comfortable_temp'] = df['temperature_2m'].between(18, 28)
print(df['temp_category'].value_counts())

In [None]:
# Rain indicator
df['is_rainy'] = df['precipitation'] > 0.5
df['rain_intensity'] = pd.cut(df['precipitation'], bins=[-0.1, 0, 2, 10, 20, np.inf], labels=['No Rain', 'Light', 'Moderate', 'Heavy', 'Very Heavy'])
print(df['rain_intensity'].value_counts())

In [None]:
# Wind indicator
df['is_windy'] = df['wind_speed_10m'] > 30
print(df['is_windy'].sum())

## Activity-Specific Hourly Features

Indicators for best hours for beach, hiking, wine tasting, etc.

In [None]:
# Example: Perfect Beach Hour
df['perfect_beach_hour'] = (
    (df['temperature_2m'] > 22) &
    (df['temperature_2m'] < 32) &
    (df['precipitation'] < 0.5) &
    (df['wind_speed_10m'] < 25) &
    (df['part_of_day'] == 'Afternoon')
)
print(df['perfect_beach_hour'].sum())

## Save Engineered Hourly Dataset

In [None]:
output_path = Path('../data/processed/hourly/hourly_with_features.parquet')
df.to_parquet(output_path, index=False)
print(f'✅ Engineered hourly dataset saved! Rows: {len(df):,}')

## 6. Next Steps

**You now have tourism-ready hourly features!**

- Use for time-of-day dashboards in Power BI
- Compare hourly vs daily patterns
- Add more activity indicators as needed

---

**Tip:** Combine with daily features for multi-level insights.