# US Weather + Energy Analysis Pipeline Exploration

This notebook demonstrates how to load, explore, and visualize the processed weather and energy data for the app.

You will:

- Load the processed CSV data
- Inspect the data structure
- Visualize temperature and energy demand trends
- Analyze correlation between temperature and energy usage
- Create a usage patterns heatmap

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Load the processed weather and energy data

In [None]:
# Load the processed CSV file
df = pd.read_csv('../data/processed/weather_energy_data.csv', parse_dates=['date'])
df.head()

## Inspect the data structure

In [None]:
# Show basic info and missing values
df.info()
df.isnull().sum()

## Visualize temperature and energy demand trends

In [None]:
# Plot temperature and energy demand over time for a selected city
city = 'New York'
city_df = df[df['city'] == city]

fig, ax1 = plt.subplots(figsize=(12, 5))
ax1.plot(city_df['date'], city_df['temp_avg_f'], color='tab:red', label='Avg Temp (°F)')
ax1.set_ylabel('Avg Temp (°F)', color='tab:red')
ax2 = ax1.twinx()
ax2.plot(city_df['date'], city_df['energy_demand_gwh'], color='tab:blue', label='Energy Demand (GWh)')
ax2.set_ylabel('Energy Demand (GWh)', color='tab:blue')
plt.title(f'Temperature and Energy Demand Trends in {city}')
plt.show()

## Correlation Analysis: Temperature vs. Energy Usage

In [None]:
# Scatter plot and correlation calculation
corr = city_df[['temp_avg_f', 'energy_demand_gwh']].corr().iloc[0,1]
plt.figure(figsize=(8,5))
sns.scatterplot(x='temp_avg_f', y='energy_demand_gwh', data=city_df)
plt.title(f'Temperature vs. Energy Usage in {city} (corr={corr:.2f})')
plt.xlabel('Avg Temp (°F)')
plt.ylabel('Energy Demand (GWh)')
plt.show()

## Usage Patterns Heatmap

This heatmap shows average energy demand by temperature range and day of week.

In [None]:
# Prepare bins and day of week
bins = [0, 50, 60, 70, 80, 90, np.inf]
labels = ['<50°F', '50-60°F', '60-70°F', '70-80°F', '80-90°F', '>90°F']
city_df['temp_range'] = pd.cut(city_df['temp_avg_f'], bins=bins, labels=labels, right=False)
city_df['day_of_week'] = city_df['date'].dt.day_name()

# Pivot for heatmap
heatmap_data = city_df.groupby(['temp_range', 'day_of_week'])['energy_demand_gwh'].mean().unstack()
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
heatmap_data = heatmap_data.reindex(columns=day_order)

plt.figure(figsize=(10,6))
sns.heatmap(heatmap_data, annot=True, fmt='.0f', cmap='coolwarm')
plt.title(f'Usage Patterns Heatmap for {city}')
plt.ylabel('Temperature Range')
plt.xlabel('Day of Week')
plt.show()

## Notes & Business Rules

- Weather data is converted from Celsius to Fahrenheit in the pipeline.
- Data quality checks are performed before analysis.
- The dashboard warns if the selected date range has less than 20°F temperature variation.
- Heatmap bins are designed to show how energy demand changes with temperature and day of week.