# ⚡ Thailand Energy Demand Forecasting – Exploratory Data Analysis (EDA)

## Project Overview
This notebook performs exploratory data analysis (EDA) on Thailand electricity demand data.
The goal is to understand demand patterns, seasonality, trends, and data quality issues 
in preparation for time series forecasting models.

### Objectives:
- Understand data structure and quality
- Analyze temporal patterns
- Identify trends and seasonality
- Prepare data for feature engineering and modeling

Dataset Source: Thailand Government Open Data  
Domain: Energy  


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams["figure.figsize"] = (12,5)


## Load Dataset

We load the electricity demand dataset and inspect its structure.


In [None]:
df = pd.read_csv("../data/raw/energy_thailand.csv")
df.head()


## Initial Data Inspection

We examine data types, missing values, and general structure.


In [None]:
df.info()
df.describe()


## Datetime Processing

Convert the datetime column to pandas datetime and sort the data chronologically.


In [None]:
df['datetime'] = pd.to_datetime(df['datetime'])
df = df.sort_values('datetime')
df.set_index('datetime', inplace=True)

df.head()


## Construct Total Electricity Demand

We aggregate regional demand values into a single total demand feature.


In [None]:
demand_cols = [
    'north_demand',
    'south_demand',
    'central_demand',
    'northeast_demand',
    'metropolitan_demand'
]

df['total_demand'] = df[demand_cols].sum(axis=1)


## Data Cleaning

Check and handle missing values and duplicates.


In [None]:
df.isnull().sum()


In [None]:
df = df.interpolate(method='time')
df = df.dropna()


In [None]:
df.index.duplicated().sum()


## Overall Electricity Demand Trend

Visualize long-term trends in electricity demand.


In [None]:
df['total_demand'].plot(title="Total Electricity Demand Over Time")
plt.show()


## Hourly Demand Pattern

Analyze average demand by hour of the day.


In [None]:
df['hour'] = df.index.hour
df.groupby('hour')['total_demand'].mean().plot(title="Average Hourly Demand")
plt.show()


## Weekly Demand Pattern

Analyze average demand by day of week.


In [None]:
df['weekday'] = df.index.weekday
df.groupby('weekday')['total_demand'].mean().plot(title="Average Demand by Weekday")
plt.show()


## Monthly Seasonality

Explore seasonal trends across months.


In [None]:
df['month'] = df.index.month
df.groupby('month')['total_demand'].mean().plot(title="Monthly Average Demand")
plt.show()


## Demand Distribution

Examine the distribution of electricity demand values.


In [None]:
sns.histplot(df['total_demand'], bins=50, kde=True)
plt.title("Distribution of Total Demand")
plt.show()


## Resampling

Resample data to daily level for more stable forecasting.


In [None]:
df_daily = df['total_demand'].resample('D').sum()
df_daily.plot(title="Daily Electricity Demand")
plt.show()


## Key Insights from EDA

- Electricity demand shows clear daily and weekly patterns.
- Seasonal variations are visible across months.
- The dataset is suitable for time series forecasting.
- Demand peaks occur during daytime hours.
- Daily resampling reduces noise and improves model stability.

These insights guide feature engineering and model selection.


## Next Steps

- Perform feature engineering (lags, rolling means, calendar features)
- Train baseline forecasting models
- Apply advanced models (ARIMA, XGBoost, LSTM)
- Evaluate performance using MAE, RMSE
- Build API and Streamlit dashboard
