ENERGY CONSUMPTION FORECASTING

The goal of this project is to develop a predictive model that accurately forecasts household electricity consumption using historical data. By analyzing patterns in electricity usage, we aim to identify trends, seasonal variations, and potential anomalies in energy consumption.

Forecasting household energy consumption helps utility providers optimize power distribution, reduces energy waste, and supports sustainable energy management.

Objective:

Predict future household energy usage based on historical consumption data.

Evaluate forecasting models (ARIMA, Prophet, LSTM) for accuracy.



In [None]:
#  Import all libraries


import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.seasonal import seasonal_decompose
from prophet import Prophet

from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings("ignore")


Dataset:

Source: UCI Machine Learning Repository – Individual Household Electric Power Consumption

Time Range: December 2006 – November 2010

Frequency: Minute-level measurements

Key Features:

Global Active Power (kilowatts) – target variable

Global Reactive Power

Voltage

Global Intensity

Sub-metering values (1, 2, 3)

In [None]:
# Load dataset
data = pd.read_csv("household_power_consumption.txt", sep=';', parse_dates={'Datetime': ['Date', 'Time']}, 
                   infer_datetime_format=True, low_memory=False, na_values=['?'])

# Check shape & info
print("Shape:", data.shape)
print(data.info())
data.head()


In [None]:
# Missing values
print(data.isnull().sum())

# Convert columns to numeric
cols = data.columns.drop('Datetime')
data[cols] = data[cols].apply(pd.to_numeric, errors='coerce')

# Fill missing values (forward fill)
data.fillna(method='ffill', inplace=True)

# Plot global active power
plt.figure(figsize=(15,5))
plt.plot(data['Datetime'], data['Global_active_power'])
plt.title("Global Active Power over Time")
plt.xlabel("Time")
plt.ylabel("Global Active Power (kilowatts)")
plt.show()


In [None]:
# Set datetime as index
data.set_index('Datetime', inplace=True)

# Resample to daily mean
daily_data = data['Global_active_power'].resample('D').mean()

# Optional: log transform
daily_data_log = np.log(daily_data)


In [None]:
# Train-test split (80%-20%)
train_size = int(len(daily_data_log) * 0.8)
train = daily_data_log[:train_size]
test = daily_data_log[train_size:]

print("Training set size:", train.shape)
print("Testing set size:", test.shape)
