# Task 03 -  Energy Consumption Time Series Forecasting

## Dataset Introduction 

The Household Power Consumption dataset contains detailed measurements of electric power usage recorded from a single household over a long period of time. 

For this project the dataset is used to analyze historical energy consumption patterns and to forecast short-term household electricity usage. By leveraging the time-based nature of the data, we aim to identify temporal trends, daily cycles, and behavioral patterns in energy usage. The datasetâ€™s high-frequency time measurements make it well-suited for time series forecasting techniques.

## EDA

In this step I have load the household power consumption dataset, clean unnecessary columns, and combine the separate date and time columns into a single datetime index. This conversion is essential for time series analysis and allows us to resample, visualize, and forecast energy consumption over time.

In [1]:
import pandas as pd
import os

In [5]:
print(os.path.exists('01 - raw_household_power_consumption.csv'))

True


In [6]:
df = pd.read_csv('01 - raw_household_power_consumption.csv')

In [7]:
df.head()

Unnamed: 0,index,Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,0,1/1/07,0:00:00,2.58,0.136,241.97,10.6,0,0,0.0
1,1,1/1/07,0:01:00,2.552,0.1,241.75,10.4,0,0,0.0
2,2,1/1/07,0:02:00,2.55,0.1,241.64,10.4,0,0,0.0
3,3,1/1/07,0:03:00,2.55,0.1,241.71,10.4,0,0,0.0
4,4,1/1/07,0:04:00,2.554,0.1,241.98,10.4,0,0,0.0


In [None]:
df.info()

In [None]:
df.drop(columns='index', inplace=True) # drop index column

In [None]:
df['DateTime'] = df['Date'] + ' ' + df['Time']

df['DateTime'] = pd.to_datetime(df['DateTime'], dayfirst=True, errors='coerce')

In [None]:
df.set_index('DateTime', inplace=True) # set the DateTime column as index

In [None]:
df.drop(columns=['Date'], inplace=True) # drop date column

In [None]:
df.drop(columns=['Time'], inplace=True) # drop time column

In [None]:
# converting datatypes
numeric_columns = ['Global_active_power', 'Global_reactive_power', 'Voltage', 'Global_intensity', 'Sub_metering_1', 'Sub_metering_2', 'Sub_metering_3']

for col in numeric_columns:
    df[col] = pd.to_numeric(df[col], errors='coerce')

In [None]:
df.info()

In [None]:
df.isnull().sum() # checking null values

In [None]:
# dropping null values 
df.fillna(0, inplace=True)

In [None]:
df.isnull().sum() # checking null values again

In [None]:
# Check if datetime index is sorted
df.index.is_monotonic_increasing

In [None]:
df.to_csv('cleaned_household_power_consumption.csv') # saving the cleaned dataset