<a href="https://colab.research.google.com/github/AechGit/Energy-Consumption-Project-Springboard/blob/main/Energy_Consumption_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Problem Statement :**
In this project, the goal is to develop a predictive model that can forecast future energy consumption values for a power grid based on historical consumption patterns. The dataset contains time series data of energy consumption recorded at regular intervals

**Dataset Name: Household Power Consumption Data**

Dataset Overview: This dataset contains detailed measurements of electric power consumption in a single household over a period of time. The dataset captures electrical energy usage, including active and reactive power, voltage, current intensity, and energy sub-metering. It is structured in a time series format, where each record represents one minute of power consumption data.

Features:

**Date**: The date when the data was recorded, in the format dd/mm/yyyy.

**Time**: The time when the data was recorded, in the format hh:mm:ss.

**Global Active Power** (kilowatts): The total active power consumed by the household, measured in kilowatts (kW). This is the primary measure of power consumption and represents the rate at which the household consumes electricity.

**Global Reactive Power** (kilowatts): The reactive power consumed by the household, measured in kilowatts (kW). Reactive power is the power that flows back and forth between the source and the load, and it is important for maintaining voltage levels in the system.

**Voltage (volts)**: The household voltage measured during each recording. It indicates the potential difference in the electrical system and is crucial for ensuring safe and efficient power distribution.

**Global Intensity** (amps): The intensity of the electrical current flowing through the household, measured in amperes (A). It reflects the overall current drawn by the household appliances.

**Sub-Metering 1** (watt-hour): Energy consumption in watt-hours measured by sub-metering channel 1. This sub-metering corresponds to specific electrical circuits within the household, such as the kitchen or laundry appliances.

**Sub-Metering 2** (watt-hour): Energy consumption in watt-hours measured by sub-metering channel 2. This channel captures the power usage of additional household circuits, typically the heating and cooling systems.

**Sub-Metering 3** (watt-hour): Energy consumption in watt-hours measured by sub-metering channel 3, typically representing power consumption from the rest of the household circuits, such as lighting or general power outlets.




In [1]:
!gdown --fuzzy https://drive.google.com/file/d/1bvaXJJqNObOCkX-i475BNxpidk024pyx/view?usp=sharing


Downloading...
From (original): https://drive.google.com/uc?id=1bvaXJJqNObOCkX-i475BNxpidk024pyx
From (redirected): https://drive.google.com/uc?id=1bvaXJJqNObOCkX-i475BNxpidk024pyx&confirm=t&uuid=f3f2e305-8cd6-49d1-9ded-c5b048c55507
To: /content/household_power_consumption.txt
100% 133M/133M [00:01<00:00, 99.9MB/s]


In [3]:
import pandas as pd
import numpy as np
df=pd.read_csv('/content/household_power_consumption.txt',sep=";")

  df=pd.read_csv('/content/household_power_consumption.txt',sep=";")


In [12]:
df.head()

Unnamed: 0,Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,16/12/2006,17:24:00,4.216,0.418,234.84,18.4,0.0,1.0,17.0
1,16/12/2006,17:25:00,5.36,0.436,233.63,23.0,0.0,1.0,16.0
2,16/12/2006,17:26:00,5.374,0.498,233.29,23.0,0.0,2.0,17.0
3,16/12/2006,17:27:00,5.388,0.502,233.74,23.0,0.0,1.0,17.0
4,16/12/2006,17:28:00,3.666,0.528,235.68,15.8,0.0,1.0,17.0


In [13]:
df.tail()

Unnamed: 0,Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
2075254,26/11/2010,20:58:00,0.946,0.0,240.43,4.0,0.0,0.0,0.0
2075255,26/11/2010,20:59:00,0.944,0.0,240.0,4.0,0.0,0.0,0.0
2075256,26/11/2010,21:00:00,0.938,0.0,239.82,3.8,0.0,0.0,0.0
2075257,26/11/2010,21:01:00,0.934,0.0,239.7,3.8,0.0,0.0,0.0
2075258,26/11/2010,21:02:00,0.932,0.0,239.55,3.8,0.0,0.0,0.0


In [14]:
df.describe()

Unnamed: 0,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
count,2075259.0,2075259.0,2075259.0,2075259.0,2075259.0,2075259.0,2075259.0
mean,1.07795,0.1221658,237.8249,4.569827,1.107879,1.282265,6.377598
std,1.057642,0.1128556,26.97024,4.446361,6.115669,5.787271,8.414871
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.304,0.046,238.89,1.4,0.0,0.0,0.0
50%,0.578,0.1,240.96,2.6,0.0,0.0,1.0
75%,1.52,0.192,242.86,6.4,0.0,1.0,17.0
max,11.122,1.39,254.15,48.4,88.0,80.0,31.0


In [15]:
len(df)

2075259

In [16]:
df.shape

(2075259, 9)

In [17]:
df.isnull().any()

Unnamed: 0,0
Date,False
Time,False
Global_active_power,False
Global_reactive_power,False
Voltage,False
Global_intensity,False
Sub_metering_1,False
Sub_metering_2,False
Sub_metering_3,False


In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075259 entries, 0 to 2075258
Data columns (total 9 columns):
 #   Column                 Dtype  
---  ------                 -----  
 0   Date                   object 
 1   Time                   object 
 2   Global_active_power    float64
 3   Global_reactive_power  float64
 4   Voltage                float64
 5   Global_intensity       float64
 6   Sub_metering_1         float64
 7   Sub_metering_2         float64
 8   Sub_metering_3         float64
dtypes: float64(7), object(2)
memory usage: 142.5+ MB


In [19]:
df.isnull().sum()

Unnamed: 0,0
Date,0
Time,0
Global_active_power,0
Global_reactive_power,0
Voltage,0
Global_intensity,0
Sub_metering_1,0
Sub_metering_2,0
Sub_metering_3,0


In [20]:
df.columns

Index(['Date', 'Time', 'Global_active_power', 'Global_reactive_power',
       'Voltage', 'Global_intensity', 'Sub_metering_1', 'Sub_metering_2',
       'Sub_metering_3'],
      dtype='object')

In [29]:
df.nunique()


Unnamed: 0,0
Date,1442
Time,1440
Global_active_power,4187
Global_reactive_power,532
Voltage,2838
Global_intensity,222
Sub_metering_1,88
Sub_metering_2,81
Sub_metering_3,32


In [21]:
null_percentage = (df.isnull().sum() / len(df)) * 100 #This calculates the percentage of missing values for each column.
print(null_percentage)

Date                     0.0
Time                     0.0
Global_active_power      0.0
Global_reactive_power    0.0
Voltage                  0.0
Global_intensity         0.0
Sub_metering_1           0.0
Sub_metering_2           0.0
Sub_metering_3           0.0
dtype: float64


In [22]:
df.fillna(0 , inplace =True) #fill zeros

In [23]:
df.isnull()

Unnamed: 0,Date,Time,Global_active_power,Global_reactive_power,Voltage,Global_intensity,Sub_metering_1,Sub_metering_2,Sub_metering_3
0,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...
2075254,False,False,False,False,False,False,False,False,False
2075255,False,False,False,False,False,False,False,False,False
2075256,False,False,False,False,False,False,False,False,False
2075257,False,False,False,False,False,False,False,False,False


In [30]:
conv_cols = ['Global_active_power', 'Global_reactive_power', 'Voltage', 'Global_intensity', 'Sub_metering_1', 'Sub_metering_2']

for col in conv_cols:
    df[col] = pd.to_numeric(df[col], errors='coerce')


In [31]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075259 entries, 0 to 2075258
Data columns (total 9 columns):
 #   Column                 Dtype  
---  ------                 -----  
 0   Date                   object 
 1   Time                   object 
 2   Global_active_power    float64
 3   Global_reactive_power  float64
 4   Voltage                float64
 5   Global_intensity       float64
 6   Sub_metering_1         float64
 7   Sub_metering_2         float64
 8   Sub_metering_3         float64
dtypes: float64(7), object(2)
memory usage: 142.5+ MB


In [32]:
df.describe(include = object)

Unnamed: 0,Date,Time
count,2075259,2075259
unique,1442,1440
top,6/12/2008,17:24:00
freq,1440,1442
