<a href="https://colab.research.google.com/github/Srijan-Rai/Appliane-Energy-Prediction/blob/main/ApplianceEnergyPrediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Appliance Energy Prediction**

---
In the modern era of development, energy plays a vital role for various households and industries. In today’s time, there are many places, especially in the developing world where there are outages. Hence having proper knowledge of energy consumption by various household appliances could help us to tackle such problems.

In this project, we will be building a Machine Learning model to predict the energy consumption of various appliances in an apartment.

The data set is at 10 min for about 4.5 months. The house temperature and humidity conditions were monitored with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. Then, the wireless data was averaged for10 minutes periods. The energy data was logged every 10 minutes with m-bus energy meters.

Weather from the nearest airport weather station (Chievres Airport, Belgium) was downloaded from a public data set from Reliable Prognosis (rp5.ru) and merged with the experimental data sets using the date and time column. 

Two random variables have been included in the data set for testing the regression models and to filter out non-predictive attributes
(parameters).

The approach taken to achieve the objective of the project:


*   Understanding the data
*   Data Preprocessing
*   Exploratory Data Analysis
*   Building different machine learning models.
*   Choosing the best model based on the necessary evaluation metrics.






## Problem Statement 
In this project we are provided with a dataset, where we need to predict the Appliance energy consumption for a house based on various features provided in the dataset. In order to achieve this, we need to develop a supervised learning model using regression algorithms. 

In [1]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_theme(style = "darkgrid")
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
%matplotlib inline

## **Reading and understanding the data**

In [2]:
# mounting the drive to obtain the data
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# Reading the dataset
path = "/content/drive/My Drive/Colab Notebooks/Appliance Energy Prediction(Regression Project)/data_application_energy.csv"
df= pd.read_csv(path)

In [4]:
df.columns

Index(['date', 'Appliances', 'lights', 'T1', 'RH_1', 'T2', 'RH_2', 'T3',
       'RH_3', 'T4', 'RH_4', 'T5', 'RH_5', 'T6', 'RH_6', 'T7', 'RH_7', 'T8',
       'RH_8', 'T9', 'RH_9', 'T_out', 'Press_mm_hg', 'RH_out', 'Windspeed',
       'Visibility', 'Tdewpoint', 'rv1', 'rv2'],
      dtype='object')

## **Dataset Columns**
The dataset consists of the following columns:


* **date** - time year-month-day hour:minute:second
* **Appliances** - energy use in Wh (Dependent variable)
* **lights** - energy use of light fixtures in the house in Wh
* **T1** - Temperature in kitchen area, in Celsius
* **RH1** - Humidity in kitchen area, in %
* **T2** - Temperature in living room area, in Celsius
* **RH2** - Humidity in living room area, in %
* **T3** - Temperature in laundry room area
* **RH3** - Humidity in laundry room area, in %
* **T4** - Temperature in office room, in Celsius
* **RH4** - Humidity in office room, in %
* **T5** - Temperature in bathroom, in Celsius
* **RH5** - Humidity in bathroom, in %
* **T6** - Temperature outside the building (north side), in Celsius
* **RH6** - Humidity outside the building (north side), in %
* **T7** - Temperature in ironing room, in Celsius
* **RH7** - Humidity in ironing room, in %
* **T8** - Temperature in teenager room 2, in Celsius
* **RH8** - Humidity in teenager room 2, in %
* **T9** - Temperature in parents room, in Celsius
* **RH9** - Humidity in parents room, in %
* **T_out** - Temperature outside (from Chievres weather station), in Celsius
* **Pressure** - (from Chievres weather station), in mm Hg RHout
* **Humidity** - outside (from Chievres weather station), in %
* **Wind speed** - (from Chievres weather station), in m/s
* **Visibility** - (from Chievres weather station), in km
* **Tdewpoint** - (from Chievres weather station), Â°C
* **rv1** - Random variable 1, nondimensional
* **rv2** - Random variable 2, nondimensional

## **Exploratory Data Analysis**

### **Head and Tail**

In [5]:
# Head of the data frame
df.head()

Unnamed: 0,date,Appliances,lights,T1,RH_1,T2,RH_2,T3,RH_3,T4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
0,2016-01-11 17:00:00,60,30,19.89,47.596667,19.2,44.79,19.79,44.73,19.0,...,17.033333,45.53,6.6,733.5,92.0,7.0,63.0,5.3,13.275433,13.275433
1,2016-01-11 17:10:00,60,30,19.89,46.693333,19.2,44.7225,19.79,44.79,19.0,...,17.066667,45.56,6.483333,733.6,92.0,6.666667,59.166667,5.2,18.606195,18.606195
2,2016-01-11 17:20:00,50,30,19.89,46.3,19.2,44.626667,19.79,44.933333,18.926667,...,17.0,45.5,6.366667,733.7,92.0,6.333333,55.333333,5.1,28.642668,28.642668
3,2016-01-11 17:30:00,50,40,19.89,46.066667,19.2,44.59,19.79,45.0,18.89,...,17.0,45.4,6.25,733.8,92.0,6.0,51.5,5.0,45.410389,45.410389
4,2016-01-11 17:40:00,60,40,19.89,46.333333,19.2,44.53,19.79,45.0,18.89,...,17.0,45.4,6.133333,733.9,92.0,5.666667,47.666667,4.9,10.084097,10.084097


In [6]:
# Tail of the data
df.tail()

Unnamed: 0,date,Appliances,lights,T1,RH_1,T2,RH_2,T3,RH_3,T4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
19730,2016-05-27 17:20:00,100,0,25.566667,46.56,25.89,42.025714,27.2,41.163333,24.7,...,23.2,46.79,22.733333,755.2,55.666667,3.333333,23.666667,13.333333,43.096812,43.096812
19731,2016-05-27 17:30:00,90,0,25.5,46.5,25.754,42.08,27.133333,41.223333,24.7,...,23.2,46.79,22.6,755.2,56.0,3.5,24.5,13.3,49.28294,49.28294
19732,2016-05-27 17:40:00,270,10,25.5,46.596667,25.628571,42.768571,27.05,41.69,24.7,...,23.2,46.79,22.466667,755.2,56.333333,3.666667,25.333333,13.266667,29.199117,29.199117
19733,2016-05-27 17:50:00,420,10,25.5,46.99,25.414,43.036,26.89,41.29,24.7,...,23.2,46.8175,22.333333,755.2,56.666667,3.833333,26.166667,13.233333,6.322784,6.322784
19734,2016-05-27 18:00:00,430,10,25.5,46.6,25.264286,42.971429,26.823333,41.156667,24.7,...,23.2,46.845,22.2,755.2,57.0,4.0,27.0,13.2,34.118851,34.118851


### **Data Exploration**

In [7]:
df.shape

(19735, 29)

We can see that there are 19735 rows and 29 columns in the data provided.


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19735 entries, 0 to 19734
Data columns (total 29 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   date         19735 non-null  object 
 1   Appliances   19735 non-null  int64  
 2   lights       19735 non-null  int64  
 3   T1           19735 non-null  float64
 4   RH_1         19735 non-null  float64
 5   T2           19735 non-null  float64
 6   RH_2         19735 non-null  float64
 7   T3           19735 non-null  float64
 8   RH_3         19735 non-null  float64
 9   T4           19735 non-null  float64
 10  RH_4         19735 non-null  float64
 11  T5           19735 non-null  float64
 12  RH_5         19735 non-null  float64
 13  T6           19735 non-null  float64
 14  RH_6         19735 non-null  float64
 15  T7           19735 non-null  float64
 16  RH_7         19735 non-null  float64
 17  T8           19735 non-null  float64
 18  RH_8         19735 non-null  float64
 19  T9  