#### The aim of this project is to predict the energy use of lights and appliances in a low-energy building using the UCI data set collected in 2017. This dataset is provided to us by tutoring team of Computational Machine Learning, Master of AI at RMIT University. The description of the dataset was also found within the same resources provided to us.

In [2]:
# Importing the necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Retrieving the dataset

In [5]:
""" As per the scope of this of this project, the provided dataset is enough and we do not require any other
    to conduct and express our analysis of this problem. """

df = pd.read_csv('UCI_data.csv')

#### The description of the dataset as provided with the resources of the project.

date time year-month-day hour:minute:second <br>
T1 -  Temperature in kitchen area, in Celsius <br>
RH_1 -  Humidity in kitchen area, in % <br>
T2 - Temperature in living room area, in Celsius <br> 
RH_2 - Humidity in living room area, in % <br>
T3 -  Temperature in laundry room area <br>
RH_3 -  Humidity in laundry room area, in % <br>
T4 -  Temperature in office room, in Celsius <br>
RH_4 -  Humidity in office room, in % <br>
T5 - Temperature in bathroom, in Celsius <br>
RH_5 - Humidity in bathroom, in % <br>
T6 - Temperature outside the building (north side), in Celsius <br>
RH_6 - Humidity outside the building (north side), in % <br>
T7 - Temperature in ironing room , in Celsius <br>
RH_7 - Humidity in ironing room, in % <br>
T8 - Temperature in teenager room 2, in Celsius <br>
RH_8 - Humidity in teenager room 2, in % <br>
T9 - Temperature in parents room, in Celsius <br>
RH_9 - Humidity in parents room, in % <br>
To - Temperature outside (from Chievres weather station), in Celsius <br>
Pressure - (from Chievres weather station), in mm Hg <br>
RH_out - Humidity outside (from Chievres weather station), in % <br>
Wind speed - (from Chievres weather station), in m/s <br>
Visibility - (from Chievres weather station), in km <br>
Tdewpoint - (from Chievres weather station), Â°C <br>
rv1 - Random variable 1, nondimensional <br>
rv2 - Random variable 2, nondimensional <br>
<br>
------------------------------------------------------------------------------ <br>
TARGET_Energy - energy use of Appliances and light fixtures in the house in Wh

### Data Preparation

In [18]:
# Printing the name and the number of columns

print("Number of columns:- {} columns".format(len(df.columns)))
print("Column Names:- ",", ".join(list(df.columns)))

Number of columns:- 28 columns
Column Names:-  date, T1, RH_1, T2, RH_2, T3, RH_3, T4, RH_4, T5, RH_5, T6, RH_6, T7, RH_7, T8, RH_8, T9, RH_9, T_out, Press_mm_hg, RH_out, Windspeed, Visibility, Tdewpoint, rv1, rv2, TARGET_energy


In [19]:
# Printing some rows to get a basic idea of the data

df.head()

Unnamed: 0,date,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,T5,...,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2,TARGET_energy
0,2016-04-19 20:30:00,22.2,39.5,20.566667,37.656667,22.23,37.03,22.318571,36.61,20.633333,...,33.9,9.7,766.1,65.5,3.5,40.0,3.35,24.061869,24.061869,60
1,2016-03-05 04:40:00,20.356667,37.126667,17.566667,40.23,20.89,37.663333,18.7,36.26,18.463333,...,41.09,0.3,740.333333,99.0,1.0,41.333333,0.1,4.622052,4.622052,50
2,2016-03-14 12:40:00,20.926667,38.79,21.1,35.526667,21.6,36.29,21.0,34.826667,18.1,...,38.76,4.4,768.466667,72.0,6.0,22.666667,-0.266667,5.635898,5.635898,80
3,2016-01-22 15:30:00,18.29,38.9,17.29,39.26,18.39,39.326667,16.1,38.79,16.1,...,39.2,3.35,760.6,82.0,5.5,41.0,0.5,49.216445,49.216445,40
4,2016-02-10 00:40:00,22.29,42.333333,21.6,40.433333,22.666667,43.363333,19.1,40.9,19.29,...,43.73,3.2,738.9,88.0,7.333333,56.0,1.4,47.617579,47.617579,60


From the data printed above, we may say that the dataset represents a time-series with the column 'date' holding the date and time. Most of the columns seem to hold float values and the target columns represents values in the form of integers with no decimal values. However, the range of some columns appear to be different in accordance with the others and hence scaling may be required.

In [21]:
# Checking the size of the dataset.

df_len = len(df)
df_len

19735

Therefore, the dataset is of shape (19735,28).

In [22]:
# Extracting out some basic information from the dataset.

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19735 entries, 0 to 19734
Data columns (total 28 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   date           19735 non-null  object 
 1   T1             19735 non-null  float64
 2   RH_1           19735 non-null  float64
 3   T2             19735 non-null  float64
 4   RH_2           19735 non-null  float64
 5   T3             19735 non-null  float64
 6   RH_3           19735 non-null  float64
 7   T4             19735 non-null  float64
 8   RH_4           19735 non-null  float64
 9   T5             19735 non-null  float64
 10  RH_5           19735 non-null  float64
 11  T6             19735 non-null  float64
 12  RH_6           19735 non-null  float64
 13  T7             19735 non-null  float64
 14  RH_7           19735 non-null  float64
 15  T8             19735 non-null  float64
 16  RH_8           19735 non-null  float64
 17  T9             19735 non-null  float64
 18  RH_9  

In [24]:
# Describing the dataset in statistical terms.

df.describe()

Unnamed: 0,T1,RH_1,T2,RH_2,T3,RH_3,T4,RH_4,T5,RH_5,...,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2,TARGET_energy
count,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,...,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0,19735.0
mean,21.686571,40.259739,20.341219,40.42042,22.267611,39.2425,20.855335,39.026904,19.592106,50.949283,...,41.552401,7.411665,755.522602,79.750418,4.039752,38.330834,3.760707,24.988033,24.988033,101.496833
std,1.606066,3.979299,2.192974,4.069813,2.006111,3.254576,2.042884,4.341321,1.844623,9.022034,...,4.151497,5.317409,7.399441,14.901088,2.451221,11.794719,4.194648,14.496634,14.496634,104.380829
min,16.79,27.023333,16.1,20.463333,17.2,28.766667,15.1,27.66,15.33,29.815,...,29.166667,-5.0,729.3,24.0,0.0,1.0,-6.6,0.005322,0.005322,10.0
25%,20.76,37.333333,18.79,37.9,20.79,36.9,19.53,35.53,18.2775,45.4,...,38.5,3.666667,750.933333,70.333333,2.0,29.0,0.9,12.497889,12.497889,50.0
50%,21.6,39.656667,20.0,40.5,22.1,38.53,20.666667,38.4,19.39,49.09,...,40.9,6.916667,756.1,83.666667,3.666667,40.0,3.433333,24.897653,24.897653,60.0
75%,22.6,43.066667,21.5,43.26,23.29,41.76,22.1,42.156667,20.619643,53.663333,...,44.338095,10.408333,760.933333,91.666667,5.5,40.0,6.566667,37.583769,37.583769,100.0
max,26.26,63.36,29.856667,56.026667,29.236,50.163333,26.2,51.09,25.795,96.321667,...,53.326667,26.1,772.3,100.0,14.0,66.0,15.5,49.99653,49.99653,1110.0
