# Weather Influence on Energy Consumption of a Building

- Created by Clayton Miller - clayton@nus.edu.sg - miller.clayton@gmail.com

We learned in the last notebook that the floor area is a good normalizing factor for energy consumption. The question we will look into now is how weather influences energy consumption



In [0]:
import pandas as pd
from google.colab import drive
import os

In [0]:
%matplotlib inline

In [0]:
drive.mount('/content/gdrive')
os.chdir("/content/gdrive/My Drive/EDX Data Science for Construction, Architecture and Engineering/Week 3 - Construction - Pandas Fundamentals/meter_data/")

If you get an error after the last cell, check that the file path you have in your code is the same as the location of where the folder is on your own personal Google Drive.

In [0]:
rawdata = pd.read_csv("UnivClass_Ciara.csv", parse_dates=True, index_col='timestamp')

In [0]:
rawdata.info()

In [0]:
rawdata.plot(figsize=(10,4))

# Load the weather data file - Cleaning Data and Dealing with Missing Data

In this case, we will use weather data files to supplement the analysis and converge two different datasets

First we will have to find the right weather file (can be found in the `meta.csv` file manually). For this building, the weather file is `weather2.csv`

In [0]:
os.chdir("/content/gdrive/My Drive/EDX Data Science for Construction, Architecture and Engineering/Week 3 - Construction - Pandas Fundamentals/weather_data/")
weather_data = pd.read_csv("weather2.csv", index_col='timestamp', parse_dates=True)

In [0]:
weather_data.head()

In [0]:
weather_data.info()

Let's take a look at the data

In [0]:
weather_data["TemperatureC"].plot(figsize=(10,4))

## Finding and removing outliers

Looks like there are quite a few readings in this data set that are very unprobable -- temperature below 10,000 Deg C is physically impossible. 

This is a common scenario with IoT devices and we can filter those outlier and fill the gaps 

In [0]:
weather_hourly = weather_data.resample("H").mean()

In [0]:
weather_hourly_nooutlier = weather_hourly[weather_hourly > -40]

In [0]:
weather_hourly.info()

In [0]:
weather_hourly_nooutlier.info()

In [0]:
weather_hourly_nooutlier["TemperatureC"].plot(figsize=(10,4))

## Filling gaps in data

We can fill the gap left by filtering the outliers by using the `.fillna()` function

In [0]:
weather_hourly_nooutlier_nogaps = weather_hourly_nooutlier.fillna(method='ffill')

In [0]:
weather_hourly_nooutlier_nogaps.info()

# Merge Temperature and Electricity Data - Combining Data Sets

Once again, we need to converge two data sets -- this time we will use both the `.concat()` and the `.merge()` function to show the differences

In [0]:
weather_hourly_nooutlier_nogaps['TemperatureC'].head()

In [0]:
rawdata = rawdata[~rawdata.index.duplicated(keep='first')]

In [0]:
rawdata['UnivClass_Ciara'].head()

## Using `.concat()` to combine data sets

In [0]:
comparison = pd.concat([weather_hourly_nooutlier_nogaps['TemperatureC'], rawdata['UnivClass_Ciara']], axis=1)

In [0]:
comparison.info()

In [0]:
comparison.head()

## Using the `.merge()` function

the `.merge()` function is useful in converging data sets that don't fit perfectly together. Merge has several additional attributes that indicate which columns will be merged upon and **how** the merge will occur.

Refer to Pandas cheat sheet for more information: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

In [0]:
comparison_merged = pd.merge(weather_hourly_nooutlier_nogaps['TemperatureC'], rawdata['UnivClass_Ciara'], left_index=True, right_index=True, how='outer')

In [0]:
comparison_merged.info()

In [0]:
comparison.plot(figsize=(20,10), subplots=True)

# Analyze the weather influence on energy consumption

In order to understand how weather and energy are related -- we will use a `scatterplot` to visualize the comparison

In [0]:
comparison.plot(kind='scatter', x='TemperatureC', y='UnivClass_Ciara', figsize=(10,10))

In [0]:
comparison.resample("D").mean().plot(kind='scatter', x='TemperatureC', y='UnivClass_Ciara', figsize=(10,10))

# Advanced Visualizations using Seaborn

It looks there are two linear models with a change point happening in this situation - let's use some more advanced visualization techniques using the Seaborn library to visualize the two regions and draw a regression line between them

https://seaborn.pydata.org/

In [0]:
import seaborn as sns

In [0]:
comparison.info()

In [0]:
comparison[comparison.TemperatureC > 14].info()

In [0]:
def make_color_division(x):
  if x < 14:
    return "Heating"
  else:
    return "Cooling"

In [0]:
comparison = comparison.resample("D").mean()

In [0]:
comparison['heating_vs_cooling'] = comparison.TemperatureC.apply(lambda x: make_color_division(x))

In [0]:
comparison.head()

In [0]:
g = sns.lmplot(x="TemperatureC", y="UnivClass_Ciara", hue="heating_vs_cooling",
               truncate=True, data=comparison)

g.set_axis_labels("Outdoor Air Temperature", "Average Hourly kWH")