<p>
    <img src="https://s3.amazonaws.com/iotanalytics-templates/Logo.png" style="float:left;">
    <h1 style="color:#1A5276;padding-left:115px;padding-bottom:0px;font-size:28px;">AWS IoT Analytics | Building Energy Consumption Prediction</h1>
</p>
<p style="color:#1A5276;padding-left:90px;padding-top:0px;position:relative;font-style:italic;font-size:18px">
Applying model trained from the training job to automatically predict the building energy consumption and Output Dataset to IoT Analytics Dataset.   
</p>

## Set-up: Import Required Notebook Libraries

In [1]:
#This notebook uses holidays package

try:
    import holidays
    import lightgbm as lgb
except:
    !pip install holidays
    import holidays
    !pip install lightgbm
    import lightgbm as lgb

In [2]:
import pandas as pd
import numpy as np
import boto3
import os
import sys
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import KFold
import datetime
import gc
from sklearn.metrics import mean_squared_error

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import FeatureUnion, Pipeline

In [3]:
import warnings

warnings.filterwarnings("ignore", message="numpy.dtype size changed")
warnings.filterwarnings("ignore", message="numpy.ufunc size changed")
warnings.simplefilter(action='ignore', category=FutureWarning)

<h1 style="color:#20B3CD;font-size:20px;float:left">Step 1  |  Load Meter-reading and weather Data from IoTAnalytics</h1> <div style="float:right;height:7px;background-color:#20B3CD;margin-top:30px;width:70%"></div>

In [4]:
# Before actually loading the data we need to set up an IoT Analytics client for accessing datasets.
# create IoT Analytics client
client = boto3.client('iotanalytics')

dataset = "jh_iot_analytics_data_set"
dataset_url = client.get_dataset_content(datasetName = dataset)['entries'][0]['dataURI']

Now we can get the data location (URL) for the given dataset and start working with the data (In order to need to perform get_dataset_content, you need to grant iot analytics corresponding IAM permission):

In [5]:
# start working with the data
test_df = pd.read_csv(dataset_url,parse_dates=True)
if test_df.empty:
    raise Exception('No data found')
    
# start working with the data
drop_col = ['meter_reading','__dt']
test_df.drop(drop_col, axis=1, inplace=True) # removes unnecessary columns
test_df['timestamp'] = pd.to_datetime(test_df['timestamp'] / 1000., unit='s')

In [6]:
weather_dataset = "iot_analytics_data_set"
weather_dataset_url = client.get_dataset_content(datasetName = weather_dataset)['entries'][0]['dataURI']

# start working with the data
weather_test_df = pd.read_csv(weather_dataset_url,parse_dates=True)
if weather_test_df.empty:
    raise Exception('No data found')
    
# start working with the data
drop_col = ['__dt']
weather_test_df.drop(drop_col, axis=1, inplace=True) # removes unnecessary columns

<h1 style="color:#20B3CD;font-size:20px;float:left">Step 2  |  Feature Engineering</h1> <div style="float:right;height:7px;background-color:#20B3CD;margin-top:30px;width:70%"></div>

## (1) Weathertransformer
Added missing time-series data by finding start_date-end_date
Then fill in missed data invweather data, temperature, cloud coverage, due_temperature, sea_level, wind_direction, wind_speed, precip_depth


In [7]:
from weathertranformer import WeatherTranformer

In [8]:
weather_test_df = WeatherTranformer(True).fit_transform(weather_test_df)

## (2) Smoothing Filter
Smooth air and dew temperature

In [9]:
weather_test_df.dropna(subset=['air_temperature'],inplace=True)

In [10]:
from SGFilter import SGFilterTranformer

In [11]:
weather_test_df = SGFilterTranformer(True).fit_transform(weather_test_df)

## (3) Rolling Window
Calculate min max std within a time window of 24

In [12]:
from Rollwindow import RollwinTranformer

In [13]:
weather_test_df = RollwinTranformer(True,24).fit_transform(weather_test_df)

## (4) Get Building related metadata from S3 bucket and Merge all datasets together

In [14]:
bucket = "check-ride-data-explore"
file_name = "input/ashrae-energy-prediction/building_metadata.csv"

s3 = boto3.client('s3') 
# 's3' is a key word. create connection to S3 using default config and all buckets within S3

obj = s3.get_object(Bucket= bucket, Key= file_name) 
# get object and file (key) from bucket

building_df = pd.read_csv(obj['Body']) # 'Body' is a key word


In [15]:
test_df = test_df.merge(building_df, left_on='building_id',right_on='building_id',how='left')


In [16]:
weather_test_df['timestamp'] = pd.to_datetime(weather_test_df['timestamp'])
test_df = test_df.merge(weather_test_df,how='left',left_on=['site_id','timestamp'],right_on=['site_id','timestamp'])

## (5) Numerical Features
Feature transform for Numerical Features

In [17]:
from NumericalEng import NumericalTransformer
test_df = NumericalTransformer(True, True, True, True).fit_transform(test_df)

In [18]:
test_time_df = test_df

## (6) Holidays Features
Add one feature to state if that day is public holiday or not

In [19]:
from HolidayFea import HolidayTranformer

In [20]:
test_df = HolidayTranformer(True).fit_transform(test_df)


## (7) One hot Encoding for Primary use
One hot encoding for categorical feature

In [21]:
from LabelEncode import CategoricalTransformer

In [22]:
test_df.dropna(subset=['building_id'],inplace=True)
test_df = CategoricalTransformer().fit_transform(test_df)

<h1 style="color:#20B3CD;font-size:20px;float:left">Step 3  |  Generate Prediction</h1> <div style="float:right;height:7px;background-color:#20B3CD;margin-top:30px;width:70%"></div>


Setup test_features and load model for prediction

In [23]:
import joblib

gbm_pickle = joblib.load('lgb.pkl')

In [24]:
results = gbm_pickle.predict(test_df, num_iteration=gbm_pickle.best_iteration) 

In [25]:
pred_df = pd.DataFrame({'pred':results})
pred_df.loc[pred_df.pred < 0, 'pred'] = 0


In [27]:
test_final_df = pd.concat([test_time_df,pred_df], axis=1, sort=False)



<h1 style="color:#20B3CD;font-size:20px;float:left">Step 4  |  Create Dataset to IoT Dataset</h1> <div style="float:right;height:7px;background-color:#20B3CD;margin-top:30px;width:70%"></div>

In [28]:
from io import StringIO
from datetime import datetime
bucket='check-ride-data-explore'
csv_key='prediction_result.csv'
prefix = 'energy_prediction' + datetime.now().strftime('%Y-%m-%d') + "/"


s3 = boto3.resource('s3')
csv_buffer = StringIO()
test_final_df.to_csv(csv_buffer)
s3.Object(bucket, prefix + csv_key).put(Body = csv_buffer.getvalue())

{'ResponseMetadata': {'RequestId': '6BDBB2CEE92130CA',
  'HostId': 't08zjP6AoG0f1sk7DfHM0sYVT57qg7LSi/WrG5lYtZ27XLYAcovz+KgQFfxvE+xalJ0YHUOk8a0=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 't08zjP6AoG0f1sk7DfHM0sYVT57qg7LSi/WrG5lYtZ27XLYAcovz+KgQFfxvE+xalJ0YHUOk8a0=',
   'x-amz-request-id': '6BDBB2CEE92130CA',
   'date': 'Mon, 31 Aug 2020 05:02:01 GMT',
   'etag': '"cf04317f65d910f98aa864138ba4aa9e"',
   'content-length': '0',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'ETag': '"cf04317f65d910f98aa864138ba4aa9e"'}

<div style="height:60px;"><div style="height:7px;background-color:#20B3CD;width:100%;margin-top:20px;position:relative;"><img src="https://s3.amazonaws.com/iotanalytics-templates/Logo.png" style="height:50px;width:50px;margin-top:-20px;position:absolute;margin-left:42%;"></div></div>