#Mockup - how would the machine learning model be used

This notebook shows an example of how the machine learning model would be used by "customer".

- Initially, we would use it as a "monitoring and warning system" 
  - Read the data coming in from various sensors and display the predicted temperature on a dashboard
  - Two possible warnings:
    - 2nd floor temperature will get too hot in 4 hours
    - sensor/database not communicating (e.g. indicate a sensor if haven't received a valid value for 2-3 samples)​

  - The system should allow for adding sensors/input features as they become available​
    - for example, sensors for humidity, air speed, etc
    - model would have to be recalculated to include the new inputs


- Later on, the dashboard could include a simulation component, allowing different inputs
  - display both the predicted temperature and what it would be if a simulated input is applied
  - used more like a recommendation system​
    - "From the simulation results, best thing to lower the temperature is to do is this"

---

In [28]:
import pandas as pd
import numpy as np
import io

from datetime import date, datetime, timedelta


In [6]:
# in a real dashboard, this is where the connection to the database would be implemented
# for example using sqlalchemy package

# for the mockup, everything is read from the csv file
# consecutive data block file
# (click on the "choose files" button after running the cell)
from google.colab import files
uploaded = files.upload()

Saving consecutive_data_block_220days_1hr_with5am.csv to consecutive_data_block_220days_1hr_with5am.csv


In [29]:
# save raw data to dataframes

data_1hr_df = pd.read_csv(
    io.BytesIO(uploaded['consecutive_data_block_220days_1hr_with5am.csv']), 
    parse_dates = ['DateAndTimeIndex','fact_date','fact_time'],
    index_col = 'DateAndTimeIndex')


In [30]:
data_1hr_df.head()


Unnamed: 0_level_0,fact_date,fact_time,2nd_floor_406,HRV_north_427,HRV_south_466,OAT_north_173,carport_pv_469,solar_wall_146,solar_roof_185,temp_5am_173,year,day_mo,fact_day,fact_month,fact_hour
DateAndTimeIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2019-02-22 15:00:00,2019-02-22,2021-12-16 15:00:00,23.710943,22.711896,22.917343,1.151461,4641.016667,19.063403,7.708834,-15.552165,2019,02-22,22,2,15
2019-02-22 16:00:00,2019-02-22,2021-12-16 16:00:00,23.345989,22.569029,22.768495,1.431155,4329.4,18.638129,7.934233,-15.552165,2019,02-22,22,2,16
2019-02-22 17:00:00,2019-02-22,2021-12-16 17:00:00,23.144977,22.413946,22.608923,-1.538001,714.5,14.824185,7.35675,-15.552165,2019,02-22,22,2,17
2019-02-22 18:00:00,2019-02-22,2021-12-16 18:00:00,22.869214,40.433321,40.791459,-4.367817,-0.05,4.763032,2.198453,-15.552165,2019,02-22,22,2,18
2019-02-22 19:00:00,2019-02-22,2021-12-16 19:00:00,22.659765,41.090683,41.134331,-5.559224,-7.6,0.510342,-1.548033,-15.552165,2019,02-22,22,2,19


In [31]:
data_1hr_df.columns

Index(['fact_date', 'fact_time', '2nd_floor_406', 'HRV_north_427',
       'HRV_south_466', 'OAT_north_173', 'carport_pv_469', 'solar_wall_146',
       'solar_roof_185', 'temp_5am_173', 'year', 'day_mo', 'fact_day',
       'fact_month', 'fact_hour'],
      dtype='object')

---
##Predict the temperature
For the final project, we calculated the machine learning model in mlOS

In this mockup, use the default random forest regressor to show python can be used to integrate the model and dashboard

In [32]:
# create the random forest regressor model, following how we did in mlOS
# 2nd_floor_406 column is the data we want to predict
# model the "4 hour ahead prediction" by creating a new column where 406 data is shifted 
# then the inputs for the current DateAndTimeIndex would be used with the shifted value
model_test = data_1hr_df.copy()
model_test['2nd_floor_406_MVT'] = model_test['2nd_floor_406'].shift(-4)
model_test.head(10)

Unnamed: 0_level_0,fact_date,fact_time,2nd_floor_406,HRV_north_427,HRV_south_466,OAT_north_173,carport_pv_469,solar_wall_146,solar_roof_185,temp_5am_173,year,day_mo,fact_day,fact_month,fact_hour,2nd_floor_406_MVT
DateAndTimeIndex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2019-02-22 15:00:00,2019-02-22,2021-12-16 15:00:00,23.710943,22.711896,22.917343,1.151461,4641.016667,19.063403,7.708834,-15.552165,2019,02-22,22,2,15,22.659765
2019-02-22 16:00:00,2019-02-22,2021-12-16 16:00:00,23.345989,22.569029,22.768495,1.431155,4329.4,18.638129,7.934233,-15.552165,2019,02-22,22,2,16,22.394025
2019-02-22 17:00:00,2019-02-22,2021-12-16 17:00:00,23.144977,22.413946,22.608923,-1.538001,714.5,14.824185,7.35675,-15.552165,2019,02-22,22,2,17,22.075849
2019-02-22 18:00:00,2019-02-22,2021-12-16 18:00:00,22.869214,40.433321,40.791459,-4.367817,-0.05,4.763032,2.198453,-15.552165,2019,02-22,22,2,18,21.987131
2019-02-22 19:00:00,2019-02-22,2021-12-16 19:00:00,22.659765,41.090683,41.134331,-5.559224,-7.6,0.510342,-1.548033,-15.552165,2019,02-22,22,2,19,21.900601
2019-02-22 20:00:00,2019-02-22,2021-12-16 20:00:00,22.394025,38.672799,38.25474,-5.505789,-7.283333,-0.919854,-3.494038,-15.552165,2019,02-22,22,2,20,21.739694
2019-02-22 21:00:00,2019-02-22,2021-12-16 21:00:00,22.075849,30.801815,28.46953,-5.45709,-7.4,-0.237125,-3.587553,-15.552165,2019,02-22,22,2,21,21.600509
2019-02-22 22:00:00,2019-02-22,2021-12-16 22:00:00,21.987131,27.040945,24.976827,-5.352929,-7.6,-0.289764,-3.290348,-15.552165,2019,02-22,22,2,22,21.434555
2019-02-22 23:00:00,2019-02-22,2021-12-16 23:00:00,21.900601,25.285081,23.86081,-5.273453,-7.083333,-0.391399,-2.985559,-15.552165,2019,02-22,22,2,23,21.268868
2019-02-23 00:00:00,2019-02-23,2021-12-16 00:00:00,21.739694,25.0141,23.447419,-6.412402,-6.95,-0.499769,-3.208749,-12.38206,2019,02-23,23,2,0,22.153262


In [33]:
# because of the shift, the last 4 rows of 2nd_floor_406_MVT are NaN since there was no "future" data
# drop the last 4 rows and check that there is no missing data in any of the columns
model_test = model_test.dropna()
model_test.isnull().sum()

fact_date            0
fact_time            0
2nd_floor_406        0
HRV_north_427        0
HRV_south_466        0
OAT_north_173        0
carport_pv_469       0
solar_wall_146       0
solar_roof_185       0
temp_5am_173         0
year                 0
day_mo               0
fact_day             0
fact_month           0
fact_hour            0
2nd_floor_406_MVT    0
dtype: int64

In [34]:
# select columns for input features and target 

# all of the read_values from other sensors is used as input
# additionally, minimum temperature for the day is used to model the starting point for the temparature
# and including the day, month, and hour data allows us to take into the account seasonality 
# (summer/winter and day/night)
input_features=['HRV_north_427','HRV_south_466','OAT_north_173','carport_pv_469','solar_wall_146','solar_roof_185',
                'temp_5am_173',
                'fact_day','fact_month','fact_hour']
# the predicted value is 2nd floor temperature 4 hours ahead
target=['2nd_floor_406_MVT']

# create the input and output variables
X = model_test[input_features].reset_index(drop=True)
y = model_test[target].reset_index(drop=True)

In [35]:
# create the machine learning model and save it
# in production, model creation and testing steps would be separate 
# and we would just import the saved model file as in the next cell

# split the data into train and test
from sklearn.model_selection import train_test_split

# set parameters:
# 80% of the X is used for learning and 20% for testing
# for mockup use the same random state value so that data is always split the same way
test_size= 0.2  
random_state=102202

X_train, X_test, y_train, y_test=train_test_split(X,y, test_size=test_size,random_state=random_state)

# pick ml algorithm
from sklearn.ensemble  import RandomForestRegressor
import pickle

model_rf = RandomForestRegressor(n_estimators=100)

model_rf.fit(X_train, y_train.values.ravel())

print(model_rf)
# save the model file 
model_filename= "regr_model_rf.pkl"
with open(model_filename, 'wb') as outfile:
    pickle.dump(model_rf,outfile)

# test the model by predicting output on the test data

y_pred_rf = model_rf.predict(X_test)

# check out the prediction errors
from sklearn.metrics import mean_absolute_error, mean_squared_error

mae_rf = mean_absolute_error(y_test, y_pred_rf)
print('Random forest regressor MAE: ', mae_rf)
mse_rf = mean_squared_error(y_test, y_pred_rf)
print('Random forest regressor MSE: ', mse_rf)

# check out the feature importance (which input has the most impact on the predicted value)
feature_importance = pd.DataFrame(model_rf.feature_importances_, index = X_train.columns, columns 
                                  = ['importance']).sort_values('importance', ascending = False)
print(feature_importance)

RandomForestRegressor()
Random forest regressor MAE:  0.4084012803053819
Random forest regressor MSE:  0.3344610942834355
                importance
OAT_north_173     0.453253
HRV_north_427     0.179966
HRV_south_466     0.088819
temp_5am_173      0.062509
solar_wall_146    0.054592
fact_day          0.048721
fact_month        0.042914
fact_hour         0.028415
carport_pv_469    0.025616
solar_roof_185    0.015196


In [36]:
# # example of how to read the model file
# model_filename= "regr_model_rf.pkl"
# with open(model_filename, 'rb') as infile:
#   model_rf=pickle.load(infile)

# create the predicted output for the dashboard display
y_mock = model_rf.predict(X)
y_mock_df = pd.DataFrame(data=y_mock,index=model_test.index, columns=['406_predicted_temp'])
y_mock_df = y_mock_df.reset_index()

# take a look at data (picked random date)
y_mock_df[y_mock_df.DateAndTimeIndex >= datetime(2019,5,1,0,0,0)].head(20)

Unnamed: 0,DateAndTimeIndex,406_predicted_temp
1617,2019-05-01 00:00:00,22.077982
1618,2019-05-01 01:00:00,21.727152
1619,2019-05-01 02:00:00,21.579444
1620,2019-05-01 03:00:00,21.550694
1621,2019-05-01 04:00:00,21.633209
1622,2019-05-01 05:00:00,21.872601
1623,2019-05-01 06:00:00,22.338921
1624,2019-05-01 07:00:00,22.634666
1625,2019-05-01 08:00:00,22.816213
1626,2019-05-01 09:00:00,23.119776


---
##Create the mockup dashboard

In the mockup, the inputs sensor data is read from the combined csv file and the displayed output is the predicted temperature from y_mock_df

In [None]:
# google colab doesn't load jupyter_dash by default so we have to load it 
# uncomment the line below the first time the cell is run then comment out for the rest of the session
# !pip install jupyter-dash

In [37]:
# create the dashboard using JupyterDash package
from jupyter_dash import JupyterDash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output

import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import dash_table


# use global variable for timestamp 
# really BAD design but works for the demo because it's one user using it 
# and timestamp is arbitrary (in real life it would always be datetime.now()
# as the call or we would use proper variable handling

# pick either the timestamp of the input file
data_1hr_df.reset_index(inplace=True) # reset index so it becomes a column in dataframe
start_timestamp = min(data_1hr_df.DateAndTimeIndex) + timedelta(hours=4)
# or just an arbitrary start time for the demo (comment out one)
start_timestamp = datetime(2019,5,1,0,0,0)

# load data for the component that displays predicted temperature
def getData():
  global start_timestamp
  # mask = (data_1hr_df.DateAndTimeIndex == start_timestamp+timedelta(hours=4)) 
  mask = (y_mock_df.DateAndTimeIndex == start_timestamp)

  return y_mock_df.loc[mask].round(0).to_dict('records')

# -----------------------------------------------------------------
# Build dashboard components
app = JupyterDash(__name__)

app.layout = html.Div([
    html.H1("Temperature Monitoring Demo MockUp", style=
            {'textAlign': 'center','color': 'black'}),
    html.Div(id='date_time_display', children = []),
    
    # predicted temperature in a cell of a table
    dash_table.DataTable(
          id = 'table',
          data = getData(),
          columns=[{'name': 'Predicted Temperature \a(4 hours ahead):', 'id': '406_predicted_temp'},], # 2nd_floor_406
          style_cell={'font-size': '20px',
                      'textAlign': 'center'
          },
          style_data={'height': '90px','font-size': '48px', 'font-weight':'bold'},
          style_data_conditional=[  # changes the background colour of the cell based on the temp value
              {
                  'if': {
                      'filter_query': '{2nd_floor_406} > 22 && {2nd_floor_406} < 24',
                      'column_id': '2nd_floor_406'
                  },
                  'backgroundColor': 'yellow',
                  'color': 'black'
              },
              {
                  'if': {
                      'filter_query': '{2nd_floor_406} >= 24 && {2nd_floor_406} < 26',
                      'column_id': '2nd_floor_406'
                  },
                  'backgroundColor': 'orange',
                  'color': 'black'
              },
              {
                  'if': {
                      'filter_query': '{2nd_floor_406} >= 26',
                      'column_id': '2nd_floor_406'
                  },
                  'backgroundColor': 'red',
                  'color': 'white'
              }

          ],
          fill_width=True, 
          page_size=200),

    html.Br([]), # graphically display input data for the past 5 hours
    html.H2('Sensor Input Data', style=
            {'textAlign': 'center','color': 'black'}), 
    dcc.Graph(id='graph_features'),

    dcc.Interval( # set the interval refresh for the dashboard mockup
            id='interval-component',
            interval=1*1000, # in milliseconds, so we'll update every second
            n_intervals=0
        ),

    html.Br(),
    # add a dropdown menu to show how extra components could be added for the future
    # for example, user would be able to select one or more of these inputs
    # and simulate how they would modify the predicted temperature
    # (would have to implement code for that part)
    html.H2('Add Modifier Effects to the Predicted Temperature', style=
            {'textAlign': 'center','color': 'black'}), 
    html.Label('Select one or more from list below:'),
    dcc.Dropdown(
            options=[
                {'label': 'Humidity', 'value': 'H'},
                {'label': 'Blinds at 100%', 'value': 'B100'},
                {'label': 'Blinds at 50%', 'value': 'B50'},
                {'label': 'Air Speed, 1 fan', 'value': 'F'},
                {'label': 'Air Speed, 2 fans', 'value': 'F2'},
                {'label': 'CO_2 levels (occupancy)', 'value': 'CO'}
            ],
            value=['H', 'F'],
            multi=True
        ),
    html.Br(),
    html.Br(),
    html.Br(),
    html.Br(),
    html.Br(),
    html.Br()
], style={'width':'95%','height':'85%'})

# dashboard callbacks, executed for each component at interval time
# -----
# Define callback to update timestamp
@app.callback(
    Output('date_time_display', 'children'),
    Input('interval-component', 'n_intervals')
)

def update_time(n):
  global start_timestamp
  # start_timestamp += timedelta(hours=1)
  # style={'opacity': '1','color': 'black','textAlign': 'center',}
  return html.H2("Current date and time: " + start_timestamp.strftime('%Y-%m-%d %H:%M:%S'), style=
            {'opacity': '1','color': 'black','textAlign': 'center',}),

# -----
# Define callback to update table
@app.callback(
    Output('table','data'),
    Input('interval-component', 'n_intervals')
)

def update_table(n):
  global start_timestamp
  mask = (data_1hr_df.DateAndTimeIndex == start_timestamp+timedelta(hours=4)) 
  data = getData()#data_1hr_df[['2nd_floor_406']].loc[mask].to_dict('records')
  return data

# -----
# Define callback to update graph
@app.callback(
    Output('graph_features','figure'),
    Input('interval-component', 'n_intervals')
)


def update_figure(n):
  global start_timestamp
  mask = (data_1hr_df.DateAndTimeIndex  >= start_timestamp-timedelta(hours=4)) & (data_1hr_df.DateAndTimeIndex  <= start_timestamp)

  fig = make_subplots(
      rows = 2, cols=3,
      subplot_titles = ('Heat Supply Temp, North Side (427)', 'Heat Supply Temp, North Side (466)', 'Outdoor Air Temp, North (173)', 
                        'Carport Power Generation (469)', 'Solar Wall Temp (146)', 'Solar Roof Temp (185)'))
  # fig.add_trace(go,Scatter(x = data_1hr_df.DateAndTimeIndex, y = data_1hr_df['2nd_floor_406']), row=1,col=1)
  fig.add_trace(go.Scatter(x = data_1hr_df.DateAndTimeIndex.loc[mask], y = data_1hr_df['HRV_north_427'].loc[mask]), row=1,col=1)
  fig.add_trace(go.Scatter(x = data_1hr_df.DateAndTimeIndex.loc[mask], y = data_1hr_df['HRV_south_466'].loc[mask]), row=1,col=2)
  fig.add_trace(go.Scatter(x = data_1hr_df.DateAndTimeIndex.loc[mask], y = data_1hr_df['OAT_north_173'].loc[mask]), row=1,col=3)
  fig.add_trace(go.Scatter(x = data_1hr_df.DateAndTimeIndex.loc[mask], y = data_1hr_df['carport_pv_469'].loc[mask]), row=2,col=1)
  fig.add_trace(go.Scatter(x = data_1hr_df.DateAndTimeIndex.loc[mask], y = data_1hr_df['solar_wall_146'].loc[mask]), row=2,col=2)
  fig.add_trace(go.Scatter(x = data_1hr_df.DateAndTimeIndex.loc[mask], y = data_1hr_df['solar_roof_185'].loc[mask]), row=2,col=3)

  fig.update_layout(height=600, width=1200, margin=dict(l=20, r=20, t=20, b=20),
     showlegend=False)
  start_timestamp += timedelta(hours=1)
  return fig

# -------------------------------------------------------------------------    
# Run app and display result in a separate tab (or use "inline" to display in the notebook)
app.run_server(mode='external')

Dash app running on:


<IPython.core.display.Javascript object>