<a href="https://colab.research.google.com/github/BitKnitting/FitHome_Analysis/blob/master/notebooks/Baseline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Calculating Baseline
Insights provided to the homeowner by the FitHome experience begins with knowledge of the home's initial electrcitiy consumption.

We take an average daily value to determine the starting baseline of a homeowner's electricity consumption.  

The  challenge with providing the baseline is getting enough readings for a long enough window of time.

__We constrain a baseline to be at least three days worth of readings in which there are enough readings to represent 24 hours for a day.__

The goal of this notebook is to provide a walkthough on how we calculate a home's baseline average daily electricity use.

#Data from Firebase
Our data comes from the Firebase RT db that the home's monitor is publishing to.  Data is retrieved and put into the baseline.json file.


In [1]:
%%time
!curl 'https://fithome-9ebbd.firebaseio.com/flower-09282019/readings.json?print=pretty' > baseline.json 

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18.0M  100 18.0M    0     0  2065k      0  0:00:08  0:00:08 --:--:-- 5406k
CPU times: user 80.7 ms, sys: 20.8 ms, total: 102 ms
Wall time: 10.4 s


# Pandas Dataframe
Now that we have a copy of the data, we must get it into a structure that has all the amazing math/transform goop...Ooohh - we've got - PANDAS!

In [58]:
# @title Put into a Pandas dataframe
# This can take >= 3 minutes.
%%time
import pandas as pd
df_needs_reshape = pd.read_json("baseline.json","r", encoding="utf8")
df_reshaped = df_needs_reshape.T
# The datetime is in UTC time.  We are in the US Pacific Timeframe.
df_reshaped.index = df_reshaped.index.tz_localize('UTC').tz_convert('US/Pacific')
# @title Start Date and End Data of Series
print('Start date: {}'.format(df_reshaped.index.min()))
print('End   date: {}'.format(df_reshaped.index.max()))
time_between = df_reshaped.index.max() - df_reshaped.index.min()
print('\n\nElapsed time: {}'.format(time_between))

Start date: 2019-10-11 13:46:35-07:00
End   date: 2019-10-31 08:40:27-07:00


Elapsed time: 19 days 18:53:52
CPU times: user 4min 2s, sys: 6.12 s, total: 4min 8s
Wall time: 3min 59s


# Describe the Data
Let's look at some interesting stats

In [59]:
# @title Interesting Data Stats
df_reshaped.describe()

Unnamed: 0,I,P
count,314616.0,314616.0
mean,6.17531,660.158902
std,4.876879,546.450344
min,0.002,0.00032
25%,2.525,259.5798
50%,4.309,479.15085
75%,8.495,896.704275
max,52.895,5663.446


# Baseline
The baseline is the amout of kWh used on a daily average for at least 3 days.  The readings are received while the monitor is in Learning mode.  Once there is enough readings to calculate a baseling, the FitHome experience changes state to Active mode.  Active mode is where the FitHome experience starts giving personalized insights about minimizing electricity use without any lifestyle changes.

Below are the steps taken to come up with a baseline kWh value.

## Daily Readings
Get the days that have enough readings to be included.

In [62]:
all_power_data = df_reshaped['P']
daily_readings = all_power_data.resample(rule='D').mean()
enough_samples = all_power_data.resample(rule='D').count() > all_power_data.resample(rule='D').count().quantile(.7)
daily_readings_kWh = daily_readings[enough_samples]/1000
daily_readings_kWh

2019-10-22 00:00:00-07:00    0.332787
2019-10-23 00:00:00-07:00    0.383072
2019-10-24 00:00:00-07:00    0.587490
2019-10-25 00:00:00-07:00    0.916216
2019-10-27 00:00:00-07:00    0.570707
2019-10-28 00:00:00-07:00    0.570248
Name: P, dtype: float64

## Calculate Baseline
We have at least three days of readings that are acceptable to being used in the baseline calculation.  The average of these values will be our baseline value.

In [66]:
baseline_kWh = round(daily_readings_kWh.mean(),2)
print('\n************************\nThe baseline is: {} kWh.\n************************\n'.format(baseline_kWh))



************************
The baseline is: 0.56 kWh.
************************

